| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
| |
|
|\
| |
| | |
ProgrammingGuide
|
| |
| |
| |
| | |
Programming guide.
|
| |
| |
| |
| | |
Getting unpersist right in GraphLab is tricky.
|
| | |
|
| | |
|
| | |
|
| | |
|
| | |
|
| | |
|
| | |
|
| | |
|
| | |
|
| | |
|
| | |
|
| | |
|
| |
| |
| |
| | |
Reverts to 04d83fc37f9eef89c20331c85291a0a169f75e6d:examples/src/main/scala/org/apache/spark/examples/bagel/PageRankUtils.scala.
|
|/ |
|
| |
|
|
|
|
| |
7210257ba3038d5e22d4b60fe9c3113dc45c3dff:README.md
|
| |
|
| |
|
|
|
|
|
|
|
|
| |
The zip{Edge,Vertex}Partitions methods created doubly-nested closures
and passed them to zipPartitions. For some reason this caused an
AbstractMethodError when zipPartitions tried to invoke the closure. This
commit works around the problem by inlining these methods wherever they
are called, eliminating the doubly-nested closure.
|
| |
|
| |
|
| |
|
| |
|
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Conflicts:
README.md
core/src/main/scala/org/apache/spark/util/collection/OpenHashMap.scala
core/src/main/scala/org/apache/spark/util/collection/OpenHashSet.scala
core/src/main/scala/org/apache/spark/util/collection/PrimitiveKeyOpenHashMap.scala
pom.xml
project/SparkBuild.scala
repl/src/main/scala/org/apache/spark/repl/SparkILoop.scala
|
| |\
| | |
| | |
| | | |
fix make-distribution.sh show version: command not found
|
| | | |
|
| |\ \
| | | |
| | | |
| | | |
| | | |
| | | | |
Set boolean param name for call to SparkHadoopMapReduceUtil.newTaskAttemptID
Set boolean param name for call to SparkHadoopMapReduceUtil.newTaskAttemptID to make it clear which param being set.
|
| | | | |
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
SparkHadoopMapReduceUtil.newTaskAttemptID to make
it clear which param being set.
|
| |\ \ \
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Add CDH Repository to Maven Build
At some point this was removed from the Maven build... so I'm adding it back. It's needed for the Hadoop2 tests we run on Jenkins and it's also included in the SBT build.
|
| | | | | |
|
| |\ \ \ \
| | |_|_|/
| |/| | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Remove calls to deprecated mapred's OutputCommitter.cleanupJob
Since Hadoop 1.0.4 the mapred OutputCommitter.commitJob should do cleanup job via call to OutputCommitter.cleanupJob,
Remove SparkHadoopWriter.cleanup since it is used only by PairRDDFunctions.
In fact the implementation of mapred OutputCommitter.commitJob looks like this:
public void commitJob(JobContext jobContext) throws IOException {
cleanupJob(jobContext);
}
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Hadoop 1.0.4
the mapred OutputCommitter.commitJob should do cleanup job.
In fact the implementation of mapred OutputCommitter.commitJob looks like this:
public void commitJob(JobContext jobContext) throws IOException {
cleanupJob(jobContext);
}
(The jobContext input argument is type of org.apache.hadoop.mapred.JobContext)
|
| |\ \ \ \
| | |_|/ /
| |/| | |
| | | | |
| | | | |
| | | | | |
support distributing extra files to worker for yarn client mode
So that user doesn't need to package all dependency into one assemble jar as spark app jar
|
| | | | |
| | | | |
| | | | |
| | | | | |
on yarn cluster
|
| | | | | |
|
| |\ \ \ \
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
SPARK-1009 Updated MLlib docs to show how to use it in Python
In addition added detailed examples for regression, clustering and recommendation algorithms in a separate Scala section. Fixed a few minor issues with existing documentation.
|
| | |\ \ \ \ |
|
| | | | | | | |
|
| | | | | | | |
|
| | | | | | | |
|
| | | | | | | |
|
| |\ \ \ \ \ \
| | |_|_|_|_|/
| |/| | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
Update README.md
The link does not work otherwise.
|
| | | |_|_|/
| | |/| | |
| | | | | | |
The link does not work otherwise.
|
| |\ \ \ \ \
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
Refactored the streaming project to separate external libraries like Twitter, Kafka, Flume, etc.
At a high level, these are the following changes.
1. All the external code was put in `SPARK_HOME/external/` as separate SBT projects and Maven modules. Their artifact names are `spark-streaming-twitter`, `spark-streaming-kafka`, etc. Both SparkBuild.scala and pom.xml files have been updated. References to external libraries and repositories have been removed from the settings of root and streaming projects/modules.
2. To avail the external functionality (say, creating a Twitter stream), the developer has to `import org.apache.spark.streaming.twitter._` . For Scala API, the developer has to call `TwitterUtils.createStream(streamingContext, ...)`. For the Java API, the developer has to call `TwitterUtils.createStream(javaStreamingContext, ...)`.
3. Each external project has its own scala and java unit tests. Note the unit tests of each external library use classes of the streaming unit tests (`TestSuiteBase`, `LocalJavaStreamingContext`, etc.). To enable this code sharing among test classes, `dependsOn(streaming % "compile->compile,test->test")` was used in the SparkBuild.scala . In the streaming/pom.xml, an additional `maven-jar-plugin` was necessary to capture this dependency (see comment inside the pom.xml for more information).
4. Jars of the external projects have been added to examples project but not to the assembly project.
5. In some files, imports have been rearrange to conform to the Spark coding guidelines.
|