| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
| |
- Added a StorageLevels class for easy access to StorageLevel constants
in Java
- Added doc comments on Function classes in Java
- Updated Accumulator and HadoopWriter docs slightly
|
|\ |
|
| | |
|
|/ |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
|
| |
- Add override keywords.
- Cache RDDs and counts in TC example.
- Clean up JavaRDDLike's abstract methods.
|
|
|
|
|
|
| |
- Replace JavaLR example with JavaHdfsLR example.
- Use anonymous classes in JavaWordCount; add options.
- Remove @Override annotations.
|
|
|
|
|
|
| |
Add distinct() method to RDD.
Fix bug in DoubleRDDFunctions.
|
| |
|
| |
|
| |
|
|\
| |
| | |
Made improvements to takeSample. Also changed SparkLocalKMeans to SparkKMeans
|
| | |
|
| | |
|
| |
| |
| |
| |
| | |
LocalKMeans runs locally with a randomly generated dataset.
SparkLocalKMeans takes an input file and runs KMeans on it.
|
|\ \ |
|
| |/
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
When a task throws an exception, the Spark executor previously just
logged it to a local file on the slave and exited. This commit causes
Spark to also report the exception back to the driver using a Mesos
status update, so the user doesn't have to look through a log file on
the slave.
Here's what the reporting currently looks like:
# ./run spark.examples.ExceptionHandlingTest master@203.0.113.1:5050
[...]
11/10/26 21:04:13 INFO spark.SimpleJob: Lost TID 1 (task 0:1)
11/10/26 21:04:13 INFO spark.SimpleJob: Loss was due to java.lang.Exception: Testing exception handling
[...]
11/10/26 21:04:16 INFO spark.SparkContext: Job finished in 5.988547328 s
|
|/
|
|
| |
(you can no longer iterate over a Source multiple times).
|
| |
|
|
|
|
|
|
| |
Note that we use scala.Serializable introduced in Scala 2.9 instead of
java.io.Serializable. Also, case classes inherit from scala.Serializable by
default.
|
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This merge keeps only the broadcast work in mos-bt because the structure
of shuffle has changed with the new RDD design. We still need some kind
of parallel shuffle but that will be added later.
Conflicts:
core/src/main/scala/spark/BitTorrentBroadcast.scala
core/src/main/scala/spark/ChainedBroadcast.scala
core/src/main/scala/spark/RDD.scala
core/src/main/scala/spark/SparkContext.scala
core/src/main/scala/spark/Utils.scala
core/src/main/scala/spark/shuffle/BasicLocalFileShuffle.scala
core/src/main/scala/spark/shuffle/DfsShuffle.scala
|
| |\
| | |
| | |
| | |
| | | |
Conflicts:
core/src/main/scala/spark/Broadcast.scala
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Conflicts:
.gitignore
core/src/main/scala/spark/LocalFileShuffle.scala
src/scala/spark/BasicLocalFileShuffle.scala
src/scala/spark/Broadcast.scala
src/scala/spark/LocalFileShuffle.scala
|
| | | |
|
| |/ |
|
| |
| |
| |
| | |
tabulate used if indexed used by function and fill otherwise.
|
| | |
|
|/ |
|
| |
|
|
|