aboutsummaryrefslogtreecommitdiff
path: root/core
Commit message (Collapse)AuthorAgeFilesLines
* Incorporated Patrick's feedback comment on #211 and made maven ↵Prashant Sharma2013-12-071-1/+1
| | | | build/dep-resolution atleast a bit faster.
* Fix long linesAaron Davidson2013-12-064-10/+12
|
* Rename SparkActorSystem to IndestructibleActorSystemAaron Davidson2013-12-065-14/+23
|\
| * Merge branch 'wip-scala-2.10' into akka-bug-fixPrashant Sharma2013-12-061-1/+1
| |\
| | * A left over akka -> akka.tcp changesPrashant Sharma2013-12-061-1/+1
| | |
| * | Made running SparkActorSystem specific to executors only.Prashant Sharma2013-12-032-3/+11
| | |
| * | Cleanup and documentation of SparkActorSystemAaron Davidson2013-12-031-85/+29
| | |
* | | Cleanup and documentation of SparkActorSystemAaron Davidson2013-12-021-85/+29
|/ /
* / Made akka capable of tolerating fatal exceptions and moving on.Prashant Sharma2013-12-022-2/+114
|/
* Merge branch 'master' into wip-scala-2.10Prashant Sharma2013-11-291-2/+8
|\
| * Merge pull request #210 from haitaoyao/http-timeoutMatei Zaharia2013-11-271-2/+8
| |\ | | | | | | | | | | | | | | | | | | | | | add http timeout for httpbroadcast While pulling task bytecode from HttpBroadcast server, there's no timeout value set. This may cause spark executor code hang and other task in the same executor process wait for the lock. I have encountered the issue in my cluster. Here's the stacktrace I captured : https://gist.github.com/haitaoyao/7655830 So add a time out value to ensure the task fail fast.
| | * add http timeout for httpbroadcasthaitao.yao2013-11-261-2/+8
| | |
* | | Changed defaults for akka to almost disable failure detector.Prashant Sharma2013-11-291-4/+6
| | |
* | | Fixed the broken build.Prashant Sharma2013-11-281-2/+3
| | |
* | | Merge branch 'master' into wip-scala-2.10Prashant Sharma2013-11-2720-243/+686
|\| | | | | | | | | | | | | | | | | | | | | | | | | | Conflicts: core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala core/src/main/scala/org/apache/spark/rdd/MapPartitionsRDD.scala core/src/main/scala/org/apache/spark/rdd/MapPartitionsWithContextRDD.scala core/src/main/scala/org/apache/spark/rdd/RDD.scala python/pyspark/rdd.py
| * | Merge pull request #146 from JoshRosen/pyspark-custom-serializersMatei Zaharia2013-11-261-104/+45
| |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Custom Serializers for PySpark This pull request adds support for custom serializers to PySpark. For now, all Python-transformed (or parallelize()d RDDs) are serialized with the same serializer that's specified when creating SparkContext. For now, PySpark includes `PickleSerDe` and `MarshalSerDe` classes for using Python's `pickle` and `marshal` serializers. It's pretty easy to add support for other serializers, although I still need to add instructions on this. A few notable changes: - The Scala `PythonRDD` class no longer manipulates Pickled objects; data from `textFile` is written to Python as MUTF-8 strings. The Python code performs the appropriate bookkeeping to track which deserializer should be used when reading an underlying JavaRDD. This mechanism could also be used to support other data exchange formats, such as MsgPack. - Several magic numbers were refactored into constants. - Batching is implemented by wrapping / decorating an unbatched SerDe.
| | * | Send PySpark commands as bytes insetad of strings.Josh Rosen2013-11-101-20/+4
| | | |
| | * | Add custom serializer support to PySpark.Josh Rosen2013-11-101-22/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For now, this only adds MarshalSerializer, but it lays the groundwork for other supporting custom serializers. Many of these mechanisms can also be used to support deserialization of different data formats sent by Java, such as data encoded by MsgPack. This also fixes a bug in SparkContext.union().
| | * | Remove Pickle-wrapping of Java objects in PySpark.Josh Rosen2013-11-031-67/+39
| | | | | | | | | | | | | | | | | | | | | | | | If we support custom serializers, the Python worker will know what type of input to expect, so we won't need to wrap Tuple2 and Strings into pickled tuples and strings.
| | * | Replace magic lengths with constants in PySpark.Josh Rosen2013-11-031-10/+16
| | | | | | | | | | | | | | | | | | | | | | | | Write the length of the accumulators section up-front rather than terminating it with a negative length. I find this easier to read.
| * | | Merge pull request #207 from henrydavidge/masterMatei Zaharia2013-11-264-0/+19
| |\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | Log a warning if a task's serialized size is very big As per Reynold's instructions, we now create a warning level log entry if a task's serialized size is too big. "Too big" is currently defined as 100kb. This warning message is generated at most once for each stage.
| | * | | Emit warning when task size > 100KBhhd2013-11-264-0/+19
| | | | |
| * | | | [SPARK-963] Wait for SparkListenerBus eventQueue to be empty before checking ↵Mark Hamstra2013-11-261-1/+6
| | | | | | | | | | | | | | | | | | | | jobLogger state
| * | | | Merge pull request #209 from pwendell/better-docsReynold Xin2013-11-261-10/+13
| |\ \ \ \ | | |_|_|/ | |/| | | | | | | | Improve docs for shuffle instrumentation
| | * | | Improve docs for shuffle instrumentationPatrick Wendell2013-11-251-10/+13
| | | | |
| * | | | Merge pull request #86 from holdenk/masterMatei Zaharia2013-11-264-0/+451
| |\ \ \ \ | | |_|/ / | |/| | | | | | | | | | | | | | | | | | Add histogram functionality to DoubleRDDFunctions This pull request add histogram functionality to the DoubleRDDFunctions.
| | * | | Fix the testHolden Karau2013-11-252-5/+5
| | | | |
| | * | | Add spacesHolden Karau2013-11-181-0/+14
| | | | |
| | * | | Remove explicit boxingHolden Karau2013-11-181-2/+2
| | | | |
| | * | | Remove extranious type declerationsHolden Karau2013-10-211-2/+2
| | | | |
| | * | | Remove extranious type definitions from inside of testsHolden Karau2013-10-211-86/+86
| | | | |
| | * | | CR feedbackHolden Karau2013-10-213-101/+125
| | | | |
| | * | | Add tests for the Java implementation.Holden Karau2013-10-201-0/+14
| | | | |
| | * | | Initial commit of adding histogram functionality to the DoubleRDDFunctions.Holden Karau2013-10-193-0/+399
| | | | |
| * | | | Merge pull request #204 from rxin/hashMatei Zaharia2013-11-254-54/+103
| |\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | OpenHashSet fixes Incorporated ideas from pull request #200. - Use Murmur Hash 3 finalization step to scramble the bits of HashCode instead of the simpler version in java.util.HashMap; the latter one had trouble with ranges of consecutive integers. Murmur Hash 3 is used by fastutil. - Don't check keys for equality when re-inserting due to growing the table; the keys will already be unique. - Remember the grow threshold instead of recomputing it on each insert Also added unit tests for size estimation for specialized hash sets and maps.
| | * | | | Incorporated ideas from pull request #200.Reynold Xin2013-11-251-50/+57
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Use Murmur Hash 3 finalization step to scramble the bits of HashCode instead of the simpler version in java.util.HashMap; the latter one had trouble with ranges of consecutive integers. Murmur Hash 3 is used by fastutil. - Don't check keys for equality when re-inserting due to growing the table; the keys will already be unique - Remember the grow threshold instead of recomputing it on each insert
| | * | | | Added unit tests for size estimation for specialized hash sets and maps.Reynold Xin2013-11-253-4/+46
| | | |/ / | | |/| |
| * | | | Merge pull request #201 from rxin/mappartitionsMatei Zaharia2013-11-254-70/+22
| |\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use the proper partition index in mapPartitionsWIthIndex mapPartitionsWithIndex uses TaskContext.partitionId as the partition index. TaskContext.partitionId used to be identical to the partition index in a RDD. However, pull request #186 introduced a scenario (with partition pruning) that the two can be different. This pull request uses the right partition index in all mapPartitionsWithIndex related calls. Also removed the extra MapPartitionsWIthContextRDD and put all the mapPartitions related functionality in MapPartitionsRDD.
| | * | | | Consolidated both mapPartitions related RDDs into a single MapPartitionsRDD.Reynold Xin2013-11-244-70/+22
| | | | | | | | | | | | | | | | | | | | | | | | Also changed the semantics of the index parameter in mapPartitionsWithIndex from the partition index of the output partition to the partition index in the current RDD.
| * | | | | Merge pull request #101 from colorant/yarn-client-schedulerMatei Zaharia2013-11-251-0/+25
| |\ \ \ \ \ | | |_|/ / / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For SPARK-527, Support spark-shell when running on YARN sync to trunk and resubmit here In current YARN mode approaching, the application is run in the Application Master as a user program thus the whole spark context is on remote. This approaching won't support application that involve local interaction and need to be run on where it is launched. So In this pull request I have a YarnClientClusterScheduler and backend added. With this scheduler, the user application is launched locally,While the executor will be launched by YARN on remote nodes with a thin AM which only launch the executor and monitor the Driver Actor status, so that when client app is done, it can finish the YARN Application as well. This enables spark-shell to run upon YARN. This also enable other Spark applications to have the spark context to run locally with a master-url "yarn-client". Thus e.g. SparkPi could have the result output locally on console instead of output in the log of the remote machine where AM is running on. Docs also updated to show how to use this yarn-client mode.
| | * | | | Add YarnClientClusterScheduler and Backend.Raymond Liu2013-11-221-0/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With this scheduler, the user application is launched locally, While the executor will be launched by YARN on remote nodes. This enables spark-shell to run upon YARN.
* | | | | | Improvements from the review comments and followed Boy Scout Rule.Prashant Sharma2013-11-278-35/+21
| | | | | |
* | | | | | Restored master address for client.Prashant Sharma2013-11-262-6/+9
| | | | | |
* | | | | | Fixed compile time warnings and formatting post merge.Prashant Sharma2013-11-264-8/+5
| | | | | |
* | | | | | Merge branch 'master' into scala-2.10-wipPrashant Sharma2013-11-2510-56/+365
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Conflicts: core/src/main/scala/org/apache/spark/rdd/RDD.scala project/SparkBuild.scala
| * | | | | Merge pull request #151 from russellcardullo/add-graphite-sinkMatei Zaharia2013-11-242-0/+86
| |\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add graphite sink for metrics This adds a metrics sink for graphite. The sink must be configured with the host and port of a graphite node and optionally may be configured with a prefix that will be prepended to all metrics that are sent to graphite.
| | * | | | | Cleanup GraphiteSink.scala based on feedbackRussell Cardullo2013-11-181-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * Reorder imports according to the style guide * Consistently use propertyToOption in all places
| | * | | | | Add graphite sink for metricsRussell Cardullo2013-11-082-0/+86
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds a metrics sink for graphite. The sink must be configured with the host and port of a graphite node and optionally may be configured with a prefix that will be prepended to all metrics that are sent to graphite.
| * | | | | | Merge pull request #185 from mkolod/random-number-generatorMatei Zaharia2013-11-243-0/+194
| |\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | XORShift RNG with unit tests and benchmark This patch was introduced to address SPARK-950 - the discussion below the ticket explains not only the rationale, but also the design and testing decisions: https://spark-project.atlassian.net/browse/SPARK-950 To run unit test, start SBT console and type: compile test-only org.apache.spark.util.XORShiftRandomSuite To run benchmark, type: project core console Once the Scala console starts, type: org.apache.spark.util.XORShiftRandom.benchmark(100000000) XORShiftRandom is also an object with a main method taking the number of iterations as an argument, so you can also run it from the command line.
| | * | | | | | Make XORShiftRandom explicit in KMeans and roll it back for RDDMarek Kolodziej2013-11-201-1/+3
| | | | | | | |