| Commit message (Collapse) | Author | Age | Files | Lines |
|\
| |
| |
| |
| | |
Conflicts:
yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala
|
| |\
| | |
| | |
| | | |
Improve docs for shuffle instrumentation
|
| | | |
|
| |\ \
| | | |
| | | |
| | | |
| | | |
| | | | |
Add histogram functionality to DoubleRDDFunctions
This pull request add histogram functionality to the DoubleRDDFunctions.
|
| | | | |
|
| | | | |
|
| | | | |
|
| | | | |
|
| | | | |
|
| | | | |
|
| | | | |
|
| | | | |
|
| |\ \ \
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
OpenHashSet fixes
Incorporated ideas from pull request #200.
- Use Murmur Hash 3 finalization step to scramble the bits of HashCode
instead of the simpler version in java.util.HashMap; the latter one
had trouble with ranges of consecutive integers. Murmur Hash 3 is used
by fastutil.
- Don't check keys for equality when re-inserting due to growing the
table; the keys will already be unique.
- Remember the grow threshold instead of recomputing it on each insert
Also added unit tests for size estimation for specialized hash sets and maps.
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
- Use Murmur Hash 3 finalization step to scramble the bits of HashCode
instead of the simpler version in java.util.HashMap; the latter one
had trouble with ranges of consecutive integers. Murmur Hash 3 is used
by fastutil.
- Don't check keys for equality when re-inserting due to growing the
table; the keys will already be unique
- Remember the grow threshold instead of recomputing it on each insert
|
| | | |/
| | |/| |
|
| |\ \ \
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Update tuning.md
Clarify when serializer is used based on recent user@ mailing list discussion.
|
| | | | |
| | | | |
| | | | | |
Clarify when serializer is used based on recent user@ mailing list discussion.
|
| |\ \ \ \
| | |/ / /
| |/| | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Use the proper partition index in mapPartitionsWIthIndex
mapPartitionsWithIndex uses TaskContext.partitionId as the partition index. TaskContext.partitionId used to be identical to the partition index in a RDD. However, pull request #186 introduced a scenario (with partition pruning) that the two can be different. This pull request uses the right partition index in all mapPartitionsWithIndex related calls.
Also removed the extra MapPartitionsWIthContextRDD and put all the mapPartitions related functionality in MapPartitionsRDD.
|
| | | | |
| | | | |
| | | | |
| | | | | |
Also changed the semantics of the index parameter in mapPartitionsWithIndex from the partition index of the output partition to the partition index in the current RDD.
|
| |\ \ \ \
| | |_|/ /
| |/| | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
For SPARK-527, Support spark-shell when running on YARN
sync to trunk and resubmit here
In current YARN mode approaching, the application is run in the Application Master as a user program thus the whole spark context is on remote.
This approaching won't support application that involve local interaction and need to be run on where it is launched.
So In this pull request I have a YarnClientClusterScheduler and backend added.
With this scheduler, the user application is launched locally,While the executor will be launched by YARN on remote nodes with a thin AM which only launch the executor and monitor the Driver Actor status, so that when client app is done, it can finish the YARN Application as well.
This enables spark-shell to run upon YARN.
This also enable other Spark applications to have the spark context to run locally with a master-url "yarn-client". Thus e.g. SparkPi could have the result output locally on console instead of output in the log of the remote machine where AM is running on.
Docs also updated to show how to use this yarn-client mode.
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
With this scheduler, the user application is launched locally,
While the executor will be launched by YARN on remote nodes.
This enables spark-shell to run upon YARN.
|
| |\ \ \ \
| | | | | |
| | | | | |
| | | | | | |
Fix Maven build for metrics-graphite
|
| |/ / / / |
|
| |\ \ \ \
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Add graphite sink for metrics
This adds a metrics sink for graphite. The sink must
be configured with the host and port of a graphite node
and optionally may be configured with a prefix that will
be prepended to all metrics that are sent to graphite.
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
* Reorder imports according to the style guide
* Consistently use propertyToOption in all places
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
This adds a metrics sink for graphite. The sink must
be configured with the host and port of a graphite node
and optionally may be configured with a prefix that will
be prepended to all metrics that are sent to graphite.
|
| |\ \ \ \ \
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
XORShift RNG with unit tests and benchmark
This patch was introduced to address SPARK-950 - the discussion below the ticket explains not only the rationale, but also the design and testing decisions: https://spark-project.atlassian.net/browse/SPARK-950
To run unit test, start SBT console and type:
compile
test-only org.apache.spark.util.XORShiftRandomSuite
To run benchmark, type:
project core
console
Once the Scala console starts, type:
org.apache.spark.util.XORShiftRandom.benchmark(100000000)
XORShiftRandom is also an object with a main method taking the
number of iterations as an argument, so you can also run it
from the command line.
|
| | | | | | | |
|
| | | | | | | |
|
| | | | | | | |
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
To run unit test, start SBT console and type:
compile
test-only org.apache.spark.util.XORShiftRandomSuite
To run benchmark, type:
project core
console
Once the Scala console starts, type:
org.apache.spark.util.XORShiftRandom.benchmark(100000000)
|
| |\ \ \ \ \ \
| | |_|_|_|/ /
| |/| | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
Fix 'timeWriting' stat for shuffle files
Due to concurrent git branches, changes from shuffle file consolidation patch
caused the shuffle write timing patch to no longer actually measure the time,
since it requires time be measured after the stream has been closed.
|
| | |/ / / /
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Due to concurrent git branches, changes from shuffle file consolidation patch
caused the shuffle write timing patch to no longer actually measure the time,
since it requires time be measured after the stream has been closed.
|
| |\ \ \ \ \
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
AppendOnlyMap fixes
- Chose a more random reshuffling step for values returned by Object.hashCode to avoid some long chaining that was happening for consecutive integers (e.g. `sc.makeRDD(1 to 100000000, 100).map(t => (t, t)).reduceByKey(_ + _).count`)
- Some other small optimizations throughout (see commit comments)
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
- Don't check keys for equality when re-inserting due to growing the
table; the keys will already be unique
- Remember the grow threshold instead of recomputing it on each insert
|
| |/ / / / /
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
- Use Murmur Hash 3 finalization step to scramble the bits of HashCode
instead of the simpler version in java.util.HashMap; the latter one
had trouble with ranges of consecutive integers. Murmur Hash 3 is used
by fastutil.
- Use Object.equals() instead of Scala's == to compare keys, because the
latter does extra casts for numeric types (see the equals method in
https://github.com/scala/scala/blob/master/src/library/scala/runtime/BoxesRunTime.java)
|
| |\ \ \ \ \
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
Support preservesPartitioning in RDD.zipPartitions
In `RDD.zipPartitions`, add support for a `preservesPartitioning` option (similar to `RDD.mapPartitions`) that reuses the first RDD's partitioner.
|
| |/ / / / / |
|
| |\ \ \ \ \
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
Fix Kryo Serializer buffer documentation inconsistency
The documentation here is inconsistent with the coded default and other documentation.
|
| | | |_|/ /
| | |/| | |
| | | | | | |
The documentation here is inconsistent with the coded default and other documentation.
|
| |\ \ \ \ \
| | |/ / / /
| |/| | | |
| | | | | |
| | | | | |
| | | | | | |
TimeTrackingOutputStream should pass on calls to close() and flush().
Without this fix you get a huge number of open files when running shuffles.
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Without this fix you get a huge number of open shuffles after running
shuffles.
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
If the Hadoop used is version 2.2 or derived from it, then Spark
will be compiled against protobuf-2.5 and a protobuf-2.5 version of
Akka 2.0.5.
|
| | | | | | |
|
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
changes.
|
| | | | | | |
|
|\| | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Conflicts:
yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala
yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala
yarn/src/main/scala/org/apache/spark/deploy/yarn/WorkerRunnable.scala
yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocationHandler.scala
|
| |\ \ \ \ \
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
PartitionPruningRDD is using index from parent
I was getting a ArrayIndexOutOfBoundsException exception after doing union on pruned RDD. The index it was using on the partition was the index in the original RDD not the new pruned RDD.
|
| | | | | | | |
|
| | | |/ / /
| | |/| | | |
|