| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
| |
Currently PythonPartitioner determines partition ID by hashing a
byte-array representation of PySpark's key. This PR lets
PythonPartitioner use the actual partition ID, which is required e.g.
for sorting via PySpark.
|
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
fixed a wildcard bug in make-distribution.sh; ask sbt to check local
maven repo in project/SparkBuild.scala
(1) fixed a wildcard bug in make-distribution.sh:
with the wildcard * in quotes, this cp command failed. it worked after
moving the wildcard out quotes.
(2) ask sbt to check local maven repo in SparkBuild.scala:
To build Spark (0.9.0-SNAPSHOT) with the HEAD of mesos (0.15.0), I must
do "make maven-install" under mesos/build, which publishes the java .jar
file under ~/.m2. However, when building Spark (after pointing mesos to
version 0.15.0), sbt uses ivy which by default only checks ~/.ivy2. This
change is to tell sbt to also check ~/.m2.
|
| | |
|
| | |
|
|\ \
| | |
| | |
| | | |
Update README: updated the link
|
| |/ |
|
|\ \
| | |
| | |
| | | |
Allow users to set the application name for Spark on Yarn
|
| | | |
|
| |/ |
|
|\ \
| |/
|/|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Send Task results through the block manager when larger than Akka frame size (fixes SPARK-669).
This change requires adding an extra failure mode: tasks can complete
successfully, but the result gets lost or flushed from the block manager
before it's been fetched.
This change also moves the deserialization of tasks into a separate thread, so it's no longer part of the DAG scheduler's tight loop. This should improve scheduler throughput, particularly when tasks are sending back large results.
Thanks Josh for writing the original version of this patch!
This is duplicated from the mesos/spark repo: https://github.com/mesos/spark/pull/835
|
| | |
|
| | |
|
| |\
| |/
|/|
| |
| |
| |
| | |
Conflicts:
core/src/main/scala/org/apache/spark/scheduler/cluster/ClusterScheduler.scala
core/src/main/scala/org/apache/spark/scheduler/cluster/ClusterTaskSetManager.scala
core/src/main/scala/org/apache/spark/scheduler/local/LocalTaskSetManager.scala
|
|\ \
| | |
| | |
| | | |
Remove -optimize flag
|
| | | |
|
|\ \ \
| | | |
| | | |
| | | | |
Bug fix in master build
|
| | | | |
|
|\ \ \ \
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Improved organization of scheduling packages.
This commit does not change any code -- only file organization.
Please let me know if there was some masterminded strategy behind
the existing organization that I failed to understand!
There are two components of this change:
(1) Moving files out of the cluster package, and down
a level to the scheduling package. These files are all used by
the local scheduler in addition to the cluster scheduler(s), so
should not be in the cluster package. As a result of this change,
none of the files in the local package reference files in the
cluster package.
(2) Moving the mesos package to within the cluster package.
The mesos scheduling code is for a cluster, and represents a
specific case of cluster scheduling (the Mesos-related classes
often subclass cluster scheduling classes). Thus, the most logical
place for it seems to be within the cluster package.
The one thing about the scheduling code that seems a little funny to me
is the naming of the SchedulerBackends. The StandaloneSchedulerBackend
is not just for Standalone mode, but instead is used by Mesos coarse grained
mode and Yarn, and the backend that *is* just for Standalone mode is instead called SparkDeploySchedulerBackend. I didn't change this because I wasn't sure if there
was a reason for this naming that I'm just not aware of.
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
This commit does not change any code -- only file organization.
There are two components of this change:
(1) Moving files out of the cluster package, and down
a level to the scheduling package. These files are all used by
the local scheduler in addition to the cluster scheduler(s), so
should not be in the cluster package. As a result of this change,
none of the files in the local package reference files in the
cluster package.
(2) Moving the mesos package to within the cluster package.
The mesos scheduling code is for a cluster, and represents a
specific case of cluster scheduling (the Mesos-related classes
often subclass cluster scheduling classes). Thus, the most logical
place for it is within the cluster package.
|
|\ \ \ \ \
| |_|_|/ /
|/| | | | |
EC2 SSH improvements
|
| | | | | |
|
| | | | | |
|
| | | | | |
|
|\ \ \ \ \
| | | | | |
| | | | | | |
Add mapPartitionsWithIndex
|
| | | | | | |
|
| | | | | | |
|
| | | | | | |
|
| |\ \ \ \ \ |
|
| | | | | | | |
|
| | | | | | | |
|
|\ \ \ \ \ \ \
| |_|_|_|_|/ /
|/| | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
some minor fixes to MemoryStore
This is a repeat of #5, moved to its own branch in my repo.
This makes all updates to on ; it skips on synchronizing the reads where it can get away with it.
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
Make "currentMemory" @volatile, so that it's reads in ensureFreeSpace() are atomic and up-to-date--i.e., currentMemory can't increase while putLock is held (though it could decrease, which would only help ensureFreeSpace()).
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
Remove unnecessary entries.get() call.
|
| | | | | | | |
|
|\ \ \ \ \ \ \
| | | | | | | |
| | | | | | | |
| | | | | | | | |
Smarter take/limit implementation.
|
| | | | | | | | |
|
|\ \ \ \ \ \ \ \
| |_|_|_|_|_|/ /
|/| | | | | | | |
|
|/ / / / / / / |
|
|\ \ \ \ \ \ \
| | | | | | | |
| | | | | | | |
| | | | | | | | |
Fix spacing so java.io.tmpdir doesn't run on with SPARK_JAVA_OPTS
|
| |/ / / / / / |
|
|\| | | | | | |
|
|\ \ \ \ \ \ \ |
|
| | |/ / / / /
| |/| | | | | |
|
|\ \ \ \ \ \ \ |
|
| |\ \ \ \ \ \ \
| | | | | | | | |
| | | | | | | | | |
Refactor FairSchedulableBuilder
|
| | | | | | | | | |
|
| | | | | | | | | |
|
| |/ / / / / / /
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
1. Configuration can be read from classpath if not set explicitly.
2. Add missing close handler.
|
| |\ \ \ \ \ \ \
| | | | | | | | |
| | | | | | | | | |
Fix PR926 local properties issues in Spark Streaming like scenarios
|
| | | | | | | | | |
|