aboutsummaryrefslogtreecommitdiff
Commit message (Collapse)AuthorAgeFilesLines
* Fixed a typo in Hadoop version in README.Reynold Xin2013-11-021-1/+1
|
* Merge pull request #132 from Mistobaan/doc_fixReynold Xin2013-11-011-1/+1
|\ | | | | | | fix persistent-hdfs
| * fix persistent-hdfsFabrizio (Misto) Milo2013-11-011-1/+1
|/
* Merge pull request #129 from velvia/2013-11/document-local-urisMatei Zaharia2013-11-012-2/+15
|\ | | | | | | | | | | | | Document & finish support for local: URIs Review all the supported URI schemes for addJar / addFile to the Cluster Overview page. Add support for local: URI to addFile.
| * Add local: URI support to addFile as wellEvan Chan2013-11-011-1/+2
| |
| * Document all the URIs for addJar/addFileEvan Chan2013-11-011-1/+13
|/
* Merge pull request #117 from stephenh/avoid_concurrent_modification_exceptionMatei Zaharia2013-10-302-3/+12
|\ | | | | | | | | | | | | | | | | | | | | | | Handle ConcurrentModificationExceptions in SparkContext init. System.getProperties.toMap will fail-fast when concurrently modified, and it seems like some other thread started by SparkContext does a System.setProperty during it's initialization. Handle this by just looping on ConcurrentModificationException, which seems the safest, since the non-fail-fast methods (Hastable.entrySet) have undefined behavior under concurrent modification.
| * Avoid match errors when filtering for spark.hadoop settings.Stephen Haberman2013-10-301-2/+4
| |
| * Use Properties.clone() instead.Stephen Haberman2013-10-291-5/+2
| |
| * Handle ConcurrentModificationExceptions in SparkContext init.Stephen Haberman2013-10-272-3/+13
| | | | | | | | | | | | | | | | | | | | System.getProperties.toMap will fail-fast when concurrently modified, and it seems like some other thread started by SparkContext does a System.setProperty during it's initialization. Handle this by just looping on ConcurrentModificationException, which seems the safest, since the non-fail-fast methods (Hastable.entrySet) have undefined behavior under concurrent modification.
* | Merge pull request #126 from kayousterhout/local_fixMatei Zaharia2013-10-301-1/+1
|\ \ | | | | | | | | | | | | | | | Fixed incorrect log message in local scheduler This change is especially relevant at the moment, because some users are seeing this failure, and the log message is misleading/incorrect (because for the tests, the max failures is set to 0, not 4)
| * | Fixed incorrect log message in local schedulerKay Ousterhout2013-10-301-1/+1
| | |
* | | Merge pull request #124 from tgravescs/sparkHadoopUtilFixMatei Zaharia2013-10-308-38/+43
|\ \ \ | | | | | | | | | | | | | | | | | | | | Pull SparkHadoopUtil out of SparkEnv (jira SPARK-886) Having the logic to initialize the correct SparkHadoopUtil in SparkEnv prevents it from being used until after the SparkContext is initialized. This causes issues like https://spark-project.atlassian.net/browse/SPARK-886. It also makes it hard to use in singleton objects. For instance I want to use it in the security code.
| * | | move the hadoopJobMetadata back into SparkEnvtgravescs2013-10-303-10/+8
| | | |
| * | | Merge remote-tracking branch 'upstream/master' into sparkHadoopUtilFixtgravescs2013-10-303-58/+100
| |\| |
| * | | fix sparkhdfs lr testtgravescs2013-10-291-1/+2
| | | |
| * | | Remove SparkHadoopUtil stuff from SparkEnvtgravescs2013-10-297-32/+38
| | | |
* | | | Merge pull request #125 from velvia/2013-10/local-jar-uriMatei Zaharia2013-10-302-1/+21
|\ \ \ \ | |_|/ / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add support for local:// URI scheme for addJars() This PR adds support for a new URI scheme for SparkContext.addJars(): `local://file/path`. The *local* scheme indicates that the `/file/path` exists on every worker node. The reason for its existence is for big library JARs, which would be really expensive to serve using the standard HTTP fileserver distribution method, especially for big clusters. Today the only inexpensive method (assuming such a file is on every host, via say NFS, rsync, etc.) of doing this is to add the JAR to the SPARK_CLASSPATH, but we want a method where the user does not need to modify the Spark configuration. I would add something to the docs, but it's not obvious where to add it. Oh, and it would be great if this could be merged in time for 0.8.1.
| * | | Add support for local:// URI scheme for addJars()Evan Chan2013-10-302-1/+21
| | | | | | | | | | | | | | | | This indicates that a jar is available locally on each worker node.
* | | | Merge pull request #118 from JoshRosen/blockinfo-memory-usageMatei Zaharia2013-10-292-57/+99
|\ \ \ \ | |/ / / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Reduce the memory footprint of BlockInfo objects This pull request reduces the memory footprint of all BlockInfo objects and makes additional optimizations for shuffle blocks. For all BlockInfo objects, these changes remove two boolean fields and one Object field. For shuffle blocks, we additionally remove an Object field and a boolean field. When storing tens of thousands of these objects, this may add up to significant memory savings. A ShuffleBlockInfo now only needs to wrap a single long. This was motivated by a [report of high blockInfo memory usage during shuffles](https://mail-archives.apache.org/mod_mbox/incubator-spark-user/201310.mbox/%3C20131026134353.202b2b9b%40sh9%3E). I haven't run benchmarks to measure the exact memory savings. /cc @aarondav
| * | | Extract BlockInfo classes from BlockManager.Josh Rosen2013-10-292-75/+97
| | | | | | | | | | | | | | | | | | | | This saves space, since the inner classes needed to keep a reference to the enclosing BlockManager.
| * | | Store fewer BlockInfo fields for shuffle blocks.Josh Rosen2013-10-291-7/+25
| | | |
| * | | Restructure BlockInfo fields to reduce memory use.Josh Rosen2013-10-271-15/+17
| |/ /
* | | Merge pull request #119 from soulmachine/masterReynold Xin2013-10-291-1/+1
|\ \ \ | |/ / |/| | | | | A little revise for the document
| * | A little revise for the documentsoulmachine2013-10-291-1/+1
| | |
* | | Merge pull request #112 from kayousterhout/ui_task_attempt_idMatei Zaharia2013-10-271-2/+3
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | Display both task ID and task attempt ID in UI, and rename taskId to taskAttemptId Previously only the task attempt ID was shown in the UI; this was confusing because the job can be shown as complete while there are tasks still running. Showing the task ID in addition to the attempt ID makes it clear which tasks are redundant. This commit also renames taskId to taskAttemptId in TaskInfo and in the local/cluster schedulers. This identifier was used to uniquely identify attempts, not tasks, so the current naming was confusing. The new naming is also more consistent with map reduce.
| * | | Display both task ID and task index in UIKay Ousterhout2013-10-261-2/+3
| | |/ | |/|
* | | Merge pull request #115 from aarondav/shuffle-fixReynold Xin2013-10-272-5/+12
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | Eliminate extra memory usage when shuffle file consolidation is disabled Otherwise, we see SPARK-946 even when shuffle file consolidation is disabled. Fixing SPARK-946 is still forthcoming.
| * | | Use flag instead of name check.Aaron Davidson2013-10-261-2/+1
| | | |
| * | | Eliminate extra memory usage when shuffle file consolidation is disabledAaron Davidson2013-10-262-5/+13
|/ / / | | | | | | | | | | | | Otherwise, we see SPARK-946 even when shuffle file consolidation is disabled. Fixing SPARK-946 is still forthcoming.
* | | Merge pull request #113 from pwendell/masterPatrick Wendell2013-10-261-4/+11
|\ \ \ | | | | | | | | | | | | | | | | | | | | Improve error message when multiple assembly jars are present. This can happen easily if building different hadoop versions. Right now it gives a class not found exception.
| * | | Adding improved error message when multiple assembly jars are present.Patrick Wendell2013-10-251-4/+11
| | | | | | | | | | | | | | | | This can happen easily if building different hadoop versions.
* | | | Merge pull request #114 from soulmachine/masterReynold Xin2013-10-261-3/+4
|\ \ \ \ | | |_|/ | |/| | | | | | A little revise for the document
| * | | A little revise for the documentsoulmachine2013-10-261-3/+4
|/ / /
* | | Merge pull request #108 from alig/masterMatei Zaharia2013-10-257-8/+123
|\ \ \ | | | | | | | | | | | | Changes to enable executing by using HDFS as a synchronization point between driver and executors, as well as ensuring executors exit properly.
| * | | fixing comments on PRAli Ghodsi2013-10-253-29/+18
| | | |
| * | | Makes Spark SIMR ready.Ali Ghodsi2013-10-247-5/+131
| | | |
* | | | Merge pull request #102 from tdas/transformMatei Zaharia2013-10-2513-162/+1037
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Added new Spark Streaming operations New operations - transformWith which allows arbitrary 2-to-1 DStream transform, added to Scala and Java API - StreamingContext.transform to allow arbitrary n-to-1 DStream - leftOuterJoin and rightOuterJoin between 2 DStreams, added to Scala and Java API - missing variations of join and cogroup added to Scala Java API - missing JavaStreamingContext.union Updated a number of Java and Scala API docs
| * \ \ \ Merge branch 'apache-master' into transformTathagata Das2013-10-2517-15/+272
| |\ \ \ \
| * | | | | Fixed accidental bug.Tathagata Das2013-10-241-1/+1
| | | | | |
| * | | | | Merge branch 'apache-master' into transformTathagata Das2013-10-2419-10/+417
| |\ \ \ \ \ | | | |_|_|/ | | |/| | |
| * | | | | Added JavaStreamingContext.transformTathagata Das2013-10-245-33/+169
| | | | | |
| * | | | | Removed Function3.call() based on Josh's comment.Tathagata Das2013-10-231-2/+0
| | | | | |
| * | | | | Merge branch 'apache-master' into transformTathagata Das2013-10-2290-3300/+2058
| |\ \ \ \ \
| * | | | | | Fixed bug in Java transformWith, added more Java testcases for transform and ↵Tathagata Das2013-10-228-179/+424
| | | | | | | | | | | | | | | | | | | | | | | | | | | | transformWith, added missing variations of Java join and cogroup, updated various Scala and Java API docs.
| * | | | | | Updated TransformDStream to allow n-ary DStream transform. Added ↵Tathagata Das2013-10-2111-33/+529
| | | | | | | | | | | | | | | | | | | | | | | | | | | | transformWith, leftOuterJoin and rightOuterJoin operations to DStream for Scala and Java APIs. Also added n-ary union and n-ary transform operations to StreamingContext for Scala and Java APIs.
* | | | | | | Merge pull request #111 from kayousterhout/ui_nameMatei Zaharia2013-10-252-3/+1
|\ \ \ \ \ \ \ | |_|_|_|/ / / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Properly display the name of a stage in the UI. This fixes a bug introduced by the fix for SPARK-940, which changed the UI to display the RDD name rather than the stage name. As a result, no name for the stage was shown when using the Spark shell, which meant that there was no way to click on the stage to see more details (e.g., the running tasks). This commit changes the UI back to using the stage name. @pwendell -- let me know if this change was intentional
| * | | | | | Properly display the name of a stage in the UI.Kay Ousterhout2013-10-252-3/+1
| | |_|/ / / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This fixes a bug introduced by the fix for SPARK-940, which changed the UI to display the RDD name rather than the stage name. As a result, no name for the stage was shown when using the Spark shell, which meant that there was no way to click on the stage to see more details (e.g., the running tasks). This commit changes the UI back to using the stage name.
* | | | | | Merge pull request #110 from pwendell/masterReynold Xin2013-10-252-0/+5
|\ \ \ \ \ \ | | |_|_|_|/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Exclude jopt from kafka dependency. Kafka uses an older version of jopt that causes bad conflicts with the version used by spark-perf. It's not easy to remove this downstream because of the way that spark-perf uses Spark (by including a spark assembly as an unmanaged jar). This fixes the problem at its source by just never including it.
| * | | | | Exclude jopt from kafka dependency.Patrick Wendell2013-10-252-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Kafka uses an older version of jopt that causes bad conflicts with the version used by spark-perf. It's not easy to remove this downstream because of the way that spark-perf uses Spark (by including a spark assembly as an unmanaged jar). This fixes the problem at its source by just never including it.