aboutsummaryrefslogtreecommitdiff
Commit message (Collapse)AuthorAgeFilesLines
* Allow files added through SparkContext.addFile() to be overwrittenYinan Li2014-01-182-15/+42
| | | | | | | This is useful for the cases when a file needs to be refreshed and downloaded by the executors periodically. Signed-off-by: Yinan Li <liyinan926@gmail.com>
* Merge pull request #461 from pwendell/masterPatrick Wendell2014-01-181-1/+1
|\ | | | | | | | | | | Use renamed shuffle spill config in CoGroupedRDD.scala This one got missed when it was renamed.
| * Use renamed shuffle spill config in CoGroupedRDD.scalaPatrick Wendell2014-01-181-1/+1
|/
* Merge pull request #451 from Qiuzhuang/masterPatrick Wendell2014-01-162-3/+3
|\ | | | | | | | | | | Fixed Window spark shell launch script error. JIRA SPARK-1029:https://spark-project.atlassian.net/browse/SPARK-1029
| * Fixed Window spark shell launch script error.Qiuzhuang Lian2014-01-162-3/+3
| | | | | | | | JIRA SPARK-1029:https://spark-project.atlassian.net/browse/SPARK-1029
* | Merge pull request #438 from ScrapCodes/clone-records-java-apiPatrick Wendell2014-01-161-2/+114
|\ \ | | | | | | | | | Clone records java api
| * | adding clone records field to equivaled java apisPrashant Sharma2014-01-171-2/+114
| | |
* | | Merge pull request #445 from kayousterhout/exec_lostReynold Xin2014-01-152-1/+18
|\ \ \ | |_|/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fail rather than hanging if a task crashes the JVM. Prior to this commit, if a task crashes the JVM, the task (and all other tasks running on that executor) is marked at KILLED rather than FAILED. As a result, the TaskSetManager will retry the task indefinitely rather than failing the job after maxFailures. Eventually, this makes the job hang, because the Standalone Scheduler removes the application after 10 works have failed, and then the app is left in a state where it's disconnected from the master and waiting to reconnect. This commit fixes that problem by marking tasks as FAILED rather than killed when an executor is lost. The downside of this commit is that if task A fails because another task running on the same executor caused the VM to crash, the failure will incorrectly be counted as a failure of task A. This should not be an issue because we typically set maxFailures to 3, and it is unlikely that a task will be co-located with a JVM-crashing task multiple times.
| * | Updated unit test commentKay Ousterhout2014-01-151-1/+3
| | |
| * | Fail rather than hanging if a task crashes the JVM.Kay Ousterhout2014-01-152-1/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Prior to this commit, if a task crashes the JVM, the task (and all other tasks running on that executor) is marked at KILLED rather than FAILED. As a result, the TaskSetManager will retry the task indefiniteily rather than failing the job after maxFailures. This commit fixes that problem by marking tasks as FAILED rather than killed when an executor is lost. The downside of this commit is that if task A fails because another task running on the same executor caused the VM to crash, the failure will incorrectly be counted as a failure of task A. This should not be an issue because we typically set maxFailures to 3, and it is unlikely that a task will be co-located with a JVM-crashing task multiple times.
* | | Merge pull request #414 from soulmachine/code-styleReynold Xin2014-01-1516-48/+28
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Code clean up for mllib * Removed unnecessary parentheses * Removed unused imports * Simplified `filter...size()` to `count ...` * Removed obsoleted parameters' comments
| * | | Added parentheses for that getDouble() also has side effectFrank Dai2014-01-141-1/+1
| | | |
| * | | Merge remote-tracking branch 'upstream/master' into code-styleFrank Dai2014-01-14135-271/+7842
| |\ \ \
| * | | | Indent two spacesFrank Dai2014-01-144-6/+6
| | | | |
| * | | | Since getLong() and getInt() have side effect, get back parentheses, and ↵Frank Dai2014-01-142-10/+9
| | | | | | | | | | | | | | | | | | | | remove an empty line
| * | | | Code clean up for mllibFrank Dai2014-01-1416-63/+44
| | | | |
* | | | | Merge pull request #439 from CrazyJvm/masterReynold Xin2014-01-151-2/+1
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | SPARK-1024 Remove "-XX:+UseCompressedStrings" option from tuning guide remove "-XX:+UseCompressedStrings" option from tuning guide since jdk7 no longer supports this.
| * | | | | remove "-XX:+UseCompressedStrings" optionCrazyJvm2014-01-151-2/+1
| | | | | | | | | | | | | | | | | | remove "-XX:+UseCompressedStrings" option from tuning guide since jdk7 no longer supports this.
* | | | | | Merge pull request #444 from mateiz/py-versionPatrick Wendell2014-01-152-3/+4
|\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | Clarify that Python 2.7 is only needed for MLlib
| * | | | | | Clarify that Python 2.7 is only needed for MLlibMatei Zaharia2014-01-152-3/+4
|/ / / / / /
* | | | | | Merge pull request #442 from pwendell/standalonePatrick Wendell2014-01-151-1/+4
|\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Workers should use working directory as spark home if it's not specified If users don't set SPARK_HOME in their environment file when launching an application, the standalone cluster should default to the spark home of the worker.
| * | | | | | Workers should use working directory as spark home if it's not specifiedPatrick Wendell2014-01-151-1/+4
| | | | | | |
* | | | | | | Merge pull request #443 from tdas/filestream-fixPatrick Wendell2014-01-155-4/+11
|\ \ \ \ \ \ \ | |_|_|_|_|/ / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Made some classes private[stremaing] and deprecated a method in JavaStreamingContext. Classes `RawTextHelper`, `RawTextSender` and `RateLimitedOutputStream` are not useful in the streaming API. There are not used by the core functionality and was there as a support classes for an obscure example. One of the classes is RawTextSender has a main function which can be executed using bin/spark-class even if it is made private[streaming]. In future, I will probably completely remove these classes. For the time being, I am just converting them to private[streaming]. Accessing underlying JavaSparkContext in JavaStreamingContext was through `JavaStreamingContext.sc` . This is deprecated and preferred method is `JavaStreamingContext.sparkContext` to keep it consistent with the `StreamingContext.sparkContext`.
| * | | | | | Made some classes private[stremaing] and deprecated a method in ↵Tathagata Das2014-01-155-4/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | JavaStreamingContext.
* | | | | | | Merge pull request #441 from pwendell/graphx-buildPatrick Wendell2014-01-151-1/+0
|\ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | GraphX shouldn't list Spark as provided. I noticed this when building an application against GraphX to audit the released artifacts.
| * | | | | | | GraphX shouldn't list Spark as providedPatrick Wendell2014-01-151-1/+0
| | |/ / / / / | |/| | | | |
* | | | | | | Merge pull request #433 from markhamstra/debFixPatrick Wendell2014-01-159-298/+120
|\ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | Updated Debian packaging
| * | | | | | | Removed repl-bin and updated maven build doc.Mark Hamstra2014-01-147-305/+3
| | | | | | | |
| * | | | | | | Add deb profile to assembly/pom.xmlMark Hamstra2014-01-143-1/+125
| | | | | | | |
* | | | | | | | Merge pull request #366 from colorant/yarn-devThomas Graves2014-01-156-951/+627
|\ \ \ \ \ \ \ \ | |_|_|_|/ / / / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | More yarn code refactor Try to retrive common code in yarn alpha/stable for client and workerRunnable to reduce duplicated codes. By put them into a trait in common dir and extends with them. Same works could be done for the remaining files in alpha/stable , while the remainning files have much more overlapping codes with different API call here and there within functions, and will need much more close review , aslo it might divide functions into too small trifle ones, thus might not deserve to be done in this way. So just make it run for these two files firstly.
| * | | | | | | Address comments to fix code formatsRaymond Liu2014-01-144-24/+22
| | | | | | | |
| * | | | | | | Yarn workerRunnable refactorRaymond Liu2014-01-143-247/+184
| | | | | | | |
| * | | | | | | Yarn Client refactorRaymond Liu2014-01-145-709/+450
| | |_|_|/ / / | |/| | | | |
* | | | | | | Merge pull request #436 from ankurdave/VertexId-caseReynold Xin2014-01-1433-244/+244
|\ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | Rename VertexID -> VertexId in GraphX
| * | | | | | | VertexID -> VertexIdAnkur Dave2014-01-1433-244/+244
| | | | | | | |
* | | | | | | | Merge pull request #435 from tdas/filestream-fixPatrick Wendell2014-01-1416-23/+66
|\ \ \ \ \ \ \ \ | | |_|_|_|/ / / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fixed the flaky tests by making SparkConf not serializable SparkConf was being serialized with CoGroupedRDD and Aggregator, which somehow caused OptionalJavaException while being deserialized as part of a ShuffleMapTask. SparkConf should not even be serializable (according to conversation with Matei). This change fixes that. @mateiz @pwendell
| * | | | | | | Merge remote-tracking branch 'apache/master' into filestream-fixTathagata Das2014-01-1466-52/+1114
| |\ \ \ \ \ \ \ | | | |_|/ / / / | | |/| | | | |
| * | | | | | | Changed SparkConf to not be serializable. And also fixed unit-test log paths ↵Tathagata Das2014-01-1416-23/+66
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | in log4j.properties of external modules.
* | | | | | | | Merge pull request #434 from rxin/graphxmavenPatrick Wendell2014-01-142-7/+19
|\ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fixed SVDPlusPlusSuite in Maven build. This should go into 0.9.0 also.
| * | | | | | | | Fixed SVDPlusPlusSuite in Maven build.Reynold Xin2014-01-142-7/+19
| | |/ / / / / / | |/| | | | | |
* | | | | | | | Merge pull request #424 from jegonzal/GraphXProgrammingGuideReynold Xin2014-01-141-52/+121
|\ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Additional edits for clarity in the graphx programming guide. Added an overview of the Graph and GraphOps functions and fixed numerous typos.
| * | | | | | | | Additional edits for clarity in the graphx programming guide.Joseph E. Gonzalez2014-01-141-52/+121
| | |_|_|_|/ / / | |/| | | | | |
* | | | | | | | Merge pull request #431 from ankurdave/graphx-caching-docReynold Xin2014-01-141-1/+10
|\ \ \ \ \ \ \ \ | |_|/ / / / / / |/| | | / / / / | | |_|/ / / / | |/| | | | | Describe caching and uncaching in GraphX programming guide
| * | | | | | Describe GraphX caching and uncaching in guideAnkur Dave2014-01-141-1/+10
|/ / / / / /
* | | | | | Merge pull request #428 from pwendell/writeable-objectsReynold Xin2014-01-141-2/+2
|\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | Don't clone records for text files
| * | | | | | Style fixPatrick Wendell2014-01-141-1/+1
| | | | | | |
| * | | | | | Don't clone records for text filesPatrick Wendell2014-01-141-2/+2
| | | | | | |
* | | | | | | Merge pull request #429 from ankurdave/graphx-examples-pom.xmlReynold Xin2014-01-141-0/+6
|\ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | Add GraphX dependency to examples/pom.xml
| * | | | | | | Add GraphX dependency to examples/pom.xmlAnkur Dave2014-01-141-0/+6
| | |/ / / / / | |/| | | | |
* | | | | | | Merge pull request #427 from pwendell/deprecate-aggregatorReynold Xin2014-01-141-5/+17
|\ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | Deprecate rather than remove old combineValuesByKey function