spark - Mirror of Apache Spark

	Commit message (Collapse)	Author	Age	Files	Lines
*	Required AM memory is "amMem", not "args.amMemory"	derek ma	2014-07-30	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	"ERROR yarn.Client: Required AM memory (1024) is above the max threshold (1048) of this cluster" appears if this code is not changed. obviously, 1024 is less than 1048, so change this Author: derek ma <maji3@asiainfo-linkage.com> Closes #1494 from maji2014/master and squashes the following commits: b0f6640 [derek ma] Required AM memory is "amMem", not "args.amMemory"
*	[SPARK-2410][SQL] Merging Hive Thrift/JDBC server (with Maven profile fix)	Cheng Lian	2014-07-28	3	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	JIRA issue: [SPARK-2410](https://issues.apache.org/jira/browse/SPARK-2410) Another try for #1399 & #1600. Those two PR breaks Jenkins builds because we made a separate profile `hive-thriftserver` in sub-project `assembly`, but the `hive-thriftserver` module is defined outside the `hive-thriftserver` profile. Thus every time a pull request that doesn't touch SQL code will also execute test suites defined in `hive-thriftserver`, but tests fail because related .class files are not included in the assembly jar. In the most recent commit, module `hive-thriftserver` is moved into its own profile to fix this problem. All previous commits are squashed for clarity. Author: Cheng Lian <lian.cs.zju@gmail.com> Closes #1620 from liancheng/jdbc-with-maven-fix and squashes the following commits: 629988e [Cheng Lian] Moved hive-thriftserver module definition into its own profile ec3c7a7 [Cheng Lian] Cherry picked the Hive Thrift server
*	Revert "[SPARK-2410][SQL] Merging Hive Thrift/JDBC server"	Patrick Wendell	2014-07-27	3	-3/+3
\| \| \| \|	This reverts commit f6ff2a61d00d12481bfb211ae13d6992daacdcc2.
*	[SPARK-2410][SQL] Merging Hive Thrift/JDBC server	Cheng Lian	2014-07-27	3	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	(This is a replacement of #1399, trying to fix potential `HiveThriftServer2` port collision between parallel builds. Please refer to [these comments](https://github.com/apache/spark/pull/1399#issuecomment-50212572) for details.) JIRA issue: [SPARK-2410](https://issues.apache.org/jira/browse/SPARK-2410) Merging the Hive Thrift/JDBC server from [branch-1.0-jdbc](https://github.com/apache/spark/tree/branch-1.0-jdbc). Thanks chenghao-intel for his initial contribution of the Spark SQL CLI. Author: Cheng Lian <lian.cs.zju@gmail.com> Closes #1600 from liancheng/jdbc and squashes the following commits: ac4618b [Cheng Lian] Uses random port for HiveThriftServer2 to avoid collision with parallel builds 090beea [Cheng Lian] Revert changes related to SPARK-2678, decided to move them to another PR 21c6cf4 [Cheng Lian] Updated Spark SQL programming guide docs fe0af31 [Cheng Lian] Reordered spark-submit options in spark-shell[.cmd] 199e3fb [Cheng Lian] Disabled MIMA for hive-thriftserver 1083e9d [Cheng Lian] Fixed failed test suites 7db82a1 [Cheng Lian] Fixed spark-submit application options handling logic 9cc0f06 [Cheng Lian] Starts beeline with spark-submit cfcf461 [Cheng Lian] Updated documents and build scripts for the newly added hive-thriftserver profile 061880f [Cheng Lian] Addressed all comments by @pwendell 7755062 [Cheng Lian] Adapts test suites to spark-submit settings 40bafef [Cheng Lian] Fixed more license header issues e214aab [Cheng Lian] Added missing license headers b8905ba [Cheng Lian] Fixed minor issues in spark-sql and start-thriftserver.sh f975d22 [Cheng Lian] Updated docs for Hive compatibility and Shark migration guide draft 3ad4e75 [Cheng Lian] Starts spark-sql shell with spark-submit a5310d1 [Cheng Lian] Make HiveThriftServer2 play well with spark-submit 61f39f4 [Cheng Lian] Starts Hive Thrift server via spark-submit 2c4c539 [Cheng Lian] Cherry picked the Hive Thrift server
*	Revert "[SPARK-2410][SQL] Merging Hive Thrift/JDBC server"	Michael Armbrust	2014-07-25	3	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \|	This reverts commit 06dc0d2c6b69c5d59b4d194ced2ac85bfe2e05e2. #1399 is making Jenkins fail. We should investigate and put this back after its passing tests. Author: Michael Armbrust <michael@databricks.com> Closes #1594 from marmbrus/revertJDBC and squashes the following commits: 59748da [Michael Armbrust] Revert "[SPARK-2410][SQL] Merging Hive Thrift/JDBC server"
*	[SPARK-2410][SQL] Merging Hive Thrift/JDBC server	Cheng Lian	2014-07-25	3	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	JIRA issue: - Main: [SPARK-2410](https://issues.apache.org/jira/browse/SPARK-2410) - Related: [SPARK-2678](https://issues.apache.org/jira/browse/SPARK-2678) Cherry picked the Hive Thrift/JDBC server from [branch-1.0-jdbc](https://github.com/apache/spark/tree/branch-1.0-jdbc). (Thanks chenghao-intel for his initial contribution of the Spark SQL CLI.) TODO - [x] Use `spark-submit` to launch the server, the CLI and beeline - [x] Migration guideline draft for Shark users ---- Hit by a bug in `SparkSubmitArguments` while working on this PR: all application options that are recognized by `SparkSubmitArguments` are stolen as `SparkSubmit` options. For example: ```bash $ spark-submit --class org.apache.hive.beeline.BeeLine spark-internal --help ``` This actually shows usage information of `SparkSubmit` rather than `BeeLine`. ~~Fixed this bug here since the `spark-internal` related stuff also touches `SparkSubmitArguments` and I'd like to avoid conflict.~~ UPDATE The bug mentioned above is now tracked by [SPARK-2678](https://issues.apache.org/jira/browse/SPARK-2678). Decided to revert changes to this bug since it involves more subtle considerations and worth a separate PR. Author: Cheng Lian <lian.cs.zju@gmail.com> Closes #1399 from liancheng/thriftserver and squashes the following commits: 090beea [Cheng Lian] Revert changes related to SPARK-2678, decided to move them to another PR 21c6cf4 [Cheng Lian] Updated Spark SQL programming guide docs fe0af31 [Cheng Lian] Reordered spark-submit options in spark-shell[.cmd] 199e3fb [Cheng Lian] Disabled MIMA for hive-thriftserver 1083e9d [Cheng Lian] Fixed failed test suites 7db82a1 [Cheng Lian] Fixed spark-submit application options handling logic 9cc0f06 [Cheng Lian] Starts beeline with spark-submit cfcf461 [Cheng Lian] Updated documents and build scripts for the newly added hive-thriftserver profile 061880f [Cheng Lian] Addressed all comments by @pwendell 7755062 [Cheng Lian] Adapts test suites to spark-submit settings 40bafef [Cheng Lian] Fixed more license header issues e214aab [Cheng Lian] Added missing license headers b8905ba [Cheng Lian] Fixed minor issues in spark-sql and start-thriftserver.sh f975d22 [Cheng Lian] Updated docs for Hive compatibility and Shark migration guide draft 3ad4e75 [Cheng Lian] Starts spark-sql shell with spark-submit a5310d1 [Cheng Lian] Make HiveThriftServer2 play well with spark-submit 61f39f4 [Cheng Lian] Starts Hive Thrift server via spark-submit 2c4c539 [Cheng Lian] Cherry picked the Hive Thrift server
*	[SPARK-2037]: yarn client mode doesn't support spark.yarn.max.executor.failures	GuoQiang Li	2014-07-24	3	-38/+115
\| \| \| \| \| \| \| \| \| \|	Author: GuoQiang Li <witgo@qq.com> Closes #1180 from witgo/SPARK-2037 and squashes the following commits: 3d52411 [GuoQiang Li] review commit 7058f4d [GuoQiang Li] Correctly stop SparkContext 6d0561f [GuoQiang Li] Fix: yarn client mode doesn't support spark.yarn.max.executor.failures
*	SPARK-2150: Provide direct link to finished application UI in yarn resou...	Rahul Singhal	2014-07-24	6	-6/+26
\| \| \| \| \| \| \| \| \| \| \| \| \|	...rce manager UI Use the event logger directory to provide a direct link to finished application UI in yarn resourcemanager UI. Author: Rahul Singhal <rahul.singhal@guavus.com> Closes #1094 from rahulsinghaliitd/SPARK-2150 and squashes the following commits: 95f230c [Rahul Singhal] SPARK-2150: Provide direct link to finished application UI in yarn resource manager UI
*	[YARN] SPARK-2577: File upload to viewfs is broken due to mount point re...	Gera Shegalov	2014-07-22	1	-1/+2
\| \| \| \| \| \| \| \| \| \|	Opting to the option 2 defined in SPARK-2577, i.e., retrieve and pass the correct file system object to addResource. Author: Gera Shegalov <gera@twitter.com> Closes #1483 from gerashegalov/master and squashes the following commits: 90c9087 [Gera Shegalov] [YARN] SPARK-2577: File upload to viewfs is broken due to mount point resolution
*	SPARK-1707. Remove unnecessary 3 second sleep in YarnClusterScheduler	Sandy Ryza	2014-07-21	6	-99/+11
\| \| \| \| \| \| \| \| \| \| \|	Author: Sandy Ryza <sandy@cloudera.com> Closes #634 from sryza/sandy-spark-1707 and squashes the following commits: 2f6e358 [Sandy Ryza] Default min registered executors ratio to .8 for YARN 354c630 [Sandy Ryza] Remove outdated comments c744ef3 [Sandy Ryza] Take out waitForInitialAllocations 2a4329b [Sandy Ryza] SPARK-1707. Remove unnecessary 3 second sleep in YarnClusterScheduler
*	SPARK-1291: Link the spark UI to RM ui in yarn-client mode	witgo	2014-07-15	3	-6/+40
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Author: witgo <witgo@qq.com> Closes #1112 from witgo/SPARK-1291 and squashes the following commits: 6022bcd [witgo] review commit 1fbb925 [witgo] add addAmIpFilter to yarn alpha 210299c [witgo] review commit 1b92a07 [witgo] review commit 6896586 [witgo] Add comments to addWebUIFilter 3e9630b [witgo] review commit 142ee29 [witgo] review commit 1fe7710 [witgo] Link the spark UI to RM ui in yarn-client mode
*	[SPARK-1946] Submit tasks after (configured ratio) executors have been ↵	li-zhihui	2014-07-14	7	-1/+52
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	registered Because submitting tasks and registering executors are asynchronous, in most situation, early stages' tasks run without preferred locality. A simple solution is sleeping few seconds in application, so that executors have enough time to register. The PR add 2 configuration properties to make TaskScheduler submit tasks after a few of executors have been registered. \# Submit tasks only after (registered executors / total executors) arrived the ratio, default value is 0 spark.scheduler.minRegisteredExecutorsRatio = 0.8 \# Whatever minRegisteredExecutorsRatio is arrived, submit tasks after the maxRegisteredWaitingTime(millisecond), default value is 30000 spark.scheduler.maxRegisteredExecutorsWaitingTime = 5000 Author: li-zhihui <zhihui.li@intel.com> Closes #900 from li-zhihui/master and squashes the following commits: b9f8326 [li-zhihui] Add logs & edit docs 1ac08b1 [li-zhihui] Add new configs to user docs 22ead12 [li-zhihui] Move waitBackendReady to postStartHook c6f0522 [li-zhihui] Bug fix: numExecutors wasn't set & use constant DEFAULT_NUMBER_EXECUTORS 4d6d847 [li-zhihui] Move waitBackendReady to TaskSchedulerImpl.start & some code refactor 0ecee9a [li-zhihui] Move waitBackendReady from DAGScheduler.submitStage to TaskSchedulerImpl.submitTasks 4261454 [li-zhihui] Add docs for new configs & code style ce0868a [li-zhihui] Code style, rename configuration property name of minRegisteredRatio & maxRegisteredWaitingTime 6cfb9ec [li-zhihui] Code style, revert default minRegisteredRatio of yarn to 0, driver get --num-executors in yarn/alpha 812c33c [li-zhihui] Fix driver lost --num-executors option in yarn-cluster mode e7b6272 [li-zhihui] support yarn-cluster 37f7dc2 [li-zhihui] support yarn mode(percentage style) 3f8c941 [li-zhihui] submit stage after (configured ratio of) executors have been registered
*	[SPARK-1776] Have Spark's SBT build read dependencies from Maven.	Prashant Sharma	2014-07-10	3	-0/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Patch introduces the new way of working also retaining the existing ways of doing things. For example build instruction for yarn in maven is `mvn -Pyarn -PHadoop2.2 clean package -DskipTests` in sbt it can become `MAVEN_PROFILES="yarn, hadoop-2.2" sbt/sbt clean assembly` Also supports `sbt/sbt -Pyarn -Phadoop-2.2 -Dhadoop.version=2.2.0 clean assembly` Author: Prashant Sharma <prashant.s@imaginea.com> Author: Patrick Wendell <pwendell@gmail.com> Closes #772 from ScrapCodes/sbt-maven and squashes the following commits: a8ac951 [Prashant Sharma] Updated sbt version. 62b09bb [Prashant Sharma] Improvements. fa6221d [Prashant Sharma] Excluding sql from mima 4b8875e [Prashant Sharma] Sbt assembly no longer builds tools by default. 72651ca [Prashant Sharma] Addresses code reivew comments. acab73d [Prashant Sharma] Revert "Small fix to run-examples script." ac4312c [Prashant Sharma] Revert "minor fix" 6af91ac [Prashant Sharma] Ported oldDeps back. + fixes issues with prev commit. 65cf06c [Prashant Sharma] Servelet API jars mess up with the other servlet jars on the class path. 446768e [Prashant Sharma] minor fix 89b9777 [Prashant Sharma] Merge conflicts d0a02f2 [Prashant Sharma] Bumped up pom versions, Since the build now depends on pom it is better updated there. + general cleanups. dccc8ac [Prashant Sharma] updated mima to check against 1.0 a49c61b [Prashant Sharma] Fix for tools jar a2f5ae1 [Prashant Sharma] Fixes a bug in dependencies. cf88758 [Prashant Sharma] cleanup 9439ea3 [Prashant Sharma] Small fix to run-examples script. 96cea1f [Prashant Sharma] SPARK-1776 Have Spark's SBT build read dependencies from Maven. 36efa62 [Patrick Wendell] Set project name in pom files and added eclipse/intellij plugins. 4973dbd [Patrick Wendell] Example build using pom reader.
*	[SPARK-2318] When exiting on a signal, print the signal name first.	Reynold Xin	2014-06-30	2	-4/+6
\| \| \| \| \| \| \| \| \| \| \| \| \|	Author: Reynold Xin <rxin@apache.org> Closes #1260 from rxin/signalhandler1 and squashes the following commits: 8e73552 [Reynold Xin] Uh add Logging back in ApplicationMaster. 0402ba8 [Reynold Xin] Synchronize SignalLogger.register. dc70705 [Reynold Xin] Added SignalLogger to YARN ApplicationMaster. 79a21b4 [Reynold Xin] Added license header. 0da052c [Reynold Xin] Added the SignalLogger itself. e587d2e [Reynold Xin] [SPARK-2318] When exiting on a signal, print the signal name first.
*	Remove use of spark.worker.instances	Kay Ousterhout	2014-06-26	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	spark.worker.instances was added as part of this commit: https://github.com/apache/spark/commit/1617816090e7b20124a512a43860a21232ebf511 My understanding is that SPARK_WORKER_INSTANCES is supported for backwards compatibility, but spark.worker.instances is never used (SparkSubmit.scala sets spark.executor.instances) so should not have been added. @sryza @pwendell @tgravescs LMK if I'm understanding this correctly Author: Kay Ousterhout <kayousterhout@gmail.com> Closes #1214 from kayousterhout/yarn_config and squashes the following commits: 3d7c491 [Kay Ousterhout] Remove use of spark.worker.instances
*	[SPARK-1395] Fix "local:" URI support in Yarn mode (again).	Marcelo Vanzin	2014-06-23	4	-101/+249
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Recent changes ignored the fact that path may be defined with "local:" URIs, which means they need to be explicitly added to the classpath everywhere a remote process is started. This change fixes that by: - Using the correct methods to add paths to the classpath - Creating SparkConf settings for the Spark jar itself and for the user's jar - Propagating those two settings to the remote processes where needed This ensures that both in client and in cluster mode, the driver has the necessary info to build the executor's classpath and have things still work when they contain "local:" references. The change also fixes some confusion in ClientBase about whether to use SparkConf or system properties to propagate config options to the driver and executors, by standardizing on using data held by SparkConf. On the cleanup front, I removed the hacky way that log4j configuration was being propagated to handle the "local:" case. It's much more cleanly (and generically) handled by using spark-submit arguments (--files to upload a config file, or setting spark.executor.extraJavaOptions to pass JVM arguments and use a local file). Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #560 from vanzin/yarn-local-2 and squashes the following commits: 4e7f066 [Marcelo Vanzin] Correctly propagate SPARK_JAVA_OPTS to driver/executor. 6a454ea [Marcelo Vanzin] Use constants for PWD in test. 6dd5943 [Marcelo Vanzin] Fix propagation of config options to driver / executor. b2e377f [Marcelo Vanzin] Review feedback. 93c3f85 [Marcelo Vanzin] Fix ClassCastException in test. e5c682d [Marcelo Vanzin] Fix cluster mode, restore SPARK_LOG4J_CONF. 1dfbb40 [Marcelo Vanzin] Add documentation for spark.yarn.jar. bbdce05 [Marcelo Vanzin] [SPARK-1395] Fix "local:" URI support in Yarn mode (again).
*	[SPARK-2051]In yarn.ClientBase spark.yarn.dist.* do not work	witgo	2014-06-19	3	-6/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Author: witgo <witgo@qq.com> Closes #969 from witgo/yarn_ClientBase and squashes the following commits: 8117765 [witgo] review commit 3bdbc52 [witgo] Merge branch 'master' of https://github.com/apache/spark into yarn_ClientBase 5261b6c [witgo] fix sys.props.get("SPARK_YARN_DIST_FILES") e3c1107 [witgo] update docs b6a9aa1 [witgo] merge master c8b4554 [witgo] review commit 2f48789 [witgo] Merge branch 'master' of https://github.com/apache/spark into yarn_ClientBase 8d7b82f [witgo] Merge branch 'master' of https://github.com/apache/spark into yarn_ClientBase 1048549 [witgo] remove Utils.resolveURIs 871f1db [witgo] add spark.yarn.dist.* documentation 41bce59 [witgo] review commit 35d6fa0 [witgo] move to ClientArguments 55d72fc [witgo] Merge branch 'master' of https://github.com/apache/spark into yarn_ClientBase 9cdff16 [witgo] review commit 8bc2f4b [witgo] review commit 20e667c [witgo] Merge branch 'master' into yarn_ClientBase 0961151 [witgo] merge master ce609fc [witgo] Merge branch 'master' into yarn_ClientBase 8362489 [witgo] yarn.ClientBase spark.yarn.dist.* do not work
*	[SPARK-1930] The Container is running beyond physical memory limits, so as ↵	witgo	2014-06-16	6	-18/+32
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	to be killed Author: witgo <witgo@qq.com> Closes #894 from witgo/SPARK-1930 and squashes the following commits: 564307e [witgo] Update the running-on-yarn.md 3747515 [witgo] Merge branch 'master' of https://github.com/apache/spark into SPARK-1930 172647b [witgo] add memoryOverhead docs a0ff545 [witgo] leaving only two configs a17bda2 [witgo] Merge branch 'master' of https://github.com/apache/spark into SPARK-1930 478ca15 [witgo] Merge branch 'master' into SPARK-1930 d1244a1 [witgo] Merge branch 'master' into SPARK-1930 8b967ae [witgo] Merge branch 'master' into SPARK-1930 655a820 [witgo] review commit 71859a7 [witgo] Merge branch 'master' of https://github.com/apache/spark into SPARK-1930 e3c531d [witgo] review commit e16f190 [witgo] different memoryOverhead ffa7569 [witgo] review commit 5c9581f [witgo] Merge branch 'master' into SPARK-1930 9a6bcf2 [witgo] review commit 8fae45a [witgo] fix NullPointerException e0dcc16 [witgo] Adding configuration items b6a989c [witgo] Fix container memory beyond limit, were killed
*	[SPARK-1516]Throw exception in yarn client instead of run system.exit directly.	John Zhao	2014-06-12	4	-26/+44
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	All the changes is in the package of "org.apache.spark.deploy.yarn": 1) Throw exception in ClinetArguments and ClientBase instead of exit directly. 2) in Client's main method, if exception is caught, it will exit with code 1, otherwise exit with code 0. After the fix, if user integrate the spark yarn client into their applications, when the argument is wrong or the running is finished, the application won't be terminated. Author: John Zhao <jzhao@alpinenow.com> Closes #490 from codeboyyong/jira_1516_systemexit_inyarnclient and squashes the following commits: 138cb48 [John Zhao] [SPARK-1516]Throw exception in yarn clinet instead of run system.exit directly. All the changes is in the package of "org.apache.spark.deploy.yarn": 1) Add a ClientException with an exitCode 2) Throws exception in ClinetArguments and ClientBase instead of exit directly 3) in Client's main method, catch exception and exit with the exitCode.
*	[SPARK-2080] Yarn: report HS URL in client mode, correct user in cluster mode.	Marcelo Vanzin	2014-06-12	2	-3/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Yarn client mode was not setting the app's tracking URL to the History Server's URL when configured by the user. Now client mode behaves the same as cluster mode. In SparkContext.scala, the "user.name" system property had precedence over the SPARK_USER environment variable. This means that SPARK_USER was never used, since "user.name" is always set by the JVM. In Yarn cluster mode, this means the application always reported itself as being run by user "yarn" (or whatever user was running the Yarn NM). One could argue that the correct fix would be to use UGI.getCurrentUser() here, but at least for Yarn that will match what SPARK_USER is set to. Author: Marcelo Vanzin <vanzin@cloudera.com> This patch had conflicts when merged, resolved by Committer: Thomas Graves <tgraves@apache.org> Closes #1002 from vanzin/yarn-client-url and squashes the following commits: 4046e04 [Marcelo Vanzin] Set HS link in yarn-alpha also. 4c692d9 [Marcelo Vanzin] Yarn: report HS URL in client mode, correct user in cluster mode.
*	SPARK-1639. Tidy up some Spark on YARN code	Sandy Ryza	2014-06-11	7	-178/+161
\| \| \| \| \| \| \| \| \| \| \| \| \|	This contains a bunch of small tidyings of the Spark on YARN code. I focused on the yarn stable code. @tgravescs, let me know if you'd like me to make these for the alpha code as well. Author: Sandy Ryza <sandy@cloudera.com> Closes #561 from sryza/sandy-spark-1639 and squashes the following commits: 72b6a02 [Sandy Ryza] Fix comment and set name on driver thread c2190b2 [Sandy Ryza] SPARK-1639. Tidy up some Spark on YARN code
*	[SPARK-1978] In some cases, spark-yarn does not automatically restart the ↵	witgo	2014-06-10	2	-27/+34
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	failed container Author: witgo <witgo@qq.com> Closes #921 from witgo/allocateExecutors and squashes the following commits: bc3aa66 [witgo] review commit 8800eba [witgo] Merge branch 'master' of https://github.com/apache/spark into allocateExecutors 32ac7af [witgo] review commit 056b8c7 [witgo] Merge branch 'master' of https://github.com/apache/spark into allocateExecutors 04c6f7e [witgo] Merge branch 'master' into allocateExecutors aff827c [witgo] review commit 5c376e0 [witgo] Merge branch 'master' of https://github.com/apache/spark into allocateExecutors 1faf4f4 [witgo] Merge branch 'master' into allocateExecutors 3c464bd [witgo] add time limit to allocateExecutors e00b656 [witgo] In some cases, yarn does not automatically restart the container
*	Make sure that empty string is filtered out when we get the secondary jars ↵	DB Tsai	2014-06-09	1	-2/+4
\| \| \| \| \| \| \| \| \| \| \|	from conf Author: DB Tsai <dbtsai@dbtsai.com> Closes #1027 from dbtsai/dbtsai-classloader and squashes the following commits: 9ac6be3 [DB Tsai] Fixed line too long c9c7ad7 [DB Tsai] Make sure that empty string is filtered out when we get the secondary jars from conf.
*	[SPARK-1522] : YARN ClientBase throws a NPE if there is no YARN Application CP	Bernardo Gomez Palacio	2014-06-09	2	-34/+167
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The current implementation of ClientBase.getDefaultYarnApplicationClasspath inspects the MRJobConfig class for the field DEFAULT_YARN_APPLICATION_CLASSPATH when it should be really looking into YarnConfiguration. If the Application Configuration has no yarn.application.classpath defined a NPE exception will be thrown. Additional Changes include: * Test Suite for ClientBase added [ticket: SPARK-1522] : https://issues.apache.org/jira/browse/SPARK-1522 Author : bernardo.gomezpalacio@gmail.com Testing : SPARK_HADOOP_VERSION=2.3.0 SPARK_YARN=true ./sbt/sbt test Author: Bernardo Gomez Palacio <bernardo.gomezpalacio@gmail.com> Closes #433 from berngp/feature/SPARK-1522 and squashes the following commits: 2c2e118 [Bernardo Gomez Palacio] [SPARK-1522]: YARN ClientBase throws a NPE if there is no YARN Application specific CP
*	SPARK-1898: In deploy.yarn.Client, use YarnClient not YarnClientImpl	Colin Patrick McCabe	2014-06-08	2	-10/+17
\| \| \| \| \| \| \| \| \| \|	https://issues.apache.org/jira/browse/SPARK-1898 Author: Colin Patrick McCabe <cmccabe@cloudera.com> Closes #850 from cmccabe/master and squashes the following commits: d66eddc [Colin Patrick McCabe] SPARK-1898: In deploy.yarn.Client, use YarnClient rather than YarnClientImpl
*	[SPARK-2029] Bump pom.xml version number of master branch to 1.1.0-SNAPSHOT.	Takuya UESHIN	2014-06-05	3	-3/+3
\| \| \| \| \| \| \| \|	Author: Takuya UESHIN <ueshin@happy-camper.st> Closes #974 from ueshin/issues/SPARK-2029 and squashes the following commits: e19e8f4 [Takuya UESHIN] Bump version number to 1.1.0-SNAPSHOT.
*	Improve maven plugin configuration	witgo	2014-05-31	1	-30/+0
\| \| \| \| \| \| \| \| \|	Author: witgo <witgo@qq.com> Closes #786 from witgo/maven_plugin and squashes the following commits: 5de86a2 [witgo] Merge branch 'master' of https://github.com/apache/spark into maven_plugin c35ef73 [witgo] Improve maven plugin configuration
*	[SPARK-1870] Make spark-submit --jars work in yarn-cluster mode.	Xiangrui Meng	2014-05-22	2	-53/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Sent secondary jars to distributed cache of all containers and add the cached jars to classpath before executors start. Tested on a YARN cluster (CDH-5.0). `spark-submit --jars` also works in standalone server and `yarn-client`. Thanks for @andrewor14 for testing! I removed "Doesn't work for drivers in standalone mode with "cluster" deploy mode." from `spark-submit`'s help message, though we haven't tested mesos yet. CC: @dbtsai @sryza Author: Xiangrui Meng <meng@databricks.com> Closes #848 from mengxr/yarn-classpath and squashes the following commits: 23e7df4 [Xiangrui Meng] rename spark.jar to __spark__.jar and app.jar to __app__.jar to avoid confliction apped $CWD/ and $CWD/* to the classpath remove unused methods a40f6ed [Xiangrui Meng] standalone -> cluster 65e04ad [Xiangrui Meng] update spark-submit help message and add a comment for yarn-client 11e5354 [Xiangrui Meng] minor changes 3e7e1c4 [Xiangrui Meng] use sparkConf instead of hadoop conf dc3c825 [Xiangrui Meng] add secondary jars to classpath in yarn
*	[Typo] Stoped -> Stopped	Andrew Or	2014-05-21	1	-1/+1
\| \| \| \| \| \| \| \|	Author: Andrew Or <andrewor14@gmail.com> Closes #847 from andrewor14/yarn-typo and squashes the following commits: c1906af [Andrew Or] Stoped -> Stopped
*	[SPARK-1774] Respect SparkSubmit --jars on YARN (client)	Andrew Or	2014-05-10	1	-3/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	SparkSubmit ignores `--jars` for YARN client. This is a bug. This PR also automatically adds the application jar to `spark.jar`. Previously, when running as yarn-client, you must specify the jar additionally through `--files` (because `--jars` didn't work). Now you don't have to explicitly specify it through either. Tested on a YARN cluster. Author: Andrew Or <andrewor14@gmail.com> Closes #710 from andrewor14/yarn-jars and squashes the following commits: 35d1928 [Andrew Or] Merge branch 'master' of github.com:apache/spark into yarn-jars c27bf6c [Andrew Or] For yarn-cluster and python, do not add primaryResource to spark.jar c92c5bf [Andrew Or] Minor cleanups 269f9f3 [Andrew Or] Fix format 013d840 [Andrew Or] Fix tests 1407474 [Andrew Or] Merge branch 'master' of github.com:apache/spark into yarn-jars 3bb75e8 [Andrew Or] Allow SparkSubmit --jars to take effect in yarn-client mode
*	[SPARK-1631] Correctly set the Yarn app name when launching the AM.	Marcelo Vanzin	2014-05-08	1	-3/+3
\| \| \| \| \| \| \| \|	Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #539 from vanzin/yarn-app-name and squashes the following commits: 7d1ca4f [Marcelo Vanzin] [SPARK-1631] Correctly set the Yarn app name when launching the AM.
*	SPARK-1569 Spark on Yarn, authentication broken by pr299	Thomas Graves	2014-05-07	1	-19/+30
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Pass the configs as java options since the executor needs to know before it registers whether to create the connection using authentication or not. We could see about passing only the authentication configs but for now I just had it pass them all. I also updating it to use a list to construct the command to make it the same as ClientBase and avoid any issues with spaces. Author: Thomas Graves <tgraves@apache.org> Closes #649 from tgravescs/SPARK-1569 and squashes the following commits: 0178ab8 [Thomas Graves] add akka settings 22a8735 [Thomas Graves] Change to only path spark.auth* configs 8ccc1d4 [Thomas Graves] SPARK-1569 Spark on Yarn, authentication broken
*	SPARK-1474: Spark on yarn assembly doesn't include AmIpFilter	Thomas Graves	2014-05-06	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \|	We use org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter in spark on yarn but are not included it in the assembly jar. I tested this on yarn cluster by removing the yarn jars from the classpath and spark runs fine now. Author: Thomas Graves <tgraves@apache.org> Closes #406 from tgravescs/SPARK-1474 and squashes the following commits: 1548bf9 [Thomas Graves] SPARK-1474: Spark on yarn assembly doesn't include org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
*	The default version of yarn is equal to the hadoop version	witgo	2014-05-03	1	-3/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is a part of [PR 590](https://github.com/apache/spark/pull/590) Author: witgo <witgo@qq.com> Closes #626 from witgo/yarn_version and squashes the following commits: c390631 [witgo] restore the yarn dependency declarations f8a4ad8 [witgo] revert remove the dependency of avro in yarn-alpha 2df6cf5 [witgo] review commit a1d876a [witgo] review commit 20e7e3e [witgo] review commit c76763b [witgo] The default value of yarn.version is equal to hadoop.version
*	[WIP] SPARK-1676: Cache Hadoop UGIs by default to prevent FileSystem leak	Thomas Graves	2014-05-03	4	-15/+21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Move the doAs in Executor higher up so that we only have 1 ugi and aren't leaking filesystems. Fix spark on yarn to work when the cluster is running as user "yarn" but the clients are launched as the user and want to read/write to hdfs as the user. Note this hasn't been fully tested yet. Need to test in standalone mode. Putting this up for people to look at and possibly test. I don't have access to a mesos cluster. This is alternative to https://github.com/apache/spark/pull/607 Author: Thomas Graves <tgraves@apache.org> Closes #621 from tgravescs/SPARK-1676 and squashes the following commits: 244d55a [Thomas Graves] fix line length 44163d4 [Thomas Graves] Rework 9398853 [Thomas Graves] change to have doAs in executor higher up.
*	SPARK-1588. Restore SPARK_YARN_USER_ENV and SPARK_JAVA_OPTS for YARN.	Sandy Ryza	2014-04-29	2	-5/+15
\| \| \| \| \| \| \| \| \|	Author: Sandy Ryza <sandy@cloudera.com> Closes #586 from sryza/sandy-spark-1588 and squashes the following commits: 35eb38e [Sandy Ryza] Scalify b361684 [Sandy Ryza] SPARK-1588. Restore SPARK_YARN_USER_ENV and SPARK_JAVA_OPTS for YARN.
*	Improved build configuration	witgo	2014-04-28	3	-37/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	1, Fix SPARK-1441: compile spark core error with hadoop 0.23.x 2, Fix SPARK-1491: maven hadoop-provided profile fails to build 3, Fix org.scala-lang: * ,org.apache.avro:* inconsistent versions dependency 4, A modified on the sql/catalyst/pom.xml,sql/hive/pom.xml,sql/core/pom.xml (Four spaces formatted into two spaces) Author: witgo <witgo@qq.com> Closes #480 from witgo/format_pom and squashes the following commits: 03f652f [witgo] review commit b452680 [witgo] Merge branch 'master' of https://github.com/apache/spark into format_pom bee920d [witgo] revert fix SPARK-1629: Spark Core missing commons-lang dependence 7382a07 [witgo] Merge branch 'master' of https://github.com/apache/spark into format_pom 6902c91 [witgo] fix SPARK-1629: Spark Core missing commons-lang dependence 0da4bc3 [witgo] merge master d1718ed [witgo] Merge branch 'master' of https://github.com/apache/spark into format_pom e345919 [witgo] add avro dependency to yarn-alpha 77fad08 [witgo] Merge branch 'master' of https://github.com/apache/spark into format_pom 62d0862 [witgo] Fix org.scala-lang: * inconsistent versions dependency 1a162d7 [witgo] Merge branch 'master' of https://github.com/apache/spark into format_pom 934f24d [witgo] review commit cf46edc [witgo] exclude jruby 06e7328 [witgo] Merge branch 'SparkBuild' into format_pom 99464d2 [witgo] fix maven hadoop-provided profile fails to build 0c6c1fc [witgo] Fix compile spark core error with hadoop 0.23.x 6851bec [witgo] Maintain consistent SparkBuild.scala, pom.xml
*	SPARK-1652: Remove incorrect deprecation warning in spark-submit	Patrick Wendell	2014-04-28	2	-4/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is a straightforward fix. Author: Patrick Wendell <pwendell@gmail.com> This patch had conflicts when merged, resolved by Committer: Patrick Wendell <pwendell@gmail.com> Closes #578 from pwendell/spark-submit-yarn and squashes the following commits: 96027c7 [Patrick Wendell] Test fixes b5be173 [Patrick Wendell] Review feedback 4ac9cac [Patrick Wendell] SPARK-1652: spark-submit for yarn prints warnings even though calling as expected
*	SPARK-1607. HOTFIX: Fix syntax adapting Int result to Short	Sean Owen	2014-04-25	1	-2/+2
\| \| \| \| \| \| \| \| \| \|	Sorry folks. This should make the change for SPARK-1607 compile again. Verified this time with the yarn build enabled. Author: Sean Owen <sowen@cloudera.com> Closes #556 from srowen/SPARK-1607.2 and squashes the following commits: e3fe7a3 [Sean Owen] Fix syntax adapting Int result to Short
*	SPARK-1607. Replace octal literals, removed in Scala 2.11, with hex literals	Sean Owen	2014-04-24	1	-2/+4
\| \| \| \| \| \| \| \| \| \| \| \| \|	Octal literals like "0700" are deprecated in Scala 2.10, generating a warning. They have been removed entirely in 2.11. See https://issues.scala-lang.org/browse/SI-7618 This change simply replaces two uses of octals with hex literals, which seemed the next-best representation since they express a bit mask (file permission in particular) Author: Sean Owen <sowen@cloudera.com> Closes #529 from srowen/SPARK-1607 and squashes the following commits: 1ee0e67 [Sean Owen] Use Integer.parseInt(...,8) for octal literal instead of hex equivalent 0102f3d [Sean Owen] Replace octal literals, removed in Scala 2.11, with hex literals
*	SPARK-1586 Windows build fixes	Mridul Muralidharan	2014-04-24	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Unfortunately, this is not exhaustive - particularly hive tests still fail due to path issues. Author: Mridul Muralidharan <mridulm80@apache.org> This patch had conflicts when merged, resolved by Committer: Matei Zaharia <matei@databricks.com> Closes #505 from mridulm/windows_fixes and squashes the following commits: ef12283 [Mridul Muralidharan] Move to org.apache.commons.lang3 for StringEscapeUtils. Earlier version was buggy appparently cdae406 [Mridul Muralidharan] Remove leaked changes from > 2G fix branch 3267f4b [Mridul Muralidharan] Fix build failures 35b277a [Mridul Muralidharan] Fix Scalastyle failures bc69d14 [Mridul Muralidharan] Change from hardcoded path separator 10c4d78 [Mridul Muralidharan] Use explicit encoding while using getBytes 1337abd [Mridul Muralidharan] fix classpath while running in windows
*	Fix Scala Style	Sandeep	2014-04-24	3	-38/+41
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Any comments are welcome Author: Sandeep <sandeep@techaddict.me> Closes #531 from techaddict/stylefix-1 and squashes the following commits: 7492730 [Sandeep] Pass 4 98b2428 [Sandeep] fix rxin suggestions b5e2e6f [Sandeep] Pass 3 05932d7 [Sandeep] fix if else styling 2 08690e5 [Sandeep] fix if else styling
*	Assorted clean-up for Spark-on-YARN.	Patrick Wendell	2014-04-22	1	-3/+5
\| \| \| \| \| \| \| \| \| \| \| \|	In particular when the HADOOP_CONF_DIR is not not specified. Author: Patrick Wendell <pwendell@gmail.com> Closes #488 from pwendell/hadoop-cleanup and squashes the following commits: fe95f13 [Patrick Wendell] Changes based on Andrew's feeback 18d09c1 [Patrick Wendell] Review comments from Andrew 17929cc [Patrick Wendell] Assorted clean-up for Spark-on-YARN.
*	Fix compilation on Hadoop 2.4.x.	Marcelo Vanzin	2014-04-22	1	-1/+2
\| \| \| \| \| \| \| \|	Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #483 from vanzin/yarn-2.4 and squashes the following commits: 0fc57d8 [Marcelo Vanzin] Fix compilation on Hadoop 2.4.x.
*	Clean up and simplify Spark configuration	Patrick Wendell	2014-04-21	11	-80/+116
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Over time as we've added more deployment modes, this have gotten a bit unwieldy with user-facing configuration options in Spark. Going forward we'll advise all users to run `spark-submit` to launch applications. This is a WIP patch but it makes the following improvements: 1. Improved `spark-env.sh.template` which was missing a lot of things users now set in that file. 2. Removes the shipping of SPARK_CLASSPATH, SPARK_JAVA_OPTS, and SPARK_LIBRARY_PATH to the executors on the cluster. This was an ugly hack. Instead it introduces config variables spark.executor.extraJavaOpts, spark.executor.extraLibraryPath, and spark.executor.extraClassPath. 3. Adds ability to set these same variables for the driver using `spark-submit`. 4. Allows you to load system properties from a `spark-defaults.conf` file when running `spark-submit`. This will allow setting both SparkConf options and other system properties utilized by `spark-submit`. 5. Made `SPARK_LOCAL_IP` an environment variable rather than a SparkConf property. This is more consistent with it being set on each node. Author: Patrick Wendell <pwendell@gmail.com> Closes #299 from pwendell/config-cleanup and squashes the following commits: 127f301 [Patrick Wendell] Improvements to testing a006464 [Patrick Wendell] Moving properties file template. b4b496c [Patrick Wendell] spark-defaults.properties -> spark-defaults.conf 0086939 [Patrick Wendell] Minor style fixes af09e3e [Patrick Wendell] Mention config file in docs and clean-up docs b16e6a2 [Patrick Wendell] Cleanup of spark-submit script and Scala quick start guide af0adf7 [Patrick Wendell] Automatically add user jar a56b125 [Patrick Wendell] Responses to Tom's review d50c388 [Patrick Wendell] Merge remote-tracking branch 'apache/master' into config-cleanup a762901 [Patrick Wendell] Fixing test failures ffa00fe [Patrick Wendell] Review feedback fda0301 [Patrick Wendell] Note 308f1f6 [Patrick Wendell] Properly escape quotes and other clean-up for YARN e83cd8f [Patrick Wendell] Changes to allow re-use of test applications be42f35 [Patrick Wendell] Handle case where SPARK_HOME is not set c2a2909 [Patrick Wendell] Test compile fixes 4ee6f9d [Patrick Wendell] Making YARN doc changes consistent afc9ed8 [Patrick Wendell] Cleaning up line limits and two compile errors. b08893b [Patrick Wendell] Additional improvements. ace4ead [Patrick Wendell] Responses to review feedback. b72d183 [Patrick Wendell] Review feedback for spark env file 46555c1 [Patrick Wendell] Review feedback and import clean-ups 437aed1 [Patrick Wendell] Small fix 761ebcd [Patrick Wendell] Library path and classpath for drivers 7cc70e4 [Patrick Wendell] Clean up terminology inside of spark-env script 5b0ba8e [Patrick Wendell] Don't ship executor envs 84cc5e5 [Patrick Wendell] Small clean-up 1f75238 [Patrick Wendell] SPARK_JAVA_OPTS --> SPARK_MASTER_OPTS for master settings 4982331 [Patrick Wendell] Remove SPARK_LIBRARY_PATH 6eaf7d0 [Patrick Wendell] executorJavaOpts 0faa3b6 [Patrick Wendell] Stash of adding config options in submit script and YARN ac2d65e [Patrick Wendell] Change spark.local.dir -> SPARK_LOCAL_DIRS
*	SPARK-1408 Modify Spark on Yarn to point to the history server when app ...	Thomas Graves	2014-04-17	2	-4/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	...finishes Note this is dependent on https://github.com/apache/spark/pull/204 to have a working history server, but there are no code dependencies. This also fixes SPARK-1288 yarn stable finishApplicationMaster incomplete. Since I was in there I made the diagnostic message be passed properly. Author: Thomas Graves <tgraves@apache.org> Closes #362 from tgravescs/SPARK-1408 and squashes the following commits: ec89705 [Thomas Graves] Fix typo. 446122d [Thomas Graves] Make config yarn specific f5d5373 [Thomas Graves] SPARK-1408 Modify Spark on Yarn to point to the history server when app finishes
*	[SPARK-1395] Allow "local:" URIs to work on Yarn.	Marcelo Vanzin	2014-04-17	5	-77/+140
\| \| \| \| \| \| \| \| \| \| \| \| \|	This only works for the three paths defined in the environment (SPARK_JAR, SPARK_YARN_APP_JAR and SPARK_LOG4J_CONF). Tested by running SparkPi with local: and file: URIs against Yarn cluster (no "upload" shows up in logs in the local case). Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #303 from vanzin/yarn-local and squashes the following commits: 82219c1 [Marcelo Vanzin] [SPARK-1395] Allow "local:" URIs to work on Yarn.
*	SPARK-1465: Spark compilation is broken with the latest hadoop-2.4.0 release	xuan	2014-04-16	3	-15/+85
\| \| \| \| \| \| \| \| \| \| \| \| \|	YARN-1824 changes the APIs (addToEnvironment, setEnvFromInputString) in Apps, which causes the spark build to break if built against a version 2.4.0. To fix this, create the spark own function to do that functionality which will not break compiling against 2.3 and other 2.x versions. Author: xuan <xuan@MacBook-Pro.local> Author: xuan <xuan@macbook-pro.home> Closes #396 from xgong/master and squashes the following commits: 42b5984 [xuan] Remove two extra imports bc0926f [xuan] Remove usage of org.apache.hadoop.util.Shell be89fa7 [xuan] fix Spark compilation is broken with the latest hadoop-2.4.0 release
*	SPARK-1497. Fix scalastyle warnings in YARN, Hive code	Sean Owen	2014-04-16	5	-20/+30
\| \| \| \| \| \| \| \| \| \| \| \|	(I wasn't sure how to automatically set `SPARK_YARN=true` and `SPARK_HIVE=true` when running scalastyle, but these are the errors that turn up.) Author: Sean Owen <sowen@cloudera.com> Closes #413 from srowen/SPARK-1497 and squashes the following commits: f0c9318 [Sean Owen] Fix more scalastyle warnings in yarn 80bf4c3 [Sean Owen] Add YARN alpha / YARN profile to scalastyle check 026319c [Sean Owen] Fix scalastyle warnings in YARN, Hive code
*	SPARK-1417: Spark on Yarn - spark UI link from resourcemanager is broken	Thomas Graves	2014-04-11	2	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Author: Thomas Graves <tgraves@apache.org> Closes #344 from tgravescs/SPARK-1417 and squashes the following commits: c450b5f [Thomas Graves] fix test e1c1d7e [Thomas Graves] add missing $ to appUIAddress e982ddb [Thomas Graves] use appUIHostPort in appUIAddress 0803ec2 [Thomas Graves] Review comment updates - remove extra newline, simplify assert in test 658a8ec [Thomas Graves] Add a appUIHostPort routine 0614208 [Thomas Graves] Fix test 2a6b1b7 [Thomas Graves] SPARK-1417: Spark on Yarn - spark UI link from resourcemanager is broken