spark - Mirror of Apache Spark

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	[SPARK-14756][CORE] Use parseLong instead of valueOf	Azeem Jiva	2016-04-26	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? Use Long.parseLong which returns a primative. Use a series of appends() reduces the creation of an extra StringBuilder type ## How was this patch tested? Unit tests Author: Azeem Jiva <azeemj@gmail.com> Closes #12520 from javawithjiva/minor.
*	[SPARK-14731][shuffle]Revert SPARK-12130 to make 2.0 shuffle service ↵	Lianhui Wang	2016-04-25	1	-6/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	compatible with 1.x ## What changes were proposed in this pull request? SPARK-12130 make 2.0 shuffle service incompatible with 1.x. So from discussion: [http://apache-spark-developers-list.1001551.n3.nabble.com/YARN-Shuffle-service-and-its-compatibility-td17222.html](url) we should maintain compatibility between Spark 1.x and Spark 2.x's shuffle service. I put string comparison into executor's register at first avoid string comparison in getBlockData every time. ## How was this patch tested? N/A Author: Lianhui Wang <lianhuiwang09@gmail.com> Closes #12568 from lianhuiwang/SPARK-14731.
*	[SPARK-10001] Consolidate Signaling and SignalLogger.	Reynold Xin	2016-04-22	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? This is a follow-up to #12557, with the following changes: 1. Fixes some of the style issues. 2. Merges Signaling and SignalLogger into a new class called SignalUtils. It was pretty confusing to have Signaling and Signal in one file, and it was also confusing to have two classes named Signaling and one called the other. 3. Made logging registration idempotent. ## How was this patch tested? N/A. Author: Reynold Xin <rxin@databricks.com> Closes #12605 from rxin/SPARK-10001.
*	[SPARK-6429] Implement hashCode and equals together	Joan	2016-04-22	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? Implement some `hashCode` and `equals` together in order to enable the scalastyle. This is a first batch, I will continue to implement them but I wanted to know your thoughts. Author: Joan <joan@goyeau.com> Closes #12157 from joan38/SPARK-6429-HashCode-Equals.
*	[SPARK-14602][YARN] Use SparkConf to propagate the list of cached files.	Marcelo Vanzin	2016-04-20	10	-175/+235
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This change avoids using the environment to pass this information, since with many jars it's easy to hit limits on certain OSes. Instead, it encodes the information into the Spark configuration propagated to the AM. The first problem that needed to be solved is a chicken & egg issue: the config file is distributed using the cache, and it needs to contain information about the files that are being distributed. To solve that, the code now treats the config archive especially, and uses slightly different code to distribute it, so that only its cache path needs to be saved to the config file. The second problem is that the extra information would show up in the Web UI, which made the environment tab even more noisy than it already is when lots of jars are listed. This is solved by two changes: the list of cached files is now read only once in the AM, and propagated down to the ExecutorRunnable code (which actually sends the list to the NMs when starting containers). The second change is to unset those config entries after the list is read, so that the SparkContext never sees them. Tested with both client and cluster mode by running "run-example SparkPi". This uploads a whole lot of files when run from a build dir (instead of a distribution, where the list is cleaned up), and I verified that the configs do not show up in the UI. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #12487 from vanzin/SPARK-14602.
*	[SPARK-14705][YARN] support Multiple FileSystem for YARN STAGING DIR	Lianhui Wang	2016-04-19	2	-36/+25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? In SPARK-13063, It makes the SPARK YARN STAGING DIR as configurable. But it only support default FileSystem. If there are many clusters, It can be different FileSystem for different cluster in our spark. ## How was this patch tested? I have tested it successfully with following commands: MASTER=yarn-client ./bin/spark-shell --conf spark.yarn.stagingDir=hdfs:namenode2/temp $SPARK_HOME/bin/spark-submit --conf spark.yarn.stagingDir=hdfs:namenode2/temp cc tgravescs vanzin andrewor14 Author: Lianhui Wang <lianhuiwang09@gmail.com> Closes #12473 from lianhuiwang/SPARK-14705.
*	[SPARK-14423][YARN] Avoid same name files added to distributed cache again	jerryshao	2016-04-18	2	-4/+42
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? In the current implementation of assembly-free spark deployment, jars under `assembly/target/scala-xxx/jars` will be uploaded to distributed cache by default, there's a chance these jars' name will be conflicted with name of jars specified in `--jars`, this will introduce exception when starting application: ``` client token: N/A diagnostics: Application application_1459907402325_0004 failed 2 times due to AM Container for appattempt_1459907402325_0004_000002 exited with exitCode: -1000 For more detailed output, check application tracking page:http://hw12100.local:8088/proxy/application_1459907402325_0004/Then, click on links to logs of each attempt. Diagnostics: Resource hdfs://localhost:8020/user/sshao/.sparkStaging/application_1459907402325_0004/avro-mapred-1.7.7-hadoop2.jar changed on src filesystem (expected 1459909780508, was 1459909782590 java.io.IOException: Resource hdfs://localhost:8020/user/sshao/.sparkStaging/application_1459907402325_0004/avro-mapred-1.7.7-hadoop2.jar changed on src filesystem (expected 1459909780508, was 1459909782590 at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:253) at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:61) at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:359) at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:357) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:356) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:60) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) ``` So here by checking the name of file to avoid same name files uploaded again. ## How was this patch tested? Unit test and manual integrated test is done locally. Author: jerryshao <sshao@hortonworks.com> Closes #12203 from jerryshao/SPARK-14423.
*	[SPARK-14508][BUILD] Add a new ScalaStyle Rule `OmitBracesInCase`	Dongjoon Hyun	2016-04-12	4	-24/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? According to the [Spark Code Style Guide](https://cwiki.apache.org/confluence/display/SPARK/Spark+Code+Style+Guide) and [Scala Style Guide](http://docs.scala-lang.org/style/control-structures.html#curlybraces), we had better enforce the following rule. ``` case: Always omit braces in case clauses. ``` This PR makes a new ScalaStyle rule, 'OmitBracesInCase', and enforces it to the code. ## How was this patch tested? Pass the Jenkins tests (including Scala style checking) Author: Dongjoon Hyun <dongjoon@apache.org> Closes #12280 from dongjoon-hyun/SPARK-14508.
*	[SPARK-12384] Enables spark-clients to set the min(-Xms) and max(*.memory ↵	Dhruve Ashar	2016-04-07	2	-5/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	config) j… ## What changes were proposed in this pull request? Currently Spark clients are started with the same memory setting for Xms and Xms leading to reserving unnecessary higher amounts of memory. This behavior is changed and the clients can now specify an initial heap size using the extraJavaOptions in the config for driver,executor and am individually. Note, that only -Xms can be provided through this config option, if the client wants to set the max size(-Xmx), this has to be done via the *.memory configuration knobs which are currently supported. ## How was this patch tested? Monitored executor and yarn logs in debug mode to verify the commands through which they are being launched in client and cluster mode. The driver memory was verified locally using jps -v. Setting up -Xmx parameter in the javaExtraOptions raises exception with the info provided. Author: Dhruve Ashar <dhruveashar@gmail.com> Closes #12115 from dhruve/impr/SPARK-12384.
*	[SPARK-14134][CORE] Change the package name used for shading classes.	Marcelo Vanzin	2016-04-06	2	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The current package name uses a dash, which is a little weird but seemed to work. That is, until a new test tried to mock a class that references one of those shaded types, and then things started failing. Most changes are just noise to fix the logging configs. For reference, SPARK-8815 also raised this issue, although at the time it did not cause any issues in Spark, so it was not addressed. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #11941 from vanzin/SPARK-14134.
*	[SPARK-14416][CORE] Add thread-safe comments for ↵	Shixiong Zhu	2016-04-05	1	-3/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	CoarseGrainedSchedulerBackend's fields ## What changes were proposed in this pull request? While I was reviewing #12078, I found most of CoarseGrainedSchedulerBackend's mutable fields doesn't have any comments about the thread-safe assumptions and it's hard for people to figure out which part of codes should be protected by the lock. This PR just added comments/annotations for them and also added strict access modifiers for some fields. ## How was this patch tested? Existing unit tests. Author: Shixiong Zhu <shixiong@databricks.com> Closes #12188 from zsxwing/comments.
*	[HOTFIX] Fix `optional` to `createOptional`.	Dongjoon Hyun	2016-04-05	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? This PR fixes the following line. ``` private[spark] val STAGING_DIR = ConfigBuilder("spark.yarn.stagingDir") .doc("Staging directory used while submitting applications.") .stringConf - .optional + .createOptional ``` ## How was this patch tested? Pass the build. Author: Dongjoon Hyun <dongjoon@apache.org> Closes #12187 from dongjoon-hyun/hotfix.
*	[SPARK-529][SQL] Modify SQLConf to use new config API from core.	Marcelo Vanzin	2016-04-05	1	-46/+46
\| \| \| \| \| \| \| \| \| \| \| \|	Because SQL keeps track of all known configs, some customization was needed in SQLConf to allow that, since the core API does not have that feature. Tested via existing (and slightly updated) unit tests. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #11570 from vanzin/SPARK-529-sql.
*	[SPARK-13063][YARN] Make the SPARK YARN STAGING DIR as configurable	Devaraj K	2016-04-05	2	-3/+20
\| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? Made the SPARK YARN STAGING DIR as configurable with the configuration as 'spark.yarn.staging-dir'. ## How was this patch tested? I have verified it manually by running applications on yarn, If the 'spark.yarn.staging-dir' is configured then the value used as staging directory otherwise uses the default value i.e. file system’s home directory for the user. Author: Devaraj K <devaraj@apache.org> Closes #12082 from devaraj-kavali/SPARK-13063.
*	[SPARK-13579][BUILD] Stop building the main Spark assembly.	Marcelo Vanzin	2016-04-04	2	-4/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This change modifies the "assembly/" module to just copy needed dependencies to its build directory, and modifies the packaging script to pick those up (and remove duplicate jars packages in the examples module). I also made some minor adjustments to dependencies to remove some test jars from the final packaging, and remove jars that conflict with each other when packaged separately (e.g. servlet api). Also note that this change restores guava in applications' classpaths, even though it's still shaded inside Spark. This is now needed for the Hadoop libraries that are packaged with Spark, which now are not processed by the shade plugin. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #11796 from vanzin/SPARK-13579.
*	[MINOR][DOCS] Use multi-line JavaDoc comments in Scala code.	Dongjoon Hyun	2016-04-02	1	-8/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? This PR aims to fix all Scala-Style multiline comments into Java-Style multiline comments in Scala codes. (All comment-only changes over 77 files: +786 lines, −747 lines) ## How was this patch tested? Manual. Author: Dongjoon Hyun <dongjoon@apache.org> Closes #12130 from dongjoon-hyun/use_multiine_javadoc_comments.
*	[SPARK-12864][YARN] initialize executorIdCounter after ApplicationMaster ↵	zhonghaihua	2016-04-01	2	-2/+21
\| \| \| \| \| \| \| \| \| \| \| \| \|	killed for max n… Currently, when max number of executor failures reached the `maxNumExecutorFailures`, `ApplicationMaster` will be killed and re-register another one.This time, `YarnAllocator` will be created a new instance. But, the value of property `executorIdCounter` in `YarnAllocator` will reset to `0`. Then the Id of new executor will starting from `1`. This will confuse with the executor has already created before, which will cause FetchFailedException. This situation is just in yarn client mode, so this is an issue in yarn client mode. For more details, [link to jira issues SPARK-12864](https://issues.apache.org/jira/browse/SPARK-12864) This PR introduce a mechanism to initialize `executorIdCounter` after `ApplicationMaster` killed. Author: zhonghaihua <793507405@qq.com> Closes #10794 from zhonghaihua/initExecutorIdCounterAfterAMKilled.
*	[SPARK-12343][YARN] Simplify Yarn client and client argument	jerryshao	2016-04-01	12	-306/+138
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? Currently in Spark on YARN, configurations can be passed through SparkConf, env and command arguments, some parts are duplicated, like client argument and SparkConf. So here propose to simplify the command arguments. ## How was this patch tested? This patch is tested manually with unit test. CC vanzin tgravescs , please help to suggest this proposal. The original purpose of this JIRA is to remove `ClientArguments`, through refactoring some arguments like `--class`, `--arg` are not so easy to replace, so here I remove the most part of command line arguments, only keep the minimal set. Author: jerryshao <sshao@hortonworks.com> Closes #11603 from jerryshao/SPARK-12343.
*	[SPARK-14062][YARN] Fix log4j and upload metrics.properties automatically ↵	jerryshao	2016-03-31	3	-50/+31
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	with distributed cache ## What changes were proposed in this pull request? 1. Currently log4j which uses distributed cache only adds to AM's classpath, not executor's, this is introduced in #9118, which breaks the original meaning of that PR, so here add log4j file to the classpath of both AM and executors. 2. Automatically upload metrics.properties to distributed cache, so that it could be used by remote driver and executors implicitly. ## How was this patch tested? Unit test and integration test is done. Author: jerryshao <sshao@hortonworks.com> Closes #11885 from jerryshao/SPARK-14062.
*	[SPARK-13955][YARN] Also look for Spark jars in the build directory.	Marcelo Vanzin	2016-03-30	3	-7/+16
\| \| \| \| \| \| \| \| \| \| \| \|	Move the logic to find Spark jars to CommandBuilderUtils and make it available for YARN code, so that it's possible to easily launch Spark on YARN from a build directory. Tested by running SparkPi from the build directory on YARN. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #11970 from vanzin/SPARK-13955.
*	[MINOR][DOCS] Add proper periods and spaces for CLI help messages and ↵	Dongjoon Hyun	2016-03-21	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	`config` doc. ## What changes were proposed in this pull request? This PR adds some proper periods and spaces to Spark CLI help messages and SQL/YARN conf docs for consistency. ## How was this patch tested? Manual. Author: Dongjoon Hyun <dongjoon@apache.org> Closes #11848 from dongjoon-hyun/add_proper_period_and_space.
*	[SPARK-13885][YARN] Fix attempt id regression for Spark running on Yarn	jerryshao	2016-03-18	2	-6/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? This regression is introduced in #9182, previously attempt id is simply as counter "1" or "2". With the change of #9182, it is changed to full name as "appattemtp-xxx-00001", this will affect all the parts which uses this attempt id, like event log file name, history server app url link. So here change it back to the counter to keep consistent with previous code. Also revert back this patch #11518, this patch fix the url link of history log according to the new way of attempt id, since here we change back to the previous way, so this patch is not necessary, here to revert it. Also clean "spark.yarn.app.id" and "spark.yarn.app.attemptId", since it is useless now. ## How was this patch tested? Test it with unit test and manually test different scenario: 1. application running in yarn-client mode. 2. application running in yarn-cluster mode. 3. application running in yarn-cluster mode with multiple attempts. Checked both the event log file name and url link. CC vanzin tgravescs , please help to review, thanks a lot. Author: jerryshao <sshao@hortonworks.com> Closes #11721 from jerryshao/SPARK-13885.
*	[SPARK-13928] Move org.apache.spark.Logging into ↵	Wenchen Fan	2016-03-17	16	-12/+27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	org.apache.spark.internal.Logging ## What changes were proposed in this pull request? Logging was made private in Spark 2.0. If we move it, then users would be able to create a Logging trait themselves to avoid changing their own code. ## How was this patch tested? existing tests. Author: Wenchen Fan <wenchen@databricks.com> Closes #11764 from cloud-fan/logger.
*	[SPARK-13360][PYSPARK][YARN] PYSPARK_DRIVER_PYTHON and PYSPARK_PYTHON…	Jeff Zhang	2016-03-16	1	-0/+3
\| \| \| \| \| \| \| \|	… is not picked up in yarn-cluster mode Author: Jeff Zhang <zjffdu@apache.org> Closes #11238 from zjffdu/SPARK-13360.
*	[SPARK-13889][YARN] Fix integer overflow when calculating the max number of ↵	Carson Wang	2016-03-16	1	-1/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	executor failure ## What changes were proposed in this pull request? The max number of executor failure before failing the application is default to twice the maximum number of executors if dynamic allocation is enabled. The default value for "spark.dynamicAllocation.maxExecutors" is Int.MaxValue. So this causes an integer overflow and a wrong result. The calculated value of the default max number of executor failure is 3. This PR adds a check to avoid the overflow. ## How was this patch tested? It tests if the value is greater that Int.MaxValue / 2 to avoid the overflow when it multiplies 2. Author: Carson Wang <carson.wang@intel.com> Closes #11713 from carsonwang/IntOverflow.
*	[SPARK-13642][YARN] Changed the default application exit state to failed for ↵	jerryshao	2016-03-15	1	-7/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	yarn cluster mode ## What changes were proposed in this pull request? Changing the default exit state to `failed` for any application running on yarn cluster mode. ## How was this patch tested? Unit test is done locally. CC tgravescs and vanzin . Author: jerryshao <sshao@hortonworks.com> Closes #11693 from jerryshao/SPARK-13642.
*	[SPARK-13848][SPARK-5185] Update to Py4J 0.9.2 in order to fix classloading ↵	Josh Rosen	2016-03-14	2	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	issue This patch upgrades Py4J from 0.9.1 to 0.9.2 in order to include a patch which modifies Py4J to use the current thread's ContextClassLoader when performing reflection / class loading. This is necessary in order to fix [SPARK-5185](https://issues.apache.org/jira/browse/SPARK-5185), a longstanding issue affecting the use of `--jars` and `--packages` in PySpark. In order to demonstrate that the fix works, I removed the workarounds which were added as part of [SPARK-6027](https://issues.apache.org/jira/browse/SPARK-6027) / #4779 and other patches. Py4J diff: https://github.com/bartdag/py4j/compare/0.9.1...0.9.2 /cc zsxwing tdas davies brkyvz Author: Josh Rosen <joshrosen@databricks.com> Closes #11687 from JoshRosen/py4j-0.9.2.
*	[SPARK-13779][YARN] Avoid cancelling non-local container requests.	Ryan Blue	2016-03-14	1	-12/+46
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	To maximize locality, the YarnAllocator would cancel any requests with a stale locality preference or no locality preference. This assumed that the majority of tasks had locality preferences, but may not be the case when scanning S3. This caused container requests for S3 tasks to be constantly cancelled and resubmitted. This changes the allocator's logic to cancel only stale requests and just enough requests without locality preferences to submit requests with locality preferences. This avoids cancelling requests without locality preferences that would be resubmitted without locality preferences. We've deployed this patch on our clusters and verified that jobs that couldn't get executors because they kept canceling and resubmitting requests are fixed. Large jobs are running fine. Author: Ryan Blue <blue@apache.org> Closes #11612 from rdblue/SPARK-13779-fix-yarn-allocator-requests.
*	[MINOR][DOCS] Fix more typos in comments/strings.	Dongjoon Hyun	2016-03-14	4	-8/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? This PR fixes 135 typos over 107 files: * 121 typos in comments * 11 typos in testcase name * 3 typos in log messages ## How was this patch tested? Manual. Author: Dongjoon Hyun <dongjoon@apache.org> Closes #11689 from dongjoon-hyun/fix_more_typos.
*	[SPARK-13823][CORE][STREAMING][SQL] Always specify Charset in String <-> ↵	Sean Owen	2016-03-13	5	-15/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	byte[] conversions (and remaining Coverity items) ## What changes were proposed in this pull request? - Fixes calls to `new String(byte[])` or `String.getBytes()` that rely on platform default encoding, to use UTF-8 - Same for `InputStreamReader` and `OutputStreamWriter` constructors - Standardizes on UTF-8 everywhere - Standardizes specifying the encoding with `StandardCharsets.UTF-8`, not the Guava constant or "UTF-8" (which means handling `UnuspportedEncodingException`) - (also addresses the other remaining Coverity scan issues, which are pretty trivial; these are separated into commit https://github.com/srowen/spark/commit/1deecd8d9ca986d8adb1a42d315890ce5349d29c ) ## How was this patch tested? Jenkins tests Author: Sean Owen <sowen@cloudera.com> Closes #11657 from srowen/SPARK-13823.
*	[SPARK-13577][YARN] Allow Spark jar to be multiple jars, archive.	Marcelo Vanzin	2016-03-11	5	-71/+206
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In preparation for the demise of assemblies, this change allows the YARN backend to use multiple jars and globs as the "Spark jar". The config option has been renamed to "spark.yarn.jars" to reflect that. A second option "spark.yarn.archive" was also added; if set, this takes precedence and uploads an archive expected to contain the jar files with the Spark code and its dependencies. Existing deployments should keep working, mostly. This change drops support for the "SPARK_JAR" environment variable, and also does not fall back to using "jarOfClass" if no configuration is set, falling back to finding files under SPARK_HOME instead. This should be fine since "jarOfClass" probably wouldn't work unless you were using spark-submit anyway. Tested with the unit tests, and trying the different config options on a YARN cluster. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #11500 from vanzin/SPARK-13577.
*	[HOTFIX][YARN] Fix yarn cluster mode fire and forget regression	jerryshao	2016-03-08	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? Fire and forget is disabled by default, with this patch #10205 it is enabled by default, so this is a regression should be fixed. ## How was this patch tested? Manually verified this change. Author: jerryshao <sshao@hortonworks.com> Closes #11577 from jerryshao/hot-fix-yarn-cluster.
*	[SPARK-529][CORE][YARN] Add type-safe config keys to SparkConf.	Marcelo Vanzin	2016-03-07	14	-241/+443
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is, in a way, the basics to enable SPARK-529 (which was closed as won't fix but I think is still valuable). In fact, Spark SQL created something for that, and this change basically factors out that code and inserts it into SparkConf, with some extra bells and whistles. To showcase the usage of this pattern, I modified the YARN backend to use the new config keys (defined in the new `config` package object under `o.a.s.deploy.yarn`). Most of the changes are mechanic, although logic had to be slightly modified in a handful of places. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #10205 from vanzin/conf-opts.
*	[SPARK-13583][CORE][STREAMING] Remove unused imports and add checkstyle rule	Dongjoon Hyun	2016-03-03	6	-12/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? After SPARK-6990, `dev/lint-java` keeps Java code healthy and helps PR review by saving much time. This issue aims remove unused imports from Java/Scala code and add `UnusedImports` checkstyle rule to help developers. ## How was this patch tested? ``` ./dev/lint-java ./build/sbt compile ``` Author: Dongjoon Hyun <dongjoon@apache.org> Closes #11438 from dongjoon-hyun/SPARK-13583.
*	[SPARK-13423][WIP][CORE][SQL][STREAMING] Static analysis fixes for 2.x	Sean Owen	2016-03-03	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? Make some cross-cutting code improvements according to static analysis. These are individually up for discussion since they exist in separate commits that can be reverted. The changes are broadly: - Inner class should be static - Mismatched hashCode/equals - Overflow in compareTo - Unchecked warnings - Misuse of assert, vs junit.assert - get(a) + getOrElse(b) -> getOrElse(a,b) - Array/String .size -> .length (occasionally, -> .isEmpty / .nonEmpty) to avoid implicit conversions - Dead code - tailrec - exists(_ == ) -> contains find + nonEmpty -> exists filter + size -> count - reduce(_+_) -> sum map + flatten -> map The most controversial may be .size -> .length simply because of its size. It is intended to avoid implicits that might be expensive in some places. ## How was the this patch tested? Existing Jenkins unit tests. Author: Sean Owen <sowen@cloudera.com> Closes #11292 from srowen/SPARK-13423.
*	[SPARK-13627][SQL][YARN] Fix simple deprecation warnings.	Dongjoon Hyun	2016-03-02	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? This PR aims to fix the following deprecation warnings. * MethodSymbolApi.paramss--> paramLists * AnnotationApi.tpe -> tree.tpe * BufferLike.readOnly -> toList. * StandardNames.nme -> termNames * scala.tools.nsc.interpreter.AbstractFileClassLoader -> scala.reflect.internal.util.AbstractFileClassLoader * TypeApi.declarations-> decls ## How was this patch tested? Check the compile build log and pass the tests. ``` ./build/sbt ``` Author: Dongjoon Hyun <dongjoon@apache.org> Closes #11479 from dongjoon-hyun/SPARK-13627.
*	[SPARK-13478][YARN] Use real user when fetching delegation tokens.	Marcelo Vanzin	2016-02-29	2	-12/+36
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The Hive client library is not smart enough to notice that the current user is a proxy user; so when using a proxy user, it fails to fetch delegation tokens from the metastore because of a missing kerberos TGT for the current user. To fix it, just run the code that fetches the delegation token as the real logged in user. Tested on a kerberos cluster both submitting normally and with a proxy user; Hive and HBase tokens are retrieved correctly in both cases. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #11358 from vanzin/SPARK-13478.
*	[SPARK-12523][YARN] Support long-running of the Spark On HBase and hive meta ↵	huangzhaowei	2016-02-26	3	-40/+42
\| \| \| \| \| \| \| \| \| \|	store. Obtain the hive metastore and hbase token as well as hdfs token in DelegationToeknRenewer to supoort long-running application of spark on hbase or thriftserver. Author: huangzhaowei <carlmartinmax@gmail.com> Closes #10645 from SaintBacchus/SPARK-12523.
*	[SPARK-12009][YARN] Avoid to re-allocating yarn container while driver want ↵	hushan	2016-02-25	1	-0/+3
\| \| \| \| \| \| \| \|	to stop all Executors Author: hushan <hushan@xiaomi.com> Closes #9992 from suyanNone/tricky.
*	[SPARK-12316] Wait a minutes to avoid cycle calling.	huangzhaowei	2016-02-25	1	-1/+4
\| \| \| \| \| \| \| \| \| \| \|	When application end, AM will clean the staging dir. But if the driver trigger to update the delegation token, it will can't find the right token file and then it will endless cycle call the method 'updateCredentialsIfRequired'. Then it lead driver StackOverflowError. https://issues.apache.org/jira/browse/SPARK-12316 Author: huangzhaowei <carlmartinmax@gmail.com> Closes #10475 from SaintBacchus/SPARK-12316.
*	[SPARK-13441][YARN] Fix NPE in yarn Client.createConfArchive method	Terence Yim	2016-02-25	1	-3/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? Instead of using result of File.listFiles() directly, which may throw NPE, check for null first. If it is null, log a warning instead ## How was the this patch tested? Ran the ./dev/run-tests locally Tested manually on a cluster Author: Terence Yim <terence@cask.co> Closes #11337 from chtyim/fixes/SPARK-13441-null-check.
*	[SPARK-13220][CORE] deprecate yarn-client and yarn-cluster mode	jerryshao	2016-02-23	3	-5/+8
\| \| \| \| \| \|	Author: jerryshao <sshao@hortonworks.com> Closes #11229 from jerryshao/SPARK-13220.
*	[MINOR][DOCS] Fix all typos in markdown files of `doc` and similar patterns ↵	Dongjoon Hyun	2016-02-22	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	in other comments ## What changes were proposed in this pull request? This PR tries to fix all typos in all markdown files under `docs` module, and fixes similar typos in other comments, too. ## How was the this patch tested? manual tests. Author: Dongjoon Hyun <dongjoon@apache.org> Closes #11300 from dongjoon-hyun/minor_fix_typos.
*	[SPARK-6363][BUILD] Make Scala 2.11 the default Scala version	Josh Rosen	2016-01-30	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \|	This patch changes Spark's build to make Scala 2.11 the default Scala version. To be clear, this does not mean that Spark will stop supporting Scala 2.10: users will still be able to compile Spark for Scala 2.10 by following the instructions on the "Building Spark" page; however, it does mean that Scala 2.11 will be the default Scala version used by our CI builds (including pull request builds). The Scala 2.11 compiler is faster than 2.10, so I think we'll be able to look forward to a slight speedup in our CI builds (it looks like it's about 2X faster for the Maven compile-only builds, for instance). After this patch is merged, I'll update Jenkins to add new compile-only jobs to ensure that Scala 2.10 compilation doesn't break. Author: Josh Rosen <joshrosen@databricks.com> Closes #10608 from JoshRosen/SPARK-6363.
*	[SPARK-7997][CORE] Remove Akka from Spark Core and Streaming	Shixiong Zhu	2016-01-22	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \|	- Remove Akka dependency from core. Note: the streaming-akka project still uses Akka. - Remove HttpFileServer - Remove Akka configs from SparkConf and SSLOptions - Rename `spark.akka.frameSize` to `spark.rpc.message.maxSize`. I think it's still worth to keep this config because using `DirectTaskResult` or `IndirectTaskResult` depends on it. - Update comments and docs Author: Shixiong Zhu <shixiong@databricks.com> Closes #10854 from zsxwing/remove-akka.
*	[SPARK-12652][PYSPARK] Upgrade Py4J to 0.9.1	Shixiong Zhu	2016-01-12	2	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \|	- [x] Upgrade Py4J to 0.9.1 - [x] SPARK-12657: Revert SPARK-12617 - [x] SPARK-12658: Revert SPARK-12511 - Still keep the change that only reading checkpoint once. This is a manual change and worth to take a look carefully. https://github.com/zsxwing/spark/commit/bfd4b5c040eb29394c3132af3c670b1a7272457c - [x] Verify no leak any more after reverting our workarounds Author: Shixiong Zhu <shixiong@databricks.com> Closes #10692 from zsxwing/py4j-0.9.1.
*	[SPARK-12692][BUILD][YARN] Scala style: Fix the style violation (Space ↵	Kousuke Saruta	2016-01-11	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \|	before "," or ":") Fix the style violation (space before , and :). This PR is a followup for #10643. Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp> Closes #10686 from sarutak/SPARK-12692-followup-yarn.
*	[SPARK-3873][TESTS] Import ordering fixes.	Marcelo Vanzin	2016-01-05	5	-20/+13
\| \| \| \| \| \|	Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #10582 from vanzin/SPARK-3873-tests.
*	[SPARK-7995][SPARK-6280][CORE] Remove AkkaRpcEnv and remove systemName from ↵	Shixiong Zhu	2015-12-31	1	-7/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	setupEndpointRef ### Remove AkkaRpcEnv Keep `SparkEnv.actorSystem` because Streaming still uses it. Will remove it and AkkaUtils after refactoring Streaming actorStream API. ### Remove systemName There are 2 places using `systemName`: * `RpcEnvConfig.name`. Actually, although it's used as `systemName` in `AkkaRpcEnv`, `NettyRpcEnv` uses it as the service name to output the log `Successfully started service * on port `. Since the service name in log is useful, I keep `RpcEnvConfig.name`. `def setupEndpointRef(systemName: String, address: RpcAddress, endpointName: String)`. Each `ActorSystem` has a `systemName`. Akka requires `systemName` in its URI and will refuse a connection if `systemName` is not matched. However, `NettyRpcEnv` doesn't use it. So we can remove `systemName` from `setupEndpointRef` since we are removing `AkkaRpcEnv`. ### Remove RpcEnv.uriOf `uriOf` exists because Akka uses different URI formats for with and without authentication, e.g., `akka.ssl.tcp...` and `akka.tcp://...`. But `NettyRpcEnv` uses the same format. So it's not necessary after removing `AkkaRpcEnv`. Author: Shixiong Zhu <shixiong@databricks.com> Closes #10459 from zsxwing/remove-akka-rpc-env.
*	[SPARK-3873][YARN] Fix import ordering.	Marcelo Vanzin	2015-12-30	12	-26/+23
\| \| \| \| \| \|	Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #10536 from vanzin/SPARK-3873-yarn.