spark - Mirror of Apache Spark

	Commit message (Collapse)	Author	Age	Files	Lines
*	[SPARK-17979][SPARK-14453] Remove deprecated SPARK_YARN_USER_ENV and ↵	Yong Tang	2017-03-10	3	-4/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	SPARK_JAVA_OPTS This fix removes deprecated support for config `SPARK_YARN_USER_ENV`, as is mentioned in SPARK-17979. This fix also removes deprecated support for the following: ``` SPARK_YARN_USER_ENV SPARK_JAVA_OPTS SPARK_CLASSPATH SPARK_WORKER_INSTANCES ``` Related JIRA: [SPARK-14453]: https://issues.apache.org/jira/browse/SPARK-14453 [SPARK-12344]: https://issues.apache.org/jira/browse/SPARK-12344 [SPARK-15781]: https://issues.apache.org/jira/browse/SPARK-15781 Existing tests should pass. Author: Yong Tang <yong.tang.github@outlook.com> Closes #17212 from yongtang/SPARK-17979.
*	[SPARK-19534][TESTS] Convert Java tests to use lambdas, Java 8 features	Sean Owen	2017-02-19	2	-5/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? Convert tests to use Java 8 lambdas, and modest related fixes to surrounding code. ## How was this patch tested? Jenkins tests Author: Sean Owen <sowen@cloudera.com> Closes #16964 from srowen/SPARK-19534.
*	[SPARK-19550][BUILD][CORE][WIP] Remove Java 7 support	Sean Owen	2017-02-16	11	-167/+103
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- Move external/java8-tests tests into core, streaming, sql and remove - Remove MaxPermGen and related options - Fix some reflection / TODOs around Java 8+ methods - Update doc references to 1.7/1.8 differences - Remove Java 7/8 related build profiles - Update some plugins for better Java 8 compatibility - Fix a few Java-related warnings For the future: - Update Java 8 examples to fully use Java 8 - Update Java tests to use lambdas for simplicity - Update Java internal implementations to use lambdas ## How was this patch tested? Existing tests Author: Sean Owen <sowen@cloudera.com> Closes #16871 from srowen/SPARK-19493.
*	[SPARK-19227][SPARK-19251] remove unused imports and outdated comments	uncleGen	2017-01-18	1	-3/+0
\| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? remove ununsed imports and outdated comments, and fix some minor code style issue. ## How was this patch tested? existing ut Author: uncleGen <hustyugm@gmail.com> Closes #16591 from uncleGen/SPARK-19227.
*	[SPARK-18842][TESTS][LAUNCHER] De-duplicate paths in classpaths in commands ↵	hyukjinkwon	2016-12-14	1	-3/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	for local-cluster mode to work around the path length limitation on Windows ## What changes were proposed in this pull request? Currently, some tests are being failed and hanging on Windows due to this problem. For the reason in SPARK-18718, some tests using `local-cluster` mode were disabled on Windows due to the length limitation by paths given to classpaths. The limitation seems roughly 32K (see the [blog in MS](https://blogs.msdn.microsoft.com/oldnewthing/20031210-00/?p=41553/) and [another reference](https://support.thoughtworks.com/hc/en-us/articles/213248526-Getting-around-maximum-command-line-length-is-32767-characters-on-Windows)) but in `local-cluster` mode, executors were being launched as processes with the command such as [here](https://gist.github.com/HyukjinKwon/5bc81061c250d4af5a180869b59d42ea) in (only) tests. This length is roughly 40K due to the classpaths given to `java` command. However, it seems duplicates are almost half of them. So, if we deduplicate the paths, it seems reduced to roughly 20K with the command, [here](https://gist.github.com/HyukjinKwon/dad0c8db897e5e094684a2dc6a417790). Maybe, we should consider as some more paths are added in the future but it seems better than disabling all the tests for now with minimised changes. Therefore, this PR proposes to deduplicate the paths in classpaths in case of launching executors as processes in `local-cluster` mode. ## How was this patch tested? Existing tests in `ShuffleSuite` and `BroadcastJoinSuite` manually via AppVeyor Author: hyukjinkwon <gurwls223@gmail.com> Closes #16266 from HyukjinKwon/disable-local-cluster-tests.
*	[SPARK-18662][HOTFIX] Add new resource-managers directories to SparkLauncher.	Marcelo Vanzin	2016-12-08	1	-2/+3
\| \| \| \| \| \| \| \| \| \|	These directories are added to the classpath of applications when testing or using SPARK_PREPEND_CLASSES, otherwise updated classes are not seen. Also, add the mesos directory which was missing. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #16202 from vanzin/SPARK-18662.
*	[SPARK-1267][SPARK-18129] Allow PySpark to be pip installed	Holden Karau	2016-11-16	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? This PR aims to provide a pip installable PySpark package. This does a bunch of work to copy the jars over and package them with the Python code (to prevent challenges from trying to use different versions of the Python code with different versions of the JAR). It does not currently publish to PyPI but that is the natural follow up (SPARK-18129). Done: - pip installable on conda [manual tested] - setup.py installed on a non-pip managed system (RHEL) with YARN [manual tested] - Automated testing of this (virtualenv) - packaging and signing with release-build* Possible follow up work: - release-build update to publish to PyPI (SPARK-18128) - figure out who owns the pyspark package name on prod PyPI (is it someone with in the project or should we ask PyPI or should we choose a different name to publish with like ApachePySpark?) - Windows support and or testing ( SPARK-18136 ) - investigate details of wheel caching and see if we can avoid cleaning the wheel cache during our test - consider how we want to number our dev/snapshot versions Explicitly out of scope: - Using pip installed PySpark to start a standalone cluster - Using pip installed PySpark for non-Python Spark programs *I've done some work to test release-build locally but as a non-committer I've just done local testing. ## How was this patch tested? Automated testing with virtualenv, manual testing with conda, a system wide install, and YARN integration. release-build changes tested locally as a non-committer (no testing of upload artifacts to Apache staging websites) Author: Holden Karau <holden@us.ibm.com> Author: Juliet Hougland <juliet@cloudera.com> Author: Juliet Hougland <not@myemail.com> Closes #15659 from holdenk/SPARK-1267-pip-install-pyspark.
*	[SPARK-17178][SPARKR][SPARKSUBMIT] Allow to set sparkr shell command through ↵	Jeff Zhang	2016-08-31	3	-1/+22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	--conf ## What changes were proposed in this pull request? Allow user to set sparkr shell command through --conf spark.r.shell.command ## How was this patch tested? Unit test is added and also verify it manually through ``` bin/sparkr --master yarn-client --conf spark.r.shell.command=/usr/local/bin/R ``` Author: Jeff Zhang <zjffdu@apache.org> Closes #14744 from zjffdu/SPARK-17178.
*	[SPARK-13081][PYSPARK][SPARK_SUBMIT] Allow set pythonExec of driver and ↵	Jeff Zhang	2016-08-11	2	-3/+19
\| \| \| \| \| \| \| \| \| \| \| \|	executor through conf… Before this PR, user have to export environment variable to specify the python of driver & executor which is not so convenient for users. This PR is trying to allow user to specify python through configuration "--pyspark-driver-python" & "--pyspark-executor-python" Manually test in local & yarn mode for pyspark-shell and pyspark batch mode. Author: Jeff Zhang <zjffdu@apache.org> Closes #13146 from zjffdu/SPARK-13081.
*	[SPARK-14702] Make environment of SparkLauncher launched process more ↵	Andrew Duffy	2016-07-19	2	-27/+145
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	configurable ## What changes were proposed in this pull request? Adds a few public methods to `SparkLauncher` to allow configuring some extra features of the `ProcessBuilder`, including the working directory, output and error stream redirection. ## How was this patch tested? Unit testing + simple Spark driver programs Author: Andrew Duffy <root@aduffy.org> Closes #14201 from andreweduffy/feature/launcher.
*	[MINOR] Fix Java Lint errors introduced by #13286 and #13280	Sandeep Singh	2016-06-08	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? revived #13464 Fix Java Lint errors introduced by #13286 and #13280 Before: ``` Using `mvn` from path: /Users/pichu/Project/spark/build/apache-maven-3.3.9/bin/mvn Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=512M; support was removed in 8.0 Checkstyle checks failed at following occurrences: [ERROR] src/main/java/org/apache/spark/launcher/LauncherServer.java:[340,5] (whitespace) FileTabCharacter: Line contains a tab character. [ERROR] src/main/java/org/apache/spark/launcher/LauncherServer.java:[341,5] (whitespace) FileTabCharacter: Line contains a tab character. [ERROR] src/main/java/org/apache/spark/launcher/LauncherServer.java:[342,5] (whitespace) FileTabCharacter: Line contains a tab character. [ERROR] src/main/java/org/apache/spark/launcher/LauncherServer.java:[343,5] (whitespace) FileTabCharacter: Line contains a tab character. [ERROR] src/main/java/org/apache/spark/sql/streaming/OutputMode.java:[41,28] (naming) MethodName: Method name 'Append' must match pattern '^[a-z][a-z0-9][a-zA-Z0-9_]$'. [ERROR] src/main/java/org/apache/spark/sql/streaming/OutputMode.java:[52,28] (naming) MethodName: Method name 'Complete' must match pattern '^[a-z][a-z0-9][a-zA-Z0-9_]$'. [ERROR] src/main/java/org/apache/spark/sql/execution/datasources/parquet/SpecificParquetRecordReaderBase.java:[61,8] (imports) UnusedImports: Unused import - org.apache.parquet.schema.PrimitiveType. [ERROR] src/main/java/org/apache/spark/sql/execution/datasources/parquet/SpecificParquetRecordReaderBase.java:[62,8] (imports) UnusedImports: Unused import - org.apache.parquet.schema.Type. ``` ## How was this patch tested? ran `dev/lint-java` locally Author: Sandeep Singh <sandeep@techaddict.me> Closes #13559 from techaddict/minor-3.
*	[SPARK-15652][LAUNCHER] Added a new State (LOST) for the listeners of ↵	Subroto Sanyal	2016-06-06	3	-1/+38
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	SparkLauncher ## What changes were proposed in this pull request? This situation can happen when the LauncherConnection gets an exception while reading through the socket and terminating silently without notifying making the client/listener think that the job is still in previous state. The fix force sends a notification to client that the job finished with unknown status and let client handle it accordingly. ## How was this patch tested? Added a unit test. Author: Subroto Sanyal <ssanyal@datameer.com> Closes #13497 from subrotosanyal/SPARK-15652-handle-spark-submit-jvm-crash.
*	[SPARK-15665][CORE] spark-submit --kill and --status are not working	Devaraj K	2016-06-03	2	-11/+29
\| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? --kill and --status were not considered while handling in OptionParser and due to that it was failing. Now handling the --kill and --status options as part of OptionParser.handle. ## How was this patch tested? Added a test org.apache.spark.launcher.SparkSubmitCommandBuilderSuite.testCliKillAndStatus() and also I have verified these manually by running --kill and --status commands. Author: Devaraj K <devaraj@apache.org> Closes #13407 from devaraj-kavali/SPARK-15665.
*	[MINOR] More than 100 chars in line in SparkSubmitCommandBuilderSuite	Sandeep Singh	2016-05-22	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? More than 100 chars in line. ## How was this patch tested? Author: Sandeep Singh <sandeep@techaddict.me> Closes #13249 from techaddict/fix-1.
*	[SPARK-15360][SPARK-SUBMIT] Should print spark-submit usage when no ↵	wm624@hotmail.com	2016-05-20	2	-19/+41
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	arguments is specified (Please fill in changes proposed in this fix) In 2.0, ./bin/spark-submit doesn't print out usage, but it raises an exception. In this PR, an exception handling is added in the Main.java when the exception is thrown. In the handling code, if there is no additional argument, it prints out usage. (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) Manually tested. ./bin/spark-submit Usage: spark-submit [options] <app jar \| python file> [app arguments] Usage: spark-submit --kill [submission ID] --master [spark://...] Usage: spark-submit --status [submission ID] --master [spark://...] Usage: spark-submit run-example [options] example-class [example args] Options: --master MASTER_URL spark://host:port, mesos://host:port, yarn, or local. --deploy-mode DEPLOY_MODE Whether to launch the driver program locally ("client") or on one of the worker machines inside the cluster ("cluster") (Default: client). --class CLASS_NAME Your application's main class (for Java / Scala apps). --name NAME A name of your application. --jars JARS Comma-separated list of local jars to include on the driver and executor classpaths. --packages Comma-separated list of maven coordinates of jars to include on the driver and executor classpaths. Will search the local maven repo, then maven central and any additional remote repositories given by --repositories. The format for the coordinates should be groupId:artifactId:version. Author: wm624@hotmail.com <wm624@hotmail.com> Closes #13163 from wangmiao1981/submit.
*	[SPARK-11249][LAUNCHER] Throw error if app resource is not provided.	Marcelo Vanzin	2016-05-10	3	-15/+32
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Without this, the code would build an invalid spark-submit command line, and a more cryptic error would be presented to the user. Also, expose a constant that allows users to set a dummy resource in cases where they don't need an actual resource file; for backwards compatibility, that uses the same "spark-internal" resource that Spark itself uses. Tested via unit tests, run-example, spark-shell, and running the thrift server with mixed spark and hive command line arguments. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #12909 from vanzin/SPARK-11249.
*	[SPARK-15067][YARN] YARN executors are launched with fixed perm gen size	Sean Owen	2016-05-09	2	-3/+39
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? Look for MaxPermSize arguments anywhere in an arg, to account for quoted args. See JIRA for discussion. ## How was this patch tested? Jenkins tests Author: Sean Owen <sowen@cloudera.com> Closes #12985 from srowen/SPARK-15067.
*	[SPARK-14391][LAUNCHER] Fix launcher communication test, take 2.	Marcelo Vanzin	2016-04-29	2	-3/+2
\| \| \| \| \| \| \| \| \| \|	There's actually a race here: the state of the handler was changed before the connection was set, so the test code could be notified of the state change, wake up, and still see the connection as null, triggering the assert. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #12785 from vanzin/SPARK-14391.
*	[HOTFIX] Disable flaky LauncherServerSuite.testCommunication	Reynold Xin	2016-04-29	1	-1/+2
\|
*	[SPARK-12384] Enables spark-clients to set the min(-Xms) and max(*.memory ↵	Dhruve Ashar	2016-04-07	4	-11/+28
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	config) j… ## What changes were proposed in this pull request? Currently Spark clients are started with the same memory setting for Xms and Xms leading to reserving unnecessary higher amounts of memory. This behavior is changed and the clients can now specify an initial heap size using the extraJavaOptions in the config for driver,executor and am individually. Note, that only -Xms can be provided through this config option, if the client wants to set the max size(-Xmx), this has to be done via the *.memory configuration knobs which are currently supported. ## How was this patch tested? Monitored executor and yarn logs in debug mode to verify the commands through which they are being launched in client and cluster mode. The driver memory was verified locally using jps -v. Setting up -Xmx parameter in the javaExtraOptions raises exception with the info provided. Author: Dhruve Ashar <dhruveashar@gmail.com> Closes #12115 from dhruve/impr/SPARK-12384.
*	[SPARK-14134][CORE] Change the package name used for shading classes.	Marcelo Vanzin	2016-04-06	1	-2/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The current package name uses a dash, which is a little weird but seemed to work. That is, until a new test tried to mock a class that references one of those shaded types, and then things started failing. Most changes are just noise to fix the logging configs. For reference, SPARK-8815 also raised this issue, although at the time it did not cause any issues in Spark, so it was not addressed. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #11941 from vanzin/SPARK-14134.
*	[SPARK-14391][LAUNCHER] Increase test timeouts.	Marcelo Vanzin	2016-04-06	1	-3/+3
\| \| \| \| \| \| \| \| \| \|	Most of the time tests should still pass really quickly; it's just when machines are overloaded that the tests may take a little time, but that's still preferable over just failing the test. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #12210 from vanzin/SPARK-14391.
*	[SPARK-13579][BUILD] Stop building the main Spark assembly.	Marcelo Vanzin	2016-04-04	3	-30/+32
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This change modifies the "assembly/" module to just copy needed dependencies to its build directory, and modifies the packaging script to pick those up (and remove duplicate jars packages in the examples module). I also made some minor adjustments to dependencies to remove some test jars from the final packaging, and remove jars that conflict with each other when packaged separately (e.g. servlet api). Also note that this change restores guava in applications' classpaths, even though it's still shaded inside Spark. This is now needed for the Hadoop libraries that are packaged with Spark, which now are not processed by the shade plugin. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #11796 from vanzin/SPARK-13579.
*	[SPARK-14355][BUILD] Fix typos in Exception/Testcase/Comments and static ↵	Dongjoon Hyun	2016-04-03	4	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	analysis results ## What changes were proposed in this pull request? This PR contains the following 5 types of maintenance fix over 59 files (+94 lines, -93 lines). - Fix typos(exception/log strings, testcase name, comments) in 44 lines. - Fix lint-java errors (MaxLineLength) in 6 lines. (New codes after SPARK-14011) - Use diamond operators in 40 lines. (New codes after SPARK-13702) - Fix redundant semicolon in 5 lines. - Rename class `InferSchemaSuite` to `CSVInferSchemaSuite` in CSVInferSchemaSuite.scala. ## How was this patch tested? Manual and pass the Jenkins tests. Author: Dongjoon Hyun <dongjoon@apache.org> Closes #12139 from dongjoon-hyun/SPARK-14355.
*	[SPARK-13955][YARN] Also look for Spark jars in the build directory.	Marcelo Vanzin	2016-03-30	2	-22/+26
\| \| \| \| \| \| \| \| \| \| \| \|	Move the logic to find Spark jars to CommandBuilderUtils and make it available for YARN code, so that it's possible to easily launch Spark on YARN from a build directory. Tested by running SparkPi from the build directory on YARN. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #11970 from vanzin/SPARK-13955.
*	[SPARK-14011][CORE][SQL] Enable `LineLength` Java checkstyle rule	Dongjoon Hyun	2016-03-21	3	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? [Spark Coding Style Guide](https://cwiki.apache.org/confluence/display/SPARK/Spark+Code+Style+Guide) has 100-character limit on lines, but it's disabled for Java since 11/09/15. This PR enables LineLength checkstyle again. To help that, this also introduces RedundantImport and RedundantModifier, too. The following is the diff on `checkstyle.xml`. ```xml - <!-- TODO: 11/09/15 disabled - the lengths are currently > 100 in many places --> - <!-- <module name="LineLength"> <property name="max" value="100"/> <property name="ignorePattern" value="^package.\|^import.\|a href\|href\|http://\|https://\|ftp://"/> </module> - --> <module name="NoLineWrap"/> <module name="EmptyBlock"> <property name="option" value="TEXT"/> -167,5 +164,7 </module> <module name="CommentsIndentation"/> <module name="UnusedImports"/> + <module name="RedundantImport"/> + <module name="RedundantModifier"/> ``` ## How was this patch tested? Currently, `lint-java` is disabled in Jenkins. It needs a manual test. After passing the Jenkins tests, `dev/lint-java` should passes locally. Author: Dongjoon Hyun <dongjoon@apache.org> Closes #11831 from dongjoon-hyun/SPARK-14011.
*	[SPARK-13823][HOTFIX] Increase tryAcquire timeout and assert it succeeds to ↵	Sean Owen	2016-03-16	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	fix failure on slow machines ## What changes were proposed in this pull request? I'm seeing several PR builder builds fail after https://github.com/apache/spark/pull/11725/files. Example: https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/job/spark-master-test-maven-hadoop-2.4/lastFailedBuild/console ``` testCommunication(org.apache.spark.launcher.LauncherServerSuite) Time elapsed: 0.023 sec <<< FAILURE! java.lang.AssertionError: expected:<app-id> but was:<null> at org.apache.spark.launcher.LauncherServerSuite.testCommunication(LauncherServerSuite.java:93) ``` However, other builds pass this same test, including the test when run locally and on the Jenkins PR builder. The failure itself concerns a change to how the test waits on a condition, and the wait can time out; therefore I think this is due to fast/slow machine differences. This is an attempt at a hot fix; it's a little hard to verify since locally and on the PR builder, it passes anyway. The change itself should be harmless anyway. Why didn't this happen before, if the new logic was supposed to be equivalent to the old? I think this is the sequence: - First attempt to acquire semaphore for 10ms actually silently times out - The changed being waited for happens just after that, a bit too late - Assertion passes since condition became true just in time - `release()` fires from the listener - Next `tryAcquire` however immediately succeeds because the first `tryAcquire` didn't acquire anything, but its subsequent condition is not yet true; this would explain why the second one always fails Versus the original using `notifyAll()`, there's a small difference: `wait()`-ing after `notifyAll()` just results in another wait; it doesn't make it return immediately. So this was a tiny latent issue that was masked by the semantics. Now the test asserts that the event actually happened (semaphore was acquired). (The timeout is still here to prevent the test from hanging forever, and to detect really slow response.) The timeout is increased to a second to allow plenty of time anyway. ## How was this patch tested? Jenkins tests Author: Sean Owen <sowen@cloudera.com> Closes #11763 from srowen/SPARK-13823.3.
*	[SPARK-13823][SPARK-13397][SPARK-13395][CORE] More warnings, StandardCharset ↵	Sean Owen	2016-03-16	1	-23/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	follow up ## What changes were proposed in this pull request? Follow up to https://github.com/apache/spark/pull/11657 - Also update `String.getBytes("UTF-8")` to use `StandardCharsets.UTF_8` - And fix one last new Coverity warning that turned up (use of unguarded `wait()` replaced by simpler/more robust `java.util.concurrent` classes in tests) - And while we're here cleaning up Coverity warnings, just fix about 15 more build warnings ## How was this patch tested? Jenkins tests Author: Sean Owen <sowen@cloudera.com> Closes #11725 from srowen/SPARK-13823.2.
*	[SPARK-13576][BUILD] Don't create assembly for examples.	Marcelo Vanzin	2016-03-15	2	-6/+80
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As part of the goal to stop creating assemblies in Spark, this change modifies the mvn and sbt builds to not create an assembly for examples. Instead, dependencies are copied to the build directory (under target/scala-xx/jars), and in the final archive, into the "examples/jars" directory. To avoid having to deal too much with Windows batch files, I made examples run through the launcher library; the spark-submit launcher now has a special mode to run examples, which adds all the necessary jars to the spark-submit command line, and replaces the bash and batch scripts that were used to run examples. The scripts are now just a thin wrapper around spark-submit; another advantage is that now all spark-submit options are supported. There are a few glitches; in the mvn build, a lot of duplicated dependencies get copied, because they are promoted to "compile" scope due to extra dependencies in the examples module (such as HBase). In the sbt build, all dependencies are copied, because there doesn't seem to be an easy way to filter things. I plan to clean some of this up when the rest of the tasks are finished. When the main assembly is replaced with jars, we can remove duplicate jars from the examples directory during packaging. Tested by running SparkPi in: maven build, sbt build, dist created by make-distribution.sh. Finally: note that running the "assembly" target in sbt doesn't build the examples anymore. You need to run "package" for that. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #11452 from vanzin/SPARK-13576.
*	[SPARK-13578][CORE] Modify launch scripts to not use assemblies.	Marcelo Vanzin	2016-03-14	3	-32/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Instead of looking for a specially-named assembly, the scripts now will blindly add all jars under the libs directory to the classpath. This libs directory is still currently the old assembly dir, so things should keep working the same way as before until we make more packaging changes. The only lost feature is the detection of multiple assemblies; I consider that a minor nicety that only really affects few developers, so it's probably ok. Tested locally by running spark-shell; also did some minor Win32 testing (just made sure spark-shell started). Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #11591 from vanzin/SPARK-13578.
*	[SPARK-13823][CORE][STREAMING][SQL] Always specify Charset in String <-> ↵	Sean Owen	2016-03-13	3	-8/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	byte[] conversions (and remaining Coverity items) ## What changes were proposed in this pull request? - Fixes calls to `new String(byte[])` or `String.getBytes()` that rely on platform default encoding, to use UTF-8 - Same for `InputStreamReader` and `OutputStreamWriter` constructors - Standardizes on UTF-8 everywhere - Standardizes specifying the encoding with `StandardCharsets.UTF-8`, not the Guava constant or "UTF-8" (which means handling `UnuspportedEncodingException`) - (also addresses the other remaining Coverity scan issues, which are pretty trivial; these are separated into commit https://github.com/srowen/spark/commit/1deecd8d9ca986d8adb1a42d315890ce5349d29c ) ## How was this patch tested? Jenkins tests Author: Sean Owen <sowen@cloudera.com> Closes #11657 from srowen/SPARK-13823.
*	[SPARK-13294][PROJECT INFRA] Remove MiMa's dependency on spark-class / Spark ↵	Josh Rosen	2016-03-10	1	-22/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	assembly This patch removes the need to build a full Spark assembly before running the `dev/mima` script. - I modified the `tools` project to remove a direct dependency on Spark, so `sbt/sbt tools/fullClasspath` will now return the classpath for the `GenerateMIMAIgnore` class itself plus its own dependencies. - This required me to delete two classes full of dead code that we don't use anymore - `GenerateMIMAIgnore` now uses [ClassUtil](http://software.clapper.org/classutil/) to find all of the Spark classes rather than our homemade JAR traversal code. The problem in our own code was that it didn't handle folders of classes properly, which is necessary in order to generate excludes with an assembly-free Spark build. - `./dev/mima` no longer runs through `spark-class`, eliminating the need to reason about classpath ordering between `SPARK_CLASSPATH` and the assembly. Author: Josh Rosen <joshrosen@databricks.com> Closes #11178 from JoshRosen/remove-assembly-in-run-tests.
*	[SPARK-13702][CORE][SQL][MLLIB] Use diamond operator for generic instance ↵	Dongjoon Hyun	2016-03-09	8	-29/+29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	creation in Java code. ## What changes were proposed in this pull request? In order to make `docs/examples` (and other related code) more simple/readable/user-friendly, this PR replaces existing codes like the followings by using `diamond` operator. ``` - final ArrayList<Product2<Object, Object>> dataToWrite = - new ArrayList<Product2<Object, Object>>(); + final ArrayList<Product2<Object, Object>> dataToWrite = new ArrayList<>(); ``` Java 7 or higher supports diamond operator which replaces the type arguments required to invoke the constructor of a generic class with an empty set of type parameters (<>). Currently, Spark Java code use mixed usage of this. ## How was this patch tested? Manual. Pass the existing tests. Author: Dongjoon Hyun <dongjoon@apache.org> Closes #11541 from dongjoon-hyun/SPARK-13702.
*	[SPARK-13583][CORE][STREAMING] Remove unused imports and add checkstyle rule	Dongjoon Hyun	2016-03-03	2	-8/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? After SPARK-6990, `dev/lint-java` keeps Java code healthy and helps PR review by saving much time. This issue aims remove unused imports from Java/Scala code and add `UnusedImports` checkstyle rule to help developers. ## How was this patch tested? ``` ./dev/lint-java ./build/sbt compile ``` Author: Dongjoon Hyun <dongjoon@apache.org> Closes #11438 from dongjoon-hyun/SPARK-13583.
*	[SPARK-13529][BUILD] Move network/* modules into common/network-*	Reynold Xin	2016-02-28	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? As the title says, this moves the three modules currently in network/ into common/network-*. This removes one top level, non-user-facing folder. ## How was this patch tested? Compilation and existing tests. We should run both SBT and Maven. Author: Reynold Xin <rxin@databricks.com> Closes #11409 from rxin/SPARK-13529.
*	[SPARK-13387][MESOS] Add support for SPARK_DAEMON_JAVA_OPTS with ↵	Timothy Chen	2016-02-25	1	-2/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	MesosClusterDispatcher. ## What changes were proposed in this pull request? Add support for SPARK_DAEMON_JAVA_OPTS with MesosClusterDispatcher. ## How was the this patch tested? Manual testing by launching dispatcher with SPARK_DAEMON_JAVA_OPTS Author: Timothy Chen <tnachen@gmail.com> Closes #11277 from tnachen/cluster_dispatcher_opts.
*	[SPARK-13278][CORE] Launcher fails to start with JDK 9 EA	Claes Redestad	2016-02-14	2	-3/+29
\| \| \| \| \| \| \| \|	See http://openjdk.java.net/jeps/223 for more information about the JDK 9 version string scheme. Author: Claes Redestad <claes.redestad@gmail.com> Closes #11160 from cl4es/master.
*	[SPARK-12707][SPARK SUBMIT] Remove submit python/R scripts through py…	Jeff Zhang	2016-01-13	1	-7/+6
\| \| \| \| \| \| \| \|	…spark/sparkR Author: Jeff Zhang <zjffdu@apache.org> Closes #10658 from zjffdu/SPARK-12707.
*	[SPARK-12489][CORE][SQL][MLIB] Fix minor issues found by FindBugs	Shixiong Zhu	2015-12-28	2	-4/+2
\| \| \| \| \| \| \| \| \| \| \| \|	Include the following changes: 1. Close `java.sql.Statement` 2. Fix incorrect `asInstanceOf`. 3. Remove unnecessary `synchronized` and `ReentrantLock`. Author: Shixiong Zhu <shixiong@databricks.com> Closes #10440 from zsxwing/findbugs.
*	[SPARK-11808] Remove Bagel.	Reynold Xin	2015-12-19	1	-1/+1
\| \| \| \| \| \|	Author: Reynold Xin <rxin@databricks.com> Closes #10395 from rxin/SPARK-11808.
*	[SPARK-10123][DEPLOY] Support specifying deploy mode from configuration	jerryshao	2015-12-15	2	-3/+7
\| \| \| \| \| \| \| \|	Please help to review, thanks a lot. Author: jerryshao <sshao@hortonworks.com> Closes #10195 from jerryshao/SPARK-10123.
*	[SPARK-11140][CORE] Transfer files using network lib when using NettyRpcEnv.	Marcelo Vanzin	2015-11-23	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This change abstracts the code that serves jars / files to executors so that each RpcEnv can have its own implementation; the akka version uses the existing HTTP-based file serving mechanism, while the netty versions uses the new stream support added to the network lib, which makes file transfers benefit from the easier security configuration of the network library, and should also reduce overhead overall. The change includes a small fix to TransportChannelHandler so that it propagates user events to downstream handlers. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #9530 from vanzin/SPARK-11140.
*	[SPARK-11744][LAUNCHER] Fix print version throw exception when using pyspark ↵	jerryshao	2015-11-17	1	-7/+10
\| \| \| \| \| \| \| \| \| \|	shell Exception details can be seen here (https://issues.apache.org/jira/browse/SPARK-11744). Author: jerryshao <sshao@hortonworks.com> Closes #9721 from jerryshao/SPARK-11744.
*	[SPARK-11655][CORE] Fix deadlock in handling of launcher stop().	Marcelo Vanzin	2015-11-12	2	-2/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The stop() callback was trying to close the launcher connection in the same thread that handles connection data, which ended up causing a deadlock. So avoid that by dispatching the stop() request in its own thread. On top of that, add some exception safety to a few parts of the code, and use "destroyForcibly" from Java 8 if it's available, to force kill the child process. The flip side is that "kill()" may not actually work if running Java 7. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #9633 from vanzin/SPARK-11655.
*	[SPARK-11388][BUILD] Fix self closing tags.	Herman van Hovell	2015-10-29	2	-6/+6
\| \| \| \| \| \| \| \| \| \|	Java 8 javadoc does not like self closing tags: ```<p/>```, ```<br/>```, ... This PR fixes those. Author: Herman van Hovell <hvanhovell@questtec.nl> Closes #9339 from hvanhovell/SPARK-11388.
*	[SPARK-11071] [LAUNCHER] Fix flakiness in LauncherServerSuite::timeout.	Marcelo Vanzin	2015-10-15	2	-10/+34
\| \| \| \| \| \| \| \| \| \|	The test could fail depending on scheduling of the various threads involved; the change removes some sources of races, while making the test a little more resilient by trying a few times before giving up. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #9079 from vanzin/SPARK-11071.
*	[SPARK-11099] [SPARK SHELL] [SPARK SUBMIT] Default conf property file i…	Jeff Zhang	2015-10-15	3	-18/+45
\| \| \| \| \| \| \| \|	Please help review it. Thanks Author: Jeff Zhang <zjffdu@apache.org> Closes #9114 from zjffdu/SPARK-11099.
*	[SPARK-8673] [LAUNCHER] API and infrastructure for communicating with child ↵	Marcelo Vanzin	2015-10-09	16	-33/+1357
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	apps. This change adds an API that encapsulates information about an app launched using the library. It also creates a socket-based communication layer for apps that are launched as child processes; the launching application listens for connections from launched apps, and once communication is established, the channel can be used to send updates to the launching app, or to send commands to the child app. The change also includes hooks for local, standalone/client and yarn masters. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #7052 from vanzin/SPARK-8673.
*	[SPARK-10916] [YARN] Set perm gen size when launching containers on YARN.	Marcelo Vanzin	2015-10-06	2	-23/+23
\| \| \| \| \| \| \| \| \| \|	This makes YARN containers behave like all other processes launched by Spark, which launch with a default perm gen size of 256m unless overridden by the user (or not needed by the vm). Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #8970 from vanzin/SPARK-10916.
*	[SPARK-9284] [TESTS] Allow all tests to run without an assembly.	Marcelo Vanzin	2015-08-28	2	-156/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This change aims at speeding up the dev cycle a little bit, by making sure that all tests behave the same w.r.t. where the code to be tested is loaded from. Namely, that means that tests don't rely on the assembly anymore, rather loading all needed classes from the build directories. The main change is to make sure all build directories (classes and test-classes) are added to the classpath of child processes when running tests. YarnClusterSuite required some custom code since the executors are run differently (i.e. not through the launcher library, like standalone and Mesos do). I also found a couple of tests that could leak a SparkContext on failure, and added code to handle those. With this patch, it's possible to run the following command from a clean source directory and have all tests pass: mvn -Pyarn -Phadoop-2.4 -Phive-thriftserver install Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #7629 from vanzin/SPARK-9284.