spark - Mirror of Apache Spark

	Commit message (Collapse)	Author	Age	Files	Lines
*	SPARK-1929 DAGScheduler suspended by local task OOM	Zhen Peng	2014-05-26	2	-1/+19
\| \| \| \| \| \| \| \| \| \| \|	DAGScheduler does not handle local task OOM properly, and will wait for the job result forever. Author: Zhen Peng <zhenpeng01@baidu.com> Closes #883 from zhpengg/bugfix-dag-scheduler-oom and squashes the following commits: 76f7eda [Zhen Peng] remove redundant memory allocations aa63161 [Zhen Peng] SPARK-1929 DAGScheduler suspended by local task OOM
*	[SPARK-1931] Reconstruct routing tables in Graph.partitionBy	Ankur Dave	2014-05-26	3	-4/+31
\| \| \| \| \| \| \| \| \| \| \| \| \|	905173df57b90f90ebafb22e43f55164445330e6 introduced a bug in partitionBy where, after repartitioning the edges, it reuses the VertexRDD without updating the routing tables to reflect the new edge layout. Subsequent accesses of the triplets contain nulls for many vertex properties. This commit adds a test for this bug and fixes it by introducing `VertexRDD#withEdges` and calling it in `partitionBy`. Author: Ankur Dave <ankurdave@gmail.com> Closes #885 from ankurdave/SPARK-1931 and squashes the following commits: 3930cdd [Ankur Dave] Note how to set up VertexRDD for efficient joins 9bdbaa4 [Ankur Dave] [SPARK-1931] Reconstruct routing tables in Graph.partitionBy
*	SPARK-1925: Replace '&' with '&&'	zsxwing	2014-05-26	1	-2/+2
\| \| \| \| \| \| \| \| \| \|	JIRA: https://issues.apache.org/jira/browse/SPARK-1925 Author: zsxwing <zsxwing@gmail.com> Closes #879 from zsxwing/SPARK-1925 and squashes the following commits: 5cf5a6d [zsxwing] SPARK-1925: Replace '&' with '&&'
*	Fix scalastyle warnings in yarn alpha	witgo	2014-05-26	1	-1/+2
\| \| \| \| \| \| \| \|	Author: witgo <witgo@qq.com> Closes #884 from witgo/scalastyle and squashes the following commits: 4b08ae4 [witgo] Fix scalastyle warnings in yarn alpha
*	[SPARK-1914] [SQL] Simplify CountFunction not to traverse to evaluate all ↵	Takuya UESHIN	2014-05-26	2	-2/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	child expressions. `CountFunction` should count up only if the child's evaluated value is not null. Because it traverses to evaluate all child expressions, even if the child is null, it counts up if one of the all children is not null. Author: Takuya UESHIN <ueshin@happy-camper.st> Closes #861 from ueshin/issues/SPARK-1914 and squashes the following commits: 3b37315 [Takuya UESHIN] Merge branch 'master' into issues/SPARK-1914 2afa238 [Takuya UESHIN] Simplify CountFunction not to traverse to evaluate all child expressions.
*	HOTFIX: Add no-arg SparkContext constructor in Java	Patrick Wendell	2014-05-25	1	-0/+6
\| \| \| \| \| \| \| \| \| \|	Self explanatory. Author: Patrick Wendell <pwendell@gmail.com> Closes #878 from pwendell/java-constructor and squashes the following commits: 2cc1605 [Patrick Wendell] HOTFIX: Add no-arg SparkContext constructor in Java
*	[SQL] Minor: Introduce SchemaRDD#aggregate() for simple aggregations	Aaron Davidson	2014-05-25	2	-2/+24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	```scala rdd.aggregate(Sum('val)) ``` is just shorthand for ```scala rdd.groupBy()(Sum('val)) ``` but seems be more natural than doing a groupBy with no grouping expressions when you really just want an aggregation over all rows. Did not add a JavaSchemaRDD or Python API, as these seem to be lacking several other methods like groupBy() already -- leaving that cleanup for future patches. Author: Aaron Davidson <aaron@databricks.com> Closes #874 from aarondav/schemardd and squashes the following commits: e9e68ee [Aaron Davidson] Add comment db6afe2 [Aaron Davidson] Introduce SchemaRDD#aggregate() for simple aggregations
*	SPARK-1903 Document Spark's network connections	Andrew Ash	2014-05-25	2	-89/+222
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	https://issues.apache.org/jira/browse/SPARK-1903 Author: Andrew Ash <andrew@andrewash.com> Closes #856 from ash211/SPARK-1903 and squashes the following commits: 6e7782a [Andrew Ash] Add the technology used on each port 1d9b5d3 [Andrew Ash] Document port for history server 56193ee [Andrew Ash] spark.ui.port becomes worker.ui.port and master.ui.port a774c07 [Andrew Ash] Wording in network section 90e8237 [Andrew Ash] Use real :toc instead of the hand-written one edaa337 [Andrew Ash] Master -> Standalone Cluster Master 57e8869 [Andrew Ash] Port -> Default Port 3d4d289 [Andrew Ash] Title to title case c7d42d9 [Andrew Ash] [WIP] SPARK-1903 Add initial port listing for documentation a416ae9 [Andrew Ash] Word wrap to 100 lines
*	Fix PEP8 violations in Python mllib.	Reynold Xin	2014-05-25	8	-88/+78
\| \| \| \| \| \| \| \| \|	Author: Reynold Xin <rxin@apache.org> Closes #871 from rxin/mllib-pep8 and squashes the following commits: 848416f [Reynold Xin] Fixed a typo in the previous cleanup (c -> sc). a8db4cd [Reynold Xin] Fix PEP8 violations in Python mllib.
*	Python docstring update for sql.py.	Reynold Xin	2014-05-25	1	-61/+63
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Mostly related to the following two rules in PEP8 and PEP257: - Line length < 72 chars. - First line should be a concise description of the function/class. Author: Reynold Xin <rxin@apache.org> Closes #869 from rxin/docstring-schemardd and squashes the following commits: 7cf0cbc [Reynold Xin] Updated sql.py for pep8 docstring. 0a4aef9 [Reynold Xin] Merge branch 'master' into docstring-schemardd 6678937 [Reynold Xin] Python docstring update for sql.py.
*	Fix PEP8 violations in examples/src/main/python.	Reynold Xin	2014-05-25	6	-19/+25
\| \| \| \| \| \| \| \|	Author: Reynold Xin <rxin@apache.org> Closes #870 from rxin/examples-python-pep8 and squashes the following commits: 2829e84 [Reynold Xin] Fix PEP8 violations in examples/src/main/python.
*	Added license header for tox.ini.	Reynold Xin	2014-05-25	1	-0/+15
\| \| \| \| \|	(cherry picked from commit fa541f32c5b92e6868a9c99cbb2c87115d624d23) Signed-off-by: Reynold Xin <rxin@apache.org>
*	SPARK-1822: Some minor cleanup work on SchemaRDD.count()	Reynold Xin	2014-05-25	4	-7/+10
\| \| \| \| \| \| \| \| \| \|	Minor cleanup following #841. Author: Reynold Xin <rxin@apache.org> Closes #868 from rxin/schema-count and squashes the following commits: 5442651 [Reynold Xin] SPARK-1822: Some minor cleanup work on SchemaRDD.count()
*	Added PEP8 style configuration file.	Reynold Xin	2014-05-25	1	-0/+2
\| \| \| \| \| \| \| \| \| \|	This sets the max line length to 100 as a PEP8 exception. Author: Reynold Xin <rxin@apache.org> Closes #872 from rxin/pep8 and squashes the following commits: 2f26029 [Reynold Xin] Added PEP8 style configuration file.
*	[SPARK-1822] SchemaRDD.count() should use query optimizer	Kan Zhang	2014-05-25	5	-8/+32
\| \| \| \| \| \| \| \| \| \|	Author: Kan Zhang <kzhang@apache.org> Closes #841 from kanzhang/SPARK-1822 and squashes the following commits: 2f8072a [Kan Zhang] [SPARK-1822] Minor style update cf4baa4 [Kan Zhang] [SPARK-1822] Adding Scaladoc e67c910 [Kan Zhang] [SPARK-1822] SchemaRDD.count() should use optimizer
*	spark-submit: add exec at the end of the script	Colin Patrick Mccabe	2014-05-24	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \|	Add an 'exec' at the end of the spark-submit script, to avoid keeping a bash process hanging around while it runs. This makes ps look a little bit nicer. Author: Colin Patrick Mccabe <cmccabe@cloudera.com> Closes #858 from cmccabe/SPARK-1907 and squashes the following commits: 7023b64 [Colin Patrick Mccabe] spark-submit: add exec at the end of the script
*	[SPARK-1913][SQL] Bug fix: column pruning error in Parquet support	Cheng Lian	2014-05-24	4	-11/+22
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	JIRA issue: [SPARK-1913](https://issues.apache.org/jira/browse/SPARK-1913) When scanning Parquet tables, attributes referenced only in predicates that are pushed down are not passed to the `ParquetTableScan` operator and causes exception. Author: Cheng Lian <lian.cs.zju@gmail.com> Closes #863 from liancheng/spark-1913 and squashes the following commits: f976b73 [Cheng Lian] Addessed the readability issue commented by @rxin f5b257d [Cheng Lian] Added back comments deleted by mistake ae60ab3 [Cheng Lian] [SPARK-1913] Attributes referenced only in predicates pushed down should remain in ParquetTableScan operator
*	[SPARK-1886] check executor id existence when executor exit	Zhen Peng	2014-05-24	1	-8/+14
\| \| \| \| \| \| \| \|	Author: Zhen Peng <zhenpeng01@baidu.com> Closes #827 from zhpengg/bugfix-executor-id-not-found and squashes the following commits: cd8bb65 [Zhen Peng] bugfix: check executor id existence when executor exit
*	SPARK-1911: Emphasize that Spark jars should be built with Java 6.	Patrick Wendell	2014-05-24	1	-21/+31
\| \| \| \| \| \| \| \| \| \| \| \| \|	This commit requires the user to manually say "yes" when buiding Spark without Java 6. The prompt can be bypassed with a flag (e.g. if the user is scripting around make-distribution). Author: Patrick Wendell <pwendell@gmail.com> Closes #859 from pwendell/java6 and squashes the following commits: 4921133 [Patrick Wendell] Adding Pyspark Notice fee8c9e [Patrick Wendell] SPARK-1911: Emphasize that Spark jars should be built with Java 6.
*	[SPARK-1900 / 1918] PySpark on YARN is broken	Andrew Or	2014-05-24	9	-47/+323
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If I run the following on a YARN cluster ``` bin/spark-submit sheep.py --master yarn-client ``` it fails because of a mismatch in paths: `spark-submit` thinks that `sheep.py` resides on HDFS, and balks when it can't find the file there. A natural workaround is to add the `file:` prefix to the file: ``` bin/spark-submit file:/path/to/sheep.py --master yarn-client ``` However, this also fails. This time it is because python does not understand URI schemes. This PR fixes this by automatically resolving all paths passed as command line argument to `spark-submit` properly. This has the added benefit of keeping file and jar paths consistent across different cluster modes. For python, we strip the URI scheme before we actually try to run it. Much of the code is originally written by @mengxr. Tested on YARN cluster. More tests pending. Author: Andrew Or <andrewor14@gmail.com> Closes #853 from andrewor14/submit-paths and squashes the following commits: 0bb097a [Andrew Or] Format path correctly before adding it to PYTHONPATH 323b45c [Andrew Or] Include --py-files on PYTHONPATH for pyspark shell 3c36587 [Andrew Or] Improve error messages (minor) 854aa6a [Andrew Or] Guard against NPE if user gives pathological paths 6638a6b [Andrew Or] Fix spark-shell jar paths after #849 went in 3bb0359 [Andrew Or] Update more comments (minor) 2a1f8a0 [Andrew Or] Update comments (minor) 6af2c77 [Andrew Or] Merge branch 'master' of github.com:apache/spark into submit-paths a68c4d1 [Andrew Or] Handle Windows python file path correctly 427a250 [Andrew Or] Resolve paths properly for Windows a591a4a [Andrew Or] Update tests for resolving URIs 6c8621c [Andrew Or] Move resolveURIs to Utils db8255e [Andrew Or] Merge branch 'master' of github.com:apache/spark into submit-paths f542dce [Andrew Or] Fix outdated tests 691c4ce [Andrew Or] Ignore special primary resource names 5342ac7 [Andrew Or] Add missing space in error message 02f77f3 [Andrew Or] Resolve command line arguments to spark-submit properly
*	Update LBFGSSuite.scala	baishuo(白硕)	2014-05-23	1	-2/+2
\| \| \| \| \| \| \| \| \| \|	the same reason as https://github.com/apache/spark/pull/588 Author: baishuo(白硕) <vc_java@hotmail.com> Closes #815 from baishuo/master and squashes the following commits: 6876c1e [baishuo(白硕)] Update LBFGSSuite.scala
*	Updated scripts for auditing releases	Tathagata Das	2014-05-22	11	-6/+547
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	- Added script to automatically generate change list CHANGES.txt - Added test for verifying linking against maven distributions of `spark-sql` and `spark-hive` - Added SBT projects for testing functionality of `spark-sql` and `spark-hive` - Fixed issues in existing tests that might have come up because of changes in Spark 1.0 Author: Tathagata Das <tathagata.das1565@gmail.com> Closes #844 from tdas/update-dev-scripts and squashes the following commits: 25090ba [Tathagata Das] Added missing license e2e20b3 [Tathagata Das] Updated tests for auditing releases.
*	[SPARK-1896] Respect spark.master (and --master) before MASTER in spark-shell	Andrew Or	2014-05-22	1	-3/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The hierarchy for configuring the Spark master in the shell is as follows: ``` MASTER > --master > spark.master (spark-defaults.conf) ``` This is inconsistent with the way we run normal applications, which is: ``` --master > spark.master (spark-defaults.conf) > MASTER ``` I was trying to run a shell locally on a standalone cluster launched through the ec2 scripts, which automatically set `MASTER` in spark-env.sh. It was surprising to me that `--master` didn't take effect, considering that this is the way we tell users to set their masters [here](http://people.apache.org/~pwendell/spark-1.0.0-rc7-docs/scala-programming-guide.html#initializing-spark). Author: Andrew Or <andrewor14@gmail.com> Closes #846 from andrewor14/shell-master and squashes the following commits: 2cb81c9 [Andrew Or] Respect spark.master before MASTER in REPL
*	[SPARK-1897] Respect spark.jars (and --jars) in spark-shell	Andrew Or	2014-05-22	1	-1/+7
\| \| \| \| \| \| \| \| \| \| \| \| \|	Spark shell currently overwrites `spark.jars` with `ADD_JARS`. In all modes except yarn-cluster, this means the `--jar` flag passed to `bin/spark-shell` is also discarded. However, in the [docs](http://people.apache.org/~pwendell/spark-1.0.0-rc7-docs/scala-programming-guide.html#initializing-spark), we explicitly tell the users to add the jars this way. Author: Andrew Or <andrewor14@gmail.com> Closes #849 from andrewor14/shell-jars and squashes the following commits: 928a7e6 [Andrew Or] ',' -> "," (minor) afc357c [Andrew Or] Handle spark.jars == "" in SparkILoop, not SparkSubmit c6da113 [Andrew Or] Do not set spark.jars to "" d8549f7 [Andrew Or] Respect spark.jars and --jars in spark-shell
*	Fix UISuite unit test that fails under Jenkins contention	Aaron Davidson	2014-05-22	1	-3/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Due to perhaps zombie processes on Jenkins, it seems that at least 10 Spark ports are in use. It also doesn't matter that the port increases when used, it could in fact go down -- the only part that matters is that it selects a different port rather than failing to bind. Changed test to match this. Thanks to @andrewor14 for helping diagnose this. Author: Aaron Davidson <aaron@databricks.com> Closes #857 from aarondav/tiny and squashes the following commits: c199ec8 [Aaron Davidson] Fix UISuite unit test that fails under Jenkins contention
*	[SPARK-1870] Make spark-submit --jars work in yarn-cluster mode.	Xiangrui Meng	2014-05-22	3	-55/+19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Sent secondary jars to distributed cache of all containers and add the cached jars to classpath before executors start. Tested on a YARN cluster (CDH-5.0). `spark-submit --jars` also works in standalone server and `yarn-client`. Thanks for @andrewor14 for testing! I removed "Doesn't work for drivers in standalone mode with "cluster" deploy mode." from `spark-submit`'s help message, though we haven't tested mesos yet. CC: @dbtsai @sryza Author: Xiangrui Meng <meng@databricks.com> Closes #848 from mengxr/yarn-classpath and squashes the following commits: 23e7df4 [Xiangrui Meng] rename spark.jar to __spark__.jar and app.jar to __app__.jar to avoid confliction apped $CWD/ and $CWD/* to the classpath remove unused methods a40f6ed [Xiangrui Meng] standalone -> cluster 65e04ad [Xiangrui Meng] update spark-submit help message and add a comment for yarn-client 11e5354 [Xiangrui Meng] minor changes 3e7e1c4 [Xiangrui Meng] use sparkConf instead of hadoop conf dc3c825 [Xiangrui Meng] add secondary jars to classpath in yarn
*	Configuration documentation updates	Reynold Xin	2014-05-21	1	-89/+105
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	1. Add < code > to configuration options 2. List env variables in tabular format to be consistent with other pages. 3. Moved Viewing Spark Properties section up. This is against branch-1.0, but should be cherry picked into master as well. Author: Reynold Xin <rxin@apache.org> Closes #851 from rxin/doc-config and squashes the following commits: 28ac0d3 [Reynold Xin] Add <code> to configuration options, and list env variables in a table. (cherry picked from commit 75af8bd3336d09e8c691e54ae9d2358fe1bf3723) Signed-off-by: Reynold Xin <rxin@apache.org>
*	[SPARK-1889] [SQL] Apply splitConjunctivePredicates to join condition while ↵	Takuya UESHIN	2014-05-21	2	-6/+24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	finding join ke... ...ys. When tables are equi-joined by multiple-keys `HashJoin` should be used, but `CartesianProduct` and then `Filter` are used. The join keys are paired by `And` expression so we need to apply `splitConjunctivePredicates` to join condition while finding join keys. Author: Takuya UESHIN <ueshin@happy-camper.st> Closes #836 from ueshin/issues/SPARK-1889 and squashes the following commits: fe1c387 [Takuya UESHIN] Apply splitConjunctivePredicates to join condition while finding join keys.
*	[SPARK-1519] Support minPartitions param of wholeTextFiles() in PySpark	Kan Zhang	2014-05-21	1	-2/+10
\| \| \| \| \| \| \| \|	Author: Kan Zhang <kzhang@apache.org> Closes #697 from kanzhang/SPARK-1519 and squashes the following commits: 4f8d1ed [Kan Zhang] [SPARK-1519] Support minPartitions param of wholeTextFiles() in PySpark
*	[Typo] Stoped -> Stopped	Andrew Or	2014-05-21	1	-1/+1
\| \| \| \| \| \| \| \|	Author: Andrew Or <andrewor14@gmail.com> Closes #847 from andrewor14/yarn-typo and squashes the following commits: c1906af [Andrew Or] Stoped -> Stopped
*	[Minor] Move JdbcRDDSuite to the correct package	Andrew Or	2014-05-21	1	-6/+6
\| \| \| \| \| \| \| \| \| \| \|	It was in the wrong package Author: Andrew Or <andrewor14@gmail.com> Closes #839 from andrewor14/jdbc-suite and squashes the following commits: f948c5a [Andrew Or] cache -> cache() b215279 [Andrew Or] Move JdbcRDDSuite to the correct package
*	[Docs] Correct example of creating a new SparkConf	Andrew Or	2014-05-21	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	The example code on the configuration page currently does not compile. Author: Andrew Or <andrewor14@gmail.com> Closes #842 from andrewor14/conf-docs and squashes the following commits: aabff57 [Andrew Or] Correct example of creating a new SparkConf
*	[SPARK-1250] Fixed misleading comments in bin/pyspark, bin/spark-class	Sumedh Mungee	2014-05-21	2	-2/+2
\| \| \| \| \| \| \| \| \| \|	Fixed a couple of misleading comments in bin/pyspark and bin/spark-class. The comments make it seem like the script is looking for the Scala installation when in fact it is looking for Spark. Author: Sumedh Mungee <smungee@gmail.com> Closes #843 from smungee/spark-1250-fix-comments and squashes the following commits: 26870f3 [Sumedh Mungee] [SPARK-1250] Fixed misleading comments in bin/pyspark and bin/spark-class
*	[Hotfix] Blacklisted flaky HiveCompatibility test	Tathagata Das	2014-05-20	1	-2/+4
\| \| \| \| \| \| \| \| \| \|	`lateral_view_outer` query sometimes returns a different set of 10 rows. Author: Tathagata Das <tathagata.das1565@gmail.com> Closes #838 from tdas/hive-test-fix2 and squashes the following commits: 9128a0d [Tathagata Das] Blacklisted flaky HiveCompatibility test.
*	[Spark 1877] ClassNotFoundException when loading RDD with serialized objects	Tathagata Das	2014-05-19	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \|	Updated version of #821 Author: Tathagata Das <tathagata.das1565@gmail.com> Author: Ghidireac <bogdang@u448a5b0a73d45358d94a.ant.amazon.com> Closes #835 from tdas/SPARK-1877 and squashes the following commits: f346f71 [Tathagata Das] Addressed Patrick's comments. fee0c5d [Ghidireac] SPARK-1877: ClassNotFoundException when loading RDD with serialized objects
*	[SPARK-1874][MLLIB] Clean up MLlib sample data	Xiangrui Meng	2014-05-19	6	-2/+2138
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	1. Added synthetic datasets for `MovieLensALS`, `LinearRegression`, `BinaryClassification`. 2. Embedded instructions in the help message of those example apps. Per discussion with Matei on the JIRA page, new example data is under `data/mllib`. Author: Xiangrui Meng <meng@databricks.com> Closes #833 from mengxr/mllib-sample-data and squashes the following commits: 59f0a18 [Xiangrui Meng] add sample binary classification data 3c2f92f [Xiangrui Meng] add linear regression data 050f1ca [Xiangrui Meng] add a sample dataset for MovieLensALS example
*	SPARK-1689: Spark application should die when removed by Master	Aaron Davidson	2014-05-19	1	-0/+2
\| \| \| \| \| \| \| \| \| \|	scheduler.error() will mask the error if there are active tasks. Being removed is a cataclysmic event for Spark applications, and should probably be treated as such. Author: Aaron Davidson <aaron@databricks.com> Closes #832 from aarondav/i-love-u and squashes the following commits: 9f1200f [Aaron Davidson] SPARK-1689: Spark application should die when removed by Master
*	[SPARK-1875]NoClassDefFoundError: StringUtils when building with hadoop 1.x ↵	witgo	2014-05-19	2	-10/+1
\| \| \| \| \| \| \| \| \| \| \|	and hive Author: witgo <witgo@qq.com> Closes #824 from witgo/SPARK-1875_commons-lang-2.6 and squashes the following commits: ef7231d [witgo] review commit ead3c3b [witgo] SPARK-1875:NoClassDefFoundError: StringUtils when building against Hadoop 1
*	SPARK-1879. Increase MaxPermSize since some of our builds have many classes	Matei Zaharia	2014-05-19	3	-5/+7
\| \| \| \| \| \| \| \| \| \| \| \|	See https://issues.apache.org/jira/browse/SPARK-1879 -- builds with Hadoop2 and Hive ran out of PermGen space in spark-shell, when those things added up with the Scala compiler. Note that users can still override it by setting their own Java options with this change. Their options will come later in the command string than the -XX:MaxPermSize=128m. Author: Matei Zaharia <matei@databricks.com> Closes #823 from mateiz/spark-1879 and squashes the following commits: 6bc0ee8 [Matei Zaharia] Increase MaxPermSize to 128m since some of our builds have lots of classes
*	SPARK-1878: Fix the incorrect initialization order	zsxwing	2014-05-19	2	-3/+7
\| \| \| \| \| \| \| \| \| \|	JIRA: https://issues.apache.org/jira/browse/SPARK-1878 Author: zsxwing <zsxwing@gmail.com> Closes #822 from zsxwing/SPARK-1878 and squashes the following commits: 4a47e27 [zsxwing] SPARK-1878: Fix the incorrect initialization order
*	[SPARK-1876] Windows fixes to deal with latest distribution layout changes	Matei Zaharia	2014-05-19	7	-30/+81
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- Look for JARs in the right place - Launch examples the same way as on Unix - Load datanucleus JARs if they exist - Don't attempt to parse local paths as URIs in SparkSubmit, since paths with C:\ are not valid URIs - Also fixed POM exclusion rules for datanucleus (it wasn't properly excluding it, whereas SBT was) Author: Matei Zaharia <matei@databricks.com> Closes #819 from mateiz/win-fixes and squashes the following commits: d558f96 [Matei Zaharia] Fix comment 228577b [Matei Zaharia] Review comments d3b71c7 [Matei Zaharia] Properly exclude datanucleus files in Maven assembly 144af84 [Matei Zaharia] Update Windows scripts to match latest binary package layout
*	[WIP][SPARK-1871][MLLIB] Improve MLlib guide for v1.0	Xiangrui Meng	2014-05-18	10	-90/+153
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Some improvements to MLlib guide: 1. [SPARK-1872] Update API links for unidoc. 2. [SPARK-1783] Added `page.displayTitle` to the global layout. If it is defined, use it instead of `page.title` for title display. 3. Add more Java/Python examples. Author: Xiangrui Meng <meng@databricks.com> Closes #816 from mengxr/mllib-doc and squashes the following commits: ec2e407 [Xiangrui Meng] format scala example for ALS cd9f40b [Xiangrui Meng] add a paragraph to summarize distributed matrix types 4617f04 [Xiangrui Meng] add python example to loadLibSVMFile and fix Java example d6509c2 [Xiangrui Meng] [SPARK-1783] update mllib titles 561fdc0 [Xiangrui Meng] add a displayTitle option to global layout 195d06f [Xiangrui Meng] add Java example for summary stats and minor fix 9f1ff89 [Xiangrui Meng] update java api links in mllib-basics 7dad18e [Xiangrui Meng] update java api links in NB 3a0f4a6 [Xiangrui Meng] api/pyspark -> api/python 35bdeb9 [Xiangrui Meng] api/mllib -> api/scala e4afaa8 [Xiangrui Meng] explicity state what might change
*	SPARK-1873: Add README.md file when making distributions	Patrick Wendell	2014-05-18	1	-0/+1
\| \| \| \| \| \| \| \|	Author: Patrick Wendell <pwendell@gmail.com> Closes #818 from pwendell/reamde and squashes the following commits: 4020b11 [Patrick Wendell] SPARK-1873: Add README.md file when making distributions
*	Fix spark-submit path in spark-shell & pyspark	Neville Li	2014-05-18	2	-5/+5
\| \| \| \| \| \| \| \| \|	Author: Neville Li <neville@spotify.com> Closes #812 from nevillelyh/neville/v1.0 and squashes the following commits: 0dc33ed [Neville Li] Fix spark-submit path in pyspark becec64 [Neville Li] Fix spark-submit path in spark-shell
*	Make deprecation warning less severe	Patrick Wendell	2014-05-16	1	-6/+6
\| \| \| \| \| \| \| \| \| \|	Just a small change. I think it's good not to scare people who are using the old options. Author: Patrick Wendell <pwendell@gmail.com> Closes #810 from pwendell/warnings and squashes the following commits: cb8a311 [Patrick Wendell] Make deprecation warning less severe
*	[SPARK-1824] Remove <master> from Python examples	Andrew Or	2014-05-16	12	-72/+77
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	A recent PR (#552) fixed this for all Scala / Java examples. We need to do it for python too. Note that this blocks on #799, which makes `bin/pyspark` go through Spark submit. With only the changes in this PR, the only way to run these examples is through Spark submit. Once #799 goes in, you can use `bin/pyspark` to run them too. For example, ``` bin/pyspark examples/src/main/python/pi.py 100 --master local-cluster[4,1,512] ``` Author: Andrew Or <andrewor14@gmail.com> Closes #802 from andrewor14/python-examples and squashes the following commits: cf50b9f [Andrew Or] De-indent python comments (minor) 50f80b1 [Andrew Or] Remove pyFiles from SparkContext construction c362f69 [Andrew Or] Update docs to use spark-submit for python applications 7072c6a [Andrew Or] Merge branch 'master' of github.com:apache/spark into python-examples 427a5f0 [Andrew Or] Update docs d32072c [Andrew Or] Remove <master> from examples + update usages
*	[SPARK-1808] Route bin/pyspark through Spark submit	Andrew Or	2014-05-16	10	-34/+107
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem. For `bin/pyspark`, there is currently no other way to specify Spark configuration properties other than through `SPARK_JAVA_OPTS` in `conf/spark-env.sh`. However, this mechanism is supposedly deprecated. Instead, it needs to pick up configurations explicitly specified in `conf/spark-defaults.conf`. Solution. Have `bin/pyspark` invoke `bin/spark-submit`, like all of its counterparts in Scala land (i.e. `bin/spark-shell`, `bin/run-example`). This has the additional benefit of making the invocation of all the user facing Spark scripts consistent. Details. `bin/pyspark` inherently handles two cases: (1) running python applications and (2) running the python shell. For (1), Spark submit already handles running python applications. For cases in which `bin/pyspark` is given a python file, we can simply call pass the file directly to Spark submit and let it handle the rest. For case (2), `bin/pyspark` starts a python process as before, which launches the JVM as a sub-process. The existing code already provides a code path to do this. All we needed to change is to use `bin/spark-submit` instead of `spark-class` to launch the JVM. This requires modifications to Spark submit to handle the pyspark shell as a special case. This has been tested locally (OSX and Windows 7), on a standalone cluster, and on a YARN cluster. Running IPython also works as before, except now it takes in Spark submit arguments too. Author: Andrew Or <andrewor14@gmail.com> Closes #799 from andrewor14/pyspark-submit and squashes the following commits: bf37e36 [Andrew Or] Minor changes 01066fa [Andrew Or] bin/pyspark for Windows c8cb3bf [Andrew Or] Handle perverse app names (with escaped quotes) 1866f85 [Andrew Or] Windows is not cooperating 456d844 [Andrew Or] Guard against shlex hanging if PYSPARK_SUBMIT_ARGS is not set 7eebda8 [Andrew Or] Merge branch 'master' of github.com:apache/spark into pyspark-submit b7ba0d8 [Andrew Or] Address a few comments (minor) 06eb138 [Andrew Or] Use shlex instead of writing our own parser 05879fa [Andrew Or] Merge branch 'master' of github.com:apache/spark into pyspark-submit a823661 [Andrew Or] Fix --die-on-broken-pipe not propagated properly 6fba412 [Andrew Or] Deal with quotes + address various comments fe4c8a7 [Andrew Or] Update --help for bin/pyspark afe47bf [Andrew Or] Fix spark shell f04aaa4 [Andrew Or] Merge branch 'master' of github.com:apache/spark into pyspark-submit a371d26 [Andrew Or] Route bin/pyspark through Spark submit
*	Version bump of spark-ec2 scripts	Patrick Wendell	2014-05-16	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	This will allow us to change things in spark-ec2 related to the 1.0 release. Author: Patrick Wendell <pwendell@gmail.com> Closes #809 from pwendell/spark-ec2 and squashes the following commits: 59117fb [Patrick Wendell] Version bump of spark-ec2 scripts
*	SPARK-1864 Look in spark conf instead of system properties when propagating ↵	Michael Armbrust	2014-05-16	1	-4/+5
\| \| \| \| \| \| \| \| \| \|	configuration to executors. Author: Michael Armbrust <michael@databricks.com> Closes #808 from marmbrus/confClasspath and squashes the following commits: 4c31d57 [Michael Armbrust] Look in spark conf instead of system properties when propagating configuration to executors.
*	Tweaks to Mesos docs	Matei Zaharia	2014-05-16	1	-37/+34
\| \| \| \| \| \| \| \| \| \| \| \|	- Mention Apache downloads first - Shorten some wording Author: Matei Zaharia <matei@databricks.com> Closes #806 from mateiz/doc-update and squashes the following commits: d9345cd [Matei Zaharia] typo a179f8d [Matei Zaharia] Tweaks to Mesos docs