spark - Mirror of Apache Spark

	Commit message (Collapse)	Author	Age	Files	Lines
*	fix java.lang.ClassCastException	baishuo(白硕)	2014-06-03	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	get Exception when run：bin/run-example org.apache.spark.examples.sql.RDDRelation Exception's detail is: Exception in thread "main" java.lang.ClassCastException: java.lang.Long cannot be cast to java.lang.Integer at scala.runtime.BoxesRunTime.unboxToInt(BoxesRunTime.java:106) at org.apache.spark.sql.catalyst.expressions.GenericRow.getInt(Row.scala:145) at org.apache.spark.examples.sql.RDDRelation$.main(RDDRelation.scala:49) at org.apache.spark.examples.sql.RDDRelation.main(RDDRelation.scala) change sql("SELECT COUNT() FROM records").collect().head.getInt(0) to sql("SELECT COUNT() FROM records").collect().head.getLong(0), then the Exception do not occur any more Author: baishuo(白硕) <vc_java@hotmail.com> Closes #949 from baishuo/master and squashes the following commits: f4b319f [baishuo(白硕)] fix java.lang.ClassCastException (cherry picked from commit aa41a522d821c989c65fa3f7f2a4d372e39bb958) Signed-off-by: Michael Armbrust <michael@databricks.com>
*	[maven-release-plugin] prepare for next development iteration	Tathagata Das	2014-05-26	1	-1/+1
\|
*	[maven-release-plugin] prepare release v1.0.0-rc11v1.0.0	Tathagata Das	2014-05-26	1	-1/+1
\|
*	Revert "[maven-release-plugin] prepare release v1.0.0-rc11"	Tathagata Das	2014-05-26	1	-1/+1
\| \| \| \|	This reverts commit 2f1dc868e5714882cf40d2633fb66772baf34789.
*	Revert "[maven-release-plugin] prepare for next development iteration"	Tathagata Das	2014-05-26	1	-1/+1
\| \| \| \|	This reverts commit 832dc594e7666f1d402334f8015ce29917d9c888.
*	Fix PEP8 violations in examples/src/main/python.	Reynold Xin	2014-05-25	6	-19/+25
\| \| \| \| \| \| \| \| \| \| \|	Author: Reynold Xin <rxin@apache.org> Closes #870 from rxin/examples-python-pep8 and squashes the following commits: 2829e84 [Reynold Xin] Fix PEP8 violations in examples/src/main/python. (cherry picked from commit d79c2b28e17ec0b15198aaedd2e1f403d81f717e) Signed-off-by: Reynold Xin <rxin@apache.org>
*	[maven-release-plugin] prepare for next development iteration	Tathagata Das	2014-05-25	1	-1/+1
\|
*	[maven-release-plugin] prepare release v1.0.0-rc11	Tathagata Das	2014-05-25	1	-1/+1
\|
*	Revert "[maven-release-plugin] prepare release v1.0.0-rc10"	Tathagata Das	2014-05-25	1	-1/+1
\| \| \| \|	This reverts commit d807023479ce10aec28ef3c1ab646ddefc2e663c.
*	Revert "[maven-release-plugin] prepare for next development iteration"	Tathagata Das	2014-05-25	1	-1/+1
\| \| \| \|	This reverts commit 67dd53d2556f03ce292e6889128cf441f1aa48f8.
*	[maven-release-plugin] prepare for next development iteration	Tathagata Das	2014-05-20	1	-1/+1
\|
*	[maven-release-plugin] prepare release v1.0.0-rc10	Tathagata Das	2014-05-20	1	-1/+1
\|
*	Revert "[maven-release-plugin] prepare release v1.0.0-rc9"	Tathagata Das	2014-05-19	1	-1/+1
\| \| \| \|	This reverts commit 920f947eb5a22a679c0c3186cf69ee75f6041c75.
*	Revert "[maven-release-plugin] prepare for next development iteration"	Tathagata Das	2014-05-19	1	-1/+1
\| \| \| \|	This reverts commit f8e611955096c5c1c7db5764b9d2851b1d295f0d.
*	[SPARK-1874][MLLIB] Clean up MLlib sample data	Xiangrui Meng	2014-05-19	3	-2/+36
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	1. Added synthetic datasets for `MovieLensALS`, `LinearRegression`, `BinaryClassification`. 2. Embedded instructions in the help message of those example apps. Per discussion with Matei on the JIRA page, new example data is under `data/mllib`. Author: Xiangrui Meng <meng@databricks.com> Closes #833 from mengxr/mllib-sample-data and squashes the following commits: 59f0a18 [Xiangrui Meng] add sample binary classification data 3c2f92f [Xiangrui Meng] add linear regression data 050f1ca [Xiangrui Meng] add a sample dataset for MovieLensALS example (cherry picked from commit bcb9dce6f444a977c714117811bce0c54b417650) Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
*	[maven-release-plugin] prepare for next development iteration	Patrick Wendell	2014-05-17	1	-1/+1
\|
*	[maven-release-plugin] prepare release v1.0.0-rc9	Patrick Wendell	2014-05-17	1	-1/+1
\|
*	Revert "[maven-release-plugin] prepare release v1.0.0-rc8"	Patrick Wendell	2014-05-16	1	-1/+1
\| \| \| \|	This reverts commit 80eea0f111c06260ffaa780d2f3f7facd09c17bc.
*	Revert "[maven-release-plugin] prepare for next development iteration"	Patrick Wendell	2014-05-16	1	-1/+1
\| \| \| \|	This reverts commit e5436b8c1a79ce108f3af402455ac5f6dc5d1eb3.
*	[SPARK-1824] Remove <master> from Python examples	Andrew Or	2014-05-16	10	-53/+53
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	A recent PR (#552) fixed this for all Scala / Java examples. We need to do it for python too. Note that this blocks on #799, which makes `bin/pyspark` go through Spark submit. With only the changes in this PR, the only way to run these examples is through Spark submit. Once #799 goes in, you can use `bin/pyspark` to run them too. For example, ``` bin/pyspark examples/src/main/python/pi.py 100 --master local-cluster[4,1,512] ``` Author: Andrew Or <andrewor14@gmail.com> Closes #802 from andrewor14/python-examples and squashes the following commits: cf50b9f [Andrew Or] De-indent python comments (minor) 50f80b1 [Andrew Or] Remove pyFiles from SparkContext construction c362f69 [Andrew Or] Update docs to use spark-submit for python applications 7072c6a [Andrew Or] Merge branch 'master' of github.com:apache/spark into python-examples 427a5f0 [Andrew Or] Update docs d32072c [Andrew Or] Remove <master> from examples + update usages (cherry picked from commit cf6cbe9f76c3b322a968c836d039fc5b70d4ce43) Signed-off-by: Patrick Wendell <pwendell@gmail.com>
*	[maven-release-plugin] prepare for next development iteration	Patrick Wendell	2014-05-16	1	-1/+1
\|
*	[maven-release-plugin] prepare release v1.0.0-rc8	Patrick Wendell	2014-05-16	1	-1/+1
\|
*	Revert "[maven-release-plugin] prepare release v1.0.0-rc7"	Patrick Wendell	2014-05-16	1	-1/+1
\| \| \| \|	This reverts commit 9212b3e5bb5545ccfce242da8d89108e6fb1c464.
*	Revert "[maven-release-plugin] prepare for next development iteration"	Patrick Wendell	2014-05-16	1	-1/+1
\| \| \| \|	This reverts commit c4746aa6fe4aaf383e69e34353114d36d1eb9ba6.
*	[maven-release-plugin] prepare for next development iteration	Patrick Wendell	2014-05-15	1	-1/+1
\|
*	[maven-release-plugin] prepare release v1.0.0-rc7	Patrick Wendell	2014-05-15	1	-1/+1
\|
*	Revert "[maven-release-plugin] prepare release v1.0.0-rc6"	Patrick Wendell	2014-05-14	1	-1/+1
\| \| \| \|	This reverts commit 54133abdce0246f6643a1112a5204afb2c4caa82.
*	Revert "[maven-release-plugin] prepare for next development iteration"	Patrick Wendell	2014-05-14	1	-1/+1
\| \| \| \|	This reverts commit e480bcfbd269ae1d7a6a92cfb50466cf192fe1fb.
*	[maven-release-plugin] prepare for next development iteration	Patrick Wendell	2014-05-14	1	-1/+1
\|
*	[maven-release-plugin] prepare release v1.0.0-rc6	Patrick Wendell	2014-05-14	1	-1/+1
\|
*	Revert "[maven-release-plugin] prepare release v1.0.0-rc5"	Patrick Wendell	2014-05-14	1	-1/+1
\| \| \| \|	This reverts commit 18f062303303824139998e8fc8f4158217b0dbc3.
*	Revert "[maven-release-plugin] prepare for next development iteration"	Patrick Wendell	2014-05-14	1	-1/+1
\| \| \| \|	This reverts commit d08e9604fc9958b7c768e91715c8152db2ed6fd0.
*	Fixed streaming examples docs to use run-example instead of spark-submit	Tathagata Das	2014-05-14	17	-72/+95
\| \| \| \| \| \| \| \| \| \| \| \| \|	Pretty self-explanatory Author: Tathagata Das <tathagata.das1565@gmail.com> Closes #722 from tdas/example-fix and squashes the following commits: 7839979 [Tathagata Das] Minor changes. 0673441 [Tathagata Das] Fixed java docs of java streaming example e687123 [Tathagata Das] Fixed scala style errors. 9b8d112 [Tathagata Das] Fixed streaming examples docs to use run-example instead of spark-submit.
*	[maven-release-plugin] prepare for next development iteration	Patrick Wendell	2014-05-13	1	-1/+1
\|
*	[maven-release-plugin] prepare release v1.0.0-rc5	Patrick Wendell	2014-05-13	1	-1/+1
\|
*	Revert "[maven-release-plugin] prepare release v1.0.0-rc4"	Patrick Wendell	2014-05-12	1	-1/+1
\| \| \| \|	This reverts commit 3d0a44833ab50360bf9feccc861cb5e8c44a4866.
*	Revert "[maven-release-plugin] prepare for next development iteration"	Patrick Wendell	2014-05-12	1	-1/+1
\| \| \| \|	This reverts commit 9772d85c6f3893d42044f4bab0e16f8b6287613a.
*	[maven-release-plugin] prepare for next development iteration	Patrick Wendell	2014-05-13	1	-1/+1
\|
*	[maven-release-plugin] prepare release v1.0.0-rc4	Patrick Wendell	2014-05-13	1	-1/+1
\|
*	Rollback versions for 1.0.0-rc4	Patrick Wendell	2014-05-12	1	-1/+1
\|
*	[maven-release-plugin] prepare for next development iteration	Patrick Wendell	2014-05-12	1	-1/+1
\|
*	[maven-release-plugin] prepare release v1.0.0-rc4	Patrick Wendell	2014-05-12	1	-1/+1
\|
*	SPARK-1789. Multiple versions of Netty dependencies cause FlumeStreamSuite ↵	Sean Owen	2014-05-10	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	failure TL;DR is there is a bit of JAR hell trouble with Netty, that can be mostly resolved and will resolve a test failure. I hit the error described at http://apache-spark-user-list.1001560.n3.nabble.com/SparkContext-startup-time-out-td1753.html while running FlumeStreamingSuite, and have for a short while (is it just me?) velvia notes: "I have found a workaround. If you add akka 2.2.4 to your dependencies, then everything works, probably because akka 2.2.4 brings in newer version of Jetty." There are at least 3 versions of Netty in play in the build: - the new Flume 1.4.0 dependency brings in io.netty:netty:3.4.0.Final, and that is the immediate problem - the custom version of akka 2.2.3 depends on io.netty:netty:3.6.6. - but, Spark Core directly uses io.netty:netty-all:4.0.17.Final The POMs try to exclude other versions of netty, but are excluding org.jboss.netty:netty, when in fact older versions of io.netty:netty (not netty-all) are also an issue. The org.jboss.netty:netty excludes are largely unnecessary. I replaced many of them with io.netty:netty exclusions until everything agreed on io.netty:netty-all:4.0.17.Final. But this didn't work, since Akka 2.2.3 doesn't work with Netty 4.x. Down-grading to 3.6.6.Final across the board made some Spark code not compile. If the build keeps io.netty:netty:3.6.6.Final as well, everything seems to work. Part of the reason seems to be that Netty 3.x used the old `org.jboss.netty` packages. This is less than ideal, but is no worse than the current situation. So this PR resolves the issue and improves the JAR hell, even if it leaves the existing theoretical Netty 3-vs-4 conflict: - Remove org.jboss.netty excludes where possible, for clarity; they're not needed except with Hadoop artifacts - Add io.netty:netty excludes where needed -- except, let akka keep its io.netty:netty - Change a bit of test code that actually depended on Netty 3.x, to use 4.x equivalent - Update SBT build accordingly A better change would be to update Akka far enough such that it agrees on Netty 4.x, but I don't know if that's feasible. Author: Sean Owen <sowen@cloudera.com> Closes #723 from srowen/SPARK-1789 and squashes the following commits: 43661b7 [Sean Owen] Update and add Netty excludes to prevent some JAR conflicts that cause test issues (cherry picked from commit 2b7bd29eb6ee5baf739eec143044ecfc296b9b1f) Signed-off-by: Patrick Wendell <pwendell@gmail.com>
*	SPARK-1708. Add a ClassTag on Serializer and things that depend on it	Matei Zaharia	2014-05-10	1	-5/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This pull request contains a rebased patch from @heathermiller (https://github.com/heathermiller/spark/pull/1) to add ClassTags on Serializer and types that depend on it (Broadcast and AccumulableCollection). Putting these in the public API signatures now will allow us to use Scala Pickling for serialization down the line without breaking binary compatibility. One question remaining is whether we also want them on Accumulator -- Accumulator is passed as part of a bigger Task or TaskResult object via the closure serializer so it doesn't seem super useful to add the ClassTag there. Broadcast and AccumulableCollection in contrast were being serialized directly. CC @rxin, @pwendell, @heathermiller Author: Matei Zaharia <matei@databricks.com> Closes #700 from mateiz/spark-1708 and squashes the following commits: 1a3d8b0 [Matei Zaharia] Use fake ClassTag in Java 3b449ed [Matei Zaharia] test fix 2209a27 [Matei Zaharia] Code style fixes 9d48830 [Matei Zaharia] Add a ClassTag on Serializer and things that depend on it
*	Fixing typo in als.py	Evan Sparks	2014-05-08	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	XtY should be Xty. Author: Evan Sparks <evan.sparks@gmail.com> Closes #696 from etrain/patch-2 and squashes the following commits: 634cb8d [Evan Sparks] Fixing typo in als.py
*	SPARK-1565, update examples to be used with spark-submit script.	Prashant Sharma	2014-05-08	53	-469/+389
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit for initial feedback, basically I am curious if we should prompt user for providing args esp. when its mandatory. And can we skip if they are not ? Also few other things that did not work like `bin/spark-submit examples/target/scala-2.10/spark-examples-1.0.0-SNAPSHOT-hadoop1.0.4.jar --class org.apache.spark.examples.SparkALS --arg 100 500 10 5 2` Not all the args get passed properly, may be I have messed up something will try to sort it out hopefully. Author: Prashant Sharma <prashant.s@imaginea.com> Closes #552 from ScrapCodes/SPARK-1565/update-examples and squashes the following commits: 669dd23 [Prashant Sharma] Review comments 2727e70 [Prashant Sharma] SPARK-1565, update examples to be used with spark-submit script. (cherry picked from commit 44dd57fb66bb676d753ad8d9757f9f4c03364113) Signed-off-by: Patrick Wendell <pwendell@gmail.com>
*	Use numpy directly for matrix multiply.	Evan Sparks	2014-05-08	1	-8/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Using matrix multiply to compute XtX and XtY yields a 5-20x speedup depending on problem size. For example - the following takes 19s locally after this change vs. 5m21s before the change. (16x speedup). bin/pyspark examples/src/main/python/als.py local[8] 1000 1000 50 10 10 Author: Evan Sparks <evan.sparks@gmail.com> Closes #687 from etrain/patch-1 and squashes the following commits: e094dbc [Evan Sparks] Touching only diaganols on update. d1ab9b6 [Evan Sparks] Use numpy directly for matrix multiply. (cherry picked from commit 6ed7e2cd01955adfbb3960e2986b6d19eaee8717) Signed-off-by: Reynold Xin <rxin@apache.org>
*	SPARK-1668: Add implicit preference as an option to examples/MovieLensALS	Sandeep	2014-05-08	1	-9/+46
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add --implicitPrefs as an command-line option to the example app MovieLensALS under examples/ Author: Sandeep <sandeep@techaddict.me> Closes #597 from techaddict/SPARK-1668 and squashes the following commits: 8b371dc [Sandeep] Second Pass on reviews by mengxr eca9d37 [Sandeep] based on mengxr's suggestions 937e54c [Sandeep] Changes 5149d40 [Sandeep] Changes based on review 1dd7657 [Sandeep] use mean() 42444d7 [Sandeep] Based on Suggestions by mengxr e3082fa [Sandeep] SPARK-1668: Add implicit preference as an option to examples/MovieLensALS Add --implicitPrefs as an command-line option to the example app MovieLensALS under examples/ (cherry picked from commit 108c4c16cc82af2e161d569d2c23849bdbf4aadb) Signed-off-by: Reynold Xin <rxin@apache.org>
*	SPARK-1544 Add support for deep decision trees.	Manish Amde	2014-05-07	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	@etrain and I came with a PR for arbitrarily deep decision trees at the cost of multiple passes over the data at deep tree levels. To summarize: 1) We take a parameter that indicates the amount of memory users want to reserve for computation on each worker (and 2x that at the driver). 2) Using that information, we calculate two things - the maximum depth to which we train as usual (which is, implicitly, the maximum number of nodes we want to train in parallel), and the size of the groups we should use in the case where we exceed this depth. cc: @atalwalkar, @hirakendu, @mengxr Author: Manish Amde <manish9ue@gmail.com> Author: manishamde <manish9ue@gmail.com> Author: Evan Sparks <sparks@cs.berkeley.edu> Closes #475 from manishamde/deep_tree and squashes the following commits: 968ca9d [Manish Amde] merged master 7fc9545 [Manish Amde] added docs ce004a1 [Manish Amde] minor formatting b27ad2c [Manish Amde] formatting 426bb28 [Manish Amde] programming guide blurb 8053fed [Manish Amde] more formatting 5eca9e4 [Manish Amde] grammar 4731cda [Manish Amde] formatting 5e82202 [Manish Amde] added documentation, fixed off by 1 error in max level calculation cbd9f14 [Manish Amde] modified scala.math to math dad9652 [Manish Amde] removed unused imports e0426ee [Manish Amde] renamed parameter 718506b [Manish Amde] added unit test 1517155 [Manish Amde] updated documentation 9dbdabe [Manish Amde] merge from master 719d009 [Manish Amde] updating user documentation fecf89a [manishamde] Merge pull request #6 from etrain/deep_tree 0287772 [Evan Sparks] Fixing scalastyle issue. 2f1e093 [Manish Amde] minor: added doc for maxMemory parameter 2f6072c [manishamde] Merge pull request #5 from etrain/deep_tree abc5a23 [Evan Sparks] Parameterizing max memory. 50b143a [Manish Amde] adding support for very deep trees (cherry picked from commit f269b016acb17b24d106dc2b32a1be389489bb01) Signed-off-by: Patrick Wendell <pwendell@gmail.com>
*	[HOTFIX] SPARK-1637: There are some Streaming examples added after the PR ↵	Sandeep	2014-05-06	2	-6/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	#571 was last updated. This resulted in Compilation Errors. cc @mateiz project not compiling currently. Author: Sandeep <sandeep@techaddict.me> Closes #673 from techaddict/SPARK-1637-HOTFIX and squashes the following commits: b512f4f [Sandeep] [SPARK-1637][HOTFIX] There are some Streaming examples added after the PR #571 was last updated. This resulted in Compilation Errors. (cherry picked from commit fdae095de2daa1fc3b343c05e515235756d856a4) Signed-off-by: Patrick Wendell <pwendell@gmail.com>