spark - Mirror of Apache Spark

	Commit message (Collapse)	Author	Age	Files	Lines
*	[maven-release-plugin] prepare release v1.1.0-rc4v1.1.0	Patrick Wendell	2014-09-03	24	-38/+33
\|
*	Revert "[maven-release-plugin] prepare release v1.1.0-rc3"	Patrick Wendell	2014-09-02	24	-33/+38
\| \| \| \|	This reverts commit b2d0493b223c5f98a593bb6d7372706cc02bebad.
*	Revert "[maven-release-plugin] prepare for next development iteration"	Patrick Wendell	2014-09-02	24	-25/+25
\| \| \| \|	This reverts commit 865e6f63f63f5e881a02d1a4e3b4c5d0e86fcd8e.
*	SPARK-3358: [EC2] Switch back to HVM instances for m3.X.	Patrick Wendell	2014-09-02	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \|	During regression tests of Spark 1.1 we discovered perf issues with PVM instances when running PySpark. This reverts a change added in #1156 which changed the default type for m3 instances to PVM. Author: Patrick Wendell <pwendell@gmail.com> Closes #2244 from pwendell/ec2-hvm and squashes the following commits: 1342d7e [Patrick Wendell] SPARK-3358: [EC2] Switch back to HVM instances for m3.X.
*	[SPARK-2823][GraphX]fix GraphX EdgeRDD zipPartitions	luluorta	2014-09-02	2	-2/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	If the users set “spark.default.parallelism” and the value is different with the EdgeRDD partition number, GraphX jobs will throw: java.lang.IllegalArgumentException: Can't zip RDDs with unequal numbers of partitions Author: luluorta <luluorta@gmail.com> Closes #1763 from luluorta/fix-graph-zip and squashes the following commits: 8338961 [luluorta] fix GraphX EdgeRDD zipPartitions (cherry picked from commit 9b225ac3072de522b40b46aba6df1f1c231f13ef) Signed-off-by: Ankur Dave <ankurdave@gmail.com>
*	[SPARK-1981][Streaming][Hotfix] Fixed docs related to kinesis	Tathagata Das	2014-09-02	5	-16/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- Include kinesis in the unidocs - Hide non-public classes from docs Author: Tathagata Das <tathagata.das1565@gmail.com> Closes #2239 from tdas/kinesis-doc-fix and squashes the following commits: 156e20c [Tathagata Das] More fixes, based on PR comments. e9a6c01 [Tathagata Das] Fixed docs related to kinesis (cherry picked from commit e9bb12bea9fbef94332fbec88e3cd9197a27b7ad) Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
*	[SPARK-2981][GraphX] EdgePartition1D Int overflow	Larry Xiao	2014-09-02	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	minor fix detail is here: https://issues.apache.org/jira/browse/SPARK-2981 Author: Larry Xiao <xiaodi@sjtu.edu.cn> Closes #1902 from larryxiao/2981 and squashes the following commits: 88059a2 [Larry Xiao] [SPARK-2981][GraphX] EdgePartition1D Int overflow (cherry picked from commit aa7de128c5987fd2e134736f07ae913ad1f5eb26) Signed-off-by: Ankur Dave <ankurdave@gmail.com>
*	SPARK-3328 fixed make-distribution script --with-tachyon option.	Prudhvi Krishna	2014-09-02	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \|	Directory path for dependencies jar and resources in Tachyon 0.5.0 has been changed. Author: Prudhvi Krishna <prudhvi953@gmail.com> Closes #2228 from prudhvije/SPARK-3328/make-dist-fix and squashes the following commits: d1d2c22 [Prudhvi Krishna] SPARK-3328 fixed make-distribution script --with-tachyon option. (cherry picked from commit 644e31524a6a9a22c671a368aeb3b4eaeb61cf29) Signed-off-by: Patrick Wendell <pwendell@gmail.com>
*	[Build] merge changes to run-tests-jenkins from master branch	Nicholas Chammas	2014-09-02	1	-49/+134
\| \| \| \| \| \| \| \| \| \|	Author: Nicholas Chammas <nicholas.chammas@gmail.com> Author: nchammas <nicholas.chammas@gmail.com> Closes #2237 from nchammas/branch-1.1 and squashes the following commits: 39bdd5e [Nicholas Chammas] merge updates from master f5aa841 [nchammas] Merge pull request #3 from apache/branch-1.1
*	[SPARK-3332] Revert spark-ec2 patch that identifies clusters using tags	Josh Rosen	2014-09-02	2	-64/+30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This reverts #1899 and #2163, two patches that modified `spark-ec2` so that clusters are identified using tags instead of security groups. The original motivation for this patch was to allow multiple clusters to run in the same security group. Unfortunately, tagging is not atomic with launching instances on EC2, so with this approach we have the possibility of `spark-ec2` launching instances and crashing before they can be tagged, effectively orphaning those instances. The orphaned instances won't belong to any cluster, so the `spark-ec2` script will be unable to clean them up. Since this feature may still be worth supporting, there are several alternative approaches that we might consider, including detecting orphaned instances and logging warnings, or maybe using another mechanism to group instances into clusters. For the 1.1.0 release, though, I propose that we just revert this patch. Author: Josh Rosen <joshrosen@apache.org> Closes #2225 from JoshRosen/revert-ec2-cluster-naming and squashes the following commits: 0c18e86 [Josh Rosen] Revert "SPARK-2333 - spark_ec2 script should allow option for existing security group" c2ca2d4 [Josh Rosen] Revert "Spark-3213 Fixes issue with spark-ec2 not detecting slaves created with "Launch More like this""
*	[MLlib] Squash bug in IndexedRowMatrix	Reza Zadeh	2014-09-02	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	Kill this bug fast before it does damage. Author: Reza Zadeh <rizlar@gmail.com> Closes #2224 from rezazadeh/indexrmbug and squashes the following commits: 53386d6 [Reza Zadeh] Squash bug in IndexedRowMatrix (cherry picked from commit 0f16b23cd17002fac05f3ecc58899be1b1121b82) Signed-off-by: Xiangrui Meng <meng@databricks.com>
*	[SPARK-3342] Add SSDs to block device mapping	Daniel Darabos	2014-09-01	1	-1/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	On `m3.2xlarge` instances the 2x80GB SSDs are inaccessible if not added to the block device mapping when the instance is created. They work when added with this patch. I have not tested this with other instance types, and I do not know much about this script and EC2 deployment in general. Maybe this code needs to depend on the instance type. The requirement for this mapping is described in the AWS docs at: http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/InstanceStorage.html#InstanceStore_UsageScenarios "For M3 instances, you must specify instance store volumes in the block device mapping for the instance. When you launch an M3 instance, we ignore any instance store volumes specified in the block device mapping for the AMI." Author: Daniel Darabos <darabos.daniel@gmail.com> Closes #2081 from darabos/patch-1 and squashes the following commits: 1ceb2c8 [Daniel Darabos] Use %d string interpolation instead of {}. a1854d7 [Daniel Darabos] Only specify ephemeral device mapping for M3. e0d9e37 [Daniel Darabos] Create ephemeral device mapping based on get_num_disks(). 6b116a6 [Daniel Darabos] Add SSDs to block device mapping
*	[maven-release-plugin] prepare for next development iteration	Patrick Wendell	2014-08-30	24	-25/+25
\|
*	[maven-release-plugin] prepare release v1.1.0-rc3	Patrick Wendell	2014-08-30	24	-38/+33
\|
*	Revert "[maven-release-plugin] prepare release v1.1.0-rc3"	Patrick Wendell	2014-08-30	24	-33/+38
\| \| \| \|	This reverts commit 2b2e02265f80e4c5172c1e498aa9ba2c6b91c6c9.
*	Revert "[maven-release-plugin] prepare for next development iteration"	Patrick Wendell	2014-08-30	24	-25/+25
\| \| \| \|	This reverts commit 8b5f0dbd8d32a25a4e7ba3ebe1a4c3c6310aeb85.
*	BUILD: Adding back CDH4 as per user requests	Patrick Wendell	2014-08-29	1	-0/+1
\|
*	[maven-release-plugin] prepare for next development iteration	Patrick Wendell	2014-08-30	24	-25/+25
\|
*	[maven-release-plugin] prepare release v1.1.0-rc3	Patrick Wendell	2014-08-30	24	-38/+33
\|
*	Adding new CHANGES.txt	Patrick Wendell	2014-08-29	1	-0/+45
\|
*	[SPARK-3320][SQL] Made batched in-memory column buffer building work for ↵	Cheng Lian	2014-08-29	3	-34/+39
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	SchemaRDDs with empty partitions Author: Cheng Lian <lian.cs.zju@gmail.com> Closes #2213 from liancheng/spark-3320 and squashes the following commits: 45a0139 [Cheng Lian] Fixed typo in InMemoryColumnarQuerySuite f67067d [Cheng Lian] Fixed SPARK-3320 (cherry picked from commit 32b18dd52cf8920903819f23e406271ecd8ac6bb) Signed-off-by: Michael Armbrust <michael@databricks.com>
*	[SPARK-3296][mllib] spark-example should be run-example in head notation of ↵	wangfei	2014-08-29	2	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	DenseKMeans and SparseNaiveBayes `./bin/spark-example` should be `./bin/run-example` in DenseKMeans and SparseNaiveBayes Author: wangfei <wangfei_hello@126.com> Closes #2193 from scwf/run-example and squashes the following commits: 207eb3a [wangfei] spark-example should be run-example 27a8999 [wangfei] ./bin/spark-example should be ./bin/run-example (cherry picked from commit 13901764f4e9ed3de03e420d88ab42bdce5d5140) Signed-off-by: Xiangrui Meng <meng@databricks.com>
*	Revert "[maven-release-plugin] prepare release v1.1.0-rc2"	Patrick Wendell	2014-08-29	24	-33/+38
\| \| \| \|	This reverts commit 711aebb329ca28046396af1e34395a0df92b5327.
*	Revert "[maven-release-plugin] prepare for next development iteration"	Patrick Wendell	2014-08-29	24	-25/+25
\| \| \| \|	This reverts commit a4a7a241441489a0d31365e18476ae2e1c34464d.
*	[SPARK-3291][SQL]TestcaseName in createQueryTest should not contain ":"	qiping.lqp	2014-08-29	3	-1/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	":" is not allowed to appear in a file name of Windows system. If file name contains ":", this file can't be checked out in a Windows system and developers using Windows must be careful to not commit the deletion of such files, Which is very inconvenient. Author: qiping.lqp <qiping.lqp@alibaba-inc.com> Closes #2191 from chouqin/querytest and squashes the following commits: 0e943a1 [qiping.lqp] rename golden file 60a863f [qiping.lqp] TestcaseName in createQueryTest should not contain ":" (cherry picked from commit 634d04b87c2744d645e9c26e746ba2006371d9b5) Signed-off-by: Michael Armbrust <michael@databricks.com>
*	[SPARK-3269][SQL] Decreases initial buffer size for row set to prevent OOM	Cheng Lian	2014-08-29	1	-2/+3
\| \| \| \| \| \| \| \| \| \| \| \| \|	When a large batch size is specified, `SparkSQLOperationManager` OOMs even if the whole result set is much smaller than the batch size. Author: Cheng Lian <lian.cs.zju@gmail.com> Closes #2171 from liancheng/jdbc-fetch-size and squashes the following commits: 5e1623b [Cheng Lian] Decreases initial buffer size for row set to prevent OOM (cherry picked from commit d94a44d7caaf3fe7559d9ad7b10872fa16cf81ca) Signed-off-by: Michael Armbrust <michael@databricks.com>
*	[SPARK-3234][Build] Fixed environment variables that rely on deprecated ↵	Cheng Lian	2014-08-29	1	-1/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	command line options in make-distribution.sh Please refer to [SPARK-3234](https://issues.apache.org/jira/browse/SPARK-3234) for details. Author: Cheng Lian <lian.cs.zju@gmail.com> Closes #2208 from liancheng/spark-3234 and squashes the following commits: fb26de8 [Cheng Lian] Fixed SPARK-3234 (cherry picked from commit 287c0ac7722dd4bc51b921ccc6f0e3c1625b5ff4) Signed-off-by: Patrick Wendell <pwendell@gmail.com>
*	[Docs] SQL doc formatting and typo fixes	Nicholas Chammas	2014-08-29	2	-59/+52
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As [reported on the dev list](http://apache-spark-developers-list.1001551.n3.nabble.com/VOTE-Release-Apache-Spark-1-1-0-RC2-tp8107p8131.html): * Code fencing with triple-backticks doesn’t seem to work like it does on GitHub. Newlines are lost. Instead, use 4-space indent to format small code blocks. * Nested bullets need 2 leading spaces, not 1. * Spellcheck! Author: Nicholas Chammas <nicholas.chammas@gmail.com> Author: nchammas <nicholas.chammas@gmail.com> Closes #2201 from nchammas/sql-doc-fixes and squashes the following commits: 873f889 [Nicholas Chammas] [Docs] fix skip-api flag 5195e0c [Nicholas Chammas] [Docs] SQL doc formatting and typo fixes 3b26c8d [nchammas] [Spark QA] Link to console output on test time out (cherry picked from commit 53aa8316e88980c6f46d3b9fc90d935a4738a370) Signed-off-by: Michael Armbrust <michael@databricks.com>
*	[SPARK-3307] [PySpark] Fix doc string of SparkContext.broadcast()	Davies Liu	2014-08-29	1	-2/+0
\| \| \| \| \| \| \| \| \| \| \| \| \|	remove invalid docs Author: Davies Liu <davies.liu@gmail.com> Closes #2202 from davies/keep and squashes the following commits: aa3b44f [Davies Liu] remove invalid docs (cherry picked from commit e248328b39f52073422a12fd0388208de41be1c7) Signed-off-by: Josh Rosen <joshrosen@apache.org>
*	HOTFIX: Bump spark-ec2 version to 1.1.0	Patrick Wendell	2014-08-29	1	-1/+1
\|
*	[maven-release-plugin] prepare for next development iteration	Patrick Wendell	2014-08-29	24	-25/+25
\|
*	[maven-release-plugin] prepare release v1.1.0-rc2	Patrick Wendell	2014-08-29	24	-38/+33
\|
*	Revert "[maven-release-plugin] prepare release v1.1.0-rc1"	Patrick Wendell	2014-08-28	24	-33/+38
\| \| \| \|	This reverts commit f07183249b74dd857069028bf7d570b35f265585.
*	Revert "[maven-release-plugin] prepare for next development iteration"	Patrick Wendell	2014-08-28	24	-25/+25
\| \| \| \|	This reverts commit f8f7a0c9dce764ece8acdc41d35bbf448dba7e92.
*	Adding new CHANGES.txt	Patrick Wendell	2014-08-28	1	-0/+30
\|
*	[SPARK-3277] Fix external spilling with LZ4 assertion error	Andrew Or	2014-08-28	5	-96/+144
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary of the changes The bulk of this PR is comprised of tests and documentation; the actual fix is really just adding 1 line of code (see `BlockObjectWriter.scala`). We currently do not run the `External` test suites with different compression codecs, and this would have caught the bug reported in [SPARK-3277](https://issues.apache.org/jira/browse/SPARK-3277). This PR extends the existing code to test spilling using all compression codecs known to Spark, including `LZ4`. The bug itself* In `DiskBlockObjectWriter`, we only report the shuffle bytes written before we close the streams. With `LZ4`, all the bytes written reported by our metrics were 0 because `flush()` was not taking effect for some reason. In general, compression codecs may write additional bytes to the file after we call `close()`, and so we must also capture those bytes in our shuffle write metrics. Thanks mridulm and pwendell for help with debugging. Author: Andrew Or <andrewor14@gmail.com> Author: Patrick Wendell <pwendell@gmail.com> Closes #2187 from andrewor14/fix-lz4-spilling and squashes the following commits: 1b54bdc [Andrew Or] Speed up tests by not compressing everything 1c4624e [Andrew Or] Merge branch 'master' of github.com:apache/spark into fix-lz4-spilling 6b2e7d1 [Andrew Or] Fix compilation error 92e251b [Patrick Wendell] Better documentation for BlockObjectWriter. a1ad536 [Andrew Or] Fix tests 089593f [Andrew Or] Actually fix SPARK-3277 (tests still fail) 4bbcf68 [Andrew Or] Update tests to actually test all compression codecs b264a84 [Andrew Or] ExternalAppendOnlyMapSuite code style fixes (minor) 1bfa743 [Andrew Or] Add more information to assert for better debugging
*	SPARK-3082. yarn.Client.logClusterResourceDetails throws NPE if requeste...	Sandy Ryza	2014-08-28	2	-20/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	...d queue doesn't exist Author: Sandy Ryza <sandy@cloudera.com> Closes #1984 from sryza/sandy-spark-3082 and squashes the following commits: fe08c37 [Sandy Ryza] Remove log message entirely 85253ad [Sandy Ryza] SPARK-3082. yarn.Client.logClusterResourceDetails throws NPE if requested queue doesn't exist (cherry picked from commit 92af2314f27e80227174499f2fca505bd551cda7) Signed-off-by: Andrew Or <andrewor14@gmail.com>
*	[SPARK-3190] Avoid overflow in VertexRDD.count()	Ankur Dave	2014-08-28	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	VertexRDDs with more than 4 billion elements are counted incorrectly due to integer overflow when summing partition sizes. This PR fixes the issue by converting partition sizes to Longs before summing them. The following code previously returned -10000000. After applying this PR, it returns the correct answer of 5000000000 (5 billion). ```scala val pairs = sc.parallelize(0L until 500L).map(_ * 10000000) .flatMap(start => start until (start + 10000000)).map(x => (x, x)) VertexRDD(pairs).count() ``` Author: Ankur Dave <ankurdave@gmail.com> Closes #2106 from ankurdave/SPARK-3190 and squashes the following commits: 641f468 [Ankur Dave] Avoid overflow in VertexRDD.count() (cherry picked from commit 96df92906978c5f58e0cc8ff5eebe5b35a08be3b) Signed-off-by: Josh Rosen <joshrosen@apache.org>
*	[SPARK-3264] Allow users to set executor Spark home in Mesos	Andrew Or	2014-08-28	3	-8/+22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The executors and the driver may not share the same Spark home. There is currently one way to set the executor side Spark home in Mesos, through setting `spark.home`. However, this is neither documented nor intuitive. This PR adds a more specific config `spark.mesos.executor.home` and exposes this to the user. liancheng tnachen Author: Andrew Or <andrewor14@gmail.com> Closes #2166 from andrewor14/mesos-spark-home and squashes the following commits: b87965e [Andrew Or] Merge branch 'master' of github.com:apache/spark into mesos-spark-home f6abb2e [Andrew Or] Document spark.mesos.executor.home ca7846d [Andrew Or] Add more specific configuration for executor Spark home in Mesos (cherry picked from commit 41dc5987d9abeca6fc0f5935c780d48f517cdf95) Signed-off-by: Andrew Or <andrewor14@gmail.com>
*	[SPARK-3150] Fix NullPointerException in in Spark recovery: Add initializing ↵	Tatiana Borisova	2014-08-28	1	-0/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	default values in DriverInfo.init() The issue happens when Spark is run standalone on a cluster. When master and driver fall simultaneously on one node in a cluster, master tries to recover its state and restart spark driver. While restarting driver, it falls with NPE exception (stacktrace is below). After falling, it restarts and tries to recover its state and restart Spark driver again. It happens over and over in an infinite cycle. Namely, Spark tries to read DriverInfo state from zookeeper, but after reading it happens to be null in DriverInfo.worker. https://issues.apache.org/jira/browse/SPARK-3150 Author: Tatiana Borisova <tanyatik@yandex.ru> Closes #2062 from tanyatik/spark-3150 and squashes the following commits: 9936043 [Tatiana Borisova] Add initializing default values in DriverInfo.init() (cherry picked from commit 70d814665baa8b8ca868d3126452105ecfa5cbff) Signed-off-by: Josh Rosen <joshrosen@apache.org>
*	[maven-release-plugin] prepare for next development iteration	Patrick Wendell	2014-08-28	24	-25/+25
\|
*	[maven-release-plugin] prepare release v1.1.0-rc1	Patrick Wendell	2014-08-28	24	-38/+33
\|
*	Revert "[maven-release-plugin] prepare release v1.1.0-rc1"	Patrick Wendell	2014-08-28	24	-33/+38
\| \| \| \|	This reverts commit 58b0be6a29eab817d350729710345e9f39e4c506.
*	Revert "[maven-release-plugin] prepare for next development iteration"	Patrick Wendell	2014-08-28	24	-25/+25
\| \| \| \|	This reverts commit 78e3c036eee7113b2ed144eec5061e070b479e56.
*	Revert "[maven-release-plugin] prepare release v1.1.0-rc1"	Patrick Wendell	2014-08-28	24	-25/+25
\| \| \| \|	This reverts commit 79e86ef3e1a3ee03a7e3b166a5c7dee11c6d60d7.
*	Revert "[maven-release-plugin] prepare for next development iteration"	Patrick Wendell	2014-08-28	24	-25/+25
\| \| \| \|	This reverts commit a118ea5c59d653f5a3feda21455ba60bc722b3b1.
*	Revert "Revert "[maven-release-plugin] prepare for next development iteration""	Patrick Wendell	2014-08-28	24	-25/+25
\| \| \| \|	This reverts commit 71ec0140f7e121bdba3d19e8219e91a5e9d1e320.
*	Revert "Revert "[maven-release-plugin] prepare release v1.1.0-rc1""	Patrick Wendell	2014-08-28	24	-25/+25
\| \| \| \|	This reverts commit 56070f12f455bae645cba887a74c72b12f1085f8.
*	Revert "[maven-release-plugin] prepare release v1.1.0-rc1"	Patrick Wendell	2014-08-28	24	-25/+25
\| \| \| \|	This reverts commit da4b94c86c9dd0d624b3040aa4b9449be9f60fc3.
*	Revert "[maven-release-plugin] prepare for next development iteration"	Patrick Wendell	2014-08-28	24	-25/+25
\| \| \| \|	This reverts commit 96926c5a42c5970ed74c50db5bd9c68cacf92207.