spark - Mirror of Apache Spark

	Commit message (Collapse)	Author	Age	Files	Lines
*	[SPARK-2366] [SQL] Add column pruning for the right side of LeftSemi join.	Takuya UESHIN	2014-07-05	1	-8/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The right side of `LeftSemi` join needs columns only used in join condition. Author: Takuya UESHIN <ueshin@happy-camper.st> Closes #1301 from ueshin/issues/SPARK-2366 and squashes the following commits: 7677a39 [Takuya UESHIN] Update comments. 786d3a0 [Takuya UESHIN] Rename method name. e0957b1 [Takuya UESHIN] Add column pruning for the right side of LeftSemi join. (cherry picked from commit 3da8df939ec63064692ba64d9188aeea908b305c) Signed-off-by: Michael Armbrust <michael@databricks.com>
*	[SPARK-2370][SQL] Decrease metadata retrieved for partitioned hive queries.	Michael Armbrust	2014-07-04	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \|	Author: Michael Armbrust <michael@databricks.com> Closes #1305 from marmbrus/usePrunerPartitions and squashes the following commits: 744aa20 [Michael Armbrust] Use getAllPartitionsForPruner instead of getPartitions, which avoids retrieving auth data (cherry picked from commit 9d006c97371ddf357e0b821d5c6d1535d9b6fe41) Signed-off-by: Reynold Xin <rxin@apache.org>
*	[maven-release-plugin] prepare for next development iteration	Ubuntu	2014-07-04	21	-22/+22
\|
*	[maven-release-plugin] prepare release v1.0.1-rc2v1.0.1	Ubuntu	2014-07-04	21	-22/+22
\|
*	Updating CHANGES.txt file	Patrick Wendell	2014-07-04	1	-0/+125
\|
*	HOTFIX: Merge issue with cf1d46e4.	Patrick Wendell	2014-07-04	1	-2/+2
\| \| \| \|	The tests in that patch used a newer constructor for TaskInfo.
*	[SPARK-2059][SQL] Add analysis checks	Reynold Xin	2014-07-04	2	-0/+24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This replaces #1263 with a test case. Author: Reynold Xin <rxin@apache.org> Author: Michael Armbrust <michael@databricks.com> Closes #1265 from rxin/sql-analysis-error and squashes the following commits: a639e01 [Reynold Xin] Added a test case for unresolved attribute analysis. 7371e1b [Reynold Xin] Merge pull request #1263 from marmbrus/analysisChecks 448c088 [Michael Armbrust] Add analysis checks (cherry picked from commit b3e768e154bd7175db44c3ffc3d8f783f15ab776) Signed-off-by: Reynold Xin <rxin@apache.org>
*	Update SQLConf.scala	baishuo(白硕)	2014-07-04	1	-6/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	use concurrent.ConcurrentHashMap instead of util.Collections.synchronizedMap Author: baishuo(白硕) <vc_java@hotmail.com> Closes #1272 from baishuo/master and squashes the following commits: 51ec55d [baishuo(白硕)] Update SQLConf.scala 63da043 [baishuo(白硕)] Update SQLConf.scala 36b6dbd [baishuo(白硕)] Update SQLConf.scala 864faa0 [baishuo(白硕)] Update SQLConf.scala 593096b [baishuo(白硕)] Update SQLConf.scala 7304d9b [baishuo(白硕)] Update SQLConf.scala 843581c [baishuo(白硕)] Update SQLConf.scala 1d3e4a2 [baishuo(白硕)] Update SQLConf.scala 0740f28 [baishuo(白硕)] Update SQLConf.scala (cherry picked from commit 0bbe61223eda3f33bbf8992d2a8f0d47813f4873) Signed-off-by: Reynold Xin <rxin@apache.org>
*	[SPARK-1199][REPL] Remove VALId and use the original import style for ↵	Prashant Sharma	2014-07-04	3	-11/+31
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	defined classes. This is an alternate solution to #1176. Author: Prashant Sharma <prashant.s@imaginea.com> Closes #1179 from ScrapCodes/SPARK-1199/repl-fix-second-approach and squashes the following commits: 820b34b [Prashant Sharma] Here we generate two kinds of import wrappers based on whether it is a class or not. (cherry picked from commit d43415075b3468fe8aa56de5d2907d409bb96347) Signed-off-by: Patrick Wendell <pwendell@gmail.com>
*	[SPARK-2059][SQL] Don't throw TreeNodeException in `execution.ExplainCommand`	Cheng Lian	2014-07-03	1	-3/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is a fix for the problem revealed by PR #1265. Currently `HiveComparisonSuite` ignores output of `ExplainCommand` since Catalyst query plan is quite different from Hive query plan. But exceptions throw from `CheckResolution` still breaks test cases. This PR catches any `TreeNodeException` and reports it as part of the query explanation. After merging this PR, PR #1265 can also be merged safely. For a normal query: ``` scala> hql("explain select key from src").foreach(println) ... [Physical execution plan:] [HiveTableScan [key#9], (MetastoreRelation default, src, None), None] ``` For a wrong query with unresolved attribute(s): ``` scala> hql("explain select kay from src").foreach(println) ... [Error occurred during query planning: ] [Unresolved attributes: 'kay, tree:] [Project ['kay]] [ LowerCaseSchema ] [ MetastoreRelation default, src, None] ``` Author: Cheng Lian <lian.cs.zju@gmail.com> Closes #1294 from liancheng/safe-explain and squashes the following commits: 4318911 [Cheng Lian] Don't throw TreeNodeException in `execution.ExplainCommand` (cherry picked from commit 544880457de556d1ad52e8cb7e1eca19da95f517) Signed-off-by: Reynold Xin <rxin@apache.org>
*	SPARK-2282: Reuse PySpark Accumulator sockets to avoid crashing Spark	Aaron Davidson	2014-07-03	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	JIRA: https://issues.apache.org/jira/browse/SPARK-2282 This issue is caused by a buildup of sockets in the TIME_WAIT stage of TCP, which is a stage that lasts for some period of time after the communication closes. This solution simply allows us to reuse sockets that are in TIME_WAIT, to avoid issues with the buildup of the rapid creation of these sockets. Author: Aaron Davidson <aaron@databricks.com> Closes #1220 from aarondav/SPARK-2282 and squashes the following commits: 2e5cab3 [Aaron Davidson] SPARK-2282: Reuse PySpark Accumulator sockets to avoid crashing Spark (cherry picked from commit 97a0bfe1c0261384f09d53f9350de52fb6446d59) Signed-off-by: Patrick Wendell <pwendell@gmail.com>
*	[SPARK-2307][Reprise] Correctly report RDD blocks on SparkUI	Andrew Or	2014-07-03	6	-23/+184
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem. The existing code in `ExecutorPage.scala` requires a linear scan through all the blocks to filter out the uncached ones. Every refresh could be expensive if there are many blocks and many executors. Solution. The proper semantics should be the following: `StorageStatusListener` should contain only block statuses that are cached. This means as soon as a block is unpersisted by any mean, its status should be removed. This is reflected in the changes made in `StorageStatusListener.scala`. Further, the `StorageTab` must stop relying on the `StorageStatusListener` changing a dropped block's status to `StorageLevel.NONE` (which no longer happens). This is reflected in the changes made in `StorageTab.scala` and `StorageUtils.scala`. ---------- If you have been following this chain of PRs like pwendell, you will quickly notice that this reverts the changes in #1249, which reverts the changes in #1080. In other words, we are adding back the changes from #1080, and fixing SPARK-2307 on top of those changes. Please ask questions if you are confused. Author: Andrew Or <andrewor14@gmail.com> Closes #1255 from andrewor14/storage-ui-fix-reprise and squashes the following commits: 45416fa [Andrew Or] Merge branch 'master' of github.com:apache/spark into storage-ui-fix-reprise a82ea25 [Andrew Or] Add tests for StorageStatusListener 8773b01 [Andrew Or] Update comment / minor changes 3afde3f [Andrew Or] Correctly report the number of blocks on SparkUI (cherry picked from commit 3894a49be9b532cc026d908a0f49bca850504498) Signed-off-by: Patrick Wendell <pwendell@gmail.com>
*	[SPARK-2350] Don't NPE while launching drivers	Aaron Davidson	2014-07-03	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	Prior to this change, we could throw a NPE if we launch a driver while another one is waiting, because removing from an iterator while iterating over it is not safe. Author: Aaron Davidson <aaron@databricks.com> Closes #1289 from aarondav/master-fail and squashes the following commits: 1cf1cf4 [Aaron Davidson] SPARK-2350: Don't NPE while launching drivers (cherry picked from commit 586feb5c9528042420f678f78bacb6c254a5eaf8) Signed-off-by: Patrick Wendell <pwendell@gmail.com>
*	[SPARK-1097] Workaround Hadoop conf ConcurrentModification issue	Raymond Liu	2014-07-03	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Workaround Hadoop conf ConcurrentModification issue Author: Raymond Liu <raymond.liu@intel.com> Closes #1273 from colorant/hadoopRDD and squashes the following commits: 994e98b [Raymond Liu] Address comments e2cda3d [Raymond Liu] Workaround Hadoop conf ConcurrentModification issue (cherry picked from commit 5fa0a05763ab1d527efe20e3b10539ac5ffc36de) Signed-off-by: Aaron Davidson <aaron@databricks.com>
*	Streaming programming guide typos	Clément MATHIEU	2014-07-03	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fix a bad Java code sample and a broken link in the streaming programming guide. Author: Clément MATHIEU <clement@unportant.info> Closes #1286 from cykl/streaming-programming-guide-typos and squashes the following commits: b0908cb [Clément MATHIEU] Fix broken URL 9d3c535 [Clément MATHIEU] Spark streaming requires at least two working threads (scala version was OK) (cherry picked from commit fdc4c112e7c2ac585d108d03209a642aa8bab7c8) Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
*	[SPARK-2109] Setting SPARK_MEM for bin/pyspark does not work.	Prashant Sharma	2014-07-03	4	-19/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Trivial fix. Author: Prashant Sharma <prashant.s@imaginea.com> Closes #1050 from ScrapCodes/SPARK-2109/pyspark-script-bug and squashes the following commits: 77072b9 [Prashant Sharma] Changed echos to redirect to STDERR. 13f48a0 [Prashant Sharma] [SPARK-2109] Setting SPARK_MEM for bin/pyspark does not work. (cherry picked from commit 731f683b1bd8abbb83030b6bae14876658bbf098) Signed-off-by: Patrick Wendell <pwendell@gmail.com>
*	[SPARK-2342] Evaluation helper's output type doesn't conform to input ty...	Yijie Shen	2014-07-03	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	The function cast doesn't conform to the intention of "Those expressions are supposed to be in the same data type, and also the return type." comment Author: Yijie Shen <henry.yijieshen@gmail.com> Closes #1283 from yijieshen/master and squashes the following commits: c7aaa4b [Yijie Shen] [SPARK-2342] Evaluation helper's output type doesn't conform to input type (cherry picked from commit a9b52e5623f7fc77fca96b095f9eeaef76e35d54) Signed-off-by: Michael Armbrust <michael@databricks.com>
*	[SPARK] Fix NPE for ExternalAppendOnlyMap	Andrew Or	2014-07-03	2	-11/+46
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	It did not handle null keys very gracefully before. Author: Andrew Or <andrewor14@gmail.com> Closes #1288 from andrewor14/fix-external and squashes the following commits: 312b8d8 [Andrew Or] Abstract key hash code ed5adf9 [Andrew Or] Fix NPE for ExternalAppendOnlyMap (cherry picked from commit c480537739f9329ebfd580f09c69778e6c976366) Signed-off-by: Aaron Davidson <aaron@databricks.com>
*	[SPARK-2287] [SQL] Make ScalaReflection be able to handle Generic case classes.	Takuya UESHIN	2014-07-02	2	-2/+25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Author: Takuya UESHIN <ueshin@happy-camper.st> Closes #1226 from ueshin/issues/SPARK-2287 and squashes the following commits: 32ef7c3 [Takuya UESHIN] Add execution of `SHOW TABLES` before `TestHive.reset()`. 541dc8d [Takuya UESHIN] Merge branch 'master' into issues/SPARK-2287 fac5fae [Takuya UESHIN] Remove unnecessary method receiver. d306e60 [Takuya UESHIN] Merge branch 'master' into issues/SPARK-2287 7de5706 [Takuya UESHIN] Make ScalaReflection be able to handle Generic case classes. (cherry picked from commit bc7041a42dfa84312492ea8cae6fdeaeac4f6d1c) Signed-off-by: Michael Armbrust <michael@databricks.com>
*	[SPARK-2328] [SQL] Add execution of `SHOW TABLES` before `TestHive.reset()`.	Takuya UESHIN	2014-07-02	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	`PruningSuite` is executed first of Hive tests unfortunately, `TestHive.reset()` breaks the test environment. To prevent this, we must run a query before calling reset the first time. Author: Takuya UESHIN <ueshin@happy-camper.st> Closes #1268 from ueshin/issues/SPARK-2328 and squashes the following commits: 043ceac [Takuya UESHIN] Add execution of `SHOW TABLES` before `TestHive.reset()`. (cherry picked from commit 1e2c26c83dd2e807cf0031ceca8b338a1a57cac6) Signed-off-by: Michael Armbrust <michael@databricks.com>
*	SPARK-2186: Spark SQL DSL support for simple aggregations such as SUM and AVG	Ximo Guanter Gonzalbez	2014-07-02	3	-8/+44
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Description This patch enables using the `.select()` function in SchemaRDD with functions such as `Sum`, `Count` and other. Testing Unit tests added. Author: Ximo Guanter Gonzalbez <ximo@tid.es> Closes #1211 from edrevo/add-expression-support-in-select and squashes the following commits: fe4a1e1 [Ximo Guanter Gonzalbez] Extend SQL DSL to functions e1d344a [Ximo Guanter Gonzalbez] SPARK-2186: Spark SQL DSL support for simple aggregations such as SUM and AVG (cherry picked from commit 5c6ec94da1bacd8e65a43acb92b6721493484e7b) Signed-off-by: Michael Armbrust <michael@databricks.com>
*	update the comments in SqlParser	CodingCat	2014-07-01	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \|	SqlParser has been case-insensitive after https://github.com/apache/spark/commit/dab5439a083b5f771d5d5b462d0d517fa8e9aaf2 was merged Author: CodingCat <zhunansjtu@gmail.com> Closes #1275 from CodingCat/master and squashes the following commits: 17931cd [CodingCat] update the comments in SqlParser (cherry picked from commit 6596392da0fc0fee89e22adfca239a3477dfcbab) Signed-off-by: Reynold Xin <rxin@apache.org>
*	[SPARK-2322] Exception in resultHandler should NOT crash DAGScheduler and ↵	Reynold Xin	2014-06-30	3	-6/+78
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	shutdown SparkContext. This should go into 1.0.1. Author: Reynold Xin <rxin@apache.org> Closes #1264 from rxin/SPARK-2322 and squashes the following commits: c77c07f [Reynold Xin] Added comment to SparkDriverExecutionException and a test case for accumulator. 5d8d920 [Reynold Xin] [SPARK-2322] Exception in resultHandler could crash DAGScheduler and shutdown SparkContext. (cherry picked from commit 358ae1534d01ad9e69364a21441a7ef23c2cb516) Signed-off-by: Reynold Xin <rxin@apache.org> Conflicts: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
*	[SPARK-1394] Remove SIGCHLD handler in worker subprocess	Matthew Farrellee	2014-06-28	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	It should not be the responsibility of the worker subprocess, which does not intentionally fork, to try and cleanup child processes. Doing so is complex and interferes with operations such as platform.system(). If it is desirable to have tighter control over subprocesses, then namespaces should be used and it should be the manager's resposibility to handle cleanup. Author: Matthew Farrellee <matt@redhat.com> Closes #1247 from mattf/SPARK-1394 and squashes the following commits: c36f308 [Matthew Farrellee] [SPARK-1394] Remove SIGCHLD handler in worker subprocess (cherry picked from commit 3c104c79d24425786cec0034f269ba19cf465b31) Signed-off-by: Aaron Davidson <aaron@databricks.com>
*	Revert "[maven-release-plugin] prepare release v1.0.1-rc1"	Patrick Wendell	2014-06-27	21	-22/+22
\| \| \| \|	This reverts commit 7feeda3d729f9397aa15ee8750c01ef5aa601962.
*	Revert "[maven-release-plugin] prepare for next development iteration"	Patrick Wendell	2014-06-27	21	-22/+22
\| \| \| \|	This reverts commit ea1a455a755f83f46fc8bf242410917d93d0c52c.
*	[SPARK-2003] Fix python SparkContext example	Matthew Farrellee	2014-06-27	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \|	Author: Matthew Farrellee <matt@redhat.com> Closes #1246 from mattf/SPARK-2003 and squashes the following commits: b12e7ca [Matthew Farrellee] [SPARK-2003] Fix python SparkContext example (cherry picked from commit 0e0686d3ef88e024fcceafe36a0cdbb953f5aeae) Signed-off-by: Patrick Wendell <pwendell@gmail.com>
*	[SPARK-2259] Fix highly misleading docs on cluster / client deploy modes	Andrew Or	2014-06-27	5	-12/+36
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The existing docs are highly misleading. For standalone mode, for example, it encourages the user to use standalone-cluster mode, which is not officially supported. The safeguards have been added in Spark submit itself to prevent bad documentation from leading users down the wrong path in the future. This PR is prompted by countless headaches users of Spark have run into on the mailing list. Author: Andrew Or <andrewor14@gmail.com> Closes #1200 from andrewor14/submit-docs and squashes the following commits: 5ea2460 [Andrew Or] Rephrase cluster vs client explanation c827f32 [Andrew Or] Clarify spark submit messages 9f7ed8f [Andrew Or] Clarify client vs cluster deploy mode + add safeguards (cherry picked from commit f17510e371dfbeaada3c72b884d70c36503ea30a) Signed-off-by: Patrick Wendell <pwendell@gmail.com>
*	[SPARK-2307] SparkUI - storage tab displays incorrect RDDs	Andrew Or	2014-06-27	2	-6/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The issue here is that the `StorageTab` listens for updates from the `StorageStatusListener`, but when a block is kicked out of the cache, `StorageStatusListener` removes it from its list. Thus, there is no way for the `StorageTab` to know whether a block has been dropped. This issue was introduced in #1080, which was itself a bug fix. Here we revert that PR and offer a different fix for the original bug (SPARK-2144). Author: Andrew Or <andrewor14@gmail.com> Closes #1249 from andrewor14/storage-ui-fix and squashes the following commits: af019ce [Andrew Or] Fix SPARK-2307 (cherry picked from commit 21e0f77b6321590ed86223a60cdb8ae08ea4057f) Signed-off-by: Patrick Wendell <pwendell@gmail.com>
*	SPARK-2181:The keys for sorting the columns of Executor page in SparkUI are ↵	witgo	2014-06-26	3	-11/+17
\| \| \| \| \| \| \| \| \| \| \| \| \|	incorrect Author: witgo <witgo@qq.com> Closes #1135 from witgo/SPARK-2181 and squashes the following commits: 39dad90 [witgo] The keys for sorting the columns of Executor page in SparkUI are incorrect (cherry picked from commit 18f29b96c7e0948f5f504e522e5aa8a8d1ab163e) Signed-off-by: Patrick Wendell <pwendell@gmail.com>
*	[maven-release-plugin] prepare for next development iteration	Ubuntu	2014-06-26	21	-22/+22
\|
*	[maven-release-plugin] prepare release v1.0.1-rc1	Ubuntu	2014-06-26	21	-22/+22
\|
*	CHANGES.txt for release 1.0.1	Patrick Wendell	2014-06-26	1	-0/+778
\|
*	Fixing AWS instance type information based upon current EC2 data	Zichuan Ye	2014-06-26	1	-5/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fixed a problem in previous file in which some information regarding AWS instance types were wrong. Such information was updated base upon current AWS EC2 data. Author: Zichuan Ye <jerry@tangentds.com> Closes #1156 from jerry86/master and squashes the following commits: ff36e95 [Zichuan Ye] Fixing AWS instance type information based upon current EC2 data (cherry picked from commit 62d4a0fa9947e64c1533f66ae577557bcfb271c9) Conflicts: ec2/spark_ec2.py
*	Small error in previous commit	Patrick Wendell	2014-06-26	1	-2/+2
\|
*	Updating versions for 1.0.1 release	Patrick Wendell	2014-06-26	9	-11/+11
\|
*	[SPARK-2286][UI] Report exception/errors for failed tasks that are not ↵	Reynold Xin	2014-06-26	4	-26/+75
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	ExceptionFailure Also added inline doc for each TaskEndReason. Author: Reynold Xin <rxin@apache.org> Closes #1225 from rxin/SPARK-2286 and squashes the following commits: 6a7959d [Reynold Xin] Fix unit test failure. cf9d5eb [Reynold Xin] Merge branch 'master' into SPARK-2286 a61fae1 [Reynold Xin] Move to line above ... 38c7391 [Reynold Xin] [SPARK-2286][UI] Report exception/errors for failed tasks that are not ExceptionFailure. (cherry picked from commit 6587ef7c1783961e6ef250afa387271a1bd6e277) Conflicts: core/src/main/scala/org/apache/spark/ui/jobs/StageTable.scala
*	[SPARK-2295] [SQL] Make JavaBeans nullability stricter.	Takuya UESHIN	2014-06-26	1	-19/+18
\| \| \| \| \| \| \| \| \| \| \|	Author: Takuya UESHIN <ueshin@happy-camper.st> Closes #1235 from ueshin/issues/SPARK-2295 and squashes the following commits: 201c508 [Takuya UESHIN] Make JavaBeans nullability stricter. (cherry picked from commit 32a1ad75313472b1b098f7ec99335686d3fe4fc3) Signed-off-by: Michael Armbrust <michael@databricks.com>
*	[SPARK-2251] fix concurrency issues in random sampler (branch-1.0)	Xiangrui Meng	2014-06-26	2	-4/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The following code is very likely to throw an exception: ~~~ val rdd = sc.parallelize(0 until 111, 10).sample(false, 0.1) rdd.zip(rdd).count() ~~~ because the same random number generator is used in compute partitions. This fix doesn't change the type signature. @pwendell Author: Xiangrui Meng <meng@databricks.com> Closes #1234 from mengxr/fix-sample-1.0 and squashes the following commits: 88795e2 [Xiangrui Meng] fix concurrency issues in random sampler
*	Remove use of spark.worker.instances	Kay Ousterhout	2014-06-26	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	spark.worker.instances was added as part of this commit: https://github.com/apache/spark/commit/1617816090e7b20124a512a43860a21232ebf511 My understanding is that SPARK_WORKER_INSTANCES is supported for backwards compatibility, but spark.worker.instances is never used (SparkSubmit.scala sets spark.executor.instances) so should not have been added. @sryza @pwendell @tgravescs LMK if I'm understanding this correctly Author: Kay Ousterhout <kayousterhout@gmail.com> Closes #1214 from kayousterhout/yarn_config and squashes the following commits: 3d7c491 [Kay Ousterhout] Remove use of spark.worker.instances (cherry picked from commit 48a82a827c99526b165c78d7e88faec43568a37a) Signed-off-by: Thomas Graves <tgraves@apache.org>
*	[SPARK-2254] [SQL] ScalaRefection should mark primitive types as non-nullable.	Takuya UESHIN	2014-06-25	2	-31/+165
\| \| \| \| \| \| \| \| \| \| \|	Author: Takuya UESHIN <ueshin@happy-camper.st> Closes #1193 from ueshin/issues/SPARK-2254 and squashes the following commits: cfd6088 [Takuya UESHIN] Modify ScalaRefection.schemaFor method to return nullability of Scala Type. (cherry picked from commit e4899a253728bfa7c78709a37a4837f74b72bd61) Signed-off-by: Reynold Xin <rxin@apache.org>
*	[SPARK-2284][UI] Mark all failed tasks as failures.	Reynold Xin	2014-06-25	2	-4/+35
\| \| \| \| \| \| \| \| \| \| \| \| \|	Previously only tasks failed with ExceptionFailure reason was marked as failure. Author: Reynold Xin <rxin@apache.org> Closes #1224 from rxin/SPARK-2284 and squashes the following commits: be79dbd [Reynold Xin] [SPARK-2284][UI] Mark all failed tasks as failures. (cherry picked from commit 4a346e242c3f241c575f35536220df01ad724e23) Signed-off-by: Reynold Xin <rxin@apache.org>
*	[SPARK-2172] PySpark cannot import mllib modules in YARN-client mode	Szul, Piotr	2014-06-25	1	-0/+8
\| \| \| \| \| \| \| \| \| \| \| \|	Include pyspark/mllib python sources as resources in the mllib.jar. This way they will be included in the final assembly Author: Szul, Piotr <Piotr.Szul@csiro.au> Closes #1223 from piotrszul/branch-1.0 and squashes the following commits: 69d5174 [Szul, Piotr] Removed unsed resource directory src/main/resource from mllib pom f8c52a0 [Szul, Piotr] [SPARK-2172] PySpark cannot import mllib modules in YARN-client mode Include pyspark/mllib python sources as resources in the jar
*	[SPARK-1749] Job cancellation when SchedulerBackend does not implement killTask	Mark Hamstra	2014-06-25	2	-9/+69
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is a fixed up version of #686 (cc @markhamstra @pwendell). The last commit (the only one I authored) reflects the changes I made from Mark's original patch. Author: Mark Hamstra <markhamstra@gmail.com> Author: Kay Ousterhout <kayousterhout@gmail.com> Closes #1219 from kayousterhout/mark-SPARK-1749 and squashes the following commits: 42dfa7e [Kay Ousterhout] Got rid of terrible double-negative name 80b3205 [Kay Ousterhout] Don't notify listeners of job failure if it wasn't successfully cancelled. d156d33 [Mark Hamstra] Do nothing in no-kill submitTasks 9312baa [Mark Hamstra] code review update cc353c8 [Mark Hamstra] scalastyle e61f7f8 [Mark Hamstra] Catch UnsupportedOperationException when DAGScheduler tries to cancel a job on a SchedulerBackend that does not implement killTask (cherry picked from commit b88a59a66845b8935b22f06fc96d16841ed20c94) Signed-off-by: Patrick Wendell <pwendell@gmail.com>
*	[SPARK-2283][SQL] Reset test environment before running PruningSuite	Cheng Lian	2014-06-25	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	JIRA issue: [SPARK-2283](https://issues.apache.org/jira/browse/SPARK-2283) If `PruningSuite` is run right after `HiveCompatibilitySuite`, the first test case fails because `srcpart` table is cached in-memory by `HiveCompatibilitySuite`, but column pruning is not implemented for `InMemoryColumnarTableScan` operator yet. Author: Cheng Lian <lian.cs.zju@gmail.com> Closes #1221 from liancheng/spark-2283 and squashes the following commits: dc0b663 [Cheng Lian] SPARK-2283: reset test environment before running PruningSuite (cherry picked from commit 7f196b009d26d4aed403b3c694f8b603601718e3) Signed-off-by: Michael Armbrust <michael@databricks.com>
*	[SPARK-1912] fix compress memory issue during reduce	Wenchen Fan(Cloud)	2014-06-25	1	-2/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When we need to read a compressed block, we will first create a compress stream instance(LZF or Snappy) and use it to wrap that block. Let's say a reducer task need to read 1000 local shuffle blocks, it will first prepare to read that 1000 blocks, which means create 1000 compression stream instance to wrap them. But the initialization of compression instance will allocate some memory and when we have many compression instance at the same time, it is a problem. Actually reducer reads the shuffle blocks one by one, so we can do the compression instance initialization lazily. Author: Wenchen Fan(Cloud) <cloud0fan@gmail.com> Closes #860 from cloud-fan/fix-compress and squashes the following commits: 0924a6b [Wenchen Fan(Cloud)] rename 'doWork' into 'getIterator' 07f32c2 [Wenchen Fan(Cloud)] move the LazyProxyIterator to dataDeserialize d80c426 [Wenchen Fan(Cloud)] remove empty lines in short class 2c8adb2 [Wenchen Fan(Cloud)] add inline comment 8ebff77 [Wenchen Fan(Cloud)] fix compress memory issue during reduce
*	[SPARK-2204] Launch tasks on the proper executors in mesos fine-grained mode	Sebastien Rainville	2014-06-25	1	-7/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The scheduler for Mesos in fine-grained mode launches tasks on the wrong executors. `MesosSchedulerBackend.resourceOffers(SchedulerDriver, List[Offer])` is assuming that `TaskSchedulerImpl.resourceOffers(Seq[WorkerOffer])` is returning task lists in the same order as the offers it was passed, but in the current implementation `TaskSchedulerImpl.resourceOffers` shuffles the offers to avoid assigning the tasks always to the same executors. The result is that the tasks are launched on the wrong executors. The jobs are sometimes able to complete, but most of the time they fail. It seems that as soon as something goes wrong with a task for some reason Spark is not able to recover since it's mistaken as to where the tasks are actually running. Also, it seems that the more the cluster is under load the more likely the job is to fail because there's a higher probability that Spark is trying to launch a task on a slave that doesn't actually have enough resources, again because it's using the wrong offers. The solution is to not assume that the order in which the tasks are returned is the same as the offers, and simply launch the tasks on the executor decided by `TaskSchedulerImpl.resourceOffers`. What I am not sure about is that I considered slaveId and executorId to be the same, which is true at least in my setup, but I don't know if that is always true. I tested this on top of the 1.0.0 release and it seems to work fine on our cluster. Author: Sebastien Rainville <sebastien@hopper.com> Closes #1140 from sebastienrainville/fine-grained-mode-fix-master and squashes the following commits: a98b0e0 [Sebastien Rainville] Use a HashMap to retrieve the offer indices d6ffe54 [Sebastien Rainville] Launch tasks on the proper executors in mesos fine-grained mode (cherry picked from commit 1132e472eca1a00c2ce10d2f84e8f0e79a5193d3) Signed-off-by: Patrick Wendell <pwendell@gmail.com>
*	[SPARK-2270] Kryo cannot serialize results returned by asJavaIterable	Reynold Xin	2014-06-25	2	-0/+65
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	and thus groupBy/cogroup are broken in Java APIs when Kryo is used). @pwendell this should be merged into 1.0.1. Thanks @sorenmacbeth for reporting this & helping out with the fix. Author: Reynold Xin <rxin@apache.org> Closes #1206 from rxin/kryo-iterable-2270 and squashes the following commits: 09da0aa [Reynold Xin] Updated the comment. 009bf64 [Reynold Xin] [SPARK-2270] Kryo cannot serialize results returned by asJavaIterable (and thus groupBy/cogroup are broken in Java APIs when Kryo is used). (cherry picked from commit 7ff2c754f340ba4c4077b0ff6285876eb7871c7b) Signed-off-by: Patrick Wendell <pwendell@gmail.com>
*	[SPARK-2258 / 2266] Fix a few worker UI bugs	Andrew Or	2014-06-25	2	-3/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	SPARK-2258. Worker UI displays zombie processes if the executor throws an exception before a process is launched. This is because we only inform the Worker of the change if the process is already launched, which in this case it isn't. SPARK-2266. We expose "Some(app-id)" on the log page. This is fairly minor. Author: Andrew Or <andrewor14@gmail.com> Closes #1213 from andrewor14/fix-worker-ui and squashes the following commits: c1223fe [Andrew Or] Fix worker UI bugs Conflicts: core/src/main/scala/org/apache/spark/deploy/worker/ExecutorRunner.scala
*	Replace doc reference to Shark with Spark SQL.	Reynold Xin	2014-06-25	1	-3/+2
\| \| \| \| \|	(cherry picked from commit ac06a85da59db8f2654cdf6601d186348da09c01) Signed-off-by: Reynold Xin <rxin@apache.org>