spark - Mirror of Apache Spark

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	[SPARK-2417][MLlib] Fix DecisionTree tests	johnnywalleye	2014-07-09	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fixes test failures introduced by https://github.com/apache/spark/pull/1316. For both the regression and classification cases, val stats is the InformationGainStats for the best tree split. stats.predict is the predicted value for the data, before the split is made. Since 600 of the 1,000 values generated by DecisionTreeSuite.generateCategoricalDataPoints() are 1.0 and the rest 0.0, the regression tree and classification tree both correctly predict a value of 0.6 for this data now, and the assertions have been changed to reflect that. Author: johnnywalleye <jsondag@gmail.com> Closes #1343 from johnnywalleye/decision-tree-tests and squashes the following commits: ef80603 [johnnywalleye] [SPARK-2417][MLlib] Fix DecisionTree tests (cherry picked from commit d35e3db2325931492b64890125a70579bc3b587b) Signed-off-by: Xiangrui Meng <meng@databricks.com>
*	[STREAMING] SPARK-2343: Fix QueueInputDStream with oneAtATime false	Manuel Laflamme	2014-07-09	2	-2/+92
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fix QueueInputDStream which was not removing dequeued items when used with the oneAtATime flag disabled. Author: Manuel Laflamme <manuel.laflamme@gmail.com> Closes #1285 from mlaflamm/spark-2343 and squashes the following commits: 61c9e38 [Manuel Laflamme] Unit tests for queue input stream c51d029 [Manuel Laflamme] Fix QueueInputDStream with oneAtATime false (cherry picked from commit 0eb11527d13083ced215e3fda44ed849198a57cb) Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
*	[SPARK-2152][MLlib] fix bin offset in DecisionTree node aggregations (also ↵	johnnywalleye	2014-07-08	1	-5/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	resolves SPARK-2160) Hi, this pull fixes (what I believe to be) a bug in DecisionTree.scala. In the extractLeftRightNodeAggregates function, the first set of rightNodeAgg values for Regression are set in line 792 as follows: rightNodeAgg(featureIndex)(2 * (numBins - 2)) = binData(shift + (2 * numBins - 1))) Then there is a loop that sets the rest of the values, as in line 809: rightNodeAgg(featureIndex)(2 * (numBins - 2 - splitIndex)) = binData(shift + (2 (numBins - 2 - splitIndex))) + rightNodeAgg(featureIndex)(2 (numBins - 1 - splitIndex)) But since splitIndex starts at 1, this ends up skipping a set of binData values. The changes here address this issue, for both the Regression and Classification cases. Author: johnnywalleye <jsondag@gmail.com> Closes #1316 from johnnywalleye/master and squashes the following commits: 73809da [johnnywalleye] fix bin offset in DecisionTree node aggregations (cherry picked from commit 1114207cc8e4ef94cb97bbd5a2ef3ae4d51f73fa) Signed-off-by: Xiangrui Meng <meng@databricks.com>
*	[SPARK-2362] Fix for newFilesOnly logic in file DStream	Gabriele Nizzoli	2014-07-08	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	The newFilesOnly logic should be inverted: the logic should be that if the flag newFilesOnly==true then only start reading files older than current time. As the code is now if newFilesOnly==true then it will start to read files that are older than 0L (that is: every file in the directory). Author: Gabriele Nizzoli <mail@nizzoli.net> Closes #1077 from gabrielenizzoli/master and squashes the following commits: 4f1d261 [Gabriele Nizzoli] Fix for newFilesOnly logic in file DStream (cherry picked from commit e6f7bfcfbf6aff7a9f8cd8e0a2166d0bf62b0912) Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
*	[SPARK-2409] Make SQLConf thread safe.	Reynold Xin	2014-07-08	1	-5/+5
\| \| \| \| \| \| \| \| \| \| \| \|	Author: Reynold Xin <rxin@apache.org> Closes #1334 from rxin/sqlConfThreadSafetuy and squashes the following commits: c1e0a5a [Reynold Xin] Fixed the duplicate comment. 7614372 [Reynold Xin] [SPARK-2409] Make SQLConf thread safe. (cherry picked from commit 32516f866a32d51bfaa04685ae77ba216b4202d9) Signed-off-by: Reynold Xin <rxin@apache.org>
*	[SPARK-2403] Catch all errors during serialization in DAGScheduler	Daniel Darabos	2014-07-08	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	https://issues.apache.org/jira/browse/SPARK-2403 Spark hangs for us whenever we forget to register a class with Kryo. This should be a simple fix for that. But let me know if you have a better suggestion. I did not write a new test for this. It would be pretty complicated and I'm not sure it's worthwhile for such a simple change. Let me know if you disagree. Author: Daniel Darabos <darabos.daniel@gmail.com> Closes #1329 from darabos/spark-2403 and squashes the following commits: 3aceaad [Daniel Darabos] Print full stack trace for miscellaneous exceptions during serialization. 52c22ba [Daniel Darabos] Only catch NonFatal exceptions. 361e962 [Daniel Darabos] Catch all errors during serialization in DAGScheduler. (cherry picked from commit c8a2313cdf825e0191680a423d17619b5504ff89) Signed-off-by: Aaron Davidson <aaron@databricks.com>
*	[SPARK-2395][SQL] Optimize common LIKE patterns.	Michael Armbrust	2014-07-08	2	-0/+74
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Author: Michael Armbrust <michael@databricks.com> Closes #1325 from marmbrus/slowLike and squashes the following commits: 023c3eb [Michael Armbrust] add comment. 8b421c2 [Michael Armbrust] Handle the case where the final % is actually escaped. d34d37e [Michael Armbrust] add periods. 3bbf35f [Michael Armbrust] Roll back changes to SparkBuild 53894b1 [Michael Armbrust] Fix grammar. 4094462 [Michael Armbrust] Fix grammar. 6d3d0a0 [Michael Armbrust] Optimize common LIKE patterns. (cherry picked from commit cc3e0a14daf756ff5c2d4e7916438e175046e5bb) Signed-off-by: Michael Armbrust <michael@databricks.com>
*	[EC2] Add default history server port to ec2 script	Andrew Or	2014-07-08	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Right now I have to open it manually Author: Andrew Or <andrewor14@gmail.com> Closes #1296 from andrewor14/hist-serv-port and squashes the following commits: 8895a1f [Andrew Or] Add default history server port to ec2 script (cherry picked from commit 56e009d4f05d990c60e109838fa70457f97f44aa) Conflicts: ec2/spark_ec2.py
*	[SPARK-2391][SQL] Custom take() for LIMIT queries.	Michael Armbrust	2014-07-08	1	-4/+47
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Using Spark's take can result in an entire in-memory partition to be shipped in order to retrieve a single row. Author: Michael Armbrust <michael@databricks.com> Closes #1318 from marmbrus/takeLimit and squashes the following commits: 77289a5 [Michael Armbrust] Update scala doc 32f0674 [Michael Armbrust] Custom take implementation for LIMIT queries. (cherry picked from commit 5a4063645dd7bb4cd8bda890785235729804ab09) Signed-off-by: Reynold Xin <rxin@apache.org>
*	Resolve sbt warnings during build Ⅱ	witgo	2014-07-08	10	-94/+94
\| \| \| \| \| \| \| \| \| \| \| \|	Author: witgo <witgo@qq.com> Closes #1153 from witgo/expectResult and squashes the following commits: 97541d8 [witgo] merge master ead26e7 [witgo] Resolve sbt warnings during build (cherry picked from commit 3cd5029be709307415f911236472a685e406e763) Signed-off-by: Reynold Xin <rxin@apache.org>
*	[SPARK-2376][SQL] Selecting list values inside nested JSON objects raises ↵	Yin Huai	2014-07-07	2	-25/+44
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	java.lang.IllegalArgumentException JIRA: https://issues.apache.org/jira/browse/SPARK-2376 Author: Yin Huai <huai@cse.ohio-state.edu> Closes #1320 from yhuai/SPARK-2376 and squashes the following commits: 0107417 [Yin Huai] Merge remote-tracking branch 'upstream/master' into SPARK-2376 480803d [Yin Huai] Correctly handling JSON arrays in PySpark. (cherry picked from commit 4352a2fdaa64efee7158eabef65703460ff284ec) Signed-off-by: Michael Armbrust <michael@databricks.com>
*	[SPARK-2375][SQL] JSON schema inference may not resolve type conflicts ↵	Yin Huai	2014-07-07	3	-8/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	correctly for a field inside an array of structs For example, for ``` {"array": [{"field":214748364700}, {"field":1}]} ``` the type of field is resolved as IntType. While, for ``` {"array": [{"field":1}, {"field":214748364700}]} ``` the type of field is resolved as LongType. JIRA: https://issues.apache.org/jira/browse/SPARK-2375 Author: Yin Huai <huaiyin.thu@gmail.com> Closes #1308 from yhuai/SPARK-2375 and squashes the following commits: 3e2e312 [Yin Huai] Update unit test. 1b2ff9f [Yin Huai] Merge remote-tracking branch 'upstream/master' into SPARK-2375 10794eb [Yin Huai] Correctly resolve the type of a field inside an array of structs. (cherry picked from commit f0496ee10847db921a028a34f70385f9b740b3f3) Signed-off-by: Michael Armbrust <michael@databricks.com>
*	[SPARK-2386] [SQL] RowWriteSupport should use the exact types to cast.	Takuya UESHIN	2014-07-07	2	-3/+41
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When execute `saveAsParquetFile` with non-primitive type, `RowWriteSupport` uses wrong type `Int` for `ByteType` and `ShortType`. Author: Takuya UESHIN <ueshin@happy-camper.st> Closes #1315 from ueshin/issues/SPARK-2386 and squashes the following commits: 20d89ec [Takuya UESHIN] Use None instead of null. bd88741 [Takuya UESHIN] Add a test. 323d1d2 [Takuya UESHIN] Modify RowWriteSupport to use the exact types to cast. (cherry picked from commit 4deeed17c4847f212a4fa1a8685cfe8a12179263) Signed-off-by: Michael Armbrust <michael@databricks.com>
*	[SPARK-2339][SQL] SQL parser in sql-core is case sensitive, but a table ↵	Yin Huai	2014-07-07	6	-30/+149
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	alias is converted to lower case when we create Subquery Reported by http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-Join-throws-exception-td8599.html After we get the table from the catalog, because the table has an alias, we will temporarily insert a Subquery. Then, we convert the table alias to lower case no matter if the parser is case sensitive or not. To see the issue ... ``` val sqlContext = new org.apache.spark.sql.SQLContext(sc) import sqlContext.createSchemaRDD case class Person(name: String, age: Int) val people = sc.textFile("examples/src/main/resources/people.txt").map(_.split(",")).map(p => Person(p(0), p(1).trim.toInt)) people.registerAsTable("people") sqlContext.sql("select PEOPLE.name from people PEOPLE") ``` The plan is ... ``` == Query Plan == Project ['PEOPLE.name] ExistingRdd [name#0,age#1], MapPartitionsRDD[4] at mapPartitions at basicOperators.scala:176 ``` You can find that `PEOPLE.name` is not resolved. This PR introduces three changes. 1. If a table has an alias, the catalog will not lowercase the alias. If a lowercase alias is needed, the analyzer will do the work. 2. A catalog has a new val caseSensitive that indicates if this catalog is case sensitive or not. For example, a SimpleCatalog is case sensitive, but 3. Corresponding unit tests. With this PR, case sensitivity of database names and table names is handled by the catalog. Case sensitivity of other identifiers are handled by the analyzer. JIRA: https://issues.apache.org/jira/browse/SPARK-2339 Author: Yin Huai <huai@cse.ohio-state.edu> Closes #1317 from yhuai/SPARK-2339 and squashes the following commits: 12d8006 [Yin Huai] Handling case sensitivity correctly. This patch introduces three changes. 1. If a table has an alias, the catalog will not lowercase the alias. If a lowercase alias is needed, the analyzer will do the work. 2. A catalog has a new val caseSensitive that indicates if this catalog is case sensitive or not. For example, a SimpleCatalog is case sensitive, but 3. Corresponding unit tests. With this patch, case sensitivity of database names and table names is handled by the catalog. Case sensitivity of other identifiers is handled by the analyzer. (cherry picked from commit c0b4cf097de50eb2c4b0f0e67da53ee92efc1f77) Signed-off-by: Michael Armbrust <michael@databricks.com>
*	[SPARK-1977][MLLIB] register mutable BitSet in MovieLenseALS	Neville Li	2014-07-07	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \|	Author: Neville Li <neville@spotify.com> Closes #1319 from nevillelyh/gh/SPARK-1977 and squashes the following commits: 1f0a355 [Neville Li] [SPARK-1977][MLLIB] register mutable BitSet in MovieLenseALS (cherry picked from commit f7ce1b3b48f0354434456241188c6a5d954852e2) Signed-off-by: Xiangrui Meng <meng@databricks.com>
*	[SPARK-2327] [SQL] Fix nullabilities of Join/Generate/Aggregate.	Takuya UESHIN	2014-07-05	7	-21/+60
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fix nullabilities of `Join`/`Generate`/`Aggregate` because: - Output attributes of opposite side of `OuterJoin` should be nullable. - Output attributes of generater side of `Generate` should be nullable if `join` is `true` and `outer` is `true`. - `AttributeReference` of `computedAggregates` of `Aggregate` should be the same as `aggregateExpression`'s. Author: Takuya UESHIN <ueshin@happy-camper.st> Closes #1266 from ueshin/issues/SPARK-2327 and squashes the following commits: 3ace83a [Takuya UESHIN] Add withNullability to Attribute and use it to change nullabilities. df1ae53 [Takuya UESHIN] Modify nullabilize to leave attribute if not resolved. 799ce56 [Takuya UESHIN] Add nullabilization to Generate of SparkPlan. a0fc9bc [Takuya UESHIN] Fix scalastyle errors. 0e31e37 [Takuya UESHIN] Fix Aggregate resultAttribute nullabilities. 09532ec [Takuya UESHIN] Fix Generate output nullabilities. f20f196 [Takuya UESHIN] Fix Join output nullabilities. (cherry picked from commit 9d5ecf8205b924dc8a3c13fed68beb78cc5c7553) Signed-off-by: Michael Armbrust <michael@databricks.com>
*	[SPARK-2366] [SQL] Add column pruning for the right side of LeftSemi join.	Takuya UESHIN	2014-07-05	1	-8/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The right side of `LeftSemi` join needs columns only used in join condition. Author: Takuya UESHIN <ueshin@happy-camper.st> Closes #1301 from ueshin/issues/SPARK-2366 and squashes the following commits: 7677a39 [Takuya UESHIN] Update comments. 786d3a0 [Takuya UESHIN] Rename method name. e0957b1 [Takuya UESHIN] Add column pruning for the right side of LeftSemi join. (cherry picked from commit 3da8df939ec63064692ba64d9188aeea908b305c) Signed-off-by: Michael Armbrust <michael@databricks.com>
*	[SPARK-2370][SQL] Decrease metadata retrieved for partitioned hive queries.	Michael Armbrust	2014-07-04	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \|	Author: Michael Armbrust <michael@databricks.com> Closes #1305 from marmbrus/usePrunerPartitions and squashes the following commits: 744aa20 [Michael Armbrust] Use getAllPartitionsForPruner instead of getPartitions, which avoids retrieving auth data (cherry picked from commit 9d006c97371ddf357e0b821d5c6d1535d9b6fe41) Signed-off-by: Reynold Xin <rxin@apache.org>
*	[maven-release-plugin] prepare for next development iteration	Ubuntu	2014-07-04	21	-22/+22
\|
*	[maven-release-plugin] prepare release v1.0.1-rc2v1.0.1	Ubuntu	2014-07-04	21	-22/+22
\|
*	Updating CHANGES.txt file	Patrick Wendell	2014-07-04	1	-0/+125
\|
*	HOTFIX: Merge issue with cf1d46e4.	Patrick Wendell	2014-07-04	1	-2/+2
\| \| \| \|	The tests in that patch used a newer constructor for TaskInfo.
*	[SPARK-2059][SQL] Add analysis checks	Reynold Xin	2014-07-04	2	-0/+24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This replaces #1263 with a test case. Author: Reynold Xin <rxin@apache.org> Author: Michael Armbrust <michael@databricks.com> Closes #1265 from rxin/sql-analysis-error and squashes the following commits: a639e01 [Reynold Xin] Added a test case for unresolved attribute analysis. 7371e1b [Reynold Xin] Merge pull request #1263 from marmbrus/analysisChecks 448c088 [Michael Armbrust] Add analysis checks (cherry picked from commit b3e768e154bd7175db44c3ffc3d8f783f15ab776) Signed-off-by: Reynold Xin <rxin@apache.org>
*	Update SQLConf.scala	baishuo(白硕)	2014-07-04	1	-6/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	use concurrent.ConcurrentHashMap instead of util.Collections.synchronizedMap Author: baishuo(白硕) <vc_java@hotmail.com> Closes #1272 from baishuo/master and squashes the following commits: 51ec55d [baishuo(白硕)] Update SQLConf.scala 63da043 [baishuo(白硕)] Update SQLConf.scala 36b6dbd [baishuo(白硕)] Update SQLConf.scala 864faa0 [baishuo(白硕)] Update SQLConf.scala 593096b [baishuo(白硕)] Update SQLConf.scala 7304d9b [baishuo(白硕)] Update SQLConf.scala 843581c [baishuo(白硕)] Update SQLConf.scala 1d3e4a2 [baishuo(白硕)] Update SQLConf.scala 0740f28 [baishuo(白硕)] Update SQLConf.scala (cherry picked from commit 0bbe61223eda3f33bbf8992d2a8f0d47813f4873) Signed-off-by: Reynold Xin <rxin@apache.org>
*	[SPARK-1199][REPL] Remove VALId and use the original import style for ↵	Prashant Sharma	2014-07-04	3	-11/+31
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	defined classes. This is an alternate solution to #1176. Author: Prashant Sharma <prashant.s@imaginea.com> Closes #1179 from ScrapCodes/SPARK-1199/repl-fix-second-approach and squashes the following commits: 820b34b [Prashant Sharma] Here we generate two kinds of import wrappers based on whether it is a class or not. (cherry picked from commit d43415075b3468fe8aa56de5d2907d409bb96347) Signed-off-by: Patrick Wendell <pwendell@gmail.com>
*	[SPARK-2059][SQL] Don't throw TreeNodeException in `execution.ExplainCommand`	Cheng Lian	2014-07-03	1	-3/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is a fix for the problem revealed by PR #1265. Currently `HiveComparisonSuite` ignores output of `ExplainCommand` since Catalyst query plan is quite different from Hive query plan. But exceptions throw from `CheckResolution` still breaks test cases. This PR catches any `TreeNodeException` and reports it as part of the query explanation. After merging this PR, PR #1265 can also be merged safely. For a normal query: ``` scala> hql("explain select key from src").foreach(println) ... [Physical execution plan:] [HiveTableScan [key#9], (MetastoreRelation default, src, None), None] ``` For a wrong query with unresolved attribute(s): ``` scala> hql("explain select kay from src").foreach(println) ... [Error occurred during query planning: ] [Unresolved attributes: 'kay, tree:] [Project ['kay]] [ LowerCaseSchema ] [ MetastoreRelation default, src, None] ``` Author: Cheng Lian <lian.cs.zju@gmail.com> Closes #1294 from liancheng/safe-explain and squashes the following commits: 4318911 [Cheng Lian] Don't throw TreeNodeException in `execution.ExplainCommand` (cherry picked from commit 544880457de556d1ad52e8cb7e1eca19da95f517) Signed-off-by: Reynold Xin <rxin@apache.org>
*	SPARK-2282: Reuse PySpark Accumulator sockets to avoid crashing Spark	Aaron Davidson	2014-07-03	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	JIRA: https://issues.apache.org/jira/browse/SPARK-2282 This issue is caused by a buildup of sockets in the TIME_WAIT stage of TCP, which is a stage that lasts for some period of time after the communication closes. This solution simply allows us to reuse sockets that are in TIME_WAIT, to avoid issues with the buildup of the rapid creation of these sockets. Author: Aaron Davidson <aaron@databricks.com> Closes #1220 from aarondav/SPARK-2282 and squashes the following commits: 2e5cab3 [Aaron Davidson] SPARK-2282: Reuse PySpark Accumulator sockets to avoid crashing Spark (cherry picked from commit 97a0bfe1c0261384f09d53f9350de52fb6446d59) Signed-off-by: Patrick Wendell <pwendell@gmail.com>
*	[SPARK-2307][Reprise] Correctly report RDD blocks on SparkUI	Andrew Or	2014-07-03	6	-23/+184
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem. The existing code in `ExecutorPage.scala` requires a linear scan through all the blocks to filter out the uncached ones. Every refresh could be expensive if there are many blocks and many executors. Solution. The proper semantics should be the following: `StorageStatusListener` should contain only block statuses that are cached. This means as soon as a block is unpersisted by any mean, its status should be removed. This is reflected in the changes made in `StorageStatusListener.scala`. Further, the `StorageTab` must stop relying on the `StorageStatusListener` changing a dropped block's status to `StorageLevel.NONE` (which no longer happens). This is reflected in the changes made in `StorageTab.scala` and `StorageUtils.scala`. ---------- If you have been following this chain of PRs like pwendell, you will quickly notice that this reverts the changes in #1249, which reverts the changes in #1080. In other words, we are adding back the changes from #1080, and fixing SPARK-2307 on top of those changes. Please ask questions if you are confused. Author: Andrew Or <andrewor14@gmail.com> Closes #1255 from andrewor14/storage-ui-fix-reprise and squashes the following commits: 45416fa [Andrew Or] Merge branch 'master' of github.com:apache/spark into storage-ui-fix-reprise a82ea25 [Andrew Or] Add tests for StorageStatusListener 8773b01 [Andrew Or] Update comment / minor changes 3afde3f [Andrew Or] Correctly report the number of blocks on SparkUI (cherry picked from commit 3894a49be9b532cc026d908a0f49bca850504498) Signed-off-by: Patrick Wendell <pwendell@gmail.com>
*	[SPARK-2350] Don't NPE while launching drivers	Aaron Davidson	2014-07-03	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	Prior to this change, we could throw a NPE if we launch a driver while another one is waiting, because removing from an iterator while iterating over it is not safe. Author: Aaron Davidson <aaron@databricks.com> Closes #1289 from aarondav/master-fail and squashes the following commits: 1cf1cf4 [Aaron Davidson] SPARK-2350: Don't NPE while launching drivers (cherry picked from commit 586feb5c9528042420f678f78bacb6c254a5eaf8) Signed-off-by: Patrick Wendell <pwendell@gmail.com>
*	[SPARK-1097] Workaround Hadoop conf ConcurrentModification issue	Raymond Liu	2014-07-03	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Workaround Hadoop conf ConcurrentModification issue Author: Raymond Liu <raymond.liu@intel.com> Closes #1273 from colorant/hadoopRDD and squashes the following commits: 994e98b [Raymond Liu] Address comments e2cda3d [Raymond Liu] Workaround Hadoop conf ConcurrentModification issue (cherry picked from commit 5fa0a05763ab1d527efe20e3b10539ac5ffc36de) Signed-off-by: Aaron Davidson <aaron@databricks.com>
*	Streaming programming guide typos	Clément MATHIEU	2014-07-03	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fix a bad Java code sample and a broken link in the streaming programming guide. Author: Clément MATHIEU <clement@unportant.info> Closes #1286 from cykl/streaming-programming-guide-typos and squashes the following commits: b0908cb [Clément MATHIEU] Fix broken URL 9d3c535 [Clément MATHIEU] Spark streaming requires at least two working threads (scala version was OK) (cherry picked from commit fdc4c112e7c2ac585d108d03209a642aa8bab7c8) Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
*	[SPARK-2109] Setting SPARK_MEM for bin/pyspark does not work.	Prashant Sharma	2014-07-03	4	-19/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Trivial fix. Author: Prashant Sharma <prashant.s@imaginea.com> Closes #1050 from ScrapCodes/SPARK-2109/pyspark-script-bug and squashes the following commits: 77072b9 [Prashant Sharma] Changed echos to redirect to STDERR. 13f48a0 [Prashant Sharma] [SPARK-2109] Setting SPARK_MEM for bin/pyspark does not work. (cherry picked from commit 731f683b1bd8abbb83030b6bae14876658bbf098) Signed-off-by: Patrick Wendell <pwendell@gmail.com>
*	[SPARK-2342] Evaluation helper's output type doesn't conform to input ty...	Yijie Shen	2014-07-03	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	The function cast doesn't conform to the intention of "Those expressions are supposed to be in the same data type, and also the return type." comment Author: Yijie Shen <henry.yijieshen@gmail.com> Closes #1283 from yijieshen/master and squashes the following commits: c7aaa4b [Yijie Shen] [SPARK-2342] Evaluation helper's output type doesn't conform to input type (cherry picked from commit a9b52e5623f7fc77fca96b095f9eeaef76e35d54) Signed-off-by: Michael Armbrust <michael@databricks.com>
*	[SPARK] Fix NPE for ExternalAppendOnlyMap	Andrew Or	2014-07-03	2	-11/+46
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	It did not handle null keys very gracefully before. Author: Andrew Or <andrewor14@gmail.com> Closes #1288 from andrewor14/fix-external and squashes the following commits: 312b8d8 [Andrew Or] Abstract key hash code ed5adf9 [Andrew Or] Fix NPE for ExternalAppendOnlyMap (cherry picked from commit c480537739f9329ebfd580f09c69778e6c976366) Signed-off-by: Aaron Davidson <aaron@databricks.com>
*	[SPARK-2287] [SQL] Make ScalaReflection be able to handle Generic case classes.	Takuya UESHIN	2014-07-02	2	-2/+25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Author: Takuya UESHIN <ueshin@happy-camper.st> Closes #1226 from ueshin/issues/SPARK-2287 and squashes the following commits: 32ef7c3 [Takuya UESHIN] Add execution of `SHOW TABLES` before `TestHive.reset()`. 541dc8d [Takuya UESHIN] Merge branch 'master' into issues/SPARK-2287 fac5fae [Takuya UESHIN] Remove unnecessary method receiver. d306e60 [Takuya UESHIN] Merge branch 'master' into issues/SPARK-2287 7de5706 [Takuya UESHIN] Make ScalaReflection be able to handle Generic case classes. (cherry picked from commit bc7041a42dfa84312492ea8cae6fdeaeac4f6d1c) Signed-off-by: Michael Armbrust <michael@databricks.com>
*	[SPARK-2328] [SQL] Add execution of `SHOW TABLES` before `TestHive.reset()`.	Takuya UESHIN	2014-07-02	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	`PruningSuite` is executed first of Hive tests unfortunately, `TestHive.reset()` breaks the test environment. To prevent this, we must run a query before calling reset the first time. Author: Takuya UESHIN <ueshin@happy-camper.st> Closes #1268 from ueshin/issues/SPARK-2328 and squashes the following commits: 043ceac [Takuya UESHIN] Add execution of `SHOW TABLES` before `TestHive.reset()`. (cherry picked from commit 1e2c26c83dd2e807cf0031ceca8b338a1a57cac6) Signed-off-by: Michael Armbrust <michael@databricks.com>
*	SPARK-2186: Spark SQL DSL support for simple aggregations such as SUM and AVG	Ximo Guanter Gonzalbez	2014-07-02	3	-8/+44
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Description This patch enables using the `.select()` function in SchemaRDD with functions such as `Sum`, `Count` and other. Testing Unit tests added. Author: Ximo Guanter Gonzalbez <ximo@tid.es> Closes #1211 from edrevo/add-expression-support-in-select and squashes the following commits: fe4a1e1 [Ximo Guanter Gonzalbez] Extend SQL DSL to functions e1d344a [Ximo Guanter Gonzalbez] SPARK-2186: Spark SQL DSL support for simple aggregations such as SUM and AVG (cherry picked from commit 5c6ec94da1bacd8e65a43acb92b6721493484e7b) Signed-off-by: Michael Armbrust <michael@databricks.com>
*	update the comments in SqlParser	CodingCat	2014-07-01	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \|	SqlParser has been case-insensitive after https://github.com/apache/spark/commit/dab5439a083b5f771d5d5b462d0d517fa8e9aaf2 was merged Author: CodingCat <zhunansjtu@gmail.com> Closes #1275 from CodingCat/master and squashes the following commits: 17931cd [CodingCat] update the comments in SqlParser (cherry picked from commit 6596392da0fc0fee89e22adfca239a3477dfcbab) Signed-off-by: Reynold Xin <rxin@apache.org>
*	[SPARK-2322] Exception in resultHandler should NOT crash DAGScheduler and ↵	Reynold Xin	2014-06-30	3	-6/+78
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	shutdown SparkContext. This should go into 1.0.1. Author: Reynold Xin <rxin@apache.org> Closes #1264 from rxin/SPARK-2322 and squashes the following commits: c77c07f [Reynold Xin] Added comment to SparkDriverExecutionException and a test case for accumulator. 5d8d920 [Reynold Xin] [SPARK-2322] Exception in resultHandler could crash DAGScheduler and shutdown SparkContext. (cherry picked from commit 358ae1534d01ad9e69364a21441a7ef23c2cb516) Signed-off-by: Reynold Xin <rxin@apache.org> Conflicts: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
*	[SPARK-1394] Remove SIGCHLD handler in worker subprocess	Matthew Farrellee	2014-06-28	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	It should not be the responsibility of the worker subprocess, which does not intentionally fork, to try and cleanup child processes. Doing so is complex and interferes with operations such as platform.system(). If it is desirable to have tighter control over subprocesses, then namespaces should be used and it should be the manager's resposibility to handle cleanup. Author: Matthew Farrellee <matt@redhat.com> Closes #1247 from mattf/SPARK-1394 and squashes the following commits: c36f308 [Matthew Farrellee] [SPARK-1394] Remove SIGCHLD handler in worker subprocess (cherry picked from commit 3c104c79d24425786cec0034f269ba19cf465b31) Signed-off-by: Aaron Davidson <aaron@databricks.com>
*	Revert "[maven-release-plugin] prepare release v1.0.1-rc1"	Patrick Wendell	2014-06-27	21	-22/+22
\| \| \| \|	This reverts commit 7feeda3d729f9397aa15ee8750c01ef5aa601962.
*	Revert "[maven-release-plugin] prepare for next development iteration"	Patrick Wendell	2014-06-27	21	-22/+22
\| \| \| \|	This reverts commit ea1a455a755f83f46fc8bf242410917d93d0c52c.
*	[SPARK-2003] Fix python SparkContext example	Matthew Farrellee	2014-06-27	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \|	Author: Matthew Farrellee <matt@redhat.com> Closes #1246 from mattf/SPARK-2003 and squashes the following commits: b12e7ca [Matthew Farrellee] [SPARK-2003] Fix python SparkContext example (cherry picked from commit 0e0686d3ef88e024fcceafe36a0cdbb953f5aeae) Signed-off-by: Patrick Wendell <pwendell@gmail.com>
*	[SPARK-2259] Fix highly misleading docs on cluster / client deploy modes	Andrew Or	2014-06-27	5	-12/+36
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The existing docs are highly misleading. For standalone mode, for example, it encourages the user to use standalone-cluster mode, which is not officially supported. The safeguards have been added in Spark submit itself to prevent bad documentation from leading users down the wrong path in the future. This PR is prompted by countless headaches users of Spark have run into on the mailing list. Author: Andrew Or <andrewor14@gmail.com> Closes #1200 from andrewor14/submit-docs and squashes the following commits: 5ea2460 [Andrew Or] Rephrase cluster vs client explanation c827f32 [Andrew Or] Clarify spark submit messages 9f7ed8f [Andrew Or] Clarify client vs cluster deploy mode + add safeguards (cherry picked from commit f17510e371dfbeaada3c72b884d70c36503ea30a) Signed-off-by: Patrick Wendell <pwendell@gmail.com>
*	[SPARK-2307] SparkUI - storage tab displays incorrect RDDs	Andrew Or	2014-06-27	2	-6/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The issue here is that the `StorageTab` listens for updates from the `StorageStatusListener`, but when a block is kicked out of the cache, `StorageStatusListener` removes it from its list. Thus, there is no way for the `StorageTab` to know whether a block has been dropped. This issue was introduced in #1080, which was itself a bug fix. Here we revert that PR and offer a different fix for the original bug (SPARK-2144). Author: Andrew Or <andrewor14@gmail.com> Closes #1249 from andrewor14/storage-ui-fix and squashes the following commits: af019ce [Andrew Or] Fix SPARK-2307 (cherry picked from commit 21e0f77b6321590ed86223a60cdb8ae08ea4057f) Signed-off-by: Patrick Wendell <pwendell@gmail.com>
*	SPARK-2181:The keys for sorting the columns of Executor page in SparkUI are ↵	witgo	2014-06-26	3	-11/+17
\| \| \| \| \| \| \| \| \| \| \| \| \|	incorrect Author: witgo <witgo@qq.com> Closes #1135 from witgo/SPARK-2181 and squashes the following commits: 39dad90 [witgo] The keys for sorting the columns of Executor page in SparkUI are incorrect (cherry picked from commit 18f29b96c7e0948f5f504e522e5aa8a8d1ab163e) Signed-off-by: Patrick Wendell <pwendell@gmail.com>
*	[maven-release-plugin] prepare for next development iteration	Ubuntu	2014-06-26	21	-22/+22
\|
*	[maven-release-plugin] prepare release v1.0.1-rc1	Ubuntu	2014-06-26	21	-22/+22
\|
*	CHANGES.txt for release 1.0.1	Patrick Wendell	2014-06-26	1	-0/+778
\|
*	Fixing AWS instance type information based upon current EC2 data	Zichuan Ye	2014-06-26	1	-5/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fixed a problem in previous file in which some information regarding AWS instance types were wrong. Such information was updated base upon current AWS EC2 data. Author: Zichuan Ye <jerry@tangentds.com> Closes #1156 from jerry86/master and squashes the following commits: ff36e95 [Zichuan Ye] Fixing AWS instance type information based upon current EC2 data (cherry picked from commit 62d4a0fa9947e64c1533f66ae577557bcfb271c9) Conflicts: ec2/spark_ec2.py