spark - Mirror of Apache Spark

	Commit message (Collapse)	Author	Age	Files	Lines
*	[SPARK-11131][CORE] Fix race in worker registration protocol.	Marcelo Vanzin	2015-10-19	6	-56/+86
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Because the registration RPC was not really an RPC, but a bunch of disconnected messages, it was possible for other messages to be sent before the reply to the registration arrived, and that would confuse the Worker. Especially in local-cluster mode, the worker was succeptible to receiving an executor request before it received a message from the master saying registration succeeded. On top of the above, the change also fixes a ClassCastException when the registration fails, which also affects the executor registration protocol. Because the `ask` is issued with a specific return type, if the error message (of a different type) was returned instead, the code would just die with an exception. This is fixed by having a common base trait for these reply messages. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #9138 from vanzin/SPARK-11131.
*	[SPARK-11063] [STREAMING] Change preferredLocations of Receiver's RDD to ↵	zsxwing	2015-10-19	3	-2/+29
\| \| \| \| \| \| \| \| \| \| \| \|	hosts rather than hostports The format of RDD's preferredLocations must be hostname but the format of Streaming Receiver's scheduling executors is hostport. So it doesn't work. This PR converts `schedulerExecutors` to `hosts` before creating Receiver's RDD. Author: zsxwing <zsxwing@gmail.com> Closes #9075 from zsxwing/SPARK-11063.
*	[SPARK-11180][SQL] Support BooleanType in DataFrame.na.fill	Rishabh Bhardwaj	2015-10-19	2	-16/+27
\| \| \| \| \| \| \| \|	Added support for boolean types in fill and replace methods Author: Rishabh Bhardwaj <rbnext29@gmail.com> Closes #9166 from rishabhbhardwaj/master.
*	[SPARK-11119] [SQL] cleanup for unsafe array and map	Wenchen Fan	2015-10-19	10	-192/+174
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	The purpose of this PR is to keep the unsafe format detail only inside the unsafe class itself, so when we use them(like use unsafe array in unsafe map, use unsafe array and map in columnar cache), we don't need to understand the format before use them. change list: * unsafe array's 4-bytes numElements header is now required(was optional), and become a part of unsafe array format. * w.r.t the previous changing, the `sizeInBytes` of unsafe array now counts the 4-bytes header. * unsafe map's format was `[numElements] [key array numBytes] [key array content(without numElements header)] [value array content(without numElements header)]` before, which is a little hacky as it makes unsafe array's header optional. I think saving 4 bytes is not a big deal, so the format is now: `[key array numBytes] [unsafe key array] [unsafe value array]`. * w.r.t the previous changing, the `sizeInBytes` of unsafe map now counts both map's header and array's header. Author: Wenchen Fan <wenchen@databricks.com> Closes #9131 from cloud-fan/unsafe.
*	[SPARK-10668] [ML] Use WeightedLeastSquares in LinearRegression with L…	lewuathe	2015-10-19	10	-494/+640
\| \| \| \| \| \| \| \| \| \| \|	…2 regularization if the number of features is small Author: lewuathe <lewuathe@me.com> Author: Lewuathe <sasaki@treasure-data.com> Author: Kai Sasaki <sasaki@treasure-data.com> Author: Lewuathe <lewuathe@me.com> Closes #8884 from Lewuathe/SPARK-10668.
*	[SPARK-9643] Upgrade pyrolite to 4.9	Alex Angelini	2015-10-19	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \|	Includes: https://github.com/irmen/Pyrolite/pull/23 which fixes datetimes with timezones. JoshRosen https://issues.apache.org/jira/browse/SPARK-9643 Author: Alex Angelini <alex.louis.angelini@gmail.com> Closes #7950 from angelini/upgrade_pyrolite_up.
*	[SPARK-10921][YARN] Completely remove the use of SparkContext.prefer…	Jacek Laskowski	2015-10-19	4	-19/+9
\| \| \| \| \| \| \| \|	…redNodeLocationData Author: Jacek Laskowski <jacek.laskowski@deepsense.io> Closes #8976 from jaceklaskowski/SPARK-10921.
*	[SPARK-11126][SQL] Fix the potential flaky test	zsxwing	2015-10-19	1	-0/+2
\| \| \| \| \| \| \| \|	The unit test added in #9132 is flaky. This is a follow up PR to add `listenerBus.waitUntilEmpty` to fix it. Author: zsxwing <zsxwing@gmail.com> Closes #9163 from zsxwing/SPARK-11126-follow-up.
*	[SPARK-7018][BUILD] Refactor dev/run-tests-jenkins into Python	Brennon York	2015-10-18	8	-269/+285
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This commit refactors the `run-tests-jenkins` script into Python. This refactoring was done by brennonyork in #7401; this PR contains a few minor edits from joshrosen in order to bring it up to date with other recent changes. From the original PR description (by brennonyork): Currently a few things are left out that, could and I think should, be smaller JIRA's after this. 1. There are still a few areas where we use environment variables where we don't need to (like `CURRENT_BLOCK`). I might get around to fixing this one in lieu of everything else, but wanted to point that out. 2. The PR tests are still written in bash. I opted to not change those and just rewrite the runner into Python. This is a great follow-on JIRA IMO. 3. All of the linting scripts are still in bash as well and would likely do to just add those in as follow-on JIRA's as well. Closes #7401. Author: Brennon York <brennon.york@capitalone.com> Closes #9161 from JoshRosen/run-tests-jenkins-refactoring.
*	[SPARK-11126][SQL] Fix a memory leak in SQLListener._stageIdToStageMetrics	zsxwing	2015-10-18	2	-3/+23
\| \| \| \| \| \| \| \| \| \|	SQLListener adds all stage infos to `_stageIdToStageMetrics`, but only removes stage infos belonging to SQL executions. This PR fixed it by ignoring stages that don't belong to SQL executions. Reported by Terry Hoo in https://www.mail-archive.com/userspark.apache.org/msg38810.html Author: zsxwing <zsxwing@gmail.com> Closes #9132 from zsxwing/SPARK-11126.
*	[SPARK-11158][SQL] Modified _verify_type() to be more informative on Errors ↵	Mahmoud Lababidi	2015-10-18	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \|	by presenting the Object The _verify_type() function had Errors that were raised when there were Type conversion issues but left out the Object in question. The Object is now added in the Error to reduce the strain on the user to debug through to figure out the Object that failed the Type conversion. The use case for me was a Pandas DataFrame that contained 'nan' as values for columns of Strings. Author: Mahmoud Lababidi <mahmoud@thehumangeo.com> Author: Mahmoud Lababidi <lababidi@gmail.com> Closes #9149 from lababidi/master.
*	MAINTENANCE: Automated closing of pull requests.	Patrick Wendell	2015-10-18	0	-0/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This commit exists to close the following pull requests on Github: Closes #8737 (close requested by 'srowen') Closes #5323 (close requested by 'JoshRosen') Closes #6148 (close requested by 'JoshRosen') Closes #7557 (close requested by 'JoshRosen') Closes #7047 (close requested by 'srowen') Closes #8713 (close requested by 'marmbrus') Closes #5834 (close requested by 'srowen') Closes #7467 (close requested by 'tdas') Closes #8943 (close requested by 'xiaowen147') Closes #4434 (close requested by 'JoshRosen') Closes #8949 (close requested by 'srowen') Closes #5368 (close requested by 'JoshRosen') Closes #8186 (close requested by 'marmbrus') Closes #5147 (close requested by 'JoshRosen')
*	[SPARK-11169] Remove the extra spaces in merge script	Reynold Xin	2015-10-18	1	-8/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Our merge script now turns ``` [SPARK-1234][SPARK-1235][SPARK-1236][SQL] description ``` into ``` [SPARK-1234] [SPARK-1235] [SPARK-1236] [SQL] description ``` The extra spaces are more annoying in git since the first line of a git commit is supposed to be very short. Doctest passes with the following command: ``` python -m doctest merge_spark_pr.py ``` Author: Reynold Xin <rxin@databricks.com> Closes #9156 from rxin/SPARK-11169.
*	[SPARK-11174] [DOCS] Fix typo in the GraphX programming guide	Lukasz Piepiora	2015-10-18	1	-1/+1
\| \| \| \| \| \| \| \|	This patch fixes a small typo in the GraphX programming guide Author: Lukasz Piepiora <lpiepiora@gmail.com> Closes #9160 from lpiepiora/11174-fix-typo-in-graphx-programming-guide.
*	[SPARK-11172] Close JsonParser/Generator in test	tedyu	2015-10-18	1	-6/+8
\| \| \| \| \| \|	Author: tedyu <yuzhihong@gmail.com> Closes #9157 from tedyu/master.
*	[SPARK-11000] [YARN] Load `metadata.Hive` class only when ↵	huangzhaowei	2015-10-17	1	-4/+4
\| \| \| \| \| \| \| \|	`hive.metastore.uris` was set to avoid bootting the database twice Author: huangzhaowei <carlmartinmax@gmail.com> Closes #9026 from SaintBacchus/SPARK-11000.
*	[SPARK-11129] [MESOS] Link Spark WebUI from Mesos WebUI	ph	2015-10-17	2	-2/+12
\| \| \| \| \| \| \| \| \| \| \|	Mesos has a feature for linking to frameworks running on top of Mesos from the Mesos WebUI. This commit enables Spark to make use of this feature so one can directly visit the running Spark WebUIs from the Mesos WebUI. Author: ph <ph@plista.com> Closes #9135 from philipphoffmann/SPARK-11129.
*	[SPARK-10185] [SQL] Feat sql comma separated paths	Koert Kuipers	2015-10-17	5	-11/+81
\| \| \| \| \| \| \| \|	Make sure comma-separated paths get processed correcly in ResolvedDataSource for a HadoopFsRelationProvider Author: Koert Kuipers <koert@tresata.com> Closes #8416 from koertkuipers/feat-sql-comma-separated-paths.
*	[SPARK-11165] Logging trait should be private - not DeveloperApi.	Reynold Xin	2015-10-17	1	-3/+2
\| \| \| \| \| \| \| \|	Its classdoc actually says; "NOTE: DO NOT USE this class outside of Spark. It is intended as an internal utility." Author: Reynold Xin <rxin@databricks.com> Closes #9155 from rxin/private-logging-trait.
*	[SPARK-9963] [ML] RandomForest cleanup: replace predictNodeIndex with ↵	Luvsandondov Lkhamsuren	2015-10-17	2	-43/+38
\| \| \| \| \| \| \| \| \| \|	predictImpl predictNodeIndex is moved to LearningNode and renamed predictImpl for consistency with Node.predictImpl Author: Luvsandondov Lkhamsuren <lkhamsurenl@gmail.com> Closes #8609 from lkhamsurenl/SPARK-9963.
*	[SPARK-11029] [ML] Add computeCost to KMeansModel in spark.ml	Yuhao Yang	2015-10-17	2	-0/+13
\| \| \| \| \| \| \| \| \| \| \| \|	jira: https://issues.apache.org/jira/browse/SPARK-11029 We should add a method analogous to spark.mllib.clustering.KMeansModel.computeCost to spark.ml.clustering.KMeansModel. This will be a temp fix until we have proper evaluators defined for clustering. Author: Yuhao Yang <hhbyyh@gmail.com> Author: yuhaoyang <yuhao@zhanglipings-iMac.local> Closes #9073 from hhbyyh/computeCost.
*	[SPARK-11084] [ML] [PYTHON] Check if index can contain non-zero value before ↵	zero323	2015-10-16	2	-2/+12
\| \| \| \| \| \| \| \| \| \|	binary search At this moment `SparseVector.__getitem__` executes `np.searchsorted` first and checks if result is in an expected range after that. It is possible to check if index can contain non-zero value before executing `np.searchsorted`. Author: zero323 <matthew.szymkiewicz@gmail.com> Closes #9098 from zero323/sparse_vector_getitem_improved.
*	[SPARK-10599] [MLLIB] Lower communication for block matrix multiplication	Burak Yavuz	2015-10-16	2	-22/+76
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This PR aims to decrease communication costs in BlockMatrix multiplication in two ways: - Simulate the multiplication on the driver, and figure out which blocks actually need to be shuffled - Send the block once to a partition, and join inside the partition rather than sending multiple copies to the same partition NOTE: One important note is that right now, the old behavior of checking for multiple blocks with the same index is lost. This is not hard to add, but is a little more expensive than how it was. Initial benchmarking showed promising results (look below), however I did hit some `FileNotFound` exceptions with the new implementation after the shuffle. Size A: 1e5 x 1e5 Size B: 1e5 x 1e5 Block Sizes: 1024 x 1024 Sparsity: 0.01 Old implementation: 1m 13s New implementation: 9s cc avulanov Would you be interested in helping me benchmark this? I used your code from the mailing list (which you sent about 3 months ago?), and the old implementation didn't even run, but the new implementation completed in 268s in a 120 GB / 16 core cluster Author: Burak Yavuz <brkyvz@gmail.com> Closes #8757 from brkyvz/opt-bmm.
*	[SPARK-11050] [MLLIB] PySpark SparseVector can return wrong index in e…	Bhargav Mangipudi	2015-10-16	1	-2/+3
\| \| \| \| \| \| \| \| \| \| \| \|	…rror message For negative indices in the SparseVector, we update the index value. If we have an incorrect index at this point, the error message has the incorrect updated index instead of the original one. This change contains the fix for the same. Author: Bhargav Mangipudi <bhargav.mangipudi@gmail.com> Closes #9069 from bhargav/spark-10759.
*	[SPARK-11109] [CORE] Move FsHistoryProvider off deprecated ↵	gweidner	2015-10-16	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	AccessControlException Switched from deprecated org.apache.hadoop.fs.permission.AccessControlException to org.apache.hadoop.security.AccessControlException. Author: gweidner <gweidner@us.ibm.com> Closes #9144 from gweidner/SPARK-11109.
*	[SPARK-11104] [STREAMING] Fix a deadlock in StreamingContex.stop	zsxwing	2015-10-16	1	-24/+31
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The following deadlock may happen if shutdownHook and StreamingContext.stop are running at the same time. ``` Java stack information for the threads listed above: =================================================== "Thread-2": at org.apache.spark.streaming.StreamingContext.stop(StreamingContext.scala:699) - waiting to lock <0x00000005405a1680> (a org.apache.spark.streaming.StreamingContext) at org.apache.spark.streaming.StreamingContext.org$apache$spark$streaming$StreamingContext$$stopOnShutdown(StreamingContext.scala:729) at org.apache.spark.streaming.StreamingContext$$anonfun$start$1.apply$mcV$sp(StreamingContext.scala:625) at org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:266) at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ShutdownHookManager.scala:236) at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:236) at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:236) at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1697) at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply$mcV$sp(ShutdownHookManager.scala:236) at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:236) at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:236) at scala.util.Try$.apply(Try.scala:161) at org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:236) - locked <0x00000005405b6a00> (a org.apache.spark.util.SparkShutdownHookManager) at org.apache.spark.util.SparkShutdownHookManager$$anon$2.run(ShutdownHookManager.scala:216) at org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54) "main": at org.apache.spark.util.SparkShutdownHookManager.remove(ShutdownHookManager.scala:248) - waiting to lock <0x00000005405b6a00> (a org.apache.spark.util.SparkShutdownHookManager) at org.apache.spark.util.ShutdownHookManager$.removeShutdownHook(ShutdownHookManager.scala:199) at org.apache.spark.streaming.StreamingContext.stop(StreamingContext.scala:712) - locked <0x00000005405a1680> (a org.apache.spark.streaming.StreamingContext) at org.apache.spark.streaming.StreamingContext.stop(StreamingContext.scala:684) - locked <0x00000005405a1680> (a org.apache.spark.streaming.StreamingContext) at org.apache.spark.streaming.SessionByKeyBenchmark$.main(SessionByKeyBenchmark.scala:108) at org.apache.spark.streaming.SessionByKeyBenchmark.main(SessionByKeyBenchmark.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:680) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) ``` This PR just moved `ShutdownHookManager.removeShutdownHook` out of `synchronized` to avoid deadlock. Author: zsxwing <zsxwing@gmail.com> Closes #9116 from zsxwing/stop-deadlock.
*	[SPARK-10974] [STREAMING] Add progress bar for output operation column and ↵	zsxwing	2015-10-16	14	-201/+258
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	use red dots for failed batches Screenshot: <img width="1363" alt="1" src="https://cloud.githubusercontent.com/assets/1000778/10342571/385d9340-6d4c-11e5-8e79-1fa4c3c98f81.png"> Also fixed the description and duration for output operations that don't have spark jobs. <img width="1354" alt="2" src="https://cloud.githubusercontent.com/assets/1000778/10342775/4bd52a0e-6d4d-11e5-99bc-26265a9fc792.png"> Author: zsxwing <zsxwing@gmail.com> Closes #9010 from zsxwing/output-op-progress-bar.
*	[SPARK-10581] [DOCS] Groups are not resolved in scaladoc in sql classes	Pravin Gadakh	2015-10-16	3	-6/+6
\| \| \| \| \| \| \| \| \| \| \| \|	Groups are not resolved properly in scaladoc in following classes: sql/core/src/main/scala/org/apache/spark/sql/Column.scala sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala sql/core/src/main/scala/org/apache/spark/sql/functions.scala Author: Pravin Gadakh <pravingadakh177@gmail.com> Closes #9148 from pravingadakh/master.
*	[SPARK-11124] JsonParser/Generator should be closed for resource recycle	navis.ryu	2015-10-16	4	-52/+57
\| \| \| \| \| \| \| \|	Some json parsers are not closed. parser in JacksonParser#parseJson, for example. Author: navis.ryu <navis@apache.org> Closes #9130 from navis/SPARK-11124.
*	[SPARK-11122] [BUILD] [WARN] Add tag to fatal warnings	Jakob Odersky	2015-10-16	1	-2/+9
\| \| \| \| \| \| \| \|	Shows that an error is actually due to a fatal warning. Author: Jakob Odersky <jodersky@gmail.com> Closes #9128 from jodersky/fatalwarnings.
*	[SPARK-11094] Strip extra strings from Java version in test runner	Jakob Odersky	2015-10-16	1	-9/+6
\| \| \| \| \| \| \| \| \|	Removes any extra strings from the Java version, fixing subsequent integer parsing. This is required since some OpenJDK versions (specifically in Debian testing), append an extra "-internal" string to the version field. Author: Jakob Odersky <jodersky@gmail.com> Closes #9111 from jodersky/fixtestrunner.
*	[SPARK-11092] [DOCS] Add source links to scaladoc generation	Jakob Odersky	2015-10-16	1	-2/+19
\| \| \| \| \| \| \| \|	Modify the SBT build script to include GitHub source links for generated Scaladocs, on releases only (no snapshots). Author: Jakob Odersky <jodersky@gmail.com> Closes #9110 from jodersky/unidoc.
*	[SPARK-11060] [STREAMING] Fix some potential NPE in DStream transformation	jerryshao	2015-10-16	5	-9/+83
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch fixes: 1. Guard out against NPEs in `TransformedDStream` when parent DStream returns None instead of empty RDD. 2. Verify some input streams which will potentially return None. 3. Add unit test to verify the behavior when input stream returns None. cc tdas , please help to review, thanks a lot :). Author: jerryshao <sshao@hortonworks.com> Closes #9070 from jerryshao/SPARK-11060.
*	[SPARK-11135] [SQL] Exchange incorrectly skips sorts when existing ordering ↵	Josh Rosen	2015-10-15	2	-2/+52
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	is non-empty subset of required ordering In Spark SQL, the Exchange planner tries to avoid unnecessary sorts in cases where the data has already been sorted by a superset of the requested sorting columns. For instance, let's say that a query calls for an operator's input to be sorted by `a.asc` and the input happens to already be sorted by `[a.asc, b.asc]`. In this case, we do not need to re-sort the input. The converse, however, is not true: if the query calls for `[a.asc, b.asc]`, then `a.asc` alone will not satisfy the ordering requirements, requiring an additional sort to be planned by Exchange. However, the current Exchange code gets this wrong and incorrectly skips sorting when the existing output ordering is a subset of the required ordering. This is simple to fix, however. This bug was introduced in https://github.com/apache/spark/pull/7458, so it affects 1.5.0+. This patch fixes the bug and significantly improves the unit test coverage of Exchange's sort-planning logic. Author: Josh Rosen <joshrosen@databricks.com> Closes #9140 from JoshRosen/SPARK-11135.
*	[SPARK-10412] [SQL] report memory usage for tungsten sql physical operator	Wenchen Fan	2015-10-15	10	-43/+116
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	https://issues.apache.org/jira/browse/SPARK-10412 some screenshots: ### aggregate: ![screen shot 2015-10-12 at 2 23 11 pm](https://cloud.githubusercontent.com/assets/3182036/10439534/618320a4-70ef-11e5-94d8-62ea7f2d1531.png) ### join ![screen shot 2015-10-12 at 2 23 29 pm](https://cloud.githubusercontent.com/assets/3182036/10439537/6724797c-70ef-11e5-8f75-0cf5cbd42048.png) Author: Wenchen Fan <wenchen@databricks.com> Author: Wenchen Fan <cloud0fan@163.com> Closes #8931 from cloud-fan/viz.
*	[SPARK-11078] Ensure spilling tests actually spill	Andrew Or	2015-10-15	8	-581/+534
\| \| \| \| \| \| \| \|	#9084 uncovered that many tests that test spilling don't actually spill. This is a follow-up patch to fix that to ensure our unit tests actually catch potential bugs in spilling. The size of this patch is inflated by the refactoring of `ExternalSorterSuite`, which had a lot of duplicate code and logic. Author: Andrew Or <andrew@databricks.com> Closes #9124 from andrewor14/spilling-tests.
*	[SPARK-10515] When killing executor, the pending replacement executors ↵	KaiXinXiaoLei	2015-10-15	2	-0/+37
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	should not be lost If the heartbeat receiver kills executors (and new ones are not registered to replace them), the idle timeout for the old executors will be lost (and then change a total number of executors requested by Driver), So new ones will be not to asked to replace them. For example, executorsPendingToRemove=Set(1), and executor 2 is idle timeout before a new executor is asked to replace executor 1. Then driver kill executor 2, and sending RequestExecutors to AM. But executorsPendingToRemove=Set(1,2), So AM doesn't allocate a executor to replace 1. see: https://github.com/apache/spark/pull/8668 Author: KaiXinXiaoLei <huleilei1@huawei.com> Author: huleilei <huleilei1@huawei.com> Closes #8945 from KaiXinXiaoLei/pendingexecutor.
*	fix typo bellow -> below	Britta Weber	2015-10-15	2	-3/+3
\| \| \| \| \| \|	Author: Britta Weber <britta.weber@elasticsearch.com> Closes #9136 from brwe/typo-bellow.
*	[SPARK-11071] [LAUNCHER] Fix flakiness in LauncherServerSuite::timeout.	Marcelo Vanzin	2015-10-15	2	-10/+34
\| \| \| \| \| \| \| \| \| \|	The test could fail depending on scheduling of the various threads involved; the change removes some sources of races, while making the test a little more resilient by trying a few times before giving up. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #9079 from vanzin/SPARK-11071.
*	[SPARK-11039][Documentation][Web UI] Document additional ui configurations	Nick Pritchard	2015-10-15	1	-0/+14
\| \| \| \| \| \| \| \| \| \|	Add documentation for configuration: - spark.sql.ui.retainedExecutions - spark.streaming.ui.retainedBatches Author: Nick Pritchard <nicholas.pritchard@falkonry.com> Closes #9052 from pnpritchard/SPARK-11039.
*	[SPARK-11047] Internal accumulators miss the internal flag when replaying ↵	Carson Wang	2015-10-15	3	-32/+79
\| \| \| \| \| \| \| \| \| \| \|	events in the history server Internal accumulators don't write the internal flag to event log. So on the history server Web UI, all accumulators are not internal. This causes incorrect peak execution memory and unwanted accumulator table displayed on the stage page. To fix it, I add the "internal" property of AccumulableInfo when writing the event log. Author: Carson Wang <carson.wang@intel.com> Closes #9061 from carsonwang/accumulableBug.
*	[SPARK-11066] Update DAGScheduler's "misbehaved ResultHandler"	shellberg	2015-10-15	1	-2/+11
\| \| \| \| \| \| \| \|	Restrict tasks (of job) to only 1 to ensure that the causing Exception asserted for job failure is the deliberately thrown DAGSchedulerSuiteDummyException intended, not an UnsupportedOperationException from any second/subsequent tasks that can propagate from a race condition during code execution. Author: shellberg <sah@zepler.org> Closes #9076 from shellberg/shellberg-DAGSchedulerSuite-misbehavedResultHandlerTest-patch-1.
*	[SPARK-11099] [SPARK SHELL] [SPARK SUBMIT] Default conf property file i…	Jeff Zhang	2015-10-15	3	-18/+45
\| \| \| \| \| \| \| \|	Please help review it. Thanks Author: Jeff Zhang <zjffdu@apache.org> Closes #9114 from zjffdu/SPARK-11099.
*	[SPARK-11093] [CORE] ChildFirstURLClassLoader#getResources should return all ↵	Adam Lewandowski	2015-10-15	2	-9/+44
\| \| \| \| \| \| \| \|	found resources, not just those in the child classloader Author: Adam Lewandowski <alewandowski@ipcoop.com> Closes #9106 from alewando/childFirstFix.
*	[SPARK-11076] [SQL] Add decimal support for floor and ceil	Cheng Hao	2015-10-14	4	-13/+91
\| \| \| \| \| \| \| \|	Actually all of the `UnaryMathExpression` doens't support the Decimal, will create follow ups for supporing it. This is the first PR which will be good to review the approach I am taking. Author: Cheng Hao <hao.cheng@intel.com> Closes #9086 from chenghao-intel/ceiling.
*	[SPARK-11017] [SQL] Support ImperativeAggregates in TungstenAggregate	Josh Rosen	2015-10-14	9	-260/+457
\| \| \| \| \| \| \| \| \| \|	This patch extends TungstenAggregate to support ImperativeAggregate functions. The existing TungstenAggregate operator only supported DeclarativeAggregate functions, which are defined in terms of Catalyst expressions and can be evaluated via generated projections. ImperativeAggregate functions, on the other hand, are evaluated by calling their `initialize`, `update`, `merge`, and `eval` methods. The basic strategy here is similar to how SortBasedAggregate evaluates both types of aggregate functions: use a generated projection to evaluate the expression-based declarative aggregates with dummy placeholder expressions inserted in place of the imperative aggregate function output, then invoke the imperative aggregate functions and target them against the aggregation buffer. The bulk of the diff here consists of code that was copied and adapted from SortBasedAggregate, with some key changes to handle TungstenAggregate's sort fallback path. Author: Josh Rosen <joshrosen@databricks.com> Closes #9038 from JoshRosen/support-interpreted-in-tungsten-agg-final.
*	[SPARK-10829] [SQL] Filter combine partition key and attribute doesn't work ↵	Cheng Hao	2015-10-14	2	-12/+39
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	in DataSource scan ```scala withSQLConf(SQLConf.PARQUET_FILTER_PUSHDOWN_ENABLED.key -> "true") { withTempPath { dir => val path = s"${dir.getCanonicalPath}/part=1" (1 to 3).map(i => (i, i.toString)).toDF("a", "b").write.parquet(path) // If the "part = 1" filter gets pushed down, this query will throw an exception since // "part" is not a valid column in the actual Parquet file checkAnswer( sqlContext.read.parquet(path).filter("a > 0 and (part = 0 or a > 1)"), (2 to 3).map(i => Row(i, i.toString, 1))) } } ``` We expect the result to be: ``` 2,1 3,1 ``` But got ``` 1,1 2,1 3,1 ``` Author: Cheng Hao <hao.cheng@intel.com> Closes #8916 from chenghao-intel/partition_filter.
*	[SPARK-11113] [SQL] Remove DeveloperApi annotation from private classes.	Reynold Xin	2015-10-14	29	-153/+22
\| \| \| \| \| \| \| \|	o.a.s.sql.catalyst and o.a.s.sql.execution are supposed to be private. Author: Reynold Xin <rxin@databricks.com> Closes #9121 from rxin/SPARK-11113.
*	[SPARK-10104] [SQL] Consolidate different forms of table identifiers	Wenchen Fan	2015-10-14	32	-327/+212
\| \| \| \| \| \| \| \| \|	Right now, we have QualifiedTableName, TableIdentifier, and Seq[String] to represent table identifiers. We should only have one form and TableIdentifier is the best one because it provides methods to get table name, database name, return unquoted string, and return quoted string. Author: Wenchen Fan <wenchen@databricks.com> Author: Wenchen Fan <cloud0fan@163.com> Closes #8453 from cloud-fan/table-name.
*	[SPARK-11068] [SQL] [FOLLOW-UP] move execution listener to util	Wenchen Fan	2015-10-14	3	-2/+4
\| \| \| \| \| \|	Author: Wenchen Fan <wenchen@databricks.com> Closes #9119 from cloud-fan/callback.