spark - Mirror of Apache Spark

	Commit message (Collapse)	Author	Age	Files	Lines
*	[SPARK-10885] [STREAMING] Display the failed output op in Streaming UI	zsxwing	2015-10-06	6	-27/+143
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This PR implements the following features for both `master` and `branch-1.5`. 1. Display the failed output op count in the batch list 2. Display the failure reason of output op in the batch detail page Screenshots: <img width="1356" alt="1" src="https://cloud.githubusercontent.com/assets/1000778/10198387/5b2b97ec-67ce-11e5-81c2-f818b9d2f3ad.png"> <img width="1356" alt="2" src="https://cloud.githubusercontent.com/assets/1000778/10198388/5b76ac14-67ce-11e5-8c8b-de2683c5b485.png"> There are still two remaining problems in the UI. 1. If an output operation doesn't run any spark job, we cannot get the its duration since now it's the sum of all jobs' durations. 2. If an output operation doesn't run any spark job, we cannot get the description since it's the latest job's call site. We need to add new `StreamingListenerEvent` about output operations to fix them. So I'd like to fix them only for `master` in another PR. Author: zsxwing <zsxwing@gmail.com> Closes #8950 from zsxwing/batch-failure.
*	[SPARK-10957] [ML] setParams changes quantileProbabilities unexpectly in ↵	Xiangrui Meng	2015-10-06	1	-5/+1
\| \| \| \| \| \| \| \| \| \|	PySpark's AFTSurvivalRegression If user doesn't specify `quantileProbs` in `setParams`, it will get reset to the default value. We don't need special handling here. vectorijk yanboliang Author: Xiangrui Meng <meng@databricks.com> Closes #9001 from mengxr/SPARK-10957.
*	[SPARK-10688] [ML] [PYSPARK] Python API for AFTSurvivalRegression	vectorijk	2015-10-06	1	-2/+169
\| \| \| \| \| \| \| \|	Implement Python API for AFTSurvivalRegression Author: vectorijk <jiangkai@gmail.com> Closes #8926 from vectorijk/spark-10688.
*	[SPARK-10901] [YARN] spark.yarn.user.classpath.first doesn't work	Thomas Graves	2015-10-06	1	-12/+27
\| \| \| \| \| \| \| \| \| \| \|	This should go into 1.5.2 also. The issue is we were no longer adding the __app__.jar to the system classpath. Author: Thomas Graves <tgraves@staydecay.corp.gq1.yahoo.com> Author: Tom Graves <tgraves@yahoo-inc.com> Closes #8959 from tgravescs/SPARK-10901.
*	[SPARK-10916] [YARN] Set perm gen size when launching containers on YARN.	Marcelo Vanzin	2015-10-06	6	-27/+48
\| \| \| \| \| \| \| \| \| \|	This makes YARN containers behave like all other processes launched by Spark, which launch with a default perm gen size of 256m unless overridden by the user (or not needed by the vm). Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #8970 from vanzin/SPARK-10916.
*	[SPARK-10938] [SQL] remove typeId in columnar cache	Davies Liu	2015-10-06	13	-151/+63
\| \| \| \| \| \| \| \|	This PR remove the typeId in columnar cache, it's not needed anymore, it also remove DATE and TIMESTAMP (use INT/LONG instead). Author: Davies Liu <davies@databricks.com> Closes #8989 from davies/refactor_cache.
*	[SPARK-10585] [SQL] [FOLLOW-UP] remove no-longer-necessary code for unsafe ↵	Wenchen Fan	2015-10-05	3	-808/+0
\| \| \| \| \| \| \| \| \| \|	generation These code was left there to produce clear diff for https://github.com/apache/spark/pull/8747 Author: Wenchen Fan <cloud0fan@163.com> Closes #8991 from cloud-fan/clean.
*	[SPARK-10900] [STREAMING] Add output operation events to StreamingListener	zsxwing	2015-10-05	7	-9/+125
\| \| \| \| \| \| \| \| \| \| \|	Add output operation events to StreamingListener so as to implement the following UI features: 1. Progress bar of a batch in the batch list. 2. Be able to display output operation `description` and `duration` when there is no spark job in a Streaming job. Author: zsxwing <zsxwing@gmail.com> Closes #8958 from zsxwing/output-operation-events.
*	[SPARK-10934] [SQL] handle hashCode of unsafe array correctly	Wenchen Fan	2015-10-05	2	-1/+12
\| \| \| \| \| \| \| \|	`Murmur3_x86_32.hashUnsafeWords` only accepts word-aligned bytes, but unsafe array is not. Author: Wenchen Fan <cloud0fan@163.com> Closes #8987 from cloud-fan/hash.
*	[SPARK-10585] [SQL] only copy data once when generate unsafe projection	Wenchen Fan	2015-10-05	12	-84/+950
\| \| \| \| \| \| \| \| \| \| \|	This PR is a completely rewritten of GenerateUnsafeProjection, to accomplish the goal of copying data only once. The old code of GenerateUnsafeProjection is still there to reduce review difficulty. Instead of creating unsafe conversion code for struct, array and map, we create code of writing the content to the global row buffer. Author: Wenchen Fan <cloud0fan@163.com> Author: Wenchen Fan <cloud0fan@outlook.com> Closes #8747 from cloud-fan/copy-once.
*	[SPARK-10889] [STREAMING] Bump KCL to add MillisBehindLatest metric	Avrohom Katz	2015-10-04	1	-1/+1
\| \| \| \| \| \| \| \|	I don't believe the API changed at all. Author: Avrohom Katz <iambpentameter@gmail.com> Closes #8957 from akatz/kcl-upgrade.
*	[SPARK-9570] [DOCS] Consistent recommendation for submitting spark apps to ↵	Sean Owen	2015-10-04	4	-27/+34
\| \| \| \| \| \| \| \| \| \| \| \|	YARN, -master yarn --deploy-mode x vs -master yarn-x'. Recommend `--master yarn --deploy-mode {cluster,client}` consistently in docs. Follow-on to https://github.com/apache/spark/pull/8385 CC nssalian Author: Sean Owen <sowen@cloudera.com> Closes #8968 from srowen/SPARK-9570.
*	[SPARK-10904] [SPARKR] Fix to support `select(df, c("col1", "col2"))`	felixcheung	2015-10-03	2	-6/+21
\| \| \| \| \| \| \| \|	The fix is to coerce `c("a", "b")` into a list such that it could be serialized to call JVM with. Author: felixcheung <felixcheung_m@hotmail.com> Closes #8961 from felixcheung/rselect.
*	Remove TODO in ShuffleMemoryManager.	Reynold Xin	2015-10-03	1	-1/+0
\|
*	FIX: rememberDuration reassignment error message	Guillaume Poulin	2015-10-03	1	-11/+5
\| \| \| \| \| \| \| \|	I was reading throught the scheduler and found this small mistake. Author: Guillaume Poulin <guillaume@hopper.com> Closes #8966 from gpoulin/remember_duration_typo.
*	[SPARK-6028] [CORE] Remerge #6457: new RPC implemetation and also pick #8905	zsxwing	2015-10-03	31	-71/+1715
\| \| \| \| \| \| \| \|	This PR just reverted https://github.com/apache/spark/commit/02144d6745ec0a6d8877d969feb82139bd22437f to remerge #6457 and also included the commits in #8905. Author: zsxwing <zsxwing@gmail.com> Closes #8944 from zsxwing/SPARK-6028.
*	[SPARK-7275] [SQL] Make LogicalRelation public	gweidner	2015-10-03	1	-1/+1
\| \| \| \| \| \| \| \|	Given LogicalRelation (and other classes) were moved from sources package to execution.sources package, removed private[sql] to make LogicalRelation public to facilitate access for data sources. Author: gweidner <gweidner@us.ibm.com> Closes #8965 from gweidner/SPARK-7275.
*	[SPARK-10317] [CORE] Compatibility between history server script and ↵	Joshi	2015-10-02	3	-22/+96
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	functionality Compatibility between history server script and functionality The history server has its argument parsing class in HistoryServerArguments. However, this doesn't get involved in the start-history-server.sh codepath where the $0 arg is assigned to spark.history.fs.logDirectory and all other arguments discarded (e.g --property-file.) This stops the other options being usable from this script Author: Joshi <rekhajoshm@gmail.com> Author: Rekha Joshi <rekhajoshm@gmail.com> Closes #8758 from rekhajoshm/SPARK-10317.
*	[HOT-FIX] Fix style.	Yin Huai	2015-10-02	1	-2/+2
\| \| \| \| \| \| \| \|	https://github.com/apache/spark/pull/8882 broke our build. Author: Yin Huai <yhuai@databricks.com> Closes #8964 from yhuai/fixStyle.
*	[SPARK-6530] [ML] Add chi-square selector for ml package	Xusen Yin	2015-10-02	3	-0/+213
\| \| \| \| \| \| \| \|	See JIRA [here](https://issues.apache.org/jira/browse/SPARK-6530). Author: Xusen Yin <yinxusen@gmail.com> Closes #5742 from yinxusen/SPARK-6530.
*	[SPARK-5890] [ML] Add feature discretizer	Xusen Yin	2015-10-02	2	-0/+274
\| \| \| \| \| \| \| \| \| \|	JIRA issue [here](https://issues.apache.org/jira/browse/SPARK-5890). I borrow the code of `findSplits` from `RandomForest`. I don't think it's good to call it from `RandomForest` directly. Author: Xusen Yin <yinxusen@gmail.com> Closes #5779 from yinxusen/SPARK-5890.
*	[SPARK-9798] [ML] CrossValidatorModel Documentation Improvements	Rerngvit Yanggratoke	2015-10-02	1	-0/+4
\| \| \| \| \| \| \| \|	Document CrossValidatorModel members: bestModel and avgMetrics Author: Rerngvit Yanggratoke <rerngvit@kth.se> Closes #8882 from rerngvit/Spark-9798.
*	[SPARK-9867] [SQL] Move utilities for binary data into ByteArray	Takeshi YAMAMURO	2015-10-01	3	-51/+52
\| \| \| \| \| \| \| \|	The utilities such as Substring#substringBinarySQL and BinaryPrefixComparator#computePrefix for binary data are put together in ByteArray for easy-to-read. Author: Takeshi YAMAMURO <linguin.m.s@gmail.com> Closes #8122 from maropu/CleanUpForBinaryType.
*	[SPARK-10400] [SQL] Renames SQLConf.PARQUET_FOLLOW_PARQUET_FORMAT_SPEC	Cheng Lian	2015-10-01	6	-148/+231
\| \| \| \| \| \| \| \| \| \| \| \|	We introduced SQL option `spark.sql.parquet.followParquetFormatSpec` while working on implementing Parquet backwards-compatibility rules in SPARK-6777. It indicates whether we should use legacy Parquet format adopted by Spark 1.4 and prior versions or the standard format defined in parquet-format spec to write Parquet files. This option defaults to `false` and is marked as a non-public option (`isPublic = false`) because we haven't finished refactored Parquet write path. The problem is, the name of this option is somewhat confusing, because it's not super intuitive why we shouldn't follow the spec. Would be nice to rename it to `spark.sql.parquet.writeLegacyFormat`, and invert its default value (the two option names have opposite meanings). Although this option is private in 1.5, we'll make it public in 1.6 after refactoring Parquet write path. So that users can decide whether to write Parquet files in standard format or legacy format. Author: Cheng Lian <lian@databricks.com> Closes #8566 from liancheng/spark-10400/deprecate-follow-parquet-format-spec.
*	[SPARK-10671] [SQL] Throws an analysis exception if we cannot find Hive UDFs	Wenchen Fan	2015-10-01	2	-23/+104
\| \| \| \| \| \| \| \|	Takes over https://github.com/apache/spark/pull/8800 Author: Wenchen Fan <cloud0fan@163.com> Closes #8941 from cloud-fan/hive-udf.
*	[SPARK-10865] [SPARK-10866] [SQL] Fix bug of ceil/floor, which should ↵	Cheng Hao	2015-10-01	3	-11/+31
\| \| \| \| \| \| \| \| \| \| \| \|	returns long instead of the Double type Floor & Ceiling function should returns Long type, rather than Double. Verified with MySQL & Hive. Author: Cheng Hao <hao.cheng@intel.com> Closes #8933 from chenghao-intel/ceiling.
*	[SPARK-10058] [CORE] [TESTS] Fix the flaky tests in HeartbeatReceiverSuite	zsxwing	2015-10-01	2	-16/+60
\| \| \| \| \| \| \| \| \| \| \| \|	Fixed the test failure here: https://amplab.cs.berkeley.edu/jenkins/view/Spark-QA-Test/job/Spark-1.5-SBT/116/AMPLAB_JENKINS_BUILD_PROFILE=hadoop2.2,label=spark-test/testReport/junit/org.apache.spark/HeartbeatReceiverSuite/normal_heartbeat/ This failure is because `HeartbeatReceiverSuite. heartbeatReceiver` may receive `SparkListenerExecutorAdded("driver")` sent from [LocalBackend](https://github.com/apache/spark/blob/8fb3a65cbb714120d612e58ef9d12b0521a83260/core/src/main/scala/org/apache/spark/scheduler/local/LocalBackend.scala#L121). There are other race conditions in `HeartbeatReceiverSuite` because `HeartbeatReceiver.onExecutorAdded` and `HeartbeatReceiver.onExecutorRemoved` are asynchronous. This PR also fixed them. Author: zsxwing <zsxwing@gmail.com> Closes #8946 from zsxwing/SPARK-10058.
*	[SPARK-10807] [SPARKR] Added as.data.frame as a synonym for collect	Oscar D. Lara Yejas	2015-09-30	4	-1/+39
\| \| \| \| \| \| \| \| \| \|	Created method as.data.frame as a synonym for collect(). Author: Oscar D. Lara Yejas <olarayej@mail.usf.edu> Author: olarayej <oscar.lara.yejas@us.ibm.com> Author: Oscar D. Lara Yejas <oscar.lara.yejas@us.ibm.com> Closes #8908 from olarayej/SPARK-10807.
*	[SPARK-9617] [SQL] Implement json_tuple	Nathan Howell	2015-09-30	4	-4/+316
\| \| \| \| \| \| \| \|	This is an implementation of Hive's `json_tuple` function using Jackson Streaming. Author: Nathan Howell <nhowell@godaddy.com> Closes #7946 from NathanHowell/SPARK-9617.
*	[SPARK-10770] [SQL] SparkPlan.executeCollect/executeTake should return ↵	Reynold Xin	2015-09-30	9	-39/+39
\| \| \| \| \| \| \| \|	InternalRow rather than external Row. Author: Reynold Xin <rxin@databricks.com> Closes #8900 from rxin/SPARK-10770-1.
*	[SPARK-10851] [SPARKR] Exception not failing R applications (in yarn cluster ↵	Sun Rui	2015-09-30	1	-3/+7
\| \| \| \| \| \| \| \| \| \| \| \|	mode) The YARN backend doesn't like when user code calls System.exit, since it cannot know the exit status and thus cannot set an appropriate final status for the application. This PR remove the usage of system.exit to exit the RRunner. Instead, when the R process running an SparkR script returns an exit code other than 0, throws SparkUserAppException which will be caught by ApplicationMaster and ApplicationMaster knows it failed. For other failures, throws SparkException. Author: Sun Rui <rui.sun@intel.com> Closes #8938 from sun-rui/SPARK-10851.
*	[SPARK-9741] [SQL] Approximate Count Distinct using the new UDAF interface.	Herman van Hovell	2015-09-30	3	-0/+554
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This PR implements a HyperLogLog based Approximate Count Distinct function using the new UDAF interface. The implementation is inspired by the ClearSpring HyperLogLog implementation and should produce the same results. There is still some documentation and testing left to do. cc yhuai Author: Herman van Hovell <hvanhovell@questtec.nl> Closes #8362 from hvanhovell/SPARK-9741.
*	[SPARK-10736] [ML] Use 1 for all ratings if $(ratingCol) = ""	Yanbo Liang	2015-09-29	1	-2/+2
\| \| \| \| \| \| \| \|	For some implicit dataset, ratings may not exist in the training data. In this case, we can assume all observed pairs to be positive and treat their ratings as 1. This should happen when users set ```ratingCol``` to an empty string. Author: Yanbo Liang <ybliang8@gmail.com> Closes #8937 from yanboliang/spark-10736.
*	[SPARK-10811] [SQL] Eliminates unnecessary byte array copying	Cheng Lian	2015-09-29	3	-10/+28
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	When reading Parquet string and binary-backed decimal values, Parquet `Binary.getBytes` always returns a copied byte array, which is unnecessary. Since the underlying implementation of `Binary` values there is guaranteed to be `ByteArraySliceBackedBinary`, and Parquet itself never reuses underlying byte arrays, we can use `Binary.toByteBuffer.array()` to steal the underlying byte arrays without copying them. This brings performance benefits when scanning Parquet string and binary-backed decimal columns. Note that, this trick doesn't cover binary-backed decimals with precision greater than 18. My micro-benchmark result is that, this brings a ~15% performance boost for scanning TPC-DS `store_sales` table (scale factor 15). Another minor optimization done in this PR is that, now we directly construct a Java `BigDecimal` in `Decimal.toJavaBigDecimal` without constructing a Scala `BigDecimal` first. This brings another ~5% performance gain. Author: Cheng Lian <lian@databricks.com> Closes #8907 from liancheng/spark-10811/eliminate-array-copying.
*	[SPARK-10782] [PYTHON] Update dropDuplicates documentation	asokadiggs	2015-09-29	1	-0/+2
\| \| \| \| \| \| \| \|	Documentation for dropDuplicates() and drop_duplicates() is one and the same. Resolved the error in the example for drop_duplicates using the same approach used for groupby and groupBy, by indicating that dropDuplicates and drop_duplicates are aliases. Author: asokadiggs <asoka.diggs@intel.com> Closes #8930 from asokadiggs/jira-10782.
*	[SPARK-6919] [PYSPARK] Add asDict method to StatCounter	Erik Shilts	2015-09-29	2	-0/+42
\| \| \| \| \| \| \| \| \| \| \| \|	Add method to easily convert a StatCounter instance into a Python dict https://issues.apache.org/jira/browse/SPARK-6919 Note: This is my original work and the existing Spark license applies. Author: Erik Shilts <erik.shilts@opower.com> Closes #5516 from eshilts/statcounter-asdict.
*	[SPARK-10415] [PYSPARK] [MLLIB] [DOCS] Enhance Navigation Sidebar in PySpark API	noelsmith	2015-09-29	4	-2/+197
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	These are CSS/JavaScript changes changes to make navigation in the PySpark API a bit simpler by adding the following to the sidebar: * Classes * Functions * Tags to highlight experimental features ![screen shot 2015-09-02 at 08 50 12](https://cloud.githubusercontent.com/assets/11915197/9634781/301f853a-518b-11e5-8d5c-fda202f6202f.png) Online example here: https://dl.dropboxusercontent.com/u/20821334/pyspark-api-nav-enhance/pyspark.mllib.html (The contribution is my original work and that I license the work to the project under the project's open source license) Author: noelsmith <mail@noelsmith.com> Closes #8571 from noel-smith/pyspark-api-nav-enhance.
*	[SPARK-10871] include number of executor failures in error msg	Ryan Williams	2015-09-29	1	-1/+1
\| \| \| \| \| \|	Author: Ryan Williams <ryan.blake.williams@gmail.com> Closes #8939 from ryan-williams/errmsg.
*	[SPARK-10825] [CORE] [TESTS] Fix race conditions in ↵	zsxwing	2015-09-29	1	-113/+192
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	StandaloneDynamicAllocationSuite Fix the following issues in StandaloneDynamicAllocationSuite: 1. It should not assume master and workers start in order 2. It should not assume master and workers get ready at once 3. It should not assume the application is already registered with master after creating SparkContext 4. It should not access Master.app and idToApp which are not thread safe The changes includes: * Use `eventually` to wait until master and workers are ready to fix 1 and 2 * Use `eventually` to wait until the application is registered with master to fix 3 * Use `askWithRetry[MasterStateResponse](RequestMasterState)` to get the application info to fix 4 Author: zsxwing <zsxwing@gmail.com> Closes #8914 from zsxwing/fix-StandaloneDynamicAllocationSuite.
*	[SPARK-10670] [ML] [Doc] add api reference for ml doc	Yuhao Yang	2015-09-28	1	-64/+195
\| \| \| \| \| \| \| \| \| \|	jira: https://issues.apache.org/jira/browse/SPARK-10670 In the Markdown docs for the spark.ml Programming Guide, we have code examples with codetabs for each language. We should link to each language's API docs within the corresponding codetab, but we are inconsistent about this. For an example of what we want to do, see the "Word2Vec" section in https://github.com/apache/spark/blob/64743870f23bffb8d96dcc8a0181c1452782a151/docs/ml-features.md This JIRA is just for spark.ml, not spark.mllib Author: Yuhao Yang <hhbyyh@gmail.com> Closes #8901 from hhbyyh/docAPI.
*	[SPARK-10833] [BUILD] Inline, organize BSD/MIT licenses in LICENSE	Sean Owen	2015-09-28	40	-679/+1153
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	In the course of https://issues.apache.org/jira/browse/LEGAL-226 it came to light that the guidance at http://www.apache.org/dev/licensing-howto.html#permissive-deps means that permissively-licensed dependencies has a different interpretation than we (er, I) had been operating under. "pointer ... to the license within the source tree" specifically means a copy of the license within Spark's distribution, whereas at the moment, Spark's LICENSE has a pointer to the project's license in the other project's source tree. The remedy is simply to inline all such license references (i.e. BSD/MIT licenses) or include their text in "licenses" subdirectory and point to that. Along the way, we can also treat other BSD/MIT licenses, whose text has been inlined into LICENSE, in the same way. The LICENSE file can continue to provide a helpful list of BSD/MIT licensed projects and a pointer to their sites. This would be over and above including license text in the distro, which is the essential thing. Author: Sean Owen <sowen@cloudera.com> Closes #8919 from srowen/SPARK-10833.
*	[SPARK-10859] [SQL] fix stats of StringType in columnar cache	Davies Liu	2015-09-28	2	-2/+9
\| \| \| \| \| \| \| \| \| \|	The UTF8String may come from UnsafeRow, then underline buffer of it is not copied, so we should clone it in order to hold it in Stats. cc yhuai Author: Davies Liu <davies@databricks.com> Closes #8929 from davies/pushdown_string.
*	[SPARK-10395] [SQL] Simplifies CatalystReadSupport	Cheng Lian	2015-09-28	1	-47/+45
\| \| \| \| \| \| \| \| \| \|	Please refer to [SPARK-10395] [1] for details. [1]: https://issues.apache.org/jira/browse/SPARK-10395 Author: Cheng Lian <lian@databricks.com> Closes #8553 from liancheng/spark-10395/simplify-parquet-read-support.
*	[SPARK-10790] [YARN] Fix initial executor number not set issue and ↵	jerryshao	2015-09-28	4	-40/+27
\| \| \| \| \| \| \| \| \| \| \| \|	consolidate the codes This bug is introduced in [SPARK-9092](https://issues.apache.org/jira/browse/SPARK-9092), `targetExecutorNumber` should use `minExecutors` if `initialExecutors` is not set. Using 0 instead will meet the problem as mentioned in [SPARK-10790](https://issues.apache.org/jira/browse/SPARK-10790). Also consolidate and simplify some similar code snippets to keep the consistent semantics. Author: jerryshao <sshao@hortonworks.com> Closes #8910 from jerryshao/SPARK-10790.
*	[SPARK-10812] [YARN] Spark hadoop util support switching to yarn	Holden Karau	2015-09-28	4	-16/+34
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	While this is likely not a huge issue for real production systems, for test systems which may setup a Spark Context and tear it down and stand up a Spark Context with a different master (e.g. some local mode & some yarn mode) tests this cane be an issue. Discovered during work on spark-testing-base on Spark 1.4.1, but seems like the logic that triggers it is present in master (see SparkHadoopUtil object). A valid work around for users encountering this issue is to fork a different JVM, however this can be heavy weight. ``` [info] SampleMiniClusterTest: [info] Exception encountered when attempting to run a suite with class name: com.holdenkarau.spark.testing.SampleMiniClusterTest * ABORTED * [info] java.lang.ClassCastException: org.apache.spark.deploy.SparkHadoopUtil cannot be cast to org.apache.spark.deploy.yarn.YarnSparkHadoopUtil [info] at org.apache.spark.deploy.yarn.YarnSparkHadoopUtil$.get(YarnSparkHadoopUtil.scala:163) [info] at org.apache.spark.deploy.yarn.Client.prepareLocalResources(Client.scala:257) [info] at org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:561) [info] at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:115) [info] at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:57) [info] at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:141) [info] at org.apache.spark.SparkContext.<init>(SparkContext.scala:497) [info] at com.holdenkarau.spark.testing.SharedMiniCluster$class.setup(SharedMiniCluster.scala:186) [info] at com.holdenkarau.spark.testing.SampleMiniClusterTest.setup(SampleMiniClusterTest.scala:26) [info] at com.holdenkarau.spark.testing.SharedMiniCluster$class.beforeAll(SharedMiniCluster.scala:103) ``` Author: Holden Karau <holden@pigscanfly.ca> Closes #8911 from holdenk/SPARK-10812-spark-hadoop-util-support-switching-to-yarn.
*	Fix two mistakes in programming-guide page	David Martin	2015-09-28	1	-2/+2
\| \| \| \| \| \| \| \| \|	seperate -> separate sees -> see Author: David Martin <dmartinpro@users.noreply.github.com> Closes #8928 from dmartinpro/patch-1.
*	add doc for spark.streaming.stopGracefullyOnShutdown	Bin Wang	2015-09-27	1	-0/+8
\| \| \| \| \| \|	Author: Bin Wang <wbin00@gmail.com> Closes #8898 from wb14123/doc.
*	[SPARK-10720] [SQL] [JAVA] Add a java wrapper to create a dataframe from a ↵	Holden Karau	2015-09-27	2	-17/+56
\| \| \| \| \| \| \| \| \| \|	local list of java beans Similar to SPARK-10630 it would be nice if Java users didn't have to parallelize there data explicitly (as Scala users already can skip). Issue came up in http://stackoverflow.com/questions/32613413/apache-spark-machine-learning-cant-get-estimator-example-to-work Author: Holden Karau <holden@pigscanfly.ca> Closes #8879 from holdenk/SPARK-10720-add-a-java-wrapper-to-create-a-dataframe-from-a-local-list-of-java-beans.
*	[SPARK-10741] [SQL] Hive Query Having/OrderBy against Parquet table is not ↵	Wenchen Fan	2015-09-27	15	-86/+103
\| \| \| \| \| \| \| \| \| \| \|	working https://issues.apache.org/jira/browse/SPARK-10741 I choose the second approach: do not change output exprIds when convert MetastoreRelation to LogicalRelation Author: Wenchen Fan <cloud0fan@163.com> Closes #8889 from cloud-fan/hot-bug.
*	[SPARK-10778] [MLLIB] Implement toString for AssociationRules.Rule	y-shimizu	2015-09-27	1	-0/+5
\| \| \| \| \| \| \| \|	I implemented toString for AssociationRules.Rule, format like `[x, y] => {z}: 1.0` Author: y-shimizu <y.shimizu0429@gmail.com> Closes #8904 from y-shimizu/master.