aboutsummaryrefslogtreecommitdiff
Commit message (Collapse)AuthorAgeFilesLines
...
* [SPARK-9990] [SQL] Create local hash join operatorzsxwing2015-09-1016-24/+455
| | | | | | | | | | | This PR includes the following changes: - Add SQLConf to LocalNode - Add HashJoinNode - Add ConvertToUnsafeNode and ConvertToSafeNode.scala to test unsafe hash join. Author: zsxwing <zsxwing@gmail.com> Closes #8535 from zsxwing/SPARK-9990.
* [SPARK-10514] [MESOS] waiting for min no of total cores acquired by Spark by ↵Akash Mishra2015-09-102-2/+7
| | | | | | | | | | | | | | implementing the sufficientResourcesRegistered method spark.scheduler.minRegisteredResourcesRatio configuration parameter works for YARN mode but not for Mesos Coarse grained mode. If the parameter specified default value of 0 will be set for spark.scheduler.minRegisteredResourcesRatio in base class and this method will always return true. There are no existing test for YARN mode too. Hence not added test for the same. Author: Akash Mishra <akash.mishra20@gmail.com> Closes #8672 from SleepyThread/master.
* [SPARK-6350] [MESOS] Fine-grained mode scheduler respects mesosExecutor.coresIulian Dragos2015-09-102-3/+33
| | | | | | | | | | This is a regression introduced in #4960, this commit fixes it and adds a test. tnachen andrewor14 please review, this should be an easy one. Author: Iulian Dragos <jaguarul@gmail.com> Closes #8653 from dragos/issue/mesos/fine-grained-maxExecutorCores.
* [SPARK-8167] Make tasks that fail from YARN preemption not fail jobmcheah2015-09-1017-79/+261
| | | | | | | | | | | | | | | | | The architecture is that, in YARN mode, if the driver detects that an executor has disconnected, it asks the ApplicationMaster why the executor died. If the ApplicationMaster is aware that the executor died because of preemption, all tasks associated with that executor are not marked as failed. The executor is still removed from the driver's list of available executors, however. There's a few open questions: 1. Should standalone mode have a similar "get executor loss reason" as well? I localized this change as much as possible to affect only YARN, but there could be a valid case to differentiate executor losses in standalone mode as well. 2. I make a pretty strong assumption in YarnAllocator that getExecutorLossReason(executorId) will only be called once per executor id; I do this so that I can remove the metadata from the in-memory map to avoid object accumulation. It's not clear if I'm being overly zealous to save space, however. cc vanzin specifically for review because it collided with some earlier YARN scheduling work. cc JoshRosen because it's similar to output commit coordination we did in the past cc andrewor14 for our discussion on how to get executor exit codes and loss reasons Author: mcheah <mcheah@palantir.com> Closes #8007 from mccheah/feature/preemption-handling.
* [SPARK-10469] [DOC] Try and document the three optionsHolden Karau2015-09-101-3/+6
| | | | | | | | | | | | | | From JIRA: Add documentation for tungsten-sort. From the mailing list "I saw a new "spark.shuffle.manager=tungsten-sort" implemented in https://issues.apache.org/jira/browse/SPARK-7081, but it can't be found its corresponding description in http://people.apache.org/~pwendell/spark-releases/spark-1.5.0-rc3-docs/configuration.html(Currenlty there are only 'sort' and 'hash' two options)." Author: Holden Karau <holden@pigscanfly.ca> Closes #8638 from holdenk/SPARK-10469-document-tungsten-sort.
* [SPARK-10466] [SQL] UnsafeRow SerDe exception with data spillCheng Hao2015-09-103-5/+67
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Data Spill with UnsafeRow causes assert failure. ``` java.lang.AssertionError: assertion failed at scala.Predef$.assert(Predef.scala:165) at org.apache.spark.sql.execution.UnsafeRowSerializerInstance$$anon$2.writeKey(UnsafeRowSerializer.scala:75) at org.apache.spark.storage.DiskBlockObjectWriter.write(DiskBlockObjectWriter.scala:180) at org.apache.spark.util.collection.ExternalSorter$$anonfun$writePartitionedFile$2$$anonfun$apply$1.apply(ExternalSorter.scala:688) at org.apache.spark.util.collection.ExternalSorter$$anonfun$writePartitionedFile$2$$anonfun$apply$1.apply(ExternalSorter.scala:687) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at org.apache.spark.util.collection.ExternalSorter$$anonfun$writePartitionedFile$2.apply(ExternalSorter.scala:687) at org.apache.spark.util.collection.ExternalSorter$$anonfun$writePartitionedFile$2.apply(ExternalSorter.scala:683) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at org.apache.spark.util.collection.ExternalSorter.writePartitionedFile(ExternalSorter.scala:683) at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:80) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:88) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) ``` To reproduce that with code (thanks andrewor14): ```scala bin/spark-shell --master local --conf spark.shuffle.memoryFraction=0.005 --conf spark.shuffle.sort.bypassMergeThreshold=0 sc.parallelize(1 to 2 * 1000 * 1000, 10) .map { i => (i, i) }.toDF("a", "b").groupBy("b").avg().count() ``` Author: Cheng Hao <hao.cheng@intel.com> Closes #8635 from chenghao-intel/unsafe_spill.
* [SPARK-10301] [SPARK-10428] [SQL] Addresses comments of PR #8583 and #8509 ↵Cheng Lian2015-09-104-45/+522
| | | | | | | | for master Author: Cheng Lian <lian@databricks.com> Closes #8670 from liancheng/spark-10301/address-pr-comments.
* [SPARK-7142] [SQL] Minor enhancement to BooleanSimplification Optimizer ruleYash Datta2015-09-102-0/+25
| | | | | | | | | | | | Use these in the optimizer as well: A and (not(A) or B) => A and B not(A and B) => not(A) or not(B) not(A or B) => not(A) and not(B) Author: Yash Datta <Yash.Datta@guavus.com> Closes #5700 from saucam/bool_simp.
* [SPARK-10065] [SQL] avoid the extra copy when generate unsafe arrayWenchen Fan2015-09-101-60/+24
| | | | | | | | | | | | The reason for this extra copy is that we iterate the array twice: calculate elements data size and copy elements to array buffer. A simple solution is to follow `createCodeForStruct`, we can dynamically grow the buffer when needed and thus don't need to know the data size ahead. This PR also include some typo and style fixes, and did some minor refactor to make sure `input.primitive` is always variable name not code when generate unsafe code. Author: Wenchen Fan <cloud0fan@outlook.com> Closes #8496 from cloud-fan/avoid-copy.
* [SPARK-10497] [BUILD] [TRIVIAL] Handle both locations for JIRAError with ↵Holden Karau2015-09-101-1/+5
| | | | | | | | | | | python-jira Location of JIRAError has moved between old and new versions of python-jira package. Longer term it probably makes sense to pin to specific versions (as mentioned in https://issues.apache.org/jira/browse/SPARK-10498 ) but for now, making release tools works with both new and old versions of python-jira. Author: Holden Karau <holden@pigscanfly.ca> Closes #8661 from holdenk/SPARK-10497-release-utils-does-not-work-with-new-jira-python.
* [MINOR] [MLLIB] [ML] [DOC] fixed typo: label for negative result should be ↵Sean Paradiso2015-09-091-1/+1
| | | | | | | | | | 0.0 (original: 1.0) Small typo in the example for `LabelledPoint` in the MLLib docs. Author: Sean Paradiso <seanparadiso@gmail.com> Closes #8680 from sparadiso/docs_mllib_smalltypo.
* [SPARK-9772] [PYSPARK] [ML] Add Python API for ml.feature.VectorSlicerYanbo Liang2015-09-091-5/+90
| | | | | | | | Add Python API for ml.feature.VectorSlicer. Author: Yanbo Liang <ybliang8@gmail.com> Closes #8102 from yanboliang/SPARK-9772.
* [SPARK-9730] [SQL] Add Full Outer Join support for SortMergeJoinLiang-Chi Hsieh2015-09-095-34/+259
| | | | | | | | | | | | | | | This PR is based on #8383 , thanks to viirya JIRA: https://issues.apache.org/jira/browse/SPARK-9730 This patch adds the Full Outer Join support for SortMergeJoin. A new class SortMergeFullJoinScanner is added to scan rows from left and right iterators. FullOuterIterator is simply a wrapper of type RowIterator to consume joined rows from SortMergeFullJoinScanner. Closes #8383 Author: Liang-Chi Hsieh <viirya@appier.com> Author: Davies Liu <davies@databricks.com> Closes #8579 from davies/smj_fullouter.
* [SPARK-10461] [SQL] make sure `input.primitive` is always variable name not ↵Wenchen Fan2015-09-095-67/+75
| | | | | | | | | | | | code at `GenerateUnsafeProjection` When we generate unsafe code inside `createCodeForXXX`, we always assign the `input.primitive` to a temp variable in case `input.primitive` is expression code. This PR did some refactor to make sure `input.primitive` is always variable name, and some other typo and style fixes. Author: Wenchen Fan <cloud0fan@outlook.com> Closes #8613 from cloud-fan/minor.
* [SPARK-10481] [YARN] SPARK_PREPEND_CLASSES make spark-yarn related jar could ↵Jeff Zhang2015-09-091-1/+4
| | | | | | | | | | n… Throw a more readable exception. Please help review. Thanks Author: Jeff Zhang <zjffdu@apache.org> Closes #8649 from zjffdu/SPARK-10481.
* [SPARK-10117] [MLLIB] Implement SQL data source API for reading LIBSVM datalewuathe2015-09-094-0/+256
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It is convenient to implement data source API for LIBSVM format to have a better integration with DataFrames and ML pipeline API. Two option is implemented. * `numFeatures`: Specify the dimension of features vector * `featuresType`: Specify the type of output vector. `sparse` is default. Author: lewuathe <lewuathe@me.com> Closes #8537 from Lewuathe/SPARK-10117 and squashes the following commits: 986999d [lewuathe] Change unit test phrase 11d513f [lewuathe] Fix some reviews 21600a4 [lewuathe] Merge branch 'master' into SPARK-10117 9ce63c7 [lewuathe] Rewrite service loader file 1fdd2df [lewuathe] Merge branch 'SPARK-10117' of github.com:Lewuathe/spark into SPARK-10117 ba3657c [lewuathe] Merge branch 'master' into SPARK-10117 0ea1c1c [lewuathe] LibSVMRelation is registered into META-INF 4f40891 [lewuathe] Improve test suites 5ab62ab [lewuathe] Merge branch 'master' into SPARK-10117 8660d0e [lewuathe] Fix Java unit test b56a948 [lewuathe] Merge branch 'master' into SPARK-10117 2c12894 [lewuathe] Remove unnecessary tag 7d693c2 [lewuathe] Resolv conflict 62010af [lewuathe] Merge branch 'master' into SPARK-10117 a97ee97 [lewuathe] Fix some points aef9564 [lewuathe] Fix 70ee4dd [lewuathe] Add Java test 3fd8dce [lewuathe] [SPARK-10117] Implement SQL data source API for reading LIBSVM data 40d3027 [lewuathe] Add Java test 7056d4a [lewuathe] Merge branch 'master' into SPARK-10117 99accaa [lewuathe] [SPARK-10117] Implement SQL data source API for reading LIBSVM data
* [SPARK-10227] fatal warnings with sbt on Scala 2.11Luc Bourlier2015-09-0960-151/+158
| | | | | | | | | | | The bulk of the changes are on `transient` annotation on class parameter. Often the compiler doesn't generate a field for this parameters, so the the transient annotation would be unnecessary. But if the class parameter are used in methods, then fields are created. So it is safer to keep the annotations. The remainder are some potential bugs, and deprecated syntax. Author: Luc Bourlier <luc.bourlier@typesafe.com> Closes #8433 from skyluc/issue/sbt-2.11.
* [SPARK-10249] [ML] [DOC] Add Python Code Example to StopWordsRemover User GuideYuhao Yang2015-09-081-0/+19
| | | | | | | | | | jira: https://issues.apache.org/jira/browse/SPARK-10249 update user guide since python support added. Author: Yuhao Yang <hhbyyh@gmail.com> Closes #8620 from hhbyyh/swPyDocExample.
* [SPARK-9654] [ML] [PYSPARK] Add IndexToString to PySparkHolden Karau2015-09-083-6/+73
| | | | | | | | Adds IndexToString to PySpark. Author: Holden Karau <holden@pigscanfly.ca> Closes #7976 from holdenk/SPARK-9654-add-string-indexer-inverse-in-pyspark.
* [SPARK-10094] Pyspark ML Feature transformers marked as experimentalnoelsmith2015-09-081-0/+52
| | | | | | | | Modified class-level docstrings to mark all feature transformers in pyspark.ml as experimental. Author: noelsmith <mail@noelsmith.com> Closes #8623 from noel-smith/SPARK-10094-mark-pyspark-ml-trans-exp.
* [SPARK-10373] [PYSPARK] move @since into pyspark from sqlDavies Liu2015-09-089-25/+23
| | | | | | | | cc mengxr Author: Davies Liu <davies@databricks.com> Closes #8657 from davies/move_since.
* [SPARK-10464] [MLLIB] Add WeibullGenerator for RandomDataGeneratorYanbo Liang2015-09-082-3/+40
| | | | | | | | | Add WeibullGenerator for RandomDataGenerator. #8611 need use WeibullGenerator to generate random data based on Weibull distribution. Author: Yanbo Liang <ybliang8@gmail.com> Closes #8622 from yanboliang/spark-10464.
* [SPARK-9834] [MLLIB] implement weighted least squares via normal equationXiangrui Meng2015-09-084-1/+438
| | | | | | | | | | | | | | | | | The goal of this PR is to have a weighted least squares implementation that takes the normal equation approach, and hence to be able to provide R-like summary statistics and support IRLS (used by GLMs). The tests match R's lm and glmnet. There are couple TODOs that can be addressed in future PRs: * consolidate summary statistics aggregators * move `dspr` to `BLAS` * etc It would be nice to have this merged first because it blocks couple other features. dbtsai Author: Xiangrui Meng <meng@databricks.com> Closes #8588 from mengxr/SPARK-9834.
* [SPARK-10071] [STREAMING] Output a warning when writing QueueInputDStream ↵zsxwing2015-09-083-13/+30
| | | | | | | | | | | | and throw a better exception when reading QueueInputDStream Output a warning when serializing QueueInputDStream rather than throwing an exception to allow unit tests use it. Moreover, this PR also throws an better exception when deserializing QueueInputDStream to make the user find out the problem easily. The previous exception is hard to understand: https://issues.apache.org/jira/browse/SPARK-8553 Author: zsxwing <zsxwing@gmail.com> Closes #8624 from zsxwing/SPARK-10071 and squashes the following commits: 847cfa8 [zsxwing] Output a warning when writing QueueInputDStream and throw a better exception when reading QueueInputDStream
* [RELEASE] Add more contributors & only show names in release notes.Reynold Xin2015-09-082-8/+39
| | | | | | Author: Reynold Xin <rxin@databricks.com> Closes #8660 from rxin/contrib.
* [HOTFIX] Fix build break caused by #8494Michael Armbrust2015-09-081-2/+2
| | | | | | Author: Michael Armbrust <michael@databricks.com> Closes #8659 from marmbrus/testBuildBreak.
* [SPARK-10327] [SQL] Cache Table is not working while subquery has alias in ↵Cheng Hao2015-09-082-3/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | its project list ```scala import org.apache.spark.sql.hive.execution.HiveTableScan sql("select key, value, key + 1 from src").registerTempTable("abc") cacheTable("abc") val sparkPlan = sql( """select a.key, b.key, c.key from |abc a join abc b on a.key=b.key |join abc c on a.key=c.key""".stripMargin).queryExecution.sparkPlan assert(sparkPlan.collect { case e: InMemoryColumnarTableScan => e }.size === 3) // failed assert(sparkPlan.collect { case e: HiveTableScan => e }.size === 0) // failed ``` The actual plan is: ``` == Parsed Logical Plan == 'Project [unresolvedalias('a.key),unresolvedalias('b.key),unresolvedalias('c.key)] 'Join Inner, Some(('a.key = 'c.key)) 'Join Inner, Some(('a.key = 'b.key)) 'UnresolvedRelation [abc], Some(a) 'UnresolvedRelation [abc], Some(b) 'UnresolvedRelation [abc], Some(c) == Analyzed Logical Plan == key: int, key: int, key: int Project [key#14,key#61,key#66] Join Inner, Some((key#14 = key#66)) Join Inner, Some((key#14 = key#61)) Subquery a Subquery abc Project [key#14,value#15,(key#14 + 1) AS _c2#16] MetastoreRelation default, src, None Subquery b Subquery abc Project [key#61,value#62,(key#61 + 1) AS _c2#58] MetastoreRelation default, src, None Subquery c Subquery abc Project [key#66,value#67,(key#66 + 1) AS _c2#63] MetastoreRelation default, src, None == Optimized Logical Plan == Project [key#14,key#61,key#66] Join Inner, Some((key#14 = key#66)) Project [key#14,key#61] Join Inner, Some((key#14 = key#61)) Project [key#14] InMemoryRelation [key#14,value#15,_c2#16], true, 10000, StorageLevel(true, true, false, true, 1), (Project [key#14,value#15,(key#14 + 1) AS _c2#16]), Some(abc) Project [key#61] MetastoreRelation default, src, None Project [key#66] MetastoreRelation default, src, None == Physical Plan == TungstenProject [key#14,key#61,key#66] BroadcastHashJoin [key#14], [key#66], BuildRight TungstenProject [key#14,key#61] BroadcastHashJoin [key#14], [key#61], BuildRight ConvertToUnsafe InMemoryColumnarTableScan [key#14], (InMemoryRelation [key#14,value#15,_c2#16], true, 10000, StorageLevel(true, true, false, true, 1), (Project [key#14,value#15,(key#14 + 1) AS _c2#16]), Some(abc)) ConvertToUnsafe HiveTableScan [key#61], (MetastoreRelation default, src, None) ConvertToUnsafe HiveTableScan [key#66], (MetastoreRelation default, src, None) ``` Author: Cheng Hao <hao.cheng@intel.com> Closes #8494 from chenghao-intel/weird_cache.
* [SPARK-10492] [STREAMING] [DOCUMENTATION] Update Streaming documentation ↵Tathagata Das2015-09-082-1/+25
| | | | | | | | | | about rate limiting and backpressure Author: Tathagata Das <tathagata.das1565@gmail.com> Closes #8656 from tdas/SPARK-10492 and squashes the following commits: 986cdd6 [Tathagata Das] Added information on backpressure
* [SPARK-10468] [ MLLIB ] Verify schema before Dataframe select API callVinod K C2015-09-082-5/+2
| | | | | | | | | Loader.checkSchema was called to verify the schema after dataframe.select(...). Schema verification should be done before dataframe.select(...) Author: Vinod K C <vinod.kc@huawei.com> Closes #8636 from vinodkc/fix_GaussianMixtureModel_load_verification.
* [SPARK-10441] [SQL] Save data correctly to json.Yin Huai2015-09-089-8/+205
| | | | | | | | https://issues.apache.org/jira/browse/SPARK-10441 Author: Yin Huai <yhuai@databricks.com> Closes #8597 from yhuai/timestampJson.
* [SPARK-10470] [ML] ml.IsotonicRegressionModel.copy should set parentYanbo Liang2015-09-082-1/+6
| | | | | | | | | Copied model must have the same parent, but ml.IsotonicRegressionModel.copy did not set parent. Here fix it and add test case. Author: Yanbo Liang <ybliang8@gmail.com> Closes #8637 from yanboliang/spark-10470.
* [SPARK-10316] [SQL] respect nondeterministic expressions in PhysicalOperationWenchen Fan2015-09-082-30/+20
| | | | | | | | We did a lot of special handling for non-deterministic expressions in `Optimizer`. However, `PhysicalOperation` just collects all Projects and Filters and mess it up. We should respect the operators order caused by non-deterministic expressions in `PhysicalOperation`. Author: Wenchen Fan <cloud0fan@outlook.com> Closes #8486 from cloud-fan/fix.
* [SPARK-10480] [ML] Fix ML.LinearRegressionModel.copy()Yanbo Liang2015-09-082-2/+4
| | | | | | | | | | | | This PR fix two model ```copy()``` related issues: [SPARK-10480](https://issues.apache.org/jira/browse/SPARK-10480) ```ML.LinearRegressionModel.copy()``` ignored argument ```extra```, it will not take effect when users setting this parameter. [SPARK-10479](https://issues.apache.org/jira/browse/SPARK-10479) ```ML.LogisticRegressionModel.copy()``` should copy model summary if available. Author: Yanbo Liang <ybliang8@gmail.com> Closes #8641 from yanboliang/linear-regression-copy.
* [SPARK-9170] [SQL] Use OrcStructInspector to be case preserving when writing ↵Liang-Chi Hsieh2015-09-082-21/+40
| | | | | | | | | | | | ORC files JIRA: https://issues.apache.org/jira/browse/SPARK-9170 `StandardStructObjectInspector` will implicitly lowercase column names. But I think Orc format doesn't have such requirement. In fact, there is a `OrcStructInspector` specified for Orc format. We should use it when serialize rows to Orc file. It can be case preserving when writing ORC files. Author: Liang-Chi Hsieh <viirya@appier.com> Closes #7520 from viirya/use_orcstruct.
* Docs small fixesJacek Laskowski2015-09-082-19/+19
| | | | | | Author: Jacek Laskowski <jacek@japila.pl> Closes #8629 from jaceklaskowski/docs-fixes.
* [DOC] Added R to the list of languages with "high-level API" support in the…Stephen Hopper2015-09-082-10/+10
| | | | | | | | … main README. Author: Stephen Hopper <shopper@shopper-osx.local> Closes #8646 from enragedginger/master.
* [SPARK-9767] Remove ConnectionManager.Reynold Xin2015-09-0721-3855/+651
| | | | | | | | We introduced the Netty network module for shuffle in Spark 1.2, and has turned it on by default for 3 releases. The old ConnectionManager is difficult to maintain. If we merge the patch now, by the time it is released, it would be 1 yr for which ConnectionManager is off by default. It's time to remove it. Author: Reynold Xin <rxin@databricks.com> Closes #8161 from rxin/SPARK-9767.
* [SPARK-10013] [ML] [JAVA] [TEST] remove java assert from java unit testsHolden Karau2015-09-054-52/+54
| | | | | | | | From Jira: We should use assertTrue, etc. instead to make sure the asserts are not ignored in tests. Author: Holden Karau <holden@pigscanfly.ca> Closes #8607 from holdenk/SPARK-10013-remove-java-assert-from-java-unit-tests.
* [SPARK-10434] [SQL] Fixes Parquet schema of arrays that may contain nullCheng Lian2015-09-052-9/+10
| | | | | | | | | | | | To keep full compatibility of Parquet write path with Spark 1.4, we should rename the innermost field name of arrays that may contain null from "array_element" to "array". Please refer to [SPARK-10434] [1] for more details. [1]: https://issues.apache.org/jira/browse/SPARK-10434 Author: Cheng Lian <lian@databricks.com> Closes #8586 from liancheng/spark-10434/fix-parquet-array-type.
* [SPARK-10440] [STREAMING] [DOCS] Update python API stuff in the programming ↵Tathagata Das2015-09-044-12/+33
| | | | | | | | | | | guides and python docs - Fixed information around Python API tags in streaming programming guides - Added missing stuff in python docs Author: Tathagata Das <tathagata.das1565@gmail.com> Closes #8595 from tdas/SPARK-10440.
* [HOTFIX] [SQL] Fixes compilation errorCheng Lian2015-09-041-1/+1
| | | | | | | | Jenkins master builders are currently broken by a merge conflict between PR #8584 and PR #8155. Author: Cheng Lian <lian@databricks.com> Closes #8614 from liancheng/hotfix/fix-pr-8155-8584-conflict.
* [SPARK-9925] [SQL] [TESTS] Set SQLConf.SHUFFLE_PARTITIONS.key correctly for ↵Yin Huai2015-09-047-21/+90
| | | | | | | | | | | | | | | tests This PR fix the failed test and conflict for #8155 https://issues.apache.org/jira/browse/SPARK-9925 Closes #8155 Author: Yin Huai <yhuai@databricks.com> Author: Davies Liu <davies@databricks.com> Closes #8602 from davies/shuffle_partitions.
* [SPARK-10402] [DOCS] [ML] Add defaults to the scaladoc for params in ml/Holden Karau2015-09-0410-2/+16
| | | | | | | | We should make sure the scaladoc for params includes their default values through the models in ml/ Author: Holden Karau <holden@pigscanfly.ca> Closes #8591 from holdenk/SPARK-10402-add-scaladoc-for-default-values-of-params-in-ml.
* [SPARK-10311] [STREAMING] Reload appId and attemptId when app starts with ↵xutingjun2015-09-041-0/+2
| | | | | | | | checkpoint file in cluster mode Author: xutingjun <xutingjun@huawei.com> Closes #8477 from XuTingjun/streaming-attempt.
* [SPARK-10454] [SPARK CORE] wait for empty event queuerobbins2015-09-041-0/+1
| | | | | | Author: robbins <robbins@uk.ibm.com> Closes #8605 from robbinspg/DAGSchedulerSuite-fix.
* [SPARK-9669] [MESOS] Support PySpark on Mesos cluster mode.Timothy Chen2015-09-043-16/+41
| | | | | | | | | Support running pyspark with cluster mode on Mesos! This doesn't upload any scripts, so if running in a remote Mesos requires the user to specify the script from a available URI. Author: Timothy Chen <tnachen@gmail.com> Closes #8349 from tnachen/mesos_python.
* [SPARK-10450] [SQL] Minor improvements to readability / style / typos etc.Andrew Or2015-09-045-15/+15
| | | | | | Author: Andrew Or <andrew@databricks.com> Closes #8603 from andrewor14/minor-sql-changes.
* [SPARK-10176] [SQL] Show partially analyzed plans when checkAnswer fails to ↵Wenchen Fan2015-09-0490-999/+908
| | | | | | | | | | | | | | | | | | | analyze This PR takes over https://github.com/apache/spark/pull/8389. This PR improves `checkAnswer` to print the partially analyzed plan in addition to the user friendly error message, in order to aid debugging failing tests. In doing so, I ran into a conflict with the various ways that we bring a SQLContext into the tests. Depending on the trait we refer to the current context as `sqlContext`, `_sqlContext`, `ctx` or `hiveContext` with access modifiers `public`, `protected` and `private` depending on the defining class. I propose we refactor as follows: 1. All tests should only refer to a `protected sqlContext` when testing general features, and `protected hiveContext` when it is a method that only exists on a `HiveContext`. 2. All tests should only import `testImplicits._` (i.e., don't import `TestHive.implicits._`) Author: Wenchen Fan <cloud0fan@outlook.com> Closes #8584 from cloud-fan/cleanupTests.
* MAINTENANCE: Automated closing of pull requests.Michael Armbrust2015-09-040-0/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit exists to close the following pull requests on Github: Closes #1890 (requested by andrewor14, JoshRosen) Closes #3558 (requested by JoshRosen, marmbrus) Closes #3890 (requested by marmbrus) Closes #3895 (requested by andrewor14, marmbrus) Closes #4055 (requested by andrewor14) Closes #4105 (requested by andrewor14) Closes #4812 (requested by marmbrus) Closes #5109 (requested by andrewor14) Closes #5178 (requested by andrewor14) Closes #5298 (requested by marmbrus) Closes #5393 (requested by marmbrus) Closes #5449 (requested by andrewor14) Closes #5468 (requested by marmbrus) Closes #5715 (requested by marmbrus) Closes #6192 (requested by marmbrus) Closes #6319 (requested by marmbrus) Closes #6326 (requested by marmbrus) Closes #6349 (requested by marmbrus) Closes #6380 (requested by andrewor14) Closes #6554 (requested by marmbrus) Closes #6696 (requested by marmbrus) Closes #6868 (requested by marmbrus) Closes #6951 (requested by marmbrus) Closes #7129 (requested by marmbrus) Closes #7188 (requested by marmbrus) Closes #7358 (requested by marmbrus) Closes #7379 (requested by marmbrus) Closes #7628 (requested by marmbrus) Closes #7715 (requested by marmbrus) Closes #7782 (requested by marmbrus) Closes #7914 (requested by andrewor14) Closes #8051 (requested by andrewor14) Closes #8269 (requested by andrewor14) Closes #8448 (requested by andrewor14) Closes #8576 (requested by andrewor14)
* [MINOR] Minor style fix in SparkRShivaram Venkataraman2015-09-041-1/+1
| | | | | | | | `dev/lintr-r` passes on my machine now Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu> Closes #8601 from shivaram/sparkr-style-fix.