aboutsummaryrefslogtreecommitdiff
Commit message (Collapse)AuthorAgeFilesLines
...
* [SPARK-12742][SQL] org.apache.spark.sql.hive.LogicalPlanToSQLSuite failure ↵wangfei2016-01-111-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | due to Table already exists exception ``` [info] Exception encountered when attempting to run a suite with class name: org.apache.spark.sql.hive.LogicalPlanToSQLSuite *** ABORTED *** (325 milliseconds) [info] org.apache.spark.sql.AnalysisException: Table `t1` already exists.; [info] at org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:296) [info] at org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:285) [info] at org.apache.spark.sql.hive.LogicalPlanToSQLSuite.beforeAll(LogicalPlanToSQLSuite.scala:33) [info] at org.scalatest.BeforeAndAfterAll$class.beforeAll(BeforeAndAfterAll.scala:187) [info] at org.apache.spark.sql.hive.LogicalPlanToSQLSuite.beforeAll(LogicalPlanToSQLSuite.scala:23) [info] at org.scalatest.BeforeAndAfterAll$class.run(BeforeAndAfterAll.scala:253) [info] at org.apache.spark.sql.hive.LogicalPlanToSQLSuite.run(LogicalPlanToSQLSuite.scala:23) [info] at org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:462) [info] at org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:671) [info] at sbt.ForkMain$Run$2.call(ForkMain.java:296) [info] at sbt.ForkMain$Run$2.call(ForkMain.java:286) [info] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [info] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [info] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [info] at java.lang.Thread.run(Thread.java:745) ``` /cc liancheng Author: wangfei <wangfei_hello@126.com> Closes #10682 from scwf/fix-test.
* [SPARK-12576][SQL] Enable expression parsing in CatalystQlHerman van Hovell2016-01-119-56/+217
| | | | | | | | | | | | The PR allows us to use the new SQL parser to parse SQL expressions such as: ```1 + sin(x*x)``` We enable this functionality in this PR, but we will not start using this actively yet. This will be done as soon as we have reached grammar parity with the existing parser stack. cc rxin Author: Herman van Hovell <hvanhovell@questtec.nl> Closes #10649 from hvanhovell/SPARK-12576.
* [SPARK-10809][MLLIB] Single-document topicDistributions method for LocalLDAModelYuhao Yang2016-01-112-3/+38
| | | | | | | | | | | | jira: https://issues.apache.org/jira/browse/SPARK-10809 We could provide a single-document topicDistributions method for LocalLDAModel to allow for quick queries which avoid RDD operations. Currently, the user must use an RDD of documents. add some missing assert too. Author: Yuhao Yang <hhbyyh@gmail.com> Closes #9484 from hhbyyh/ldaTopicPre.
* [SPARK-12685][MLLIB] word2vec trainWordsCount gets overflowYuhao Yang2016-01-111-4/+4
| | | | | | | | | | | | | | jira: https://issues.apache.org/jira/browse/SPARK-12685 the log of `word2vec` reports trainWordsCount = -785727483 during computation over a large dataset. Update the priority as it will affect the computation process. `alpha = learningRate * (1 - numPartitions * wordCount.toDouble / (trainWordsCount + 1))` Author: Yuhao Yang <hhbyyh@gmail.com> Closes #10627 from hhbyyh/w2voverflow.
* [SPARK-12603][MLLIB] PySpark MLlib GaussianMixtureModel should support ↵Yanbo Liang2016-01-115-14/+37
| | | | | | | | | | single instance predict/predictSoft PySpark MLlib ```GaussianMixtureModel``` should support single instance ```predict/predictSoft``` just like Scala do. Author: Yanbo Liang <ybliang8@gmail.com> Closes #10552 from yanboliang/spark-12603.
* [SPARK-12758][SQL] add note to Spark SQL Migration guide about TimestampType ↵Brandon Bradley2016-01-111-0/+5
| | | | | | | | | | casting Warning users about casting changes. Author: Brandon Bradley <bradleytastic@gmail.com> Closes #10708 from blbradley/spark-12758.
* [SPARK-12734][HOTFIX] Build changes must trigger all tests; clean after ↵Josh Rosen2016-01-112-2/+2
| | | | | | | | | | | | | | | | install in dep tests This patch fixes a build/test issue caused by the combination of #10672 and a latent issue in the original `dev/test-dependencies` script. First, changes which _only_ touched build files were not triggering full Jenkins runs, making it possible for a build change to be merged even though it could cause failures in other tests. The `root` build module now depends on `build`, so all tests will now be run whenever a build-related file is changed. I also added a `clean` step to the Maven install step in `dev/test-dependencies` in order to address an issue where the dummy JARs stuck around and caused "multiple assembly JARs found" errors in tests. /cc zsxwing Author: Josh Rosen <joshrosen@databricks.com> Closes #10704 from JoshRosen/fix-build-test-problems.
* [STREAMING][MINOR] Typo fixesJacek Laskowski2016-01-112-2/+2
| | | | | | Author: Jacek Laskowski <jacek@japila.pl> Closes #10698 from jaceklaskowski/streaming-kafka-typo-fixes.
* [SPARK-12744][SQL] Change parsing JSON integers to timestamps to treat ↵Anatoliy Plastinin2016-01-113-3/+20
| | | | | | | | | | | | integers as number of seconds JIRA: https://issues.apache.org/jira/browse/SPARK-12744 This PR makes parsing JSON integers to timestamps consistent with casting behavior. Author: Anatoliy Plastinin <anatoliy.plastinin@gmail.com> Closes #10687 from antlypls/fix-json-timestamp-parsing.
* [SPARK-12269][STREAMING][KINESIS] Update aws-java-sdk versionBrianLondon2016-01-115-19/+19
| | | | | | | | The current Spark Streaming kinesis connector references a quite old version 1.9.40 of the AWS Java SDK (1.10.40 is current). Numerous AWS features including Kinesis Firehose are unavailable in 1.9. Those two versions of the AWS SDK in turn require conflicting versions of Jackson (2.4.4 and 2.5.3 respectively) such that one cannot include the current AWS SDK in a project that also uses the Spark Streaming Kinesis ASL. Author: BrianLondon <brian@seatgeek.com> Closes #10256 from BrianLondon/master.
* removed lambda from sortByKey()Udo Klein2016-01-111-1/+1
| | | | | | | | According to the documentation the sortByKey method does not take a lambda as an argument, thus the example is flawed. Removed the argument completely as this will default to ascending sort. Author: Udo Klein <git@blinkenlight.net> Closes #10640 from udoklein/patch-1.
* [SPARK-12539][FOLLOW-UP] always sort in partitioning writerWenchen Fan2016-01-112-147/+48
| | | | | | | | | | | address comments in #10498 , especially https://github.com/apache/spark/pull/10498#discussion_r49021259 Author: Wenchen Fan <wenchen@databricks.com> This patch had conflicts when merged, resolved by Committer: Reynold Xin <rxin@databricks.com> Closes #10638 from cloud-fan/bucket-write.
* [SPARK-12734][HOTFIX][TEST-MAVEN] Fix bug in Netty exclusionsJosh Rosen2016-01-116-47/+11
| | | | | | | | | | This is a hotfix for a build bug introduced by the Netty exclusion changes in #10672. We can't exclude `io.netty:netty` because Akka depends on it. There's not a direct conflict between `io.netty:netty` and `io.netty:netty-all`, because the former puts classes in the `org.jboss.netty` namespace while the latter uses the `io.netty` namespace. However, there still is a conflict between `org.jboss.netty:netty` and `io.netty:netty`, so we need to continue to exclude the JBoss version of that artifact. While the diff here looks somewhat large, note that this is only a revert of a some of the changes from #10672. You can see the net changes in pom.xml at https://github.com/apache/spark/compare/3119206b7188c23055621dfeaf6874f21c711a82...5211ab8#diff-600376dffeb79835ede4a0b285078036 Author: Josh Rosen <joshrosen@databricks.com> Closes #10693 from JoshRosen/netty-hotfix.
* [SPARK-4628][BUILD] Add a resolver to MiMaBuild.scala for mqttv3(1.0.1).Kousuke Saruta2016-01-101-1/+6
| | | | | | | | | | | #10659 removed the repository `https://repo.eclipse.org/content/repositories/paho-releases` but it's needed by MiMa because `spark-streaming-mqtt(1.6.0)` depends on `mqttv3(1.0.1)` and it is provided by the removed repository and maven-central provide only `mqttv3(1.0.2)` for now. Otherwise, if `mqttv3(1.0.1)` is absent from the local repository, dev/mima should fail. JoshRosen Do you have any other better idea? Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp> Closes #10688 from sarutak/SPARK-4628-followup.
* [SPARK-3873][BUILD] Enable import ordering error checking.Marcelo Vanzin2016-01-1036-64/+62
| | | | | | | | | | | | | Turn import ordering violations into build errors, plus a few adjustments to account for how the checker behaves. I'm a little on the fence about whether the existing code is right, but it's easier to appease the checker than to discuss what's the more correct order here. Plus a few fixes to imports that cropped in since my recent cleanups. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #10612 from vanzin/SPARK-3873-enable.
* [SPARK-12734][BUILD] Fix Netty exclusion and use Maven Enforcer to prevent ↵Josh Rosen2016-01-107-18/+64
| | | | | | | | | | | | | | | future bugs Netty classes are published under multiple artifacts with different names, so our build needs to exclude the `io.netty:netty` and `org.jboss.netty:netty` versions of the Netty artifact. However, our existing exclusions were incomplete, leading to situations where duplicate Netty classes would wind up on the classpath and cause compile errors (or worse). This patch fixes the exclusion issue by adding more exclusions and uses Maven Enforcer's [banned dependencies](https://maven.apache.org/enforcer/enforcer-rules/bannedDependencies.html) rule to prevent these classes from accidentally being reintroduced. I also updated `dev/test-dependencies.sh` to run `mvn validate` so that the enforcer rules can run as part of pull request builds. /cc rxin srowen pwendell. I'd like to backport at least the exclusion portion of this fix to `branch-1.5` in order to fix the documentation publishing job, which fails nondeterministically due to incompatible versions of Netty classes taking precedence on the compile-time classpath. Author: Josh Rosen <rosenville@gmail.com> Author: Josh Rosen <joshrosen@databricks.com> Closes #10672 from JoshRosen/enforce-netty-exclusions.
* [SPARK-12692][BUILD][GRAPHX] Scala style: Fix the style violation (Space ↵Kousuke Saruta2016-01-105-9/+8
| | | | | | | | | | | before "," or ":") Fix the style violation (space before `,` and `:`). This PR is a followup for #10643. Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp> Closes #10683 from sarutak/SPARK-12692-followup-graphx.
* [SPARK-12692][BUILD][MLLIB] Scala style: Fix the style violation (Space ↵Kousuke Saruta2016-01-1017-22/+22
| | | | | | | | | | | before "," or ":") Fix the style violation (space before , and :). This PR is a followup for #10643. Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp> Closes #10684 from sarutak/SPARK-12692-followup-mllib.
* [SPARK-12736][CORE][DEPLOY] Standalone Master cannot be started due t…Jacek Laskowski2016-01-101-0/+1
| | | | | | | | | | …o NoClassDefFoundError: org/spark-project/guava/collect/Maps /cc srowen rxin Author: Jacek Laskowski <jacek@japila.pl> Closes #10674 from jaceklaskowski/SPARK-12736.
* [SPARK-12735] Consolidate & move spark-ec2 to AMPLab managed repository.Reynold Xin2016-01-0914-1808/+3
| | | | | | Author: Reynold Xin <rxin@databricks.com> Closes #10673 from rxin/SPARK-12735.
* Close #10665Reynold Xin2016-01-090-0/+0
|
* [SPARK-12340] Fix overflow in various take functions.Reynold Xin2016-01-096-22/+19
| | | | | | | | This is a follow-up for the original patch #10562. Author: Reynold Xin <rxin@databricks.com> Closes #10670 from rxin/SPARK-12340.
* [SPARK-12645][SPARKR] SparkR support hash functionYanbo Liang2016-01-094-1/+26
| | | | | | | | Add ```hash``` function for SparkR ```DataFrame```. Author: Yanbo Liang <ybliang8@gmail.com> Closes #10597 from yanboliang/spark-12645.
* [SPARK-12577] [SQL] Better support of parentheses in partition by and order ↵Liang-Chi Hsieh2016-01-082-11/+32
| | | | | | | | | | by clause of window function's over clause JIRA: https://issues.apache.org/jira/browse/SPARK-12577 Author: Liang-Chi Hsieh <viirya@gmail.com> Closes #10620 from viirya/fix-parentheses.
* [SPARK-4628][BUILD] Remove all non-Maven-Central repositories from buildJosh Rosen2016-01-084-95/+7
| | | | | | | | | | | | | | | | | | | | | | This patch removes all non-Maven-central repositories from Spark's build, thereby avoiding any risk of future build-breaks due to us accidentally depending on an artifact which is not present in an immutable public Maven repository. I tested this by running ``` build/mvn \ -Phive \ -Phive-thriftserver \ -Pkinesis-asl \ -Pspark-ganglia-lgpl \ -Pyarn \ dependency:go-offline ``` inside of a fresh Ubuntu Docker container with no Ivy or Maven caches (I did a similar test for SBT). Author: Josh Rosen <joshrosen@databricks.com> Closes #10659 from JoshRosen/SPARK-4628.
* [SPARK-12730][TESTS] De-duplicate some test code in BlockManagerSuiteJosh Rosen2016-01-081-63/+25
| | | | | | | | This patch deduplicates some test code in BlockManagerSuite. I'm splitting this change off from a larger PR in order to make things easier to review. Author: Josh Rosen <joshrosen@databricks.com> Closes #10667 from JoshRosen/block-mgr-tests-cleanup.
* [SPARK-12593][SQL] Converts resolved logical plan back to SQLCheng Lian2016-01-0847-146/+1087
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This PR tries to enable Spark SQL to convert resolved logical plans back to SQL query strings. For now, the major use case is to canonicalize Spark SQL native view support. The major entry point is `SQLBuilder.toSQL`, which returns an `Option[String]` if the logical plan is recognized. The current version is still in WIP status, and is quite limited. Known limitations include: 1. The logical plan must be analyzed but not optimized The optimizer erases `Subquery` operators, which contain necessary scope information for SQL generation. Future versions should be able to recover erased scope information by inserting subqueries when necessary. 1. The logical plan must be created using HiveQL query string Query plans generated by composing arbitrary DataFrame API combinations are not supported yet. Operators within these query plans need to be rearranged into a canonical form that is more suitable for direct SQL generation. For example, the following query plan ``` Filter (a#1 < 10) +- MetastoreRelation default, src, None ``` need to be canonicalized into the following form before SQL generation: ``` Project [a#1, b#2, c#3] +- Filter (a#1 < 10) +- MetastoreRelation default, src, None ``` Otherwise, the SQL generation process will have to handle a large number of special cases. 1. Only a fraction of expressions and basic logical plan operators are supported in this PR Currently, 95.7% (1720 out of 1798) query plans in `HiveCompatibilitySuite` can be successfully converted to SQL query strings. Known unsupported components are: - Expressions - Part of math expressions - Part of string expressions (buggy?) - Null expressions - Calendar interval literal - Part of date time expressions - Complex type creators - Special `NOT` expressions, e.g. `NOT LIKE` and `NOT IN` - Logical plan operators/patterns - Cube, rollup, and grouping set - Script transformation - Generator - Distinct aggregation patterns that fit `DistinctAggregationRewriter` analysis rule - Window functions Support for window functions, generators, and cubes etc. will be added in follow-up PRs. This PR leverages `HiveCompatibilitySuite` for testing SQL generation in a "round-trip" manner: * For all select queries, we try to convert it back to SQL * If the query plan is convertible, we parse the generated SQL into a new logical plan * Run the new logical plan instead of the original one If the query plan is inconvertible, the test case simply falls back to the original logic. TODO - [x] Fix failed test cases - [x] Support for more basic expressions and logical plan operators (e.g. distinct aggregation etc.) - [x] Comments and documentation Author: Cheng Lian <lian@databricks.com> Closes #10541 from liancheng/sql-generation.
* [SPARK-4819] Remove Guava's "Optional" from public APISean Owen2016-01-0819-85/+333
| | | | | | | | | | Replace Guava `Optional` with (an API clone of) Java 8 `java.util.Optional` (edit: and a clone of Guava `Optional`) See also https://github.com/apache/spark/pull/10512 Author: Sean Owen <sowen@cloudera.com> Closes #10513 from srowen/SPARK-4819.
* [SPARK-12654] sc.wholeTextFiles with spark.hadoop.cloneConf=true fail…Thomas Graves2016-01-081-1/+8
| | | | | | | | | | | | | | | …s on secure Hadoop https://issues.apache.org/jira/browse/SPARK-12654 So the bug here is that WholeTextFileRDD.getPartitions has: val conf = getConf in getConf if the cloneConf=true it creates a new Hadoop Configuration. Then it uses that to create a new newJobContext. The newJobContext will copy credentials around, but credentials are only present in a JobConf not in a Hadoop Configuration. So basically when it is cloning the hadoop configuration its changing it from a JobConf to Configuration and dropping the credentials that were there. NewHadoopRDD just uses the conf passed in for the getPartitions (not getConf) which is why it works. Author: Thomas Graves <tgraves@staydecay.corp.gq1.yahoo.com> Closes #10651 from tgravescs/SPARK-12654.
* fixed numVertices in transitive closure exampleUdo Klein2016-01-081-2/+2
| | | | | | Author: Udo Klein <git@blinkenlight.net> Closes #10642 from udoklein/patch-2.
* [DOCUMENTATION] doc fix of job schedulingJeff Zhang2016-01-081-1/+1
| | | | | | | | spark.shuffle.service.enabled is spark application related configuration, it is not necessary to set it in yarn-site.xml Author: Jeff Zhang <zjffdu@apache.org> Closes #10657 from zjffdu/doc-fix.
* [SPARK-12701][CORE] FileAppender should use join to ensure writing thread ↵Bryan Cutler2016-01-081-10/+1
| | | | | | | | | | completion Changed Logging FileAppender to use join in `awaitTermination` to ensure that thread is properly finished before returning. Author: Bryan Cutler <cutlerb@gmail.com> Closes #10654 from BryanCutler/fileAppender-join-thread-SPARK-12701.
* [SPARK-12687] [SQL] Support from clause surrounded by `()`.Liang-Chi Hsieh2016-01-083-2/+25
| | | | | | | | | | JIRA: https://issues.apache.org/jira/browse/SPARK-12687 Some queries such as `(select 1 as a) union (select 2 as a)` can't work. This patch fixes it. Author: Liang-Chi Hsieh <viirya@gmail.com> Closes #10660 from viirya/fix-union.
* [SPARK-12618][CORE][STREAMING][SQL] Clean up build warnings: 2.0.0 editionSean Owen2016-01-0824-137/+123
| | | | | | | | Fix most build warnings: mostly deprecated API usages. I'll annotate some of the changes below. CC rxin who is leading the charge to remove the deprecated APIs. Author: Sean Owen <sowen@cloudera.com> Closes #10570 from srowen/SPARK-12618.
* [SPARK-12692][BUILD] Scala style: check no white space before comma and colonKousuke Saruta2016-01-081-0/+6
| | | | | | | | | | We should not put a white space before `,` and `:` so let's check it. Because there are lots of style violations, first, I'd like to add a checker, enable and let the level `warning`. Then, I'd like to fix the style step by step. Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp> Closes #10643 from sarutak/SPARK-12692.
* Fix indentation for the previous patch.Reynold Xin2016-01-071-10/+8
|
* [SPARK-12317][SQL] Support units (m,k,g) in SQLConfKevin Yu2016-01-072-1/+60
| | | | | | | | | | | | | | This PR is continue from previous closed PR 10314. In this PR, SHUFFLE_TARGET_POSTSHUFFLE_INPUT_SIZE will be taken memory string conventions as input. For example, the user can now specify 10g for SHUFFLE_TARGET_POSTSHUFFLE_INPUT_SIZE in SQLConf file. marmbrus srowen : Can you help review this code changes ? Thanks. Author: Kevin Yu <qyu@us.ibm.com> Closes #10629 from kevinyu98/spark-12317.
* [SPARK-12591][STREAMING] Register OpenHashMapBasedStateMap for KryoShixiong Zhu2016-01-075-41/+174
| | | | | | | | The default serializer in Kryo is FieldSerializer and it ignores transient fields and never calls `writeObject` or `readObject`. So we should register OpenHashMapBasedStateMap using `DefaultSerializer` to make it work with Kryo. Author: Shixiong Zhu <shixiong@databricks.com> Closes #10609 from zsxwing/SPARK-12591.
* [SPARK-12507][STREAMING][DOCUMENT] Expose closeFileAfterWrite and ↵Shixiong Zhu2016-01-072-7/+23
| | | | | | | | | | allowBatching configurations for Streaming /cc tdas brkyvz Author: Shixiong Zhu <shixiong@databricks.com> Closes #10453 from zsxwing/streaming-conf.
* [SPARK-12604][CORE] Addendum - use casting vs mapValues for countBy{Key,Value}Sean Owen2016-01-072-2/+2
| | | | | | | | Per rxin, let's use the casting for countByKey and countByValue as well. Let's see if this passes. Author: Sean Owen <sowen@cloudera.com> Closes #10641 from srowen/SPARK-12604.2.
* [SPARK-12510][STREAMING] Refactor ActorReceiver to support JavaShixiong Zhu2016-01-076-20/+202
| | | | | | | | | | | | | This PR includes the following changes: 1. Rename `ActorReceiver` to `ActorReceiverSupervisor` 2. Remove `ActorHelper` 3. Add a new `ActorReceiver` for Scala and `JavaActorReceiver` for Java 4. Add `JavaActorWordCount` example Author: Shixiong Zhu <shixiong@databricks.com> Closes #10457 from zsxwing/java-actor-stream.
* [SPARK-12580][SQL] Remove string concatenations from usage and extended in ↵Kazuaki Ishizaki2016-01-072-25/+25
| | | | | | | | | | | | | | @ExpressionDescription Use multi-line string literals for ExpressionDescription with ``// scalastyle:off line.size.limit`` and ``// scalastyle:on line.size.limit`` The policy is here, as describe at https://github.com/apache/spark/pull/10488 Let's use multi-line string literals. If we have to have a line with more than 100 characters, let's use ``// scalastyle:off line.size.limit`` and ``// scalastyle:on line.size.limit`` to just bypass the line number requirement. Author: Kazuaki Ishizaki <ishizaki@jp.ibm.com> Closes #10524 from kiszk/SPARK-12580.
* [SPARK-12598][CORE] bug in setMinPartitionsDarek Blasiak2016-01-071-3/+2
| | | | | | | | There is a bug in the calculation of ```maxSplitSize```. The ```totalLen``` should be divided by ```minPartitions``` and not by ```files.size```. Author: Darek Blasiak <darek.blasiak@640labs.com> Closes #10546 from datafarmer/setminpartitionsbug.
* [STREAMING][MINOR] More contextual information in logs + minor code i…Jacek Laskowski2016-01-0714-74/+69
| | | | | | | | | | …mprovements Please review and merge at your convenience. Thanks! Author: Jacek Laskowski <jacek@japila.pl> Closes #10595 from jaceklaskowski/streaming-minor-fixes.
* [MINOR] Fix for BUILD FAILURE for Scala 2.11Jacek Laskowski2016-01-071-18/+1
| | | | | | | | | | It was introduced in 917d3fc069fb9ea1c1487119c9c12b373f4f9b77 /cc cloud-fan rxin Author: Jacek Laskowski <jacek@japila.pl> Closes #10636 from jaceklaskowski/fix-for-build-failure-2.11.
* [SPARK-12662][SQL] Fix DataFrame.randomSplit to avoid creating overlapping ↵Sameer Agarwal2016-01-072-1/+28
| | | | | | | | | | | | splits https://issues.apache.org/jira/browse/SPARK-12662 cc yhuai Author: Sameer Agarwal <sameer@databricks.com> Closes #10626 from sameeragarwal/randomsplit.
* [SPARK-12006][ML][PYTHON] Fix GMM failure if initialModel is not Nonezero3232016-01-072-1/+13
| | | | | | | | If initial model passed to GMM is not empty it causes net.razorvine.pickle.PickleException. It can be fixed by converting initialModel.weights to list. Author: zero323 <matthew.szymkiewicz@gmail.com> Closes #10644 from zero323/SPARK-12006.
* [STREAMING][DOCS][EXAMPLES] Minor fixesJacek Laskowski2016-01-072-10/+8
| | | | | | Author: Jacek Laskowski <jacek@japila.pl> Closes #10603 from jaceklaskowski/streaming-actor-custom-receiver.
* [SPARK-12542][SQL] support except/intersect in HiveQlDavies Liu2016-01-065-5/+65
| | | | | | | | Parse the SQL query with except/intersect in FROM clause for HivQL. Author: Davies Liu <davies@databricks.com> Closes #10622 from davies/intersect.
* [SPARK-12295] [SQL] external spilling for window functionsDavies Liu2016-01-066-94/+276
| | | | | | | | | | This PR manage the memory used by window functions (buffered rows), also enable external spilling. After this PR, we can run window functions on a partition with hundreds of millions of rows with only 1G. Author: Davies Liu <davies@databricks.com> Closes #10605 from davies/unsafe_window.