aboutsummaryrefslogtreecommitdiff
Commit message (Collapse)AuthorAgeFilesLines
...
* SPARK-2186: Spark SQL DSL support for simple aggregations such as SUM and AVGXimo Guanter Gonzalbez2014-07-023-8/+44
| | | | | | | | | | | | | | | **Description** This patch enables using the `.select()` function in SchemaRDD with functions such as `Sum`, `Count` and other. **Testing** Unit tests added. Author: Ximo Guanter Gonzalbez <ximo@tid.es> Closes #1211 from edrevo/add-expression-support-in-select and squashes the following commits: fe4a1e1 [Ximo Guanter Gonzalbez] Extend SQL DSL to functions e1d344a [Ximo Guanter Gonzalbez] SPARK-2186: Spark SQL DSL support for simple aggregations such as SUM and AVG (cherry picked from commit 5c6ec94da1bacd8e65a43acb92b6721493484e7b) Signed-off-by: Michael Armbrust <michael@databricks.com>
* update the comments in SqlParserCodingCat2014-07-011-1/+0
| | | | | | | | | | | | | SqlParser has been case-insensitive after https://github.com/apache/spark/commit/dab5439a083b5f771d5d5b462d0d517fa8e9aaf2 was merged Author: CodingCat <zhunansjtu@gmail.com> Closes #1275 from CodingCat/master and squashes the following commits: 17931cd [CodingCat] update the comments in SqlParser (cherry picked from commit 6596392da0fc0fee89e22adfca239a3477dfcbab) Signed-off-by: Reynold Xin <rxin@apache.org>
* [SPARK-2322] Exception in resultHandler should NOT crash DAGScheduler and ↵Reynold Xin2014-06-303-6/+78
| | | | | | | | | | | | | | | | | | | shutdown SparkContext. This should go into 1.0.1. Author: Reynold Xin <rxin@apache.org> Closes #1264 from rxin/SPARK-2322 and squashes the following commits: c77c07f [Reynold Xin] Added comment to SparkDriverExecutionException and a test case for accumulator. 5d8d920 [Reynold Xin] [SPARK-2322] Exception in resultHandler could crash DAGScheduler and shutdown SparkContext. (cherry picked from commit 358ae1534d01ad9e69364a21441a7ef23c2cb516) Signed-off-by: Reynold Xin <rxin@apache.org> Conflicts: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
* [SPARK-1394] Remove SIGCHLD handler in worker subprocessMatthew Farrellee2014-06-281-0/+1
| | | | | | | | | | | | | | | | | | | | It should not be the responsibility of the worker subprocess, which does not intentionally fork, to try and cleanup child processes. Doing so is complex and interferes with operations such as platform.system(). If it is desirable to have tighter control over subprocesses, then namespaces should be used and it should be the manager's resposibility to handle cleanup. Author: Matthew Farrellee <matt@redhat.com> Closes #1247 from mattf/SPARK-1394 and squashes the following commits: c36f308 [Matthew Farrellee] [SPARK-1394] Remove SIGCHLD handler in worker subprocess (cherry picked from commit 3c104c79d24425786cec0034f269ba19cf465b31) Signed-off-by: Aaron Davidson <aaron@databricks.com>
* Revert "[maven-release-plugin] prepare release v1.0.1-rc1"Patrick Wendell2014-06-2721-22/+22
| | | | This reverts commit 7feeda3d729f9397aa15ee8750c01ef5aa601962.
* Revert "[maven-release-plugin] prepare for next development iteration"Patrick Wendell2014-06-2721-22/+22
| | | | This reverts commit ea1a455a755f83f46fc8bf242410917d93d0c52c.
* [SPARK-2003] Fix python SparkContext exampleMatthew Farrellee2014-06-271-1/+1
| | | | | | | | | | | Author: Matthew Farrellee <matt@redhat.com> Closes #1246 from mattf/SPARK-2003 and squashes the following commits: b12e7ca [Matthew Farrellee] [SPARK-2003] Fix python SparkContext example (cherry picked from commit 0e0686d3ef88e024fcceafe36a0cdbb953f5aeae) Signed-off-by: Patrick Wendell <pwendell@gmail.com>
* [SPARK-2259] Fix highly misleading docs on cluster / client deploy modesAndrew Or2014-06-275-12/+36
| | | | | | | | | | | | | | | | | The existing docs are highly misleading. For standalone mode, for example, it encourages the user to use standalone-cluster mode, which is not officially supported. The safeguards have been added in Spark submit itself to prevent bad documentation from leading users down the wrong path in the future. This PR is prompted by countless headaches users of Spark have run into on the mailing list. Author: Andrew Or <andrewor14@gmail.com> Closes #1200 from andrewor14/submit-docs and squashes the following commits: 5ea2460 [Andrew Or] Rephrase cluster vs client explanation c827f32 [Andrew Or] Clarify spark submit messages 9f7ed8f [Andrew Or] Clarify client vs cluster deploy mode + add safeguards (cherry picked from commit f17510e371dfbeaada3c72b884d70c36503ea30a) Signed-off-by: Patrick Wendell <pwendell@gmail.com>
* [SPARK-2307] SparkUI - storage tab displays incorrect RDDsAndrew Or2014-06-272-6/+5
| | | | | | | | | | | | | | | The issue here is that the `StorageTab` listens for updates from the `StorageStatusListener`, but when a block is kicked out of the cache, `StorageStatusListener` removes it from its list. Thus, there is no way for the `StorageTab` to know whether a block has been dropped. This issue was introduced in #1080, which was itself a bug fix. Here we revert that PR and offer a different fix for the original bug (SPARK-2144). Author: Andrew Or <andrewor14@gmail.com> Closes #1249 from andrewor14/storage-ui-fix and squashes the following commits: af019ce [Andrew Or] Fix SPARK-2307 (cherry picked from commit 21e0f77b6321590ed86223a60cdb8ae08ea4057f) Signed-off-by: Patrick Wendell <pwendell@gmail.com>
* SPARK-2181:The keys for sorting the columns of Executor page in SparkUI are ↵witgo2014-06-263-11/+17
| | | | | | | | | | | | | incorrect Author: witgo <witgo@qq.com> Closes #1135 from witgo/SPARK-2181 and squashes the following commits: 39dad90 [witgo] The keys for sorting the columns of Executor page in SparkUI are incorrect (cherry picked from commit 18f29b96c7e0948f5f504e522e5aa8a8d1ab163e) Signed-off-by: Patrick Wendell <pwendell@gmail.com>
* [maven-release-plugin] prepare for next development iterationUbuntu2014-06-2621-22/+22
|
* [maven-release-plugin] prepare release v1.0.1-rc1Ubuntu2014-06-2621-22/+22
|
* CHANGES.txt for release 1.0.1Patrick Wendell2014-06-261-0/+778
|
* Fixing AWS instance type information based upon current EC2 dataZichuan Ye2014-06-261-5/+14
| | | | | | | | | | | | | | | Fixed a problem in previous file in which some information regarding AWS instance types were wrong. Such information was updated base upon current AWS EC2 data. Author: Zichuan Ye <jerry@tangentds.com> Closes #1156 from jerry86/master and squashes the following commits: ff36e95 [Zichuan Ye] Fixing AWS instance type information based upon current EC2 data (cherry picked from commit 62d4a0fa9947e64c1533f66ae577557bcfb271c9) Conflicts: ec2/spark_ec2.py
* Small error in previous commitPatrick Wendell2014-06-261-2/+2
|
* Updating versions for 1.0.1 releasePatrick Wendell2014-06-269-11/+11
|
* [SPARK-2286][UI] Report exception/errors for failed tasks that are not ↵Reynold Xin2014-06-264-26/+75
| | | | | | | | | | | | | | | | | | | | ExceptionFailure Also added inline doc for each TaskEndReason. Author: Reynold Xin <rxin@apache.org> Closes #1225 from rxin/SPARK-2286 and squashes the following commits: 6a7959d [Reynold Xin] Fix unit test failure. cf9d5eb [Reynold Xin] Merge branch 'master' into SPARK-2286 a61fae1 [Reynold Xin] Move to line above ... 38c7391 [Reynold Xin] [SPARK-2286][UI] Report exception/errors for failed tasks that are not ExceptionFailure. (cherry picked from commit 6587ef7c1783961e6ef250afa387271a1bd6e277) Conflicts: core/src/main/scala/org/apache/spark/ui/jobs/StageTable.scala
* [SPARK-2295] [SQL] Make JavaBeans nullability stricter.Takuya UESHIN2014-06-261-19/+18
| | | | | | | | | | | Author: Takuya UESHIN <ueshin@happy-camper.st> Closes #1235 from ueshin/issues/SPARK-2295 and squashes the following commits: 201c508 [Takuya UESHIN] Make JavaBeans nullability stricter. (cherry picked from commit 32a1ad75313472b1b098f7ec99335686d3fe4fc3) Signed-off-by: Michael Armbrust <michael@databricks.com>
* [SPARK-2251] fix concurrency issues in random sampler (branch-1.0)Xiangrui Meng2014-06-262-4/+15
| | | | | | | | | | | | | | | | | | | The following code is very likely to throw an exception: ~~~ val rdd = sc.parallelize(0 until 111, 10).sample(false, 0.1) rdd.zip(rdd).count() ~~~ because the same random number generator is used in compute partitions. This fix doesn't change the type signature. @pwendell Author: Xiangrui Meng <meng@databricks.com> Closes #1234 from mengxr/fix-sample-1.0 and squashes the following commits: 88795e2 [Xiangrui Meng] fix concurrency issues in random sampler
* Remove use of spark.worker.instancesKay Ousterhout2014-06-261-1/+1
| | | | | | | | | | | | | | | | | | | spark.worker.instances was added as part of this commit: https://github.com/apache/spark/commit/1617816090e7b20124a512a43860a21232ebf511 My understanding is that SPARK_WORKER_INSTANCES is supported for backwards compatibility, but spark.worker.instances is never used (SparkSubmit.scala sets spark.executor.instances) so should not have been added. @sryza @pwendell @tgravescs LMK if I'm understanding this correctly Author: Kay Ousterhout <kayousterhout@gmail.com> Closes #1214 from kayousterhout/yarn_config and squashes the following commits: 3d7c491 [Kay Ousterhout] Remove use of spark.worker.instances (cherry picked from commit 48a82a827c99526b165c78d7e88faec43568a37a) Signed-off-by: Thomas Graves <tgraves@apache.org>
* [SPARK-2254] [SQL] ScalaRefection should mark primitive types as non-nullable.Takuya UESHIN2014-06-252-31/+165
| | | | | | | | | | | Author: Takuya UESHIN <ueshin@happy-camper.st> Closes #1193 from ueshin/issues/SPARK-2254 and squashes the following commits: cfd6088 [Takuya UESHIN] Modify ScalaRefection.schemaFor method to return nullability of Scala Type. (cherry picked from commit e4899a253728bfa7c78709a37a4837f74b72bd61) Signed-off-by: Reynold Xin <rxin@apache.org>
* [SPARK-2284][UI] Mark all failed tasks as failures.Reynold Xin2014-06-252-4/+35
| | | | | | | | | | | | | Previously only tasks failed with ExceptionFailure reason was marked as failure. Author: Reynold Xin <rxin@apache.org> Closes #1224 from rxin/SPARK-2284 and squashes the following commits: be79dbd [Reynold Xin] [SPARK-2284][UI] Mark all failed tasks as failures. (cherry picked from commit 4a346e242c3f241c575f35536220df01ad724e23) Signed-off-by: Reynold Xin <rxin@apache.org>
* [SPARK-2172] PySpark cannot import mllib modules in YARN-client modeSzul, Piotr2014-06-251-0/+8
| | | | | | | | | | | | Include pyspark/mllib python sources as resources in the mllib.jar. This way they will be included in the final assembly Author: Szul, Piotr <Piotr.Szul@csiro.au> Closes #1223 from piotrszul/branch-1.0 and squashes the following commits: 69d5174 [Szul, Piotr] Removed unsed resource directory src/main/resource from mllib pom f8c52a0 [Szul, Piotr] [SPARK-2172] PySpark cannot import mllib modules in YARN-client mode Include pyspark/mllib python sources as resources in the jar
* [SPARK-1749] Job cancellation when SchedulerBackend does not implement killTaskMark Hamstra2014-06-252-9/+69
| | | | | | | | | | | | | | | | | | | This is a fixed up version of #686 (cc @markhamstra @pwendell). The last commit (the only one I authored) reflects the changes I made from Mark's original patch. Author: Mark Hamstra <markhamstra@gmail.com> Author: Kay Ousterhout <kayousterhout@gmail.com> Closes #1219 from kayousterhout/mark-SPARK-1749 and squashes the following commits: 42dfa7e [Kay Ousterhout] Got rid of terrible double-negative name 80b3205 [Kay Ousterhout] Don't notify listeners of job failure if it wasn't successfully cancelled. d156d33 [Mark Hamstra] Do nothing in no-kill submitTasks 9312baa [Mark Hamstra] code review update cc353c8 [Mark Hamstra] scalastyle e61f7f8 [Mark Hamstra] Catch UnsupportedOperationException when DAGScheduler tries to cancel a job on a SchedulerBackend that does not implement killTask (cherry picked from commit b88a59a66845b8935b22f06fc96d16841ed20c94) Signed-off-by: Patrick Wendell <pwendell@gmail.com>
* [SPARK-2283][SQL] Reset test environment before running PruningSuiteCheng Lian2014-06-251-0/+5
| | | | | | | | | | | | | | | JIRA issue: [SPARK-2283](https://issues.apache.org/jira/browse/SPARK-2283) If `PruningSuite` is run right after `HiveCompatibilitySuite`, the first test case fails because `srcpart` table is cached in-memory by `HiveCompatibilitySuite`, but column pruning is not implemented for `InMemoryColumnarTableScan` operator yet. Author: Cheng Lian <lian.cs.zju@gmail.com> Closes #1221 from liancheng/spark-2283 and squashes the following commits: dc0b663 [Cheng Lian] SPARK-2283: reset test environment before running PruningSuite (cherry picked from commit 7f196b009d26d4aed403b3c694f8b603601718e3) Signed-off-by: Michael Armbrust <michael@databricks.com>
* [SPARK-1912] fix compress memory issue during reduceWenchen Fan(Cloud)2014-06-251-2/+20
| | | | | | | | | | | | | | | | When we need to read a compressed block, we will first create a compress stream instance(LZF or Snappy) and use it to wrap that block. Let's say a reducer task need to read 1000 local shuffle blocks, it will first prepare to read that 1000 blocks, which means create 1000 compression stream instance to wrap them. But the initialization of compression instance will allocate some memory and when we have many compression instance at the same time, it is a problem. Actually reducer reads the shuffle blocks one by one, so we can do the compression instance initialization lazily. Author: Wenchen Fan(Cloud) <cloud0fan@gmail.com> Closes #860 from cloud-fan/fix-compress and squashes the following commits: 0924a6b [Wenchen Fan(Cloud)] rename 'doWork' into 'getIterator' 07f32c2 [Wenchen Fan(Cloud)] move the LazyProxyIterator to dataDeserialize d80c426 [Wenchen Fan(Cloud)] remove empty lines in short class 2c8adb2 [Wenchen Fan(Cloud)] add inline comment 8ebff77 [Wenchen Fan(Cloud)] fix compress memory issue during reduce
* [SPARK-2204] Launch tasks on the proper executors in mesos fine-grained modeSebastien Rainville2014-06-251-7/+6
| | | | | | | | | | | | | | | | | | The scheduler for Mesos in fine-grained mode launches tasks on the wrong executors. `MesosSchedulerBackend.resourceOffers(SchedulerDriver, List[Offer])` is assuming that `TaskSchedulerImpl.resourceOffers(Seq[WorkerOffer])` is returning task lists in the same order as the offers it was passed, but in the current implementation `TaskSchedulerImpl.resourceOffers` shuffles the offers to avoid assigning the tasks always to the same executors. The result is that the tasks are launched on the wrong executors. The jobs are sometimes able to complete, but most of the time they fail. It seems that as soon as something goes wrong with a task for some reason Spark is not able to recover since it's mistaken as to where the tasks are actually running. Also, it seems that the more the cluster is under load the more likely the job is to fail because there's a higher probability that Spark is trying to launch a task on a slave that doesn't actually have enough resources, again because it's using the wrong offers. The solution is to not assume that the order in which the tasks are returned is the same as the offers, and simply launch the tasks on the executor decided by `TaskSchedulerImpl.resourceOffers`. What I am not sure about is that I considered slaveId and executorId to be the same, which is true at least in my setup, but I don't know if that is always true. I tested this on top of the 1.0.0 release and it seems to work fine on our cluster. Author: Sebastien Rainville <sebastien@hopper.com> Closes #1140 from sebastienrainville/fine-grained-mode-fix-master and squashes the following commits: a98b0e0 [Sebastien Rainville] Use a HashMap to retrieve the offer indices d6ffe54 [Sebastien Rainville] Launch tasks on the proper executors in mesos fine-grained mode (cherry picked from commit 1132e472eca1a00c2ce10d2f84e8f0e79a5193d3) Signed-off-by: Patrick Wendell <pwendell@gmail.com>
* [SPARK-2270] Kryo cannot serialize results returned by asJavaIterableReynold Xin2014-06-252-0/+65
| | | | | | | | | | | | | | | | | | and thus groupBy/cogroup are broken in Java APIs when Kryo is used). @pwendell this should be merged into 1.0.1. Thanks @sorenmacbeth for reporting this & helping out with the fix. Author: Reynold Xin <rxin@apache.org> Closes #1206 from rxin/kryo-iterable-2270 and squashes the following commits: 09da0aa [Reynold Xin] Updated the comment. 009bf64 [Reynold Xin] [SPARK-2270] Kryo cannot serialize results returned by asJavaIterable (and thus groupBy/cogroup are broken in Java APIs when Kryo is used). (cherry picked from commit 7ff2c754f340ba4c4077b0ff6285876eb7871c7b) Signed-off-by: Patrick Wendell <pwendell@gmail.com>
* [SPARK-2258 / 2266] Fix a few worker UI bugsAndrew Or2014-06-252-3/+4
| | | | | | | | | | | | | | | | **SPARK-2258.** Worker UI displays zombie processes if the executor throws an exception before a process is launched. This is because we only inform the Worker of the change if the process is already launched, which in this case it isn't. **SPARK-2266.** We expose "Some(app-id)" on the log page. This is fairly minor. Author: Andrew Or <andrewor14@gmail.com> Closes #1213 from andrewor14/fix-worker-ui and squashes the following commits: c1223fe [Andrew Or] Fix worker UI bugs Conflicts: core/src/main/scala/org/apache/spark/deploy/worker/ExecutorRunner.scala
* Replace doc reference to Shark with Spark SQL.Reynold Xin2014-06-251-3/+2
| | | | | (cherry picked from commit ac06a85da59db8f2654cdf6601d186348da09c01) Signed-off-by: Reynold Xin <rxin@apache.org>
* [SPARK-2267] Log exception when TaskResultGetter fails to fetch/deserialze ↵Reynold Xin2014-06-251-1/+2
| | | | | | | | | | | | task result Note that this is only for branch-1.0 because master's been fixed. Author: Reynold Xin <rxin@apache.org> Closes #1202 from rxin/SPARK-2267 and squashes the following commits: ce1b19b [Reynold Xin] [SPARK-2267] Log exception when TaskResultGetter fails to fetch/deserialize task result
* [BUGFIX][SQL] Should match java.math.BigDecimal when wnrapping Hive outputCheng Lian2014-06-251-4/+4
| | | | | | | | | | | | | The `BigDecimal` branch in `unwrap` matches to `scala.math.BigDecimal` rather than `java.math.BigDecimal`. Author: Cheng Lian <lian.cs.zju@gmail.com> Closes #1199 from liancheng/javaBigDecimal and squashes the following commits: e9bb481 [Cheng Lian] Should match java.math.BigDecimal when wnrapping Hive output (cherry picked from commit 22036aeb1b2cac7f48cd60afea925b42a5318631) Signed-off-by: Reynold Xin <rxin@apache.org>
* [SPARK-2263][SQL] Support inserting MAP<K, V> to Hive tablesCheng Lian2014-06-253-6/+20
| | | | | | | | | | | | | | | | JIRA issue: [SPARK-2263](https://issues.apache.org/jira/browse/SPARK-2263) Map objects were not converted to Hive types before inserting into Hive tables. Author: Cheng Lian <lian.cs.zju@gmail.com> Closes #1205 from liancheng/spark-2263 and squashes the following commits: c7a4373 [Cheng Lian] Addressed @concretevitamin's comment 784940b [Cheng Lian] SARPK-2263: support inserting MAP<K, V> to Hive tables (cherry picked from commit 8fade8973e5fc97f781de5344beb66b90bd6e524) Signed-off-by: Reynold Xin <rxin@apache.org>
* Fix possible null pointer in acumulator toStringMichael Armbrust2014-06-241-1/+1
| | | | | | | | | | | Author: Michael Armbrust <michael@databricks.com> Closes #1204 from marmbrus/nullPointerToString and squashes the following commits: 35b5fce [Michael Armbrust] Fix possible null pointer in acumulator toString (cherry picked from commit 2714968e1b40221739c5dfba7ca4c0c06953dbe2) Signed-off-by: Reynold Xin <rxin@apache.org>
* [SPARK-2264][SQL] Fix failing CachedTableSuiteMichael Armbrust2014-06-243-24/+25
| | | | | | | | | | | | Author: Michael Armbrust <michael@databricks.com> Closes #1201 from marmbrus/fixCacheTests and squashes the following commits: 9d87ed1 [Michael Armbrust] Use analyzer (which runs to fixed point) instead of manually removing analysis operators. Conflicts: sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala
* [SQL]Add base row updating methods for JoinedRowCheng Hao2014-06-241-0/+17
| | | | | | | | | | | | | This will be helpful in join operators. Author: Cheng Hao <hao.cheng@intel.com> Closes #1187 from chenghao-intel/joinedRow and squashes the following commits: 87c19e3 [Cheng Hao] Add base row set methods for JoinedRow (cherry picked from commit 133495d82672c3f34d40a6298cc80c31f91faf5c) Signed-off-by: Michael Armbrust <michael@databricks.com>
* [SPARK-2252] Fix MathJax for HTTPs.Reynold Xin2014-06-231-13/+23
| | | | | | | | | | | | | | | Found out about this from the Hacker News link to GraphX which was using HTTPs. @mengxr Author: Reynold Xin <rxin@apache.org> Closes #1189 from rxin/mllib-doc and squashes the following commits: 5328be0 [Reynold Xin] [SPARK-2252] Fix MathJax for HTTPs. (cherry picked from commit 420c1c3e1beea03453e0eb9dc06f226c80496d68) Signed-off-by: Reynold Xin <rxin@apache.org>
* [SPARK-2227] Support dfs command in SQL.Reynold Xin2014-06-231-8/+6
| | | | | | | | | | | | | | | Note that nothing gets printed to the console because we don't properly maintain session state right now. I will have a followup PR that fixes it. Author: Reynold Xin <rxin@apache.org> Closes #1167 from rxin/commands and squashes the following commits: 56f04f8 [Reynold Xin] [SPARK-2227] Support dfs command in SQL. (cherry picked from commit 51c8168377a89d20d0b2d7b9a28af58593a0fe0c) Signed-off-by: Reynold Xin <rxin@apache.org>
* [SPARK-1669][SQL] Made cacheTable idempotentCheng Lian2014-06-232-4/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | JIRA issue: [SPARK-1669](https://issues.apache.org/jira/browse/SPARK-1669) Caching the same table multiple times should end up with only 1 in-memory columnar representation of this table. Before: ``` scala> loadTestTable("src") ... scala> cacheTable("src") ... scala> cacheTable("src") ... scala> table("src") ... == Query Plan == InMemoryColumnarTableScan [key#2,value#3], (InMemoryRelation [key#2,value#3], false, (InMemoryColumnarTableScan [key#2,value#3], (InMemoryRelation [key#2,value#3], false, (HiveTableScan [key#2,value#3], (MetastoreRelation default, src, None), None)))) ``` After: ``` scala> loadTestTable("src") ... scala> cacheTable("src") ... scala> cacheTable("src") ... scala> table("src") ... == Query Plan == InMemoryColumnarTableScan [key#2,value#3], (InMemoryRelation [key#2,value#3], false, (HiveTableScan [key#2,value#3], (MetastoreRelation default, src, None), None)) ``` Author: Cheng Lian <lian.cs.zju@gmail.com> Closes #1183 from liancheng/spark-1669 and squashes the following commits: 68f8a20 [Cheng Lian] Removed an unused import 51bae90 [Cheng Lian] Made cacheTable idempotent (cherry picked from commit a4bc442ca2c35444de8a33376b6f27c6c2a9003d) Signed-off-by: Michael Armbrust <michael@databricks.com>
* Fix mvn detectionMatthew Farrellee2014-06-231-2/+2
| | | | | | | | | | | | | | | When mvn is not detected (not in executor's path), 'set -e' causes the detection to terminate the script before the helpful error message can be displayed. Author: Matthew Farrellee <matt@redhat.com> Closes #1181 from mattf/master-0 and squashes the following commits: 506549f [Matthew Farrellee] Fix mvn detection (cherry picked from commit 853a2b951d4c7f6c6c37f53b465b3c7b77691b7c) Signed-off-by: Patrick Wendell <pwendell@gmail.com>
* Fixed small running on YARN docs typoVlad2014-06-231-1/+1
| | | | | | | | | | | | | The backslash is needed for multiline command Author: Vlad <frolvlad@gmail.com> Closes #1158 from frol/patch-1 and squashes the following commits: e258044 [Vlad] Fixed small running on YARN docs typo (cherry picked from commit b88238faeed8ba723986cf78d64f84965facb236) Signed-off-by: Thomas Graves <tgraves@apache.org>
* SPARK-2241: quote command line args in ec2 scriptOri Kremer2014-06-221-1/+1
| | | | | | | | | | | | | To preserve quoted command line args (in case options have space in them). Author: Ori Kremer <ori.kremer@gmail.com> Closes #1169 from orikremer/quote_cmd_line_args and squashes the following commits: 67e2aa1 [Ori Kremer] quote command line args (cherry picked from commit 9fc373e3a9a8ba7bea9df0950775f48918f63a8a) Signed-off-by: Patrick Wendell <pwendell@gmail.com>
* [SPARK-1112, 2156] (1.0 edition) Use correct akka frame size and overhead ↵Patrick Wendell2014-06-227-18/+33
| | | | | | | | | | | | | | | | | amounts. SPARK-1112: This is a more conservative version of #1132 that doesn't change around the actor system initialization on the executor. Instead we just directly read the current frame size limit from the ActorSystem. SPARK-2156: This uses the same fixe as in #1132. Author: Patrick Wendell <pwendell@gmail.com> Closes #1172 from pwendell/akka-10-fix and squashes the following commits: d56297e [Patrick Wendell] Set limit in LocalBackend to preserve test expectations 9f5ed19 [Patrick Wendell] [SPARK-1112, 2156] (1.0 edition) Use correct akka frame size and overhead amounts.
* SPARK-2034. KafkaInputDStream doesn't close resources and may prevent JVM ↵Sean Owen2014-06-221-22/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | shutdown Tobias noted today on the mailing list: ======== I am trying to use Spark Streaming with Kafka, which works like a charm – except for shutdown. When I run my program with "sbt run-main", sbt will never exit, because there are two non-daemon threads left that don't die. I created a minimal example at <https://gist.github.com/tgpfeiffer/b1e765064e983449c6b6#file-kafkadoesntshutdown-scala>. It starts a StreamingContext and does nothing more than connecting to a Kafka server and printing what it receives. Using the `future Unknown macro: { ... } ` construct, I shut down the StreamingContext after some seconds and then print the difference between the threads at start time and at end time. The output can be found at <https://gist.github.com/tgpfeiffer/b1e765064e983449c6b6#file-output1>. There are a number of threads remaining that will prevent sbt from exiting. When I replace `KafkaUtils.createStream(...)` with a call that does exactly the same, except that it calls `consumerConnector.shutdown()` in `KafkaReceiver.onStop()` (which it should, IMO), the output is as shown at <https://gist.github.com/tgpfeiffer/b1e765064e983449c6b6#file-output2>. Does anyone have any idea what is going on here and why the program doesn't shut down properly? The behavior is the same with both kafka 0.8.0 and 0.8.1.1, by the way. ======== Something similar was noted last year: http://mail-archives.apache.org/mod_mbox/spark-dev/201309.mbox/%3C1380220041.2428.YahooMailNeo@web160804.mail.bf1.yahoo.com%3E KafkaInputDStream doesn't close `ConsumerConnector` in `onStop()`, and does not close the `Executor` it creates. The latter leaves non-daemon threads and can prevent the JVM from shutting down even if streaming is closed properly. Author: Sean Owen <sowen@cloudera.com> Closes #980 from srowen/SPARK-2034 and squashes the following commits: 9f31a8d [Sean Owen] Restore ClassTag to private class because MIMA flags it; is the shadowing intended? 2d579a8 [Sean Owen] Close ConsumerConnector in onStop; shutdown() the local Executor that is created so that its threads stop when done; close the Zookeeper client even on exception; fix a few typos; log exceptions that otherwise vanish (cherry picked from commit 476581e8c8ca03a5940c404fee8a06361ff94cb5) Signed-off-by: Patrick Wendell <pwendell@gmail.com>
* [SQL] Break hiveOperators.scala into multiple files.Reynold Xin2014-06-216-529/+610
| | | | | | | | | | | | | The single file was getting very long (500+ loc). Author: Reynold Xin <rxin@apache.org> Closes #1166 from rxin/hiveOperators and squashes the following commits: 5b43068 [Reynold Xin] [SQL] Break hiveOperators.scala into multiple files. (cherry picked from commit ec935abce13b60f353236566da149c0c87bb1002) Signed-off-by: Michael Armbrust <michael@databricks.com>
* [SQL] Pass SQLContext instead of SparkContext into physical operators.Reynold Xin2014-06-207-44/+51
| | | | | | | | | | | | | This makes it easier to use config options in operators. Author: Reynold Xin <rxin@apache.org> Closes #1164 from rxin/sqlcontext and squashes the following commits: 797b2fd [Reynold Xin] Pass SQLContext instead of SparkContext into physical operators. (cherry picked from commit ca5d8b5904dc6dd5b691af506d3a842e508b3673) Signed-off-by: Reynold Xin <rxin@apache.org>
* [SQL] Use hive.SessionState, not the thread local SessionStateAaron Davidson2014-06-201-1/+1
| | | | | | | | | | | | | Note that this is simply mimicing lookupRelation(). I do not have a concrete notion of why this solution is necessarily right-er than SessionState.get, but SessionState.get is returning null, which is bad. Author: Aaron Davidson <aaron@databricks.com> Closes #1148 from aarondav/createtable and squashes the following commits: 37c3e7c [Aaron Davidson] [SQL] Use hive.SessionState, not the thread local SessionState (cherry picked from commit 2044784915554a890ca6f8450d8403495b2ee4f3) Signed-off-by: Reynold Xin <rxin@apache.org>
* Move ScriptTransformation into the appropriate place.Reynold Xin2014-06-201-0/+0
| | | | | | | | | | | Author: Reynold Xin <rxin@apache.org> Closes #1162 from rxin/script and squashes the following commits: 2c836b9 [Reynold Xin] Move ScriptTransformation into the appropriate place. (cherry picked from commit d4c7572dba1be49e55ceb38713652e5bcf485be8) Signed-off-by: Reynold Xin <rxin@apache.org>
* [SPARK-2225] Turn HAVING without GROUP BY into WHERE.Reynold Xin2014-06-202-23/+11
| | | | | | | | | | | | | @willb Author: Reynold Xin <rxin@apache.org> Closes #1161 from rxin/having-filter and squashes the following commits: fa8359a [Reynold Xin] [SPARK-2225] Turn HAVING without GROUP BY into WHERE. (cherry picked from commit 0ac71d1284cd4f011d5763181cba9ecb49337b66) Signed-off-by: Reynold Xin <rxin@apache.org>
* SPARK-2180: support HAVING clauses in Hive queriesWilliam Benton2014-06-202-6/+53
| | | | | | | | | | | | | | | | | | This PR extends Spark's HiveQL support to handle HAVING clauses in aggregations. The HAVING test from the Hive compatibility suite doesn't appear to be runnable from within Spark, so I added a simple comparable test to `HiveQuerySuite`. Author: William Benton <willb@redhat.com> Closes #1136 from willb/SPARK-2180 and squashes the following commits: 3bbaf26 [William Benton] Added casts to HAVING expressions 83f1340 [William Benton] scalastyle fixes 18387f1 [William Benton] Add test for HAVING without GROUP BY b880bef [William Benton] Added semantic error for HAVING without GROUP BY 942428e [William Benton] Added test coverage for SPARK-2180. 56084cc [William Benton] Add support for HAVING clauses in Hive queries. (cherry picked from commit 171ebb3a824a577d69443ec68a3543b27914cf6d) Signed-off-by: Reynold Xin <rxin@apache.org>