aboutsummaryrefslogtreecommitdiff
Commit message (Collapse)AuthorAgeFilesLines
...
* Whitelist Hive TestsMichael Armbrust2014-05-0387-4/+328
| | | | | | | | | | | | | This is ready when Jenkins is. Author: Michael Armbrust <michael@databricks.com> Closes #596 from marmbrus/moreTests and squashes the following commits: 85be703 [Michael Armbrust] Blacklist MR required tests. 35bc311 [Michael Armbrust] Add hive golden answers. ede98fd [Michael Armbrust] More hive gitignore da096ea [Michael Armbrust] update whitelist
* [SQL] Better logging when applying rules.Michael Armbrust2014-05-031-4/+24
| | | | | | | | | Author: Michael Armbrust <michael@databricks.com> Closes #616 from marmbrus/ruleLogging and squashes the following commits: 39c09fe [Michael Armbrust] Fix off by one error. 5af3537 [Michael Armbrust] Better logging when applying rules.
* EC2 configurable workersAllan Douglas R. de Oliveira2014-05-032-2/+12
| | | | | | | | | | | | | | Added option to configure number of worker instances and to set SPARK_MASTER_OPTS Depends on: https://github.com/mesos/spark-ec2/pull/46 Author: Allan Douglas R. de Oliveira <allan@chaordicsystems.com> Closes #612 from douglaz/ec2_configurable_workers and squashes the following commits: d6c5d65 [Allan Douglas R. de Oliveira] Added master opts parameter 6c34671 [Allan Douglas R. de Oliveira] Use number of worker instances as string on template ba528b9 [Allan Douglas R. de Oliveira] Added SPARK_WORKER_INSTANCES parameter
* SPARK-1689 AppClient should indicate app is dead() when removedAaron Davidson2014-05-034-14/+12
| | | | | | | | | | | | Previously, we indicated disconnected(), which keeps the application in a limbo state where it has no executors but thinks it will get them soon. This is a bug fix that hopefully can be included in 1.0. Author: Aaron Davidson <aaron@databricks.com> Closes #605 from aarondav/appremoved and squashes the following commits: bea02a2 [Aaron Davidson] SPARK-1689 AppClient should indicate app is dead() when removed
* [Bugfix] Tachyon file cleanup logical errorCheng Lian2014-05-031-5/+5
| | | | | | | | | | Should lookup `shutdownDeleteTachyonPaths` instead of `shutdownDeletePaths`. Together with a minor style clean up: `find {...}.isDefined` to `exists {...}`. Author: Cheng Lian <lian.cs.zju@gmail.com> Closes #575 from liancheng/tachyonFix and squashes the following commits: deb8f31 [Cheng Lian] Fixed logical error in when cleanup Tachyon files and minor style cleanup
* SPARK-1663. Corrections for several compile errors in streaming code ↵Sean Owen2014-05-031-26/+36
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | examples, and updates to follow API changes I gave the Streaming code examples, both Scala and Java, a test run today. I turned up a number of small errors, mostly compile errors in the Java examples. There were a few typos in the Scala too. I also took the liberty of adding things like imports, since in several cases they are not obvious. Feel free to push back on some changes. There's one thing I haven't quite addressed in the changes. `JavaPairDStream` uses the Java API version of `Function2` in almost all cases, as `JFunction2`. However it uses `scala.Function2` in: ``` def reduceByKeyAndWindow(reduceFunc: Function2[V, V, V], windowDuration: Duration) :JavaPairDStream[K, V] = { dstream.reduceByKeyAndWindow(reduceFunc, windowDuration) } ``` Is that a typo? Also, in Scala, I could not get this to compile: ``` val windowedWordCounts = pairs.reduceByKeyAndWindow(_ + _, Seconds(30), Seconds(10)) error: missing parameter type for expanded function ((x$1, x$2) => x$1.$plus(x$2)) ``` You can see my fix below but am I missing something? Otherwise I can say these all worked for me! Author: Sean Owen <sowen@cloudera.com> Closes #589 from srowen/SPARK-1663 and squashes the following commits: 65a906b [Sean Owen] Corrections for several compile errors in streaming code examples, and updates to follow API changes
* [WIP] SPARK-1676: Cache Hadoop UGIs by default to prevent FileSystem leakThomas Graves2014-05-038-46/+69
| | | | | | | | | | | | | | | | | | | Move the doAs in Executor higher up so that we only have 1 ugi and aren't leaking filesystems. Fix spark on yarn to work when the cluster is running as user "yarn" but the clients are launched as the user and want to read/write to hdfs as the user. Note this hasn't been fully tested yet. Need to test in standalone mode. Putting this up for people to look at and possibly test. I don't have access to a mesos cluster. This is alternative to https://github.com/apache/spark/pull/607 Author: Thomas Graves <tgraves@apache.org> Closes #621 from tgravescs/SPARK-1676 and squashes the following commits: 244d55a [Thomas Graves] fix line length 44163d4 [Thomas Graves] Rework 9398853 [Thomas Graves] change to have doAs in executor higher up.
* Update SchemaRDD.scalaArcherShao2014-05-031-4/+4
| | | | | | | | | | Modify spelling errors Author: ArcherShao <ArcherShao@users.noreply.github.com> Closes #619 from ArcherShao/patch-1 and squashes the following commits: 2957195 [ArcherShao] Update SchemaRDD.scala
* SPARK-1700: Close socket file descriptors on task completionAaron Davidson2014-05-021-1/+10
| | | | | | | | | | | | This will ensure that sockets do not build up over the course of a job, and that cancellation successfully cleans up sockets. Tested in standalone mode. More file descriptors spawn than expected (around 1000ish rather than the expected 8ish) but they do not pile up between runs, or as high as before (where they went up to around 5k). Author: Aaron Davidson <aaron@databricks.com> Closes #623 from aarondav/pyspark2 and squashes the following commits: 0ca13bb [Aaron Davidson] SPARK-1700: Close socket file descriptors on task completion
* SPARK-1492. Update Spark YARN docs to use spark-submitSandy Ryza2014-05-022-94/+38
| | | | | | | | | | Author: Sandy Ryza <sandy@cloudera.com> Closes #601 from sryza/sandy-spark-1492 and squashes the following commits: 5df1634 [Sandy Ryza] Address additional comments from Patrick. be46d1f [Sandy Ryza] Address feedback from Marcelo and Patrick 867a3ea [Sandy Ryza] SPARK-1492. Update Spark YARN docs to use spark-submit
* delete no use varwangfei2014-05-021-2/+0
| | | | | | | | Author: wangfei <wangfei_hello@126.com> Closes #613 from scwf/masterIndex and squashes the following commits: 1463056 [wangfei] delete no use var: masterIndex
* SPARK-1695: java8-tests compiler error: package com.google.common.co...witgo2014-05-021-1/+1
| | | | | | | | | | ...llections does not exist Author: witgo <witgo@qq.com> Closes #611 from witgo/SPARK-1695 and squashes the following commits: d77a887 [witgo] Fix SPARK-1695: java8-tests compiler error: package com.google.common.collections does not exist
* Add tests for FileLogger, EventLoggingListener, and ReplayListenerBusAndrew Or2014-05-018-36/+791
| | | | | | | | | | | | | | | | | | | | | | | | Modifications to Spark core are limited to exposing functionality to test files + minor style fixes. (728 / 769 lines are from tests) Author: Andrew Or <andrewor14@gmail.com> Closes #591 from andrewor14/event-log-tests and squashes the following commits: 2883837 [Andrew Or] Merge branch 'master' of github.com:apache/spark into event-log-tests c3afcea [Andrew Or] Compromise 2d5daf8 [Andrew Or] Use temp directory provided by the OS rather than /tmp 2b52151 [Andrew Or] Remove unnecessary file delete + add a comment 62010fd [Andrew Or] More cleanup (renaming variables, updating comments etc) ad2beff [Andrew Or] Clean up EventLoggingListenerSuite + modify a few comments 862e752 [Andrew Or] Merge branch 'master' of github.com:apache/spark into event-log-tests e0ba2f8 [Andrew Or] Fix test failures caused by race condition in processing/mutating events b990453 [Andrew Or] ReplayListenerBus suite - tests do not all pass yet ab66a84 [Andrew Or] Tests for FileLogger + delete file after tests 187bb25 [Andrew Or] Formatting and renaming variables 769336f [Andrew Or] Merge branch 'master' of github.com:apache/spark into event-log-tests 5d38ffe [Andrew Or] Clean up EventLoggingListenerSuite + add comments e12f4b1 [Andrew Or] Preliminary tests for EventLoggingListener (need major cleanup)
* SPARK-1659: improvements spark-submit usagewitgo2014-05-011-2/+0
| | | | | | | | Author: witgo <witgo@qq.com> Closes #581 from witgo/SPARK-1659 and squashes the following commits: 0b2cf98 [witgo] Delete spark-submit obsolete usage: "--arg ARG"
* fix the spelling mistakewangfei2014-05-011-1/+1
| | | | | | | | Author: wangfei <wangfei_hello@126.com> Closes #614 from scwf/pxcw and squashes the following commits: d1016ba [wangfei] fix spelling mistake
* [SQL] SPARK-1661 - Fix regex_serde testMichael Armbrust2014-05-0115-1/+97
| | | | | | | | | | | The JIRA in question is actually reporting a bug with Shark, but I wanted to make sure Spark SQL did not have similar problems. This fixes a bug in our parsing code that was preventing the test from executing, but it looks like the RegexSerDe is working in Spark SQL. Author: Michael Armbrust <michael@databricks.com> Closes #595 from marmbrus/fixRegexSerdeTest and squashes the following commits: a4dc612 [Michael Armbrust] Add files created by hive to gitignore. efa6402 [Michael Armbrust] Fix Hive serde_regex test.
* SPARK-1691: Support quoted arguments inside of spark-submit.Patrick Wendell2014-05-011-2/+2
| | | | | | | | | | This is a fairly straightforward fix. The bug was reported by @vanzin and the fix was proposed by @deanwampler and myself. Please take a look! Author: Patrick Wendell <pwendell@gmail.com> Closes #609 from pwendell/quotes and squashes the following commits: 8bed767 [Patrick Wendell] SPARK-1691: Support quoted arguments inside of spark-submit.
* Fix SPARK-1629: Spark should inline use of commons-lang `SystemUtils.IS_...witgo2014-04-301-4/+9
| | | | | | | | | | | | ...OS_WINDOWS` Author: witgo <witgo@qq.com> Closes #569 from witgo/SPARK-1629 and squashes the following commits: 31520eb [witgo] Merge branch 'master' of https://github.com/apache/spark into SPARK-1629 fcaafd7 [witgo] merge mastet 49e248e [witgo] Fix SPARK-1629: Spark should inline use of commons-lang `SystemUtils.IS_OS_WINDOWS`
* SPARK-1004. PySpark on YARNSandy Ryza2014-04-2911-19/+85
| | | | | | | | | | | | This reopens https://github.com/apache/incubator-spark/pull/640 against the new repo Author: Sandy Ryza <sandy@cloudera.com> Closes #30 from sryza/sandy-spark-1004 and squashes the following commits: 89889d4 [Sandy Ryza] Move unzipping py4j to the generate-resources phase so that it gets included in the jar the first time 5165a02 [Sandy Ryza] Fix docs fd0df79 [Sandy Ryza] PySpark on YARN
* Handle the vals that never usedWangTao2014-04-297-8/+2
| | | | | | | | | | | | In XORShiftRandom.scala, use val "million" instead of constant "1e6.toInt". Delete vals that never used in other files. Author: WangTao <barneystinson@aliyun.com> Closes #565 from WangTaoTheTonic/master and squashes the following commits: 17cacfc [WangTao] Handle the unused assignment, method parameters and symbol inspected by Intellij IDEA 37b4090 [WangTao] Handle the vals that never used
* Args for worker rather than masterChen Chao2014-04-291-1/+1
| | | | | | | | | | Args for worker rather than master Author: Chen Chao <crazyjvm@gmail.com> Closes #587 from CrazyJvm/patch-6 and squashes the following commits: b54b89f [Chen Chao] Args for worker rather than master
* [SPARK-1646] Micro-optimisation of ALSTor Myklebust2014-04-291-5/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This change replaces some Scala `for` and `foreach` constructs with `while` constructs. There may be a slight performance gain on the order of 1-2% when training an ALS model. I trained an ALS model on the Movielens 10M-rating dataset repeatedly both with and without these changes. All 7 runs in both columns were done in a Scala `for` loop like this: for (iter <- 0 to 10) { val before = System.currentTimeMillis() val model = ALS.train(rats, 20, 10) val after = System.currentTimeMillis() println("%d ms".format(after-before)) println("rmse %g".format(computeRmse(model, rats, numRatings))) } The timings were done on a multiuser machine, and I stopped one set of timings after 7 had been completed. It would be nice if somebody with dedicated hardware could confirm my timings. After Before 121980 ms 122041 ms 117069 ms 117127 ms 115332 ms 117523 ms 115381 ms 117402 ms 114635 ms 116550 ms 114140 ms 114076 ms 112993 ms 117200 ms Ratios are about 1.0005, 1.0005, 1.019, 1.0175, 1.01671, 0.99944, and 1.03723. I therefore suspect these changes make for a slight performance gain on the order of 1-2%. Author: Tor Myklebust <tmyklebu@gmail.com> Closes #568 from tmyklebu/alsopt and squashes the following commits: 5ded80f [Tor Myklebust] Fix style. 79595ff [Tor Myklebust] Fix style error. 4ef0313 [Tor Myklebust] Merge branch 'master' of github.com:apache/spark into alsopt 114fb74 [Tor Myklebust] Turn some 'for' loops into 'while' loops. dcf583a [Tor Myklebust] Remove the partitioner member variable; instead, thread that needle everywhere it needs to go. 23d6f91 [Tor Myklebust] Stop making the partitioner configurable. 495784f [Tor Myklebust] Merge branch 'master' of https://github.com/apache/spark 674933a [Tor Myklebust] Fix style. 40edc23 [Tor Myklebust] Fix missing space. f841345 [Tor Myklebust] Fix daft bug creating 'pairs', also for -> foreach. 5ec9e6c [Tor Myklebust] Clean a couple of things up using 'map'. 36a0f43 [Tor Myklebust] Make the partitioner private. d872b09 [Tor Myklebust] Add negative id ALS test. df27697 [Tor Myklebust] Support custom partitioners. Currently we use the same partitioner for users and products. c90b6d8 [Tor Myklebust] Scramble user and product ids before bucketing. c774d7d [Tor Myklebust] Make the partitioner a member variable and use it instead of modding directly.
* [SPARK-1674] fix interrupted system call error in pyspark's RDD.pipeXiangrui Meng2014-04-291-3/+3
| | | | | | | | | | `RDD.pipe`'s doctest throws interrupted system call exception on Mac. It can be fixed by wrapping `pipe.stdout.readline` in an iterator. Author: Xiangrui Meng <meng@databricks.com> Closes #594 from mengxr/pyspark-pipe and squashes the following commits: cc32ac9 [Xiangrui Meng] fix interrupted system call error in pyspark's RDD.pipe
* SPARK-1588. Restore SPARK_YARN_USER_ENV and SPARK_JAVA_OPTS for YARN.Sandy Ryza2014-04-292-5/+15
| | | | | | | | | Author: Sandy Ryza <sandy@cloudera.com> Closes #586 from sryza/sandy-spark-1588 and squashes the following commits: 35eb38e [Sandy Ryza] Scalify b361684 [Sandy Ryza] SPARK-1588. Restore SPARK_YARN_USER_ENV and SPARK_JAVA_OPTS for YARN.
* SPARK-1509: add zipWithIndex zipWithUniqueId methods to java apiwitgo2014-04-292-8/+45
| | | | | | | | | | | | | | | | Author: witgo <witgo@qq.com> Closes #423 from witgo/zipWithIndex and squashes the following commits: 039ec04 [witgo] Merge branch 'master' of https://github.com/apache/spark into zipWithIndex 24d74c9 [witgo] review commit 763a5e4 [witgo] Merge branch 'master' of https://github.com/apache/spark into zipWithIndex 59747d1 [witgo] review commit 7bf4d06 [witgo] Merge branch 'master' of https://github.com/apache/spark into zipWithIndex daa8f84 [witgo] review commit 4070613 [witgo] Merge branch 'master' of https://github.com/apache/spark into zipWithIndex 18e6c97 [witgo] java api zipWithIndex test 11e2e7f [witgo] add zipWithIndex zipWithUniqueId methods to java api
* SPARK-1557 Set permissions on event log files/directoriesThomas Graves2014-04-293-6/+24
| | | | | | | | | | | | | | This adds minimal setting of event log directory/files permissions. To have a secure environment the user must manually create the top level event log directory and set permissions up. We can add logic to do that automatically later if we want. Author: Thomas Graves <tgraves@apache.org> Closes #538 from tgravescs/SPARK-1557 and squashes the following commits: e471d8e [Thomas Graves] rework d8b6620 [Thomas Graves] update use of octal 3ca9b79 [Thomas Graves] Updated based on comments 5a09709 [Thomas Graves] add in missing import 3150ed6 [Thomas Graves] SPARK-1557 Set permissions on event log files/directories
* HOTFIX: minor change to release scriptPatrick Wendell2014-04-291-1/+1
|
* HOTFIX: minor change to release scriptPatrick Wendell2014-04-291-2/+4
|
* [SPARK-1636][MLLIB] Move main methods to examplesXiangrui Meng2014-04-2919-321/+795
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * `NaiveBayes` -> `SparseNaiveBayes` * `KMeans` -> `DenseKMeans` * `SVMWithSGD` and `LogisticRegerssionWithSGD` -> `BinaryClassification` * `ALS` -> `MovieLensALS` * `LinearRegressionWithSGD`, `LassoWithSGD`, and `RidgeRegressionWithSGD` -> `LinearRegression` * `DecisionTree` -> `DecisionTreeRunner` `scopt` is used for parsing command-line parameters. `scopt` has MIT license and it only depends on `scala-library`. Example help message: ~~~ BinaryClassification: an example app for binary classification. Usage: BinaryClassification [options] <input> --numIterations <value> number of iterations --stepSize <value> initial step size, default: 1.0 --algorithm <value> algorithm (SVM,LR), default: LR --regType <value> regularization type (L1,L2), default: L2 --regParam <value> regularization parameter, default: 0.1 <input> input paths to labeled examples in LIBSVM format ~~~ Author: Xiangrui Meng <meng@databricks.com> Closes #584 from mengxr/mllib-main and squashes the following commits: 7b58c60 [Xiangrui Meng] minor 6e35d7e [Xiangrui Meng] make imports explicit and fix code style c6178c9 [Xiangrui Meng] update TS PCA/SVD to use new spark-submit 6acff75 [Xiangrui Meng] use scopt for DecisionTreeRunner be86069 [Xiangrui Meng] use main instead of extending App b3edf68 [Xiangrui Meng] move DecisionTree's main method to examples 8bfaa5a [Xiangrui Meng] change NaiveBayesParams to Params fe23dcb [Xiangrui Meng] remove main from KMeans and add DenseKMeans as an example 67f4448 [Xiangrui Meng] remove main methods from linear regression algorithms and add LinearRegression example b066bbc [Xiangrui Meng] remove main from ALS and add MovieLensALS example b040f3b [Xiangrui Meng] change BinaryClassificationParams to Params 577945b [Xiangrui Meng] remove unused imports from NB 3d299bc [Xiangrui Meng] remove main from LR/SVM and add an example app for binary classification f70878e [Xiangrui Meng] remove main from NaiveBayes and add an example NaiveBayes app 01ec2cd [Xiangrui Meng] Merge branch 'master' into mllib-main 9420692 [Xiangrui Meng] add scopt to examples dependencies
* Minor fix to python table caching API.Michael Armbrust2014-04-291-2/+2
| | | | | | | | Author: Michael Armbrust <michael@databricks.com> Closes #585 from marmbrus/pythonCacheTable and squashes the following commits: 7ec1f91 [Michael Armbrust] Minor fix to python table caching API.
* HOTFIX: Bug in release scriptPatrick Wendell2014-04-291-0/+1
|
* Improved build configurationwitgo2014-04-2823-466/+295
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | 1, Fix SPARK-1441: compile spark core error with hadoop 0.23.x 2, Fix SPARK-1491: maven hadoop-provided profile fails to build 3, Fix org.scala-lang: * ,org.apache.avro:* inconsistent versions dependency 4, A modified on the sql/catalyst/pom.xml,sql/hive/pom.xml,sql/core/pom.xml (Four spaces formatted into two spaces) Author: witgo <witgo@qq.com> Closes #480 from witgo/format_pom and squashes the following commits: 03f652f [witgo] review commit b452680 [witgo] Merge branch 'master' of https://github.com/apache/spark into format_pom bee920d [witgo] revert fix SPARK-1629: Spark Core missing commons-lang dependence 7382a07 [witgo] Merge branch 'master' of https://github.com/apache/spark into format_pom 6902c91 [witgo] fix SPARK-1629: Spark Core missing commons-lang dependence 0da4bc3 [witgo] merge master d1718ed [witgo] Merge branch 'master' of https://github.com/apache/spark into format_pom e345919 [witgo] add avro dependency to yarn-alpha 77fad08 [witgo] Merge branch 'master' of https://github.com/apache/spark into format_pom 62d0862 [witgo] Fix org.scala-lang: * inconsistent versions dependency 1a162d7 [witgo] Merge branch 'master' of https://github.com/apache/spark into format_pom 934f24d [witgo] review commit cf46edc [witgo] exclude jruby 06e7328 [witgo] Merge branch 'SparkBuild' into format_pom 99464d2 [witgo] fix maven hadoop-provided profile fails to build 0c6c1fc [witgo] Fix compile spark core error with hadoop 0.23.x 6851bec [witgo] Maintain consistent SparkBuild.scala, pom.xml
* SPARK-1652: Remove incorrect deprecation warning in spark-submitPatrick Wendell2014-04-284-6/+14
| | | | | | | | | | | | | | | This is a straightforward fix. Author: Patrick Wendell <pwendell@gmail.com> This patch had conflicts when merged, resolved by Committer: Patrick Wendell <pwendell@gmail.com> Closes #578 from pwendell/spark-submit-yarn and squashes the following commits: 96027c7 [Patrick Wendell] Test fixes b5be173 [Patrick Wendell] Review feedback 4ac9cac [Patrick Wendell] SPARK-1652: spark-submit for yarn prints warnings even though calling as expected
* SPARK-1654 and SPARK-1653: Fixes in spark-submit.Patrick Wendell2014-04-285-15/+17
| | | | | | | | | | | | | | | Deals with two issues: 1. Spark shell didn't correctly pass quoted arguments to spark-submit. ```./bin/spark-shell --driver-java-options "-Dfoo=f -Dbar=b"``` 2. Spark submit used deprecated environment variables (SPARK_CLASSPATH) which triggered warnings. Now we use new, more narrowly scoped, variables. Author: Patrick Wendell <pwendell@gmail.com> Closes #576 from pwendell/spark-submit and squashes the following commits: 67004c9 [Patrick Wendell] SPARK-1654 and SPARK-1653: Fixes in spark-submit.
* SPARK-1652: Spark submit should fail gracefully if YARN not enabledPatrick Wendell2014-04-282-0/+16
| | | | | | | | | Author: Patrick Wendell <pwendell@gmail.com> Closes #579 from pwendell/spark-submit-yarn-2 and squashes the following commits: 05e1b11 [Patrick Wendell] Small fix d2a40ad [Patrick Wendell] SPARK-1652: Spark submit should fail gracefully if YARN support not enabled
* Changes to dev release scriptPatrick Wendell2014-04-281-27/+32
|
* [SPARK-1633][Streaming] Java API unit test and example for custom streaming ↵Tathagata Das2014-04-2810-35/+476
| | | | | | | | | | | | | | | | | | | | | | receiver in Java Author: Tathagata Das <tathagata.das1565@gmail.com> Closes #558 from tdas/more-fixes and squashes the following commits: c0c84e6 [Tathagata Das] Removing extra println() d8a8cf4 [Tathagata Das] More tweaks to make unit test work in Jenkins. b7caa98 [Tathagata Das] More tweaks. d337367 [Tathagata Das] More tweaks 22d6f2d [Tathagata Das] Merge remote-tracking branch 'apache/master' into more-fixes 40a961b [Tathagata Das] Modified java test to reduce flakiness. 9410ca6 [Tathagata Das] Merge remote-tracking branch 'apache/master' into more-fixes 86d9147 [Tathagata Das] scala style fix 2f3d7b1 [Tathagata Das] Added Scala custom receiver example. d677611 [Tathagata Das] Merge remote-tracking branch 'apache/master' into more-fixes bec3fc2 [Tathagata Das] Added license. 51d6514 [Tathagata Das] Fixed docs on receiver. 81aafa0 [Tathagata Das] Added Java test for Receiver API, and added JavaCustomReceiver example.
* [SQL]Append some missing types for HiveUDFCheng Hao2014-04-271-10/+48
| | | | | | | | | | Add the missing types Author: Cheng Hao <hao.cheng@intel.com> Closes #459 from chenghao-intel/missing_types and squashes the following commits: 21cba2e [Cheng Hao] Append some missing types for HiveUDF
* Update the import package name for TestHive in sbt shellCheng Hao2014-04-271-1/+1
| | | | | | | | | | sbt/sbt hive/console will fail as TestHive changed its package from "org.apache.spark.sql.hive" to "org.apache.spark.sql.hive.test". Author: Cheng Hao <hao.cheng@intel.com> Closes #574 from chenghao-intel/hive_console and squashes the following commits: de14035 [Cheng Hao] Update the import package name for TestHive in sbt shell
* Fix SPARK-1609: Executor fails to start when Command.extraJavaOptions ↵witgo2014-04-271-4/+5
| | | | | | | | | | | | | | | | | | contains multiple Java options Author: witgo <witgo@qq.com> Closes #547 from witgo/SPARK-1609 and squashes the following commits: deb6a4c [witgo] review commit 91da0bb [witgo] Merge branch 'master' of https://github.com/apache/spark into SPARK-1609 0640852 [witgo] review commit 8f90b22 [witgo] Merge branch 'master' of https://github.com/apache/spark into SPARK-1609 bcf36cb [witgo] Merge branch 'master' of https://github.com/apache/spark into SPARK-1609 1185605 [witgo] fix extraJavaOptions split f7c0ab7 [witgo] bugfix 86fc4bb [witgo] bugfix 8a265b7 [witgo] Fix SPARK-1609: Executor fails to start when use spark-submit
* SPARK-1145: Memory mapping with many small blocks can cause JVM allocation ↵Patrick Wendell2014-04-276-20/+91
| | | | | | | | | | | | | | | | | | failures This includes some minor code clean-up as well. The main change is that small files are not memory mapped. There is a nicer way to write that code block using Scala's `Try` but to make it easy to back port and as simple as possible, I opted for the more explicit but less pretty format. Author: Patrick Wendell <pwendell@gmail.com> Closes #43 from pwendell/block-iter-logging and squashes the following commits: 1cff512 [Patrick Wendell] Small issue from merge. 49f6c269 [Patrick Wendell] Merge remote-tracking branch 'apache/master' into block-iter-logging 4943351 [Patrick Wendell] Added a test and feedback on mateis review a637a18 [Patrick Wendell] Review feedback and adding rewind() when reading byte buffers. b76b95f [Patrick Wendell] Review feedback 4e1514e [Patrick Wendell] Don't memory map for small files d238b88 [Patrick Wendell] Some logging and clean-up
* HOTFIX: Minor patch to merge script.Patrick Wendell2014-04-271-1/+1
|
* SPARK-1651: Delete existing deployment directoryRahul Singhal2014-04-271-0/+1
| | | | | | | | | | | Small bug fix to make sure the "spark contents" are copied to the deployment directory correctly. Author: Rahul Singhal <rahul.singhal@guavus.com> Closes #573 from rahulsinghaliitd/SPARK-1651 and squashes the following commits: 402c999 [Rahul Singhal] SPARK-1651: Delete existing deployment directory
* SPARK-1648 Support closing JIRA's as part of merge script.Patrick Wendell2014-04-271-9/+105
| | | | | | | | | | | | | | | | | Adds an automated hook in the merge script that can close the JIRA, set the fix versions, and leave a comment on the JIRA indicating the PR in which it was resolved. This ensures that (a) we always close JIRA's when issues are merged and (b) there is a link to the pull request in every JIRA. This requires a python library called `jira-client`. We could look at embedding this library in our project, but it seemed simple enough to just gracefully disable this feature if it is not installed. It can be installed with `pip install jira-client`. Author: Patrick Wendell <pwendell@gmail.com> Closes #570 from pwendell/jira-pr-merge and squashes the following commits: 3022b96 [Patrick Wendell] SPARK-1648 Support closing JIRA's as part of merge script.
* SPARK-1650: Correctly identify maven project versionRahul Singhal2014-04-271-1/+1
| | | | | | | | | | | Better account for various side-effect outputs while executing "mvn help:evaluate -Dexpression=project.version" Author: Rahul Singhal <rahul.singhal@guavus.com> Closes #572 from rahulsinghaliitd/SPARK-1650 and squashes the following commits: fd6a611 [Rahul Singhal] SPARK-1650: Correctly identify maven project version
* SPARK-1606: Infer user application arguments instead of requiring --arg.Patrick Wendell2014-04-266-162/+181
| | | | | | | | | | | | | | | This modifies spark-submit to do something more like the Hadoop `jar` command. Now we have the following syntax: ./bin/spark-submit [options] user.jar [user options] Author: Patrick Wendell <pwendell@gmail.com> Closes #563 from pwendell/spark-submit and squashes the following commits: 32241fc [Patrick Wendell] Review feedback 3adfb69 [Patrick Wendell] Small fix bc48139 [Patrick Wendell] SPARK-1606: Infer user application arguments instead of requiring --arg.
* SPARK-1467: Make StorageLevel.apply() factory methods Developer APIsSandeep2014-04-261-4/+22
| | | | | | | | | | We may want to evolve these in the future to add things like SSDs, so let's mark them as experimental for now. Long-term the right solution might be some kind of builder. The stable API should be the existing StorageLevel constants. Author: Sandeep <sandeep@techaddict.me> Closes #551 from techaddict/SPARK-1467 and squashes the following commits: 6bdda24 [Sandeep] SPARK-1467: Make StorageLevel.apply() factory methods as Developer Api's We may want to evolve these in the future to add things like SSDs, so let's mark them as experimental for now. Long-term the right solution might be some kind of builder. The stable API should be the existing StorageLevel constants.
* [SPARK-1608] [SQL] Fix Cast.nullable when cast from StringType to ↵Takuya UESHIN2014-04-262-1/+17
| | | | | | | | | | | | | | | NumericType/TimestampType. `Cast.nullable` should be `true` when cast from `StringType` to `NumericType` or `TimestampType`. Because if `StringType` expression has an illegal number string or illegal timestamp string, the casted value becomes `null`. Author: Takuya UESHIN <ueshin@happy-camper.st> Closes #532 from ueshin/issues/SPARK-1608 and squashes the following commits: 065d37c [Takuya UESHIN] Add tests to check nullabilities of cast expressions. f278ed7 [Takuya UESHIN] Revert test to keep it readable and concise. 9fc9380 [Takuya UESHIN] Fix Cast.nullable when cast from StringType to NumericType/TimestampType.
* add note of how to support table with more than 22 fieldswangfei2014-04-261-0/+2
| | | | | | | | | | | Author: wangfei <wangfei1@huawei.com> Closes #564 from scwf/patch-6 and squashes the following commits: a331876 [wangfei] Update sql-programming-guide.md 685135b [wangfei] Update sql-programming-guide.md 10b3dc0 [wangfei] Update sql-programming-guide.md 1c40480 [wangfei] add note of how to support table with 22 fields
* [Spark-1382] Fix NPE in DStream.slice (updated version of #365)zsxwing2014-04-252-11/+23
| | | | | | | | | | | | | | @zsxwing I cherry-picked your changes and merged the master. #365 had some conflicts once again! Author: zsxwing <zsxwing@gmail.com> Author: Tathagata Das <tathagata.das1565@gmail.com> Closes #562 from tdas/SPARK-1382 and squashes the following commits: e2962c1 [Tathagata Das] Merge remote-tracking branch 'apache-github/master' into SPARK-1382 20968d9 [zsxwing] Replace Exception with SparkException in DStream e476651 [zsxwing] Merge remote-tracking branch 'origin/master' into SPARK-1382 35ba56a [zsxwing] SPARK-1382: Fix NPE in DStream.slice