aboutsummaryrefslogtreecommitdiff
Commit message (Collapse)AuthorAgeFilesLines
* Handle the vals that never usedWangTao2014-04-297-8/+2
| | | | | | | | | | | | In XORShiftRandom.scala, use val "million" instead of constant "1e6.toInt". Delete vals that never used in other files. Author: WangTao <barneystinson@aliyun.com> Closes #565 from WangTaoTheTonic/master and squashes the following commits: 17cacfc [WangTao] Handle the unused assignment, method parameters and symbol inspected by Intellij IDEA 37b4090 [WangTao] Handle the vals that never used
* Args for worker rather than masterChen Chao2014-04-291-1/+1
| | | | | | | | | | Args for worker rather than master Author: Chen Chao <crazyjvm@gmail.com> Closes #587 from CrazyJvm/patch-6 and squashes the following commits: b54b89f [Chen Chao] Args for worker rather than master
* [SPARK-1646] Micro-optimisation of ALSTor Myklebust2014-04-291-5/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This change replaces some Scala `for` and `foreach` constructs with `while` constructs. There may be a slight performance gain on the order of 1-2% when training an ALS model. I trained an ALS model on the Movielens 10M-rating dataset repeatedly both with and without these changes. All 7 runs in both columns were done in a Scala `for` loop like this: for (iter <- 0 to 10) { val before = System.currentTimeMillis() val model = ALS.train(rats, 20, 10) val after = System.currentTimeMillis() println("%d ms".format(after-before)) println("rmse %g".format(computeRmse(model, rats, numRatings))) } The timings were done on a multiuser machine, and I stopped one set of timings after 7 had been completed. It would be nice if somebody with dedicated hardware could confirm my timings. After Before 121980 ms 122041 ms 117069 ms 117127 ms 115332 ms 117523 ms 115381 ms 117402 ms 114635 ms 116550 ms 114140 ms 114076 ms 112993 ms 117200 ms Ratios are about 1.0005, 1.0005, 1.019, 1.0175, 1.01671, 0.99944, and 1.03723. I therefore suspect these changes make for a slight performance gain on the order of 1-2%. Author: Tor Myklebust <tmyklebu@gmail.com> Closes #568 from tmyklebu/alsopt and squashes the following commits: 5ded80f [Tor Myklebust] Fix style. 79595ff [Tor Myklebust] Fix style error. 4ef0313 [Tor Myklebust] Merge branch 'master' of github.com:apache/spark into alsopt 114fb74 [Tor Myklebust] Turn some 'for' loops into 'while' loops. dcf583a [Tor Myklebust] Remove the partitioner member variable; instead, thread that needle everywhere it needs to go. 23d6f91 [Tor Myklebust] Stop making the partitioner configurable. 495784f [Tor Myklebust] Merge branch 'master' of https://github.com/apache/spark 674933a [Tor Myklebust] Fix style. 40edc23 [Tor Myklebust] Fix missing space. f841345 [Tor Myklebust] Fix daft bug creating 'pairs', also for -> foreach. 5ec9e6c [Tor Myklebust] Clean a couple of things up using 'map'. 36a0f43 [Tor Myklebust] Make the partitioner private. d872b09 [Tor Myklebust] Add negative id ALS test. df27697 [Tor Myklebust] Support custom partitioners. Currently we use the same partitioner for users and products. c90b6d8 [Tor Myklebust] Scramble user and product ids before bucketing. c774d7d [Tor Myklebust] Make the partitioner a member variable and use it instead of modding directly.
* [SPARK-1674] fix interrupted system call error in pyspark's RDD.pipeXiangrui Meng2014-04-291-3/+3
| | | | | | | | | | `RDD.pipe`'s doctest throws interrupted system call exception on Mac. It can be fixed by wrapping `pipe.stdout.readline` in an iterator. Author: Xiangrui Meng <meng@databricks.com> Closes #594 from mengxr/pyspark-pipe and squashes the following commits: cc32ac9 [Xiangrui Meng] fix interrupted system call error in pyspark's RDD.pipe
* SPARK-1588. Restore SPARK_YARN_USER_ENV and SPARK_JAVA_OPTS for YARN.Sandy Ryza2014-04-292-5/+15
| | | | | | | | | Author: Sandy Ryza <sandy@cloudera.com> Closes #586 from sryza/sandy-spark-1588 and squashes the following commits: 35eb38e [Sandy Ryza] Scalify b361684 [Sandy Ryza] SPARK-1588. Restore SPARK_YARN_USER_ENV and SPARK_JAVA_OPTS for YARN.
* SPARK-1509: add zipWithIndex zipWithUniqueId methods to java apiwitgo2014-04-292-8/+45
| | | | | | | | | | | | | | | | Author: witgo <witgo@qq.com> Closes #423 from witgo/zipWithIndex and squashes the following commits: 039ec04 [witgo] Merge branch 'master' of https://github.com/apache/spark into zipWithIndex 24d74c9 [witgo] review commit 763a5e4 [witgo] Merge branch 'master' of https://github.com/apache/spark into zipWithIndex 59747d1 [witgo] review commit 7bf4d06 [witgo] Merge branch 'master' of https://github.com/apache/spark into zipWithIndex daa8f84 [witgo] review commit 4070613 [witgo] Merge branch 'master' of https://github.com/apache/spark into zipWithIndex 18e6c97 [witgo] java api zipWithIndex test 11e2e7f [witgo] add zipWithIndex zipWithUniqueId methods to java api
* SPARK-1557 Set permissions on event log files/directoriesThomas Graves2014-04-293-6/+24
| | | | | | | | | | | | | | This adds minimal setting of event log directory/files permissions. To have a secure environment the user must manually create the top level event log directory and set permissions up. We can add logic to do that automatically later if we want. Author: Thomas Graves <tgraves@apache.org> Closes #538 from tgravescs/SPARK-1557 and squashes the following commits: e471d8e [Thomas Graves] rework d8b6620 [Thomas Graves] update use of octal 3ca9b79 [Thomas Graves] Updated based on comments 5a09709 [Thomas Graves] add in missing import 3150ed6 [Thomas Graves] SPARK-1557 Set permissions on event log files/directories
* HOTFIX: minor change to release scriptPatrick Wendell2014-04-291-1/+1
|
* HOTFIX: minor change to release scriptPatrick Wendell2014-04-291-2/+4
|
* [SPARK-1636][MLLIB] Move main methods to examplesXiangrui Meng2014-04-2919-321/+795
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * `NaiveBayes` -> `SparseNaiveBayes` * `KMeans` -> `DenseKMeans` * `SVMWithSGD` and `LogisticRegerssionWithSGD` -> `BinaryClassification` * `ALS` -> `MovieLensALS` * `LinearRegressionWithSGD`, `LassoWithSGD`, and `RidgeRegressionWithSGD` -> `LinearRegression` * `DecisionTree` -> `DecisionTreeRunner` `scopt` is used for parsing command-line parameters. `scopt` has MIT license and it only depends on `scala-library`. Example help message: ~~~ BinaryClassification: an example app for binary classification. Usage: BinaryClassification [options] <input> --numIterations <value> number of iterations --stepSize <value> initial step size, default: 1.0 --algorithm <value> algorithm (SVM,LR), default: LR --regType <value> regularization type (L1,L2), default: L2 --regParam <value> regularization parameter, default: 0.1 <input> input paths to labeled examples in LIBSVM format ~~~ Author: Xiangrui Meng <meng@databricks.com> Closes #584 from mengxr/mllib-main and squashes the following commits: 7b58c60 [Xiangrui Meng] minor 6e35d7e [Xiangrui Meng] make imports explicit and fix code style c6178c9 [Xiangrui Meng] update TS PCA/SVD to use new spark-submit 6acff75 [Xiangrui Meng] use scopt for DecisionTreeRunner be86069 [Xiangrui Meng] use main instead of extending App b3edf68 [Xiangrui Meng] move DecisionTree's main method to examples 8bfaa5a [Xiangrui Meng] change NaiveBayesParams to Params fe23dcb [Xiangrui Meng] remove main from KMeans and add DenseKMeans as an example 67f4448 [Xiangrui Meng] remove main methods from linear regression algorithms and add LinearRegression example b066bbc [Xiangrui Meng] remove main from ALS and add MovieLensALS example b040f3b [Xiangrui Meng] change BinaryClassificationParams to Params 577945b [Xiangrui Meng] remove unused imports from NB 3d299bc [Xiangrui Meng] remove main from LR/SVM and add an example app for binary classification f70878e [Xiangrui Meng] remove main from NaiveBayes and add an example NaiveBayes app 01ec2cd [Xiangrui Meng] Merge branch 'master' into mllib-main 9420692 [Xiangrui Meng] add scopt to examples dependencies
* Minor fix to python table caching API.Michael Armbrust2014-04-291-2/+2
| | | | | | | | Author: Michael Armbrust <michael@databricks.com> Closes #585 from marmbrus/pythonCacheTable and squashes the following commits: 7ec1f91 [Michael Armbrust] Minor fix to python table caching API.
* HOTFIX: Bug in release scriptPatrick Wendell2014-04-291-0/+1
|
* Improved build configurationwitgo2014-04-2823-466/+295
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | 1, Fix SPARK-1441: compile spark core error with hadoop 0.23.x 2, Fix SPARK-1491: maven hadoop-provided profile fails to build 3, Fix org.scala-lang: * ,org.apache.avro:* inconsistent versions dependency 4, A modified on the sql/catalyst/pom.xml,sql/hive/pom.xml,sql/core/pom.xml (Four spaces formatted into two spaces) Author: witgo <witgo@qq.com> Closes #480 from witgo/format_pom and squashes the following commits: 03f652f [witgo] review commit b452680 [witgo] Merge branch 'master' of https://github.com/apache/spark into format_pom bee920d [witgo] revert fix SPARK-1629: Spark Core missing commons-lang dependence 7382a07 [witgo] Merge branch 'master' of https://github.com/apache/spark into format_pom 6902c91 [witgo] fix SPARK-1629: Spark Core missing commons-lang dependence 0da4bc3 [witgo] merge master d1718ed [witgo] Merge branch 'master' of https://github.com/apache/spark into format_pom e345919 [witgo] add avro dependency to yarn-alpha 77fad08 [witgo] Merge branch 'master' of https://github.com/apache/spark into format_pom 62d0862 [witgo] Fix org.scala-lang: * inconsistent versions dependency 1a162d7 [witgo] Merge branch 'master' of https://github.com/apache/spark into format_pom 934f24d [witgo] review commit cf46edc [witgo] exclude jruby 06e7328 [witgo] Merge branch 'SparkBuild' into format_pom 99464d2 [witgo] fix maven hadoop-provided profile fails to build 0c6c1fc [witgo] Fix compile spark core error with hadoop 0.23.x 6851bec [witgo] Maintain consistent SparkBuild.scala, pom.xml
* SPARK-1652: Remove incorrect deprecation warning in spark-submitPatrick Wendell2014-04-284-6/+14
| | | | | | | | | | | | | | | This is a straightforward fix. Author: Patrick Wendell <pwendell@gmail.com> This patch had conflicts when merged, resolved by Committer: Patrick Wendell <pwendell@gmail.com> Closes #578 from pwendell/spark-submit-yarn and squashes the following commits: 96027c7 [Patrick Wendell] Test fixes b5be173 [Patrick Wendell] Review feedback 4ac9cac [Patrick Wendell] SPARK-1652: spark-submit for yarn prints warnings even though calling as expected
* SPARK-1654 and SPARK-1653: Fixes in spark-submit.Patrick Wendell2014-04-285-15/+17
| | | | | | | | | | | | | | | Deals with two issues: 1. Spark shell didn't correctly pass quoted arguments to spark-submit. ```./bin/spark-shell --driver-java-options "-Dfoo=f -Dbar=b"``` 2. Spark submit used deprecated environment variables (SPARK_CLASSPATH) which triggered warnings. Now we use new, more narrowly scoped, variables. Author: Patrick Wendell <pwendell@gmail.com> Closes #576 from pwendell/spark-submit and squashes the following commits: 67004c9 [Patrick Wendell] SPARK-1654 and SPARK-1653: Fixes in spark-submit.
* SPARK-1652: Spark submit should fail gracefully if YARN not enabledPatrick Wendell2014-04-282-0/+16
| | | | | | | | | Author: Patrick Wendell <pwendell@gmail.com> Closes #579 from pwendell/spark-submit-yarn-2 and squashes the following commits: 05e1b11 [Patrick Wendell] Small fix d2a40ad [Patrick Wendell] SPARK-1652: Spark submit should fail gracefully if YARN support not enabled
* Changes to dev release scriptPatrick Wendell2014-04-281-27/+32
|
* [SPARK-1633][Streaming] Java API unit test and example for custom streaming ↵Tathagata Das2014-04-2810-35/+476
| | | | | | | | | | | | | | | | | | | | | | receiver in Java Author: Tathagata Das <tathagata.das1565@gmail.com> Closes #558 from tdas/more-fixes and squashes the following commits: c0c84e6 [Tathagata Das] Removing extra println() d8a8cf4 [Tathagata Das] More tweaks to make unit test work in Jenkins. b7caa98 [Tathagata Das] More tweaks. d337367 [Tathagata Das] More tweaks 22d6f2d [Tathagata Das] Merge remote-tracking branch 'apache/master' into more-fixes 40a961b [Tathagata Das] Modified java test to reduce flakiness. 9410ca6 [Tathagata Das] Merge remote-tracking branch 'apache/master' into more-fixes 86d9147 [Tathagata Das] scala style fix 2f3d7b1 [Tathagata Das] Added Scala custom receiver example. d677611 [Tathagata Das] Merge remote-tracking branch 'apache/master' into more-fixes bec3fc2 [Tathagata Das] Added license. 51d6514 [Tathagata Das] Fixed docs on receiver. 81aafa0 [Tathagata Das] Added Java test for Receiver API, and added JavaCustomReceiver example.
* [SQL]Append some missing types for HiveUDFCheng Hao2014-04-271-10/+48
| | | | | | | | | | Add the missing types Author: Cheng Hao <hao.cheng@intel.com> Closes #459 from chenghao-intel/missing_types and squashes the following commits: 21cba2e [Cheng Hao] Append some missing types for HiveUDF
* Update the import package name for TestHive in sbt shellCheng Hao2014-04-271-1/+1
| | | | | | | | | | sbt/sbt hive/console will fail as TestHive changed its package from "org.apache.spark.sql.hive" to "org.apache.spark.sql.hive.test". Author: Cheng Hao <hao.cheng@intel.com> Closes #574 from chenghao-intel/hive_console and squashes the following commits: de14035 [Cheng Hao] Update the import package name for TestHive in sbt shell
* Fix SPARK-1609: Executor fails to start when Command.extraJavaOptions ↵witgo2014-04-271-4/+5
| | | | | | | | | | | | | | | | | | contains multiple Java options Author: witgo <witgo@qq.com> Closes #547 from witgo/SPARK-1609 and squashes the following commits: deb6a4c [witgo] review commit 91da0bb [witgo] Merge branch 'master' of https://github.com/apache/spark into SPARK-1609 0640852 [witgo] review commit 8f90b22 [witgo] Merge branch 'master' of https://github.com/apache/spark into SPARK-1609 bcf36cb [witgo] Merge branch 'master' of https://github.com/apache/spark into SPARK-1609 1185605 [witgo] fix extraJavaOptions split f7c0ab7 [witgo] bugfix 86fc4bb [witgo] bugfix 8a265b7 [witgo] Fix SPARK-1609: Executor fails to start when use spark-submit
* SPARK-1145: Memory mapping with many small blocks can cause JVM allocation ↵Patrick Wendell2014-04-276-20/+91
| | | | | | | | | | | | | | | | | | failures This includes some minor code clean-up as well. The main change is that small files are not memory mapped. There is a nicer way to write that code block using Scala's `Try` but to make it easy to back port and as simple as possible, I opted for the more explicit but less pretty format. Author: Patrick Wendell <pwendell@gmail.com> Closes #43 from pwendell/block-iter-logging and squashes the following commits: 1cff512 [Patrick Wendell] Small issue from merge. 49f6c269 [Patrick Wendell] Merge remote-tracking branch 'apache/master' into block-iter-logging 4943351 [Patrick Wendell] Added a test and feedback on mateis review a637a18 [Patrick Wendell] Review feedback and adding rewind() when reading byte buffers. b76b95f [Patrick Wendell] Review feedback 4e1514e [Patrick Wendell] Don't memory map for small files d238b88 [Patrick Wendell] Some logging and clean-up
* HOTFIX: Minor patch to merge script.Patrick Wendell2014-04-271-1/+1
|
* SPARK-1651: Delete existing deployment directoryRahul Singhal2014-04-271-0/+1
| | | | | | | | | | | Small bug fix to make sure the "spark contents" are copied to the deployment directory correctly. Author: Rahul Singhal <rahul.singhal@guavus.com> Closes #573 from rahulsinghaliitd/SPARK-1651 and squashes the following commits: 402c999 [Rahul Singhal] SPARK-1651: Delete existing deployment directory
* SPARK-1648 Support closing JIRA's as part of merge script.Patrick Wendell2014-04-271-9/+105
| | | | | | | | | | | | | | | | | Adds an automated hook in the merge script that can close the JIRA, set the fix versions, and leave a comment on the JIRA indicating the PR in which it was resolved. This ensures that (a) we always close JIRA's when issues are merged and (b) there is a link to the pull request in every JIRA. This requires a python library called `jira-client`. We could look at embedding this library in our project, but it seemed simple enough to just gracefully disable this feature if it is not installed. It can be installed with `pip install jira-client`. Author: Patrick Wendell <pwendell@gmail.com> Closes #570 from pwendell/jira-pr-merge and squashes the following commits: 3022b96 [Patrick Wendell] SPARK-1648 Support closing JIRA's as part of merge script.
* SPARK-1650: Correctly identify maven project versionRahul Singhal2014-04-271-1/+1
| | | | | | | | | | | Better account for various side-effect outputs while executing "mvn help:evaluate -Dexpression=project.version" Author: Rahul Singhal <rahul.singhal@guavus.com> Closes #572 from rahulsinghaliitd/SPARK-1650 and squashes the following commits: fd6a611 [Rahul Singhal] SPARK-1650: Correctly identify maven project version
* SPARK-1606: Infer user application arguments instead of requiring --arg.Patrick Wendell2014-04-266-162/+181
| | | | | | | | | | | | | | | This modifies spark-submit to do something more like the Hadoop `jar` command. Now we have the following syntax: ./bin/spark-submit [options] user.jar [user options] Author: Patrick Wendell <pwendell@gmail.com> Closes #563 from pwendell/spark-submit and squashes the following commits: 32241fc [Patrick Wendell] Review feedback 3adfb69 [Patrick Wendell] Small fix bc48139 [Patrick Wendell] SPARK-1606: Infer user application arguments instead of requiring --arg.
* SPARK-1467: Make StorageLevel.apply() factory methods Developer APIsSandeep2014-04-261-4/+22
| | | | | | | | | | We may want to evolve these in the future to add things like SSDs, so let's mark them as experimental for now. Long-term the right solution might be some kind of builder. The stable API should be the existing StorageLevel constants. Author: Sandeep <sandeep@techaddict.me> Closes #551 from techaddict/SPARK-1467 and squashes the following commits: 6bdda24 [Sandeep] SPARK-1467: Make StorageLevel.apply() factory methods as Developer Api's We may want to evolve these in the future to add things like SSDs, so let's mark them as experimental for now. Long-term the right solution might be some kind of builder. The stable API should be the existing StorageLevel constants.
* [SPARK-1608] [SQL] Fix Cast.nullable when cast from StringType to ↵Takuya UESHIN2014-04-262-1/+17
| | | | | | | | | | | | | | | NumericType/TimestampType. `Cast.nullable` should be `true` when cast from `StringType` to `NumericType` or `TimestampType`. Because if `StringType` expression has an illegal number string or illegal timestamp string, the casted value becomes `null`. Author: Takuya UESHIN <ueshin@happy-camper.st> Closes #532 from ueshin/issues/SPARK-1608 and squashes the following commits: 065d37c [Takuya UESHIN] Add tests to check nullabilities of cast expressions. f278ed7 [Takuya UESHIN] Revert test to keep it readable and concise. 9fc9380 [Takuya UESHIN] Fix Cast.nullable when cast from StringType to NumericType/TimestampType.
* add note of how to support table with more than 22 fieldswangfei2014-04-261-0/+2
| | | | | | | | | | | Author: wangfei <wangfei1@huawei.com> Closes #564 from scwf/patch-6 and squashes the following commits: a331876 [wangfei] Update sql-programming-guide.md 685135b [wangfei] Update sql-programming-guide.md 10b3dc0 [wangfei] Update sql-programming-guide.md 1c40480 [wangfei] add note of how to support table with 22 fields
* [Spark-1382] Fix NPE in DStream.slice (updated version of #365)zsxwing2014-04-252-11/+23
| | | | | | | | | | | | | | @zsxwing I cherry-picked your changes and merged the master. #365 had some conflicts once again! Author: zsxwing <zsxwing@gmail.com> Author: Tathagata Das <tathagata.das1565@gmail.com> Closes #562 from tdas/SPARK-1382 and squashes the following commits: e2962c1 [Tathagata Das] Merge remote-tracking branch 'apache-github/master' into SPARK-1382 20968d9 [zsxwing] Replace Exception with SparkException in DStream e476651 [zsxwing] Merge remote-tracking branch 'origin/master' into SPARK-1382 35ba56a [zsxwing] SPARK-1382: Fix NPE in DStream.slice
* SPARK-1632. Remove unnecessary boxing in compares in ExternalAppendOnlyM...Sandy Ryza2014-04-251-3/+5
| | | | | | | | | | | ...ap Author: Sandy Ryza <sandy@cloudera.com> Closes #559 from sryza/sandy-spark-1632 and squashes the following commits: a6cd352 [Sandy Ryza] Only compute hashes once 04e3884 [Sandy Ryza] SPARK-1632. Remove unnecessary boxing in compares in ExternalAppendOnlyMap
* SPARK-1235: manage the DAGScheduler EventProcessActor with supervisor and ↵CodingCat2014-04-257-221/+290
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | refactor the DAGScheduler with Akka https://spark-project.atlassian.net/browse/SPARK-1235 In the current implementation, the running job will hang if the DAGScheduler crashes for some reason (eventProcessActor throws exception in receive() ) The reason is that the actor will automatically restart when the exception is thrown during the running but is not captured properly (Akka behaviour), and the JobWaiters are still waiting there for the completion of the tasks In this patch, I refactored the DAGScheduler with Akka and manage the eventProcessActor with supervisor, so that upon the failure of a eventProcessActor, the supervisor will terminate the EventProcessActor and close the SparkContext thanks for @kayousterhout and @markhamstra to give the hints in JIRA Author: CodingCat <zhunansjtu@gmail.com> Author: Xiangrui Meng <meng@databricks.com> Author: Nan Zhu <CodingCat@users.noreply.github.com> Closes #186 from CodingCat/SPARK-1235 and squashes the following commits: a7fb0ee [CodingCat] throw Exception on failure of creating DAG 124d82d [CodingCat] blocking the constructor until event actor is ready baf2d38 [CodingCat] fix the issue brought by non-blocking actorOf 35c886a [CodingCat] fix bug 82d08b3 [CodingCat] calling actorOf on system to ensure it is blocking 310a579 [CodingCat] style fix cd02d9a [Nan Zhu] small fix 561cfbc [CodingCat] recover doCheckpoint c048d0e [CodingCat] call submitWaitingStages for every event a9eea039 [CodingCat] address Matei's comments ac878ab [CodingCat] typo fix 5d1636a [CodingCat] re-trigger the test..... 9dfb033 [CodingCat] remove unnecessary changes a7a2a97 [CodingCat] add StageCancelled message fdf3b17 [CodingCat] just to retrigger the test...... 089bc2f [CodingCat] address andrew's comments 228f4b0 [CodingCat] address comments from Mark b68c1c7 [CodingCat] refactor DAGScheduler with Akka 810efd8 [Xiangrui Meng] akka solution
* SPARK-1607. HOTFIX: Fix syntax adapting Int result to ShortSean Owen2014-04-251-2/+2
| | | | | | | | | | Sorry folks. This should make the change for SPARK-1607 compile again. Verified this time with the yarn build enabled. Author: Sean Owen <sowen@cloudera.com> Closes #556 from srowen/SPARK-1607.2 and squashes the following commits: e3fe7a3 [Sean Owen] Fix syntax adapting Int result to Short
* Update KafkaWordCount.scalabaishuo(白硕)2014-04-251-1/+1
| | | | | | | | | | modify the required args number Author: baishuo(白硕) <vc_java@hotmail.com> Closes #523 from baishuo/master and squashes the following commits: 0368ba9 [baishuo(白硕)] Update KafkaWordCount.scala
* Delete the val that never usedWangTao2014-04-251-4/+0
| | | | | | | | | | It seems that the val "startTime" and "endTime" is never used, so delete them. Author: WangTao <barneystinson@aliyun.com> Closes #553 from WangTaoTheTonic/master and squashes the following commits: 4fcb639 [WangTao] Delete the val that never used
* SPARK-1621 Upgrade Chill to 0.3.6Matei Zaharia2014-04-253-11/+9
| | | | | | | | | | It registers more Scala classes, including things like Ranges that we had to register manually before. See https://github.com/twitter/chill/releases for Chill's change log. Author: Matei Zaharia <matei@databricks.com> Closes #543 from mateiz/chill-0.3.6 and squashes the following commits: a1dc5e0 [Matei Zaharia] Upgrade Chill to 0.3.6 and remove our special registration of Ranges
* SPARK-1619 Launch spark-shell with spark-submitPatrick Wendell2014-04-2411-189/+39
| | | | | | | | | | | | | | | This simplifies the shell a bunch and passes all arguments through to spark-submit. There is a tiny incompatibility from 0.9.1 which is that you can't put `-c` _or_ `--cores`, only `--cores`. However, spark-submit will give a good error message in this case, I don't think many people used this, and it's a trivial change for users. Author: Patrick Wendell <pwendell@gmail.com> Closes #542 from pwendell/spark-shell and squashes the following commits: 9eb3e6f [Patrick Wendell] Updating Spark docs b552459 [Patrick Wendell] Andrew's feedback 97720fa [Patrick Wendell] Review feedback aa2900b [Patrick Wendell] SPARK-1619 Launch spark-shell with spark-submit
* SPARK-1607. Replace octal literals, removed in Scala 2.11, with hex literalsSean Owen2014-04-241-2/+4
| | | | | | | | | | | | | Octal literals like "0700" are deprecated in Scala 2.10, generating a warning. They have been removed entirely in 2.11. See https://issues.scala-lang.org/browse/SI-7618 This change simply replaces two uses of octals with hex literals, which seemed the next-best representation since they express a bit mask (file permission in particular) Author: Sean Owen <sowen@cloudera.com> Closes #529 from srowen/SPARK-1607 and squashes the following commits: 1ee0e67 [Sean Owen] Use Integer.parseInt(...,8) for octal literal instead of hex equivalent 0102f3d [Sean Owen] Replace octal literals, removed in Scala 2.11, with hex literals
* Call correct stop().Aaron Davidson2014-04-241-1/+2
| | | | | | | | | | Oopsie in #504. Author: Aaron Davidson <aaron@databricks.com> Closes #527 from aarondav/stop and squashes the following commits: 8d1446a [Aaron Davidson] Call correct stop().
* SPARK-1242 Add aggregate to python rddHolden Karau2014-04-241-2/+29
| | | | | | | | | | | Author: Holden Karau <holden@pigscanfly.ca> Closes #139 from holdenk/add_aggregate_to_python_api and squashes the following commits: 0f39ae3 [Holden Karau] Merge in master 4879c75 [Holden Karau] CR feedback, fix issue with empty RDDs in aggregate 70b4724 [Holden Karau] Style fixes from code review 96b047b [Holden Karau] Add aggregate to python rdd
* Fix [SPARK-1078]: Remove the Unnecessary lift-json dependencySandeep2014-04-241-16/+2
| | | | | | | | | | Remove the Unnecessary lift-json dependency from pom.xml Author: Sandeep <sandeep@techaddict.me> Closes #536 from techaddict/FIX-SPARK-1078 and squashes the following commits: bd0fd1d [Sandeep] Fix [SPARK-1078]: Replace lift-json with json4s-jackson. Remove the Unnecessary lift-json dependency from pom.xml
* [Typo] In the maven docs: chd -> cdhAndrew Or2014-04-241-1/+1
| | | | | | | | Author: Andrew Or <andrewor14@gmail.com> Closes #548 from andrewor14/doc-typo and squashes the following commits: 3eaf4c4 [Andrew Or] chd -> cdh
* Generalize pattern for planning hash joins.Michael Armbrust2014-04-243-48/+82
| | | | | | | | | | | | | This will be helpful for [SPARK-1495](https://issues.apache.org/jira/browse/SPARK-1495) and other cases where we want to have custom hash join implementations but don't want to repeat the logic for finding the join keys. Author: Michael Armbrust <michael@databricks.com> Closes #418 from marmbrus/hashFilter and squashes the following commits: d5cc79b [Michael Armbrust] Address @rxin 's comments. 366b6d9 [Michael Armbrust] style fixes 14560eb [Michael Armbrust] Generalize pattern for planning hash joins. f4809c1 [Michael Armbrust] Move common functions to PredicateHelper.
* [SPARK-1617] and [SPARK-1618] Improvements to streaming ui and bug fix to ↵Tathagata Das2014-04-2415-103/+217
| | | | | | | | | | | | | | | | | | | socket receiver 1617: These changes expose the receiver state (active or inactive) and last error in the UI 1618: If the socket receiver cannot connect in the first attempt, it should try to restart after a delay. That was broken, as the thread that restarts (hence, stops) the receiver waited on Thread.join on itself! Author: Tathagata Das <tathagata.das1565@gmail.com> Closes #540 from tdas/streaming-ui-fix and squashes the following commits: e469434 [Tathagata Das] Merge remote-tracking branch 'apache-github/master' into streaming-ui-fix dbddf75 [Tathagata Das] Style fix. 66df1a5 [Tathagata Das] Merge remote-tracking branch 'apache/master' into streaming-ui-fix ad98bc9 [Tathagata Das] Refactored streaming listener to use ReceiverInfo. d7f849c [Tathagata Das] Revert "Moved BatchInfo from streaming.scheduler to streaming.ui" 5c80919 [Tathagata Das] Moved BatchInfo from streaming.scheduler to streaming.ui da244f6 [Tathagata Das] Fixed socket receiver as well as made receiver state and error visible in the streamign UI.
* SPARK-1586 Windows build fixesMridul Muralidharan2014-04-2421-116/+185
| | | | | | | | | | | | | | | | | | | Unfortunately, this is not exhaustive - particularly hive tests still fail due to path issues. Author: Mridul Muralidharan <mridulm80@apache.org> This patch had conflicts when merged, resolved by Committer: Matei Zaharia <matei@databricks.com> Closes #505 from mridulm/windows_fixes and squashes the following commits: ef12283 [Mridul Muralidharan] Move to org.apache.commons.lang3 for StringEscapeUtils. Earlier version was buggy appparently cdae406 [Mridul Muralidharan] Remove leaked changes from > 2G fix branch 3267f4b [Mridul Muralidharan] Fix build failures 35b277a [Mridul Muralidharan] Fix Scalastyle failures bc69d14 [Mridul Muralidharan] Change from hardcoded path separator 10c4d78 [Mridul Muralidharan] Use explicit encoding while using getBytes 1337abd [Mridul Muralidharan] fix classpath while running in windows
* SPARK-1584: Upgrade Flume dependency to 1.4.0tmalaska2014-04-242-2/+7
| | | | | | | | | | | | Updated the Flume dependency in the maven pom file and the scala build file. Author: tmalaska <ted.malaska@cloudera.com> Closes #507 from tmalaska/master and squashes the following commits: 79492c8 [tmalaska] excluded all thrift 159c3f1 [tmalaska] fixed the flume pom file issues 5bf56a7 [tmalaska] Upgrade flume version
* [SPARK-986]: Job cancelation for PySparkAhir Reddy2014-04-243-4/+86
| | | | | | | | | | | | | | | | | | | | | | | * Additions to the PySpark API to cancel jobs * Monitor Thread in PythonRDD to kill Python workers if a task is interrupted Author: Ahir Reddy <ahirreddy@gmail.com> Closes #541 from ahirreddy/python-cancel and squashes the following commits: dfdf447 [Ahir Reddy] Changed success -> completed and made logging message clearer 6c860ab [Ahir Reddy] PR Comments 4b4100a [Ahir Reddy] Success flag adba6ed [Ahir Reddy] Destroy python workers 27a2f8f [Ahir Reddy] Start the writer thread... d422f7b [Ahir Reddy] Remove unnecesssary vals adda337 [Ahir Reddy] Busy wait on the ocntext.interrupted flag, and then kill the python worker d9e472f [Ahir Reddy] Revert "removed unnecessary vals" 5b9cae5 [Ahir Reddy] removed unnecessary vals 07b54d9 [Ahir Reddy] Fix canceling unit test 8ae9681 [Ahir Reddy] Don't interrupt worker 7722342 [Ahir Reddy] Monitor Thread for python workers db04e16 [Ahir Reddy] Added canceling api to PySpark
* [SPARK-1615] Synchronize accesses to the LiveListenerBus' event queueAndrew Or2014-04-242-12/+31
| | | | | | | | | | | | | | | | | | | | | | | Original poster is @zsxwing, who reported this bug in #516. Much of SparkListenerSuite relies on LiveListenerBus's `waitUntilEmpty()` method. As the name suggests, this waits until the event queue is empty. However, the following race condition could happen: (1) We dequeue an event (2) The queue is empty, we return true (even though the event has not been processed) (3) The test asserts something assuming that all listeners have finished executing (and fails) (4) The listeners receive and process the event This PR makes (1) and (4) atomic by synchronizing around it. To do that, however, we must avoid using `eventQueue.take`, which is blocking and will cause a deadlock if we synchronize around it. As a workaround, we use the non-blocking `eventQueue.poll` + a semaphore to provide the same semantics. This has been a possible race condition for a long time, but for some reason we've never run into it. Author: Andrew Or <andrewor14@gmail.com> Closes #544 from andrewor14/stage-info-test-fix and squashes the following commits: 3cbe40c [Andrew Or] Merge github.com:apache/spark into stage-info-test-fix 56dbbcb [Andrew Or] Check if event is actually added before releasing semaphore eb486ae [Andrew Or] Synchronize accesses to the LiveListenerBus' event queue
* [SPARK-1510] Spark Streaming metrics source for metrics systemjerryshao2014-04-243-1/+79
| | | | | | | | | | | | | | This pulls in changes made by @jerryshao in https://github.com/apache/spark/pull/424 and merges with the master. Author: jerryshao <saisai.shao@intel.com> Author: Tathagata Das <tathagata.das1565@gmail.com> Closes #545 from tdas/streaming-metrics and squashes the following commits: 034b443 [Tathagata Das] Merge remote-tracking branch 'apache-github/master' into streaming-metrics fb3b0a5 [jerryshao] Modify according master update 21939f5 [jerryshao] Style changes according to style check error 976116b [jerryshao] Add StreamSource in StreamingContext for better monitoring through metrics system