aboutsummaryrefslogtreecommitdiff
Commit message (Collapse)AuthorAgeFilesLines
* Revert "[maven-release-plugin] prepare release v1.0.0-rc10"Tathagata Das2014-05-2521-24/+24
| | | | This reverts commit d807023479ce10aec28ef3c1ab646ddefc2e663c.
* Revert "[maven-release-plugin] prepare for next development iteration"Tathagata Das2014-05-2521-22/+22
| | | | This reverts commit 67dd53d2556f03ce292e6889128cf441f1aa48f8.
* Updated CHANGES.txtTathagata Das2014-05-251-1/+86
|
* SPARK-1911: Emphasize that Spark jars should be built with Java 6.Patrick Wendell2014-05-241-21/+31
| | | | | | | | | | | | | | | | This commit requires the user to manually say "yes" when buiding Spark without Java 6. The prompt can be bypassed with a flag (e.g. if the user is scripting around make-distribution). Author: Patrick Wendell <pwendell@gmail.com> Closes #859 from pwendell/java6 and squashes the following commits: 4921133 [Patrick Wendell] Adding Pyspark Notice fee8c9e [Patrick Wendell] SPARK-1911: Emphasize that Spark jars should be built with Java 6. (cherry picked from commit 75a03277704f8618a0f1c41aecfb1ebd24a8ac1a) Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
* [SPARK-1900 / 1918] PySpark on YARN is brokenAndrew Or2014-05-249-47/+323
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If I run the following on a YARN cluster ``` bin/spark-submit sheep.py --master yarn-client ``` it fails because of a mismatch in paths: `spark-submit` thinks that `sheep.py` resides on HDFS, and balks when it can't find the file there. A natural workaround is to add the `file:` prefix to the file: ``` bin/spark-submit file:/path/to/sheep.py --master yarn-client ``` However, this also fails. This time it is because python does not understand URI schemes. This PR fixes this by automatically resolving all paths passed as command line argument to `spark-submit` properly. This has the added benefit of keeping file and jar paths consistent across different cluster modes. For python, we strip the URI scheme before we actually try to run it. Much of the code is originally written by @mengxr. Tested on YARN cluster. More tests pending. Author: Andrew Or <andrewor14@gmail.com> Closes #853 from andrewor14/submit-paths and squashes the following commits: 0bb097a [Andrew Or] Format path correctly before adding it to PYTHONPATH 323b45c [Andrew Or] Include --py-files on PYTHONPATH for pyspark shell 3c36587 [Andrew Or] Improve error messages (minor) 854aa6a [Andrew Or] Guard against NPE if user gives pathological paths 6638a6b [Andrew Or] Fix spark-shell jar paths after #849 went in 3bb0359 [Andrew Or] Update more comments (minor) 2a1f8a0 [Andrew Or] Update comments (minor) 6af2c77 [Andrew Or] Merge branch 'master' of github.com:apache/spark into submit-paths a68c4d1 [Andrew Or] Handle Windows python file path correctly 427a250 [Andrew Or] Resolve paths properly for Windows a591a4a [Andrew Or] Update tests for resolving URIs 6c8621c [Andrew Or] Move resolveURIs to Utils db8255e [Andrew Or] Merge branch 'master' of github.com:apache/spark into submit-paths f542dce [Andrew Or] Fix outdated tests 691c4ce [Andrew Or] Ignore special primary resource names 5342ac7 [Andrew Or] Add missing space in error message 02f77f3 [Andrew Or] Resolve command line arguments to spark-submit properly (cherry picked from commit 5081a0a9d47ca31900ea4de570de2cbb0e063105) Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
* Update LBFGSSuite.scalabaishuo(白硕)2014-05-231-2/+2
| | | | | | | | | | | | | the same reason as https://github.com/apache/spark/pull/588 Author: baishuo(白硕) <vc_java@hotmail.com> Closes #815 from baishuo/master and squashes the following commits: 6876c1e [baishuo(白硕)] Update LBFGSSuite.scala (cherry picked from commit a08262d8769808dd3a8ee1b1e80fbf6ac13a557c) Signed-off-by: Reynold Xin <rxin@apache.org>
* Updated scripts for auditing releasesTathagata Das2014-05-2211-6/+547
| | | | | | | | | | | | | | | | | - Added script to automatically generate change list CHANGES.txt - Added test for verifying linking against maven distributions of `spark-sql` and `spark-hive` - Added SBT projects for testing functionality of `spark-sql` and `spark-hive` - Fixed issues in existing tests that might have come up because of changes in Spark 1.0 Author: Tathagata Das <tathagata.das1565@gmail.com> Closes #844 from tdas/update-dev-scripts and squashes the following commits: 25090ba [Tathagata Das] Added missing license e2e20b3 [Tathagata Das] Updated tests for auditing releases. (cherry picked from commit b2bdd0e505f1ae3d39c46139f17bd43779ece635) Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
* [SPARK-1896] Respect spark.master (and --master) before MASTER in spark-shellAndrew Or2014-05-221-3/+2
| | | | | | | | | | | | | | | | | | | | | | The hierarchy for configuring the Spark master in the shell is as follows: ``` MASTER > --master > spark.master (spark-defaults.conf) ``` This is inconsistent with the way we run normal applications, which is: ``` --master > spark.master (spark-defaults.conf) > MASTER ``` I was trying to run a shell locally on a standalone cluster launched through the ec2 scripts, which automatically set `MASTER` in spark-env.sh. It was surprising to me that `--master` didn't take effect, considering that this is the way we tell users to set their masters [here](http://people.apache.org/~pwendell/spark-1.0.0-rc7-docs/scala-programming-guide.html#initializing-spark). Author: Andrew Or <andrewor14@gmail.com> Closes #846 from andrewor14/shell-master and squashes the following commits: 2cb81c9 [Andrew Or] Respect spark.master before MASTER in REPL (cherry picked from commit cce77457e00aa5f1f4db3d50454cf257efb156ed) Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
* [SPARK-1897] Respect spark.jars (and --jars) in spark-shellAndrew Or2014-05-221-1/+7
| | | | | | | | | | | | | | | | Spark shell currently overwrites `spark.jars` with `ADD_JARS`. In all modes except yarn-cluster, this means the `--jar` flag passed to `bin/spark-shell` is also discarded. However, in the [docs](http://people.apache.org/~pwendell/spark-1.0.0-rc7-docs/scala-programming-guide.html#initializing-spark), we explicitly tell the users to add the jars this way. Author: Andrew Or <andrewor14@gmail.com> Closes #849 from andrewor14/shell-jars and squashes the following commits: 928a7e6 [Andrew Or] ',' -> "," (minor) afc357c [Andrew Or] Handle spark.jars == "" in SparkILoop, not SparkSubmit c6da113 [Andrew Or] Do not set spark.jars to "" d8549f7 [Andrew Or] Respect spark.jars and --jars in spark-shell (cherry picked from commit 8edbee7d1b4afc192d97ba192a5526affc464205) Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
* Fix UISuite unit test that fails under Jenkins contentionAaron Davidson2014-05-221-3/+4
| | | | | | | | | | | | | | | | | | | Due to perhaps zombie processes on Jenkins, it seems that at least 10 Spark ports are in use. It also doesn't matter that the port increases when used, it could in fact go down -- the only part that matters is that it selects a different port rather than failing to bind. Changed test to match this. Thanks to @andrewor14 for helping diagnose this. Author: Aaron Davidson <aaron@databricks.com> Closes #857 from aarondav/tiny and squashes the following commits: c199ec8 [Aaron Davidson] Fix UISuite unit test that fails under Jenkins contention (cherry picked from commit f9f5fd5f4e81828a3e0c391892e0f28751568843) Signed-off-by: Reynold Xin <rxin@apache.org>
* [SPARK-1870] Make spark-submit --jars work in yarn-cluster mode.Xiangrui Meng2014-05-223-55/+19
| | | | | | | | | | | | | | | | | | | | | | | | Sent secondary jars to distributed cache of all containers and add the cached jars to classpath before executors start. Tested on a YARN cluster (CDH-5.0). `spark-submit --jars` also works in standalone server and `yarn-client`. Thanks for @andrewor14 for testing! I removed "Doesn't work for drivers in standalone mode with "cluster" deploy mode." from `spark-submit`'s help message, though we haven't tested mesos yet. CC: @dbtsai @sryza Author: Xiangrui Meng <meng@databricks.com> Closes #848 from mengxr/yarn-classpath and squashes the following commits: 23e7df4 [Xiangrui Meng] rename spark.jar to __spark__.jar and app.jar to __app__.jar to avoid confliction apped $CWD/ and $CWD/* to the classpath remove unused methods a40f6ed [Xiangrui Meng] standalone -> cluster 65e04ad [Xiangrui Meng] update spark-submit help message and add a comment for yarn-client 11e5354 [Xiangrui Meng] minor changes 3e7e1c4 [Xiangrui Meng] use sparkConf instead of hadoop conf dc3c825 [Xiangrui Meng] add secondary jars to classpath in yarn (cherry picked from commit dba314029b4c9d72d7e48a2093b39edd01931f57) Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
* Configuration documentation updatesReynold Xin2014-05-211-89/+105
| | | | | | | | | | | | | | 1. Add < code > to configuration options 2. List env variables in tabular format to be consistent with other pages. 3. Moved Viewing Spark Properties section up. This is against branch-1.0, but should be cherry picked into master as well. Author: Reynold Xin <rxin@apache.org> Closes #851 from rxin/doc-config and squashes the following commits: 28ac0d3 [Reynold Xin] Add <code> to configuration options, and list env variables in a table.
* [SPARK-1889] [SQL] Apply splitConjunctivePredicates to join condition while ↵Takuya UESHIN2014-05-212-6/+24
| | | | | | | | | | | | | | | | | | finding join ke... ...ys. When tables are equi-joined by multiple-keys `HashJoin` should be used, but `CartesianProduct` and then `Filter` are used. The join keys are paired by `And` expression so we need to apply `splitConjunctivePredicates` to join condition while finding join keys. Author: Takuya UESHIN <ueshin@happy-camper.st> Closes #836 from ueshin/issues/SPARK-1889 and squashes the following commits: fe1c387 [Takuya UESHIN] Apply splitConjunctivePredicates to join condition while finding join keys. (cherry picked from commit bb88875ad52e8209c25e8350af1fe4b7159086ae) Signed-off-by: Reynold Xin <rxin@apache.org>
* [SPARK-1519] Support minPartitions param of wholeTextFiles() in PySparkKan Zhang2014-05-211-2/+10
| | | | | | | | | | | Author: Kan Zhang <kzhang@apache.org> Closes #697 from kanzhang/SPARK-1519 and squashes the following commits: 4f8d1ed [Kan Zhang] [SPARK-1519] Support minPartitions param of wholeTextFiles() in PySpark (cherry picked from commit f18fd05b513b136363c94adb3e5b841f8bf48134) Signed-off-by: Reynold Xin <rxin@apache.org>
* [Typo] Stoped -> StoppedAndrew Or2014-05-211-1/+1
| | | | | | | | | | | Author: Andrew Or <andrewor14@gmail.com> Closes #847 from andrewor14/yarn-typo and squashes the following commits: c1906af [Andrew Or] Stoped -> Stopped (cherry picked from commit ba5d4a99425a2083fea2a9759050c5e770197e23) Signed-off-by: Reynold Xin <rxin@apache.org>
* [Minor] Move JdbcRDDSuite to the correct packageAndrew Or2014-05-211-6/+6
| | | | | | | | | | | | | | It was in the wrong package Author: Andrew Or <andrewor14@gmail.com> Closes #839 from andrewor14/jdbc-suite and squashes the following commits: f948c5a [Andrew Or] cache -> cache() b215279 [Andrew Or] Move JdbcRDDSuite to the correct package (cherry picked from commit 7c79ef7d43de258ad9a5de15c590132bd78ce8dd) Signed-off-by: Reynold Xin <rxin@apache.org>
* [Docs] Correct example of creating a new SparkConfAndrew Or2014-05-211-1/+1
| | | | | | | | | | | | | The example code on the configuration page currently does not compile. Author: Andrew Or <andrewor14@gmail.com> Closes #842 from andrewor14/conf-docs and squashes the following commits: aabff57 [Andrew Or] Correct example of creating a new SparkConf (cherry picked from commit 1014668f2727863fe46f9c75201ee459d093bf0c) Signed-off-by: Reynold Xin <rxin@apache.org>
* [SPARK-1250] Fixed misleading comments in bin/pyspark, bin/spark-classSumedh Mungee2014-05-212-2/+2
| | | | | | | | | | | | | Fixed a couple of misleading comments in bin/pyspark and bin/spark-class. The comments make it seem like the script is looking for the Scala installation when in fact it is looking for Spark. Author: Sumedh Mungee <smungee@gmail.com> Closes #843 from smungee/spark-1250-fix-comments and squashes the following commits: 26870f3 [Sumedh Mungee] [SPARK-1250] Fixed misleading comments in bin/pyspark and bin/spark-class (cherry picked from commit 6e337380fc47071fc7fb28d744e8209c729fe1e9) Signed-off-by: Reynold Xin <rxin@apache.org>
* [maven-release-plugin] prepare for next development iterationTathagata Das2014-05-2021-22/+22
|
* [maven-release-plugin] prepare release v1.0.0-rc10Tathagata Das2014-05-2021-24/+24
|
* [Hotfix] Blacklisted flaky HiveCompatibility testTathagata Das2014-05-201-2/+4
| | | | | | | | | | | | | `lateral_view_outer` query sometimes returns a different set of 10 rows. Author: Tathagata Das <tathagata.das1565@gmail.com> Closes #838 from tdas/hive-test-fix2 and squashes the following commits: 9128a0d [Tathagata Das] Blacklisted flaky HiveCompatibility test. (cherry picked from commit 7f0cfe47f4709843d70ceccc25dee7551206ce0d) Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
* Revert "[maven-release-plugin] prepare release v1.0.0-rc9"Tathagata Das2014-05-1921-24/+24
| | | | This reverts commit 920f947eb5a22a679c0c3186cf69ee75f6041c75.
* Revert "[maven-release-plugin] prepare for next development iteration"Tathagata Das2014-05-1921-22/+22
| | | | This reverts commit f8e611955096c5c1c7db5764b9d2851b1d295f0d.
* Updated CHANGES.txtTathagata Das2014-05-191-0/+355
|
* [Spark 1877] ClassNotFoundException when loading RDD with serialized objectsTathagata Das2014-05-191-1/+1
| | | | | | | | | | | | | | | Updated version of #821 Author: Tathagata Das <tathagata.das1565@gmail.com> Author: Ghidireac <bogdang@u448a5b0a73d45358d94a.ant.amazon.com> Closes #835 from tdas/SPARK-1877 and squashes the following commits: f346f71 [Tathagata Das] Addressed Patrick's comments. fee0c5d [Ghidireac] SPARK-1877: ClassNotFoundException when loading RDD with serialized objects (cherry picked from commit 52eb54d02403a3c37d84b9da7cc1cdb261048cf8) Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
* [SPARK-1874][MLLIB] Clean up MLlib sample dataXiangrui Meng2014-05-196-2/+2138
| | | | | | | | | | | | | | | | | | 1. Added synthetic datasets for `MovieLensALS`, `LinearRegression`, `BinaryClassification`. 2. Embedded instructions in the help message of those example apps. Per discussion with Matei on the JIRA page, new example data is under `data/mllib`. Author: Xiangrui Meng <meng@databricks.com> Closes #833 from mengxr/mllib-sample-data and squashes the following commits: 59f0a18 [Xiangrui Meng] add sample binary classification data 3c2f92f [Xiangrui Meng] add linear regression data 050f1ca [Xiangrui Meng] add a sample dataset for MovieLensALS example (cherry picked from commit bcb9dce6f444a977c714117811bce0c54b417650) Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
* SPARK-1689: Spark application should die when removed by MasterAaron Davidson2014-05-191-0/+2
| | | | | | | | | | | | | scheduler.error() will mask the error if there are active tasks. Being removed is a cataclysmic event for Spark applications, and should probably be treated as such. Author: Aaron Davidson <aaron@databricks.com> Closes #832 from aarondav/i-love-u and squashes the following commits: 9f1200f [Aaron Davidson] SPARK-1689: Spark application should die when removed by Master (cherry picked from commit b0ce22e071da4cc62ec5e29abf7b1299b8e4a6b0) Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
* [SPARK-1875]NoClassDefFoundError: StringUtils when building with hadoop 1.x ↵witgo2014-05-192-10/+1
| | | | | | | | | | | | | | and hive Author: witgo <witgo@qq.com> Closes #824 from witgo/SPARK-1875_commons-lang-2.6 and squashes the following commits: ef7231d [witgo] review commit ead3c3b [witgo] SPARK-1875:NoClassDefFoundError: StringUtils when building against Hadoop 1 (cherry picked from commit 6a2c5c610c259f62cb12d8cfc18bf59cdb334bb2) Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
* SPARK-1879. Increase MaxPermSize since some of our builds have many classesMatei Zaharia2014-05-193-5/+7
| | | | | | | | | | | | | | | See https://issues.apache.org/jira/browse/SPARK-1879 -- builds with Hadoop2 and Hive ran out of PermGen space in spark-shell, when those things added up with the Scala compiler. Note that users can still override it by setting their own Java options with this change. Their options will come later in the command string than the -XX:MaxPermSize=128m. Author: Matei Zaharia <matei@databricks.com> Closes #823 from mateiz/spark-1879 and squashes the following commits: 6bc0ee8 [Matei Zaharia] Increase MaxPermSize to 128m since some of our builds have lots of classes (cherry picked from commit 5af99d7617ba3b9fbfdb345ef9571b7dd41f45a1) Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
* SPARK-1878: Fix the incorrect initialization orderzsxwing2014-05-192-3/+7
| | | | | | | | | | | | | JIRA: https://issues.apache.org/jira/browse/SPARK-1878 Author: zsxwing <zsxwing@gmail.com> Closes #822 from zsxwing/SPARK-1878 and squashes the following commits: 4a47e27 [zsxwing] SPARK-1878: Fix the incorrect initialization order (cherry picked from commit 1811ba8ccb580979aa2e12019e6a82805f09ab53) Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
* [SPARK-1876] Windows fixes to deal with latest distribution layout changesMatei Zaharia2014-05-197-30/+81
| | | | | | | | | | | | | | | | | | | | - Look for JARs in the right place - Launch examples the same way as on Unix - Load datanucleus JARs if they exist - Don't attempt to parse local paths as URIs in SparkSubmit, since paths with C:\ are not valid URIs - Also fixed POM exclusion rules for datanucleus (it wasn't properly excluding it, whereas SBT was) Author: Matei Zaharia <matei@databricks.com> Closes #819 from mateiz/win-fixes and squashes the following commits: d558f96 [Matei Zaharia] Fix comment 228577b [Matei Zaharia] Review comments d3b71c7 [Matei Zaharia] Properly exclude datanucleus files in Maven assembly 144af84 [Matei Zaharia] Update Windows scripts to match latest binary package layout (cherry picked from commit 7b70a7071894dd90ea1d0091542b3e13e7ef8d3a) Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
* [WIP][SPARK-1871][MLLIB] Improve MLlib guide for v1.0Xiangrui Meng2014-05-1810-90/+153
| | | | | | | | | | | | | | | | | | | | | | | | | | | Some improvements to MLlib guide: 1. [SPARK-1872] Update API links for unidoc. 2. [SPARK-1783] Added `page.displayTitle` to the global layout. If it is defined, use it instead of `page.title` for title display. 3. Add more Java/Python examples. Author: Xiangrui Meng <meng@databricks.com> Closes #816 from mengxr/mllib-doc and squashes the following commits: ec2e407 [Xiangrui Meng] format scala example for ALS cd9f40b [Xiangrui Meng] add a paragraph to summarize distributed matrix types 4617f04 [Xiangrui Meng] add python example to loadLibSVMFile and fix Java example d6509c2 [Xiangrui Meng] [SPARK-1783] update mllib titles 561fdc0 [Xiangrui Meng] add a displayTitle option to global layout 195d06f [Xiangrui Meng] add Java example for summary stats and minor fix 9f1ff89 [Xiangrui Meng] update java api links in mllib-basics 7dad18e [Xiangrui Meng] update java api links in NB 3a0f4a6 [Xiangrui Meng] api/pyspark -> api/python 35bdeb9 [Xiangrui Meng] api/mllib -> api/scala e4afaa8 [Xiangrui Meng] explicity state what might change (cherry picked from commit df0aa8353ab6d3b19d838c6fa95a93a64948309f) Signed-off-by: Matei Zaharia <matei@databricks.com>
* SPARK-1873: Add README.md file when making distributionsPatrick Wendell2014-05-181-0/+1
| | | | | | | | | | | Author: Patrick Wendell <pwendell@gmail.com> Closes #818 from pwendell/reamde and squashes the following commits: 4020b11 [Patrick Wendell] SPARK-1873: Add README.md file when making distributions (cherry picked from commit 4ce479324bdcf603806fc90b5b0f4968c6de690e) Signed-off-by: Matei Zaharia <matei@databricks.com>
* Fix spark-submit path in spark-shell & pysparkNeville Li2014-05-182-5/+5
| | | | | | | | | Author: Neville Li <neville@spotify.com> Closes #812 from nevillelyh/neville/v1.0 and squashes the following commits: 0dc33ed [Neville Li] Fix spark-submit path in pyspark becec64 [Neville Li] Fix spark-submit path in spark-shell
* [maven-release-plugin] prepare for next development iterationPatrick Wendell2014-05-1721-22/+22
|
* [maven-release-plugin] prepare release v1.0.0-rc9Patrick Wendell2014-05-1721-24/+24
|
* Revert "[maven-release-plugin] prepare release v1.0.0-rc8"Patrick Wendell2014-05-1621-24/+24
| | | | This reverts commit 80eea0f111c06260ffaa780d2f3f7facd09c17bc.
* Revert "[maven-release-plugin] prepare for next development iteration"Patrick Wendell2014-05-1621-22/+22
| | | | This reverts commit e5436b8c1a79ce108f3af402455ac5f6dc5d1eb3.
* Make deprecation warning less severePatrick Wendell2014-05-161-6/+6
| | | | | | | | | | | | | Just a small change. I think it's good not to scare people who are using the old options. Author: Patrick Wendell <pwendell@gmail.com> Closes #810 from pwendell/warnings and squashes the following commits: cb8a311 [Patrick Wendell] Make deprecation warning less severe (cherry picked from commit 442808a7482b81c8de887c901b424683da62022e) Signed-off-by: Patrick Wendell <pwendell@gmail.com>
* [SPARK-1824] Remove <master> from Python examplesAndrew Or2014-05-1612-72/+77
| | | | | | | | | | | | | | | | | | | | | | | | A recent PR (#552) fixed this for all Scala / Java examples. We need to do it for python too. Note that this blocks on #799, which makes `bin/pyspark` go through Spark submit. With only the changes in this PR, the only way to run these examples is through Spark submit. Once #799 goes in, you can use `bin/pyspark` to run them too. For example, ``` bin/pyspark examples/src/main/python/pi.py 100 --master local-cluster[4,1,512] ``` Author: Andrew Or <andrewor14@gmail.com> Closes #802 from andrewor14/python-examples and squashes the following commits: cf50b9f [Andrew Or] De-indent python comments (minor) 50f80b1 [Andrew Or] Remove pyFiles from SparkContext construction c362f69 [Andrew Or] Update docs to use spark-submit for python applications 7072c6a [Andrew Or] Merge branch 'master' of github.com:apache/spark into python-examples 427a5f0 [Andrew Or] Update docs d32072c [Andrew Or] Remove <master> from examples + update usages (cherry picked from commit cf6cbe9f76c3b322a968c836d039fc5b70d4ce43) Signed-off-by: Patrick Wendell <pwendell@gmail.com>
* [SPARK-1808] Route bin/pyspark through Spark submitAndrew Or2014-05-1610-34/+107
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | **Problem.** For `bin/pyspark`, there is currently no other way to specify Spark configuration properties other than through `SPARK_JAVA_OPTS` in `conf/spark-env.sh`. However, this mechanism is supposedly deprecated. Instead, it needs to pick up configurations explicitly specified in `conf/spark-defaults.conf`. **Solution.** Have `bin/pyspark` invoke `bin/spark-submit`, like all of its counterparts in Scala land (i.e. `bin/spark-shell`, `bin/run-example`). This has the additional benefit of making the invocation of all the user facing Spark scripts consistent. **Details.** `bin/pyspark` inherently handles two cases: (1) running python applications and (2) running the python shell. For (1), Spark submit already handles running python applications. For cases in which `bin/pyspark` is given a python file, we can simply call pass the file directly to Spark submit and let it handle the rest. For case (2), `bin/pyspark` starts a python process as before, which launches the JVM as a sub-process. The existing code already provides a code path to do this. All we needed to change is to use `bin/spark-submit` instead of `spark-class` to launch the JVM. This requires modifications to Spark submit to handle the pyspark shell as a special case. This has been tested locally (OSX and Windows 7), on a standalone cluster, and on a YARN cluster. Running IPython also works as before, except now it takes in Spark submit arguments too. Author: Andrew Or <andrewor14@gmail.com> Closes #799 from andrewor14/pyspark-submit and squashes the following commits: bf37e36 [Andrew Or] Minor changes 01066fa [Andrew Or] bin/pyspark for Windows c8cb3bf [Andrew Or] Handle perverse app names (with escaped quotes) 1866f85 [Andrew Or] Windows is not cooperating 456d844 [Andrew Or] Guard against shlex hanging if PYSPARK_SUBMIT_ARGS is not set 7eebda8 [Andrew Or] Merge branch 'master' of github.com:apache/spark into pyspark-submit b7ba0d8 [Andrew Or] Address a few comments (minor) 06eb138 [Andrew Or] Use shlex instead of writing our own parser 05879fa [Andrew Or] Merge branch 'master' of github.com:apache/spark into pyspark-submit a823661 [Andrew Or] Fix --die-on-broken-pipe not propagated properly 6fba412 [Andrew Or] Deal with quotes + address various comments fe4c8a7 [Andrew Or] Update --help for bin/pyspark afe47bf [Andrew Or] Fix spark shell f04aaa4 [Andrew Or] Merge branch 'master' of github.com:apache/spark into pyspark-submit a371d26 [Andrew Or] Route bin/pyspark through Spark submit (cherry picked from commit 4b8ec6fcfd7a7ef0857d5b21917183c181301c95) Signed-off-by: Patrick Wendell <pwendell@gmail.com>
* Version bump of spark-ec2 scriptsPatrick Wendell2014-05-161-1/+1
| | | | | | | | | | | | | This will allow us to change things in spark-ec2 related to the 1.0 release. Author: Patrick Wendell <pwendell@gmail.com> Closes #809 from pwendell/spark-ec2 and squashes the following commits: 59117fb [Patrick Wendell] Version bump of spark-ec2 scripts (cherry picked from commit c0ab85d7320cea90e6331fb03a70349bc804c1b1) Signed-off-by: Patrick Wendell <pwendell@gmail.com>
* SPARK-1864 Look in spark conf instead of system properties when propagating ↵Michael Armbrust2014-05-161-4/+5
| | | | | | | | | | | | | configuration to executors. Author: Michael Armbrust <michael@databricks.com> Closes #808 from marmbrus/confClasspath and squashes the following commits: 4c31d57 [Michael Armbrust] Look in spark conf instead of system properties when propagating configuration to executors. (cherry picked from commit a80a6a139e729ee3f81ec4f0028e084d2d9f7e82) Signed-off-by: Patrick Wendell <pwendell@gmail.com>
* Tweaks to Mesos docsMatei Zaharia2014-05-161-37/+34
| | | | | | | | | | | | | | | - Mention Apache downloads first - Shorten some wording Author: Matei Zaharia <matei@databricks.com> Closes #806 from mateiz/doc-update and squashes the following commits: d9345cd [Matei Zaharia] typo a179f8d [Matei Zaharia] Tweaks to Mesos docs (cherry picked from commit fed6303f29250bd5e656dbdd731b38938c933a61) Signed-off-by: Matei Zaharia <matei@databricks.com>
* [SQL] Implement between in hqlMichael Armbrust2014-05-163-0/+21
| | | | | | | | | | | | | Author: Michael Armbrust <michael@databricks.com> Closes #804 from marmbrus/between and squashes the following commits: ae24672 [Michael Armbrust] add golden answer. d9997ef [Michael Armbrust] Implement between in hql. 9bd4433 [Michael Armbrust] Better error on parse failures. (cherry picked from commit 032d6632ad4ab88c97c9e568b63169a114220a02) Signed-off-by: Reynold Xin <rxin@apache.org>
* bugfix: overflow of graphx Edge compare functionZhen Peng2014-05-162-2/+47
| | | | | | | | | | | | Author: Zhen Peng <zhenpeng01@baidu.com> Closes #769 from zhpengg/bugfix-graphx-edge-compare and squashes the following commits: 8a978ff [Zhen Peng] add ut for graphx Edge.lexicographicOrdering.compare 413c258 [Zhen Peng] there maybe a overflow for two Long's substraction (cherry picked from commit fa6de408a131a3e84350a60af74a92c323dfc5eb) Signed-off-by: Reynold Xin <rxin@apache.org>
* [maven-release-plugin] prepare for next development iterationPatrick Wendell2014-05-1621-22/+22
|
* [maven-release-plugin] prepare release v1.0.0-rc8Patrick Wendell2014-05-1621-24/+24
|
* Revert "[maven-release-plugin] prepare release v1.0.0-rc7"Patrick Wendell2014-05-1621-24/+24
| | | | This reverts commit 9212b3e5bb5545ccfce242da8d89108e6fb1c464.
* Revert "[maven-release-plugin] prepare for next development iteration"Patrick Wendell2014-05-1621-22/+22
| | | | This reverts commit c4746aa6fe4aaf383e69e34353114d36d1eb9ba6.