diff options
author | Andrew Or <andrewor14@gmail.com> | 2014-05-16 22:36:23 -0700 |
---|---|---|
committer | Patrick Wendell <pwendell@gmail.com> | 2014-05-16 22:36:23 -0700 |
commit | cf6cbe9f76c3b322a968c836d039fc5b70d4ce43 (patch) | |
tree | 7f1269166db1364d6f9393bd65d830a9948ce884 /docs | |
parent | 4b8ec6fcfd7a7ef0857d5b21917183c181301c95 (diff) | |
download | spark-cf6cbe9f76c3b322a968c836d039fc5b70d4ce43.tar.gz spark-cf6cbe9f76c3b322a968c836d039fc5b70d4ce43.tar.bz2 spark-cf6cbe9f76c3b322a968c836d039fc5b70d4ce43.zip |
[SPARK-1824] Remove <master> from Python examples
A recent PR (#552) fixed this for all Scala / Java examples. We need to do it for python too.
Note that this blocks on #799, which makes `bin/pyspark` go through Spark submit. With only the changes in this PR, the only way to run these examples is through Spark submit. Once #799 goes in, you can use `bin/pyspark` to run them too. For example,
```
bin/pyspark examples/src/main/python/pi.py 100 --master local-cluster[4,1,512]
```
Author: Andrew Or <andrewor14@gmail.com>
Closes #802 from andrewor14/python-examples and squashes the following commits:
cf50b9f [Andrew Or] De-indent python comments (minor)
50f80b1 [Andrew Or] Remove pyFiles from SparkContext construction
c362f69 [Andrew Or] Update docs to use spark-submit for python applications
7072c6a [Andrew Or] Merge branch 'master' of github.com:apache/spark into python-examples
427a5f0 [Andrew Or] Update docs
d32072c [Andrew Or] Remove <master> from examples + update usages
Diffstat (limited to 'docs')
-rw-r--r-- | docs/index.md | 11 | ||||
-rw-r--r-- | docs/python-programming-guide.md | 32 |
2 files changed, 24 insertions, 19 deletions
diff --git a/docs/index.md b/docs/index.md index 48182a27d2..c9b10376cc 100644 --- a/docs/index.md +++ b/docs/index.md @@ -43,12 +43,15 @@ The `--master` option specifies the locally with one thread, or `local[N]` to run locally with N threads. You should start by using `local` for testing. For a full list of options, run Spark shell with the `--help` option. -Spark also provides a Python interface. To run an example Spark application written in Python, use -`bin/pyspark <program> [params]`. For example, +Spark also provides a Python interface. To run Spark interactively in a Python interpreter, use +`bin/pyspark`. As in Spark shell, you can also pass in the `--master` option to configure your +master URL. - ./bin/pyspark examples/src/main/python/pi.py local[2] 10 + ./bin/pyspark --master local[2] -or simply `bin/pyspark` without any arguments to run Spark interactively in a python interpreter. +Example applications are also provided in Python. For example, + + ./bin/spark-submit examples/src/main/python/pi.py 10 # Launching on a Cluster diff --git a/docs/python-programming-guide.md b/docs/python-programming-guide.md index 17675acba6..b686bee1ae 100644 --- a/docs/python-programming-guide.md +++ b/docs/python-programming-guide.md @@ -60,13 +60,9 @@ By default, PySpark requires `python` to be available on the system `PATH` and u All of PySpark's library dependencies, including [Py4J](http://py4j.sourceforge.net/), are bundled with PySpark and automatically imported. -Standalone PySpark applications should be run using the `bin/spark-submit` script, which automatically -configures the Java and Python environment for running Spark. - - # Interactive Use -The `bin/pyspark` script launches a Python interpreter that is configured to run PySpark applications. To use `pyspark` interactively, first build Spark, then launch it directly from the command line without any options: +The `bin/pyspark` script launches a Python interpreter that is configured to run PySpark applications. To use `pyspark` interactively, first build Spark, then launch it directly from the command line: {% highlight bash %} $ sbt/sbt assembly @@ -83,20 +79,24 @@ The Python shell can be used explore data interactively and is a simple way to l {% endhighlight %} By default, the `bin/pyspark` shell creates SparkContext that runs applications locally on all of -your machine's logical cores. -To connect to a non-local cluster, or to specify a number of cores, set the `MASTER` environment variable. -For example, to use the `bin/pyspark` shell with a [standalone Spark cluster](spark-standalone.html): +your machine's logical cores. To connect to a non-local cluster, or to specify a number of cores, +set the `--master` flag. For example, to use the `bin/pyspark` shell with a +[standalone Spark cluster](spark-standalone.html): {% highlight bash %} -$ MASTER=spark://IP:PORT ./bin/pyspark +$ ./bin/pyspark --master spark://1.2.3.4:7077 {% endhighlight %} Or, to use exactly four cores on the local machine: {% highlight bash %} -$ MASTER=local[4] ./bin/pyspark +$ ./bin/pyspark --master local[4] {% endhighlight %} +Under the hood `bin/pyspark` is a wrapper around the +[Spark submit script](cluster-overview.html#launching-applications-with-spark-submit), so these +two scripts share the same list of options. For a complete list of options, run `bin/pyspark` with +the `--help` option. ## IPython @@ -115,13 +115,14 @@ the [IPython Notebook](http://ipython.org/notebook.html) with PyLab graphing sup $ IPYTHON_OPTS="notebook --pylab inline" ./bin/pyspark {% endhighlight %} -IPython also works on a cluster or on multiple cores if you set the `MASTER` environment variable. +IPython also works on a cluster or on multiple cores if you set the `--master` flag. # Standalone Programs -PySpark can also be used from standalone Python scripts by creating a SparkContext in your script and running the script using `bin/spark-submit`. -The Quick Start guide includes a [complete example](quick-start.html#standalone-applications) of a standalone Python application. +PySpark can also be used from standalone Python scripts by creating a SparkContext in your script +and running the script using `bin/spark-submit`. The Quick Start guide includes a +[complete example](quick-start.html#standalone-applications) of a standalone Python application. Code dependencies can be deployed by passing .zip or .egg files in the `--py-files` option of `spark-submit`: @@ -138,6 +139,7 @@ You can set [configuration properties](configuration.html#spark-properties) by p {% highlight python %} from pyspark import SparkConf, SparkContext conf = (SparkConf() + .setMaster("local") .setAppName("My app") .set("spark.executor.memory", "1g")) sc = SparkContext(conf = conf) @@ -164,6 +166,6 @@ some example applications. PySpark also includes several sample programs in the [`examples/src/main/python` folder](https://github.com/apache/spark/tree/master/examples/src/main/python). You can run them by passing the files to `pyspark`; e.g.: - ./bin/spark-submit examples/src/main/python/wordcount.py local[2] README.md + ./bin/spark-submit examples/src/main/python/wordcount.py README.md -Each program prints usage help when run without arguments. +Each program prints usage help when run without the sufficient arguments. |