diff options
author | Matei Zaharia <matei@eecs.berkeley.edu> | 2013-09-07 00:34:12 -0400 |
---|---|---|
committer | Matei Zaharia <matei@eecs.berkeley.edu> | 2013-09-08 00:29:11 -0700 |
commit | 651a96adf7b53085bd810e153f8eabf52eed1994 (patch) | |
tree | 70e9c70470c93c4630de0f958eaed4b98706d2ba /docs/python-programming-guide.md | |
parent | 98fb69822cf780160bca51abeaab7c82e49fab54 (diff) | |
download | spark-651a96adf7b53085bd810e153f8eabf52eed1994.tar.gz spark-651a96adf7b53085bd810e153f8eabf52eed1994.tar.bz2 spark-651a96adf7b53085bd810e153f8eabf52eed1994.zip |
More fair scheduler docs and property names.
Also changed uses of "job" terminology to "application" when they
referred to an entire Spark program, to avoid confusion.
Diffstat (limited to 'docs/python-programming-guide.md')
-rw-r--r-- | docs/python-programming-guide.md | 12 |
1 files changed, 6 insertions, 6 deletions
diff --git a/docs/python-programming-guide.md b/docs/python-programming-guide.md index 8c33a953a4..5662e7d02a 100644 --- a/docs/python-programming-guide.md +++ b/docs/python-programming-guide.md @@ -53,20 +53,20 @@ In addition, PySpark fully supports interactive use---simply run `./pyspark` to # Installing and Configuring PySpark PySpark requires Python 2.6 or higher. -PySpark jobs are executed using a standard CPython interpreter in order to support Python modules that use C extensions. +PySpark applications are executed using a standard CPython interpreter in order to support Python modules that use C extensions. We have not tested PySpark with Python 3 or with alternative Python interpreters, such as [PyPy](http://pypy.org/) or [Jython](http://www.jython.org/). By default, PySpark requires `python` to be available on the system `PATH` and use it to run programs; an alternate Python executable may be specified by setting the `PYSPARK_PYTHON` environment variable in `conf/spark-env.sh` (or `.cmd` on Windows). All of PySpark's library dependencies, including [Py4J](http://py4j.sourceforge.net/), are bundled with PySpark and automatically imported. -Standalone PySpark jobs should be run using the `pyspark` script, which automatically configures the Java and Python environment using the settings in `conf/spark-env.sh` or `.cmd`. +Standalone PySpark applications should be run using the `pyspark` script, which automatically configures the Java and Python environment using the settings in `conf/spark-env.sh` or `.cmd`. The script automatically adds the `pyspark` package to the `PYTHONPATH`. # Interactive Use -The `pyspark` script launches a Python interpreter that is configured to run PySpark jobs. To use `pyspark` interactively, first build Spark, then launch it directly from the command line without any options: +The `pyspark` script launches a Python interpreter that is configured to run PySpark applications. To use `pyspark` interactively, first build Spark, then launch it directly from the command line without any options: {% highlight bash %} $ sbt/sbt assembly @@ -82,7 +82,7 @@ The Python shell can be used explore data interactively and is a simple way to l >>> help(pyspark) # Show all pyspark functions {% endhighlight %} -By default, the `pyspark` shell creates SparkContext that runs jobs locally on a single core. +By default, the `pyspark` shell creates SparkContext that runs applications locally on a single core. To connect to a non-local cluster, or use multiple cores, set the `MASTER` environment variable. For example, to use the `pyspark` shell with a [standalone Spark cluster](spark-standalone.html): @@ -119,13 +119,13 @@ IPython also works on a cluster or on multiple cores if you set the `MASTER` env # Standalone Programs PySpark can also be used from standalone Python scripts by creating a SparkContext in your script and running the script using `pyspark`. -The Quick Start guide includes a [complete example](quick-start.html#a-standalone-job-in-python) of a standalone Python job. +The Quick Start guide includes a [complete example](quick-start.html#a-standalone-app-in-python) of a standalone Python application. Code dependencies can be deployed by listing them in the `pyFiles` option in the SparkContext constructor: {% highlight python %} from pyspark import SparkContext -sc = SparkContext("local", "Job Name", pyFiles=['MyFile.py', 'lib.zip', 'app.egg']) +sc = SparkContext("local", "App Name", pyFiles=['MyFile.py', 'lib.zip', 'app.egg']) {% endhighlight %} Files listed here will be added to the `PYTHONPATH` and shipped to remote worker machines. |