diff options
Diffstat (limited to 'docs')
-rw-r--r-- | docs/configuration.md | 12 | ||||
-rw-r--r-- | docs/python-programming-guide.md | 11 |
2 files changed, 15 insertions, 8 deletions
diff --git a/docs/configuration.md b/docs/configuration.md index 036a0df480..a7054b4321 100644 --- a/docs/configuration.md +++ b/docs/configuration.md @@ -202,7 +202,7 @@ Apart from these, the following properties are also available, and may be useful <td>10</td> <td> Maximum message size to allow in "control plane" communication (for serialized tasks and task - results), in MB. Increase this if your tasks need to send back large results to the master + results), in MB. Increase this if your tasks need to send back large results to the driver (e.g. using <code>collect()</code> on a large dataset). </td> </tr> @@ -211,7 +211,7 @@ Apart from these, the following properties are also available, and may be useful <td>4</td> <td> Number of actor threads to use for communication. Can be useful to increase on large clusters - when the master has a lot of CPU cores. + when the driver has a lot of CPU cores. </td> </tr> <tr> @@ -222,17 +222,17 @@ Apart from these, the following properties are also available, and may be useful </td> </tr> <tr> - <td>spark.master.host</td> + <td>spark.driver.host</td> <td>(local hostname)</td> <td> - Hostname or IP address for the master to listen on. + Hostname or IP address for the driver to listen on. </td> </tr> <tr> - <td>spark.master.port</td> + <td>spark.driver.port</td> <td>(random)</td> <td> - Port for the master to listen on. + Port for the driver to listen on. </td> </tr> <tr> diff --git a/docs/python-programming-guide.md b/docs/python-programming-guide.md index a840b9b34b..4e84d23edf 100644 --- a/docs/python-programming-guide.md +++ b/docs/python-programming-guide.md @@ -67,13 +67,20 @@ The script automatically adds the `pyspark` package to the `PYTHONPATH`. # Interactive Use -The `pyspark` script launches a Python interpreter that is configured to run PySpark jobs. -When run without any input files, `pyspark` launches a shell that can be used explore data interactively, which is a simple way to learn the API: +The `pyspark` script launches a Python interpreter that is configured to run PySpark jobs. To use `pyspark` interactively, first build Spark, then launch it directly from the command line without any options: + +{% highlight bash %} +$ sbt/sbt package +$ ./pyspark +{% endhighlight %} + +The Python shell can be used explore data interactively and is a simple way to learn the API: {% highlight python %} >>> words = sc.textFile("/usr/share/dict/words") >>> words.filter(lambda w: w.startswith("spar")).take(5) [u'spar', u'sparable', u'sparada', u'sparadrap', u'sparagrass'] +>>> help(pyspark) # Show all pyspark functions {% endhighlight %} By default, the `pyspark` shell creates SparkContext that runs jobs locally. |