Merge remote-tracking branch 'spark-upstream/master'

Conflicts: project/SparkBuild.scala
author: Ankur Dave <ankurdave@gmail.com> 2013-10-30 15:59:09 -0700
committer: Ankur Dave <ankurdave@gmail.com> 2013-10-30 15:59:09 -0700
commit: 5064f9b2d22b9d28734bf19d825d20292a3b0fd9 (patch)
tree: 5dc2b23dcb54091585dccd3d11810e3110706428 /docs
parent: a0c86c36896c20cd70a8fecfe23284486f898883 (diff)
parent: 618c1f6cf3008caae7a8c0202721a6bd77d29a0f (diff)
download: spark-5064f9b2d22b9d28734bf19d825d20292a3b0fd9.tar.gz
spark-5064f9b2d22b9d28734bf19d825d20292a3b0fd9.tar.bz2
spark-5064f9b2d22b9d28734bf19d825d20292a3b0fd9.zip
4 files changed, 27 insertions, 4 deletions
diff --git a/docs/configuration.md b/docs/configuration.md
index 7940d41a27..97183bafdb 100644
--- a/docs/configuration.md
+++ b/docs/configuration.md
@@ -149,7 +149,7 @@ Apart from these, the following properties are also available, and may be useful
   <td>spark.io.compression.codec</td>
   <td>org.apache.spark.io.<br />LZFCompressionCodec</td>
   <td>
-    The compression codec class to use for various compressions. By default, Spark provides two
+    The codec used to compress internal data such as RDD partitions and shuffle outputs. By default, Spark provides two
     codecs: <code>org.apache.spark.io.LZFCompressionCodec</code> and <code>org.apache.spark.io.SnappyCompressionCodec</code>.
   </td>
 </tr>
@@ -319,6 +319,14 @@ Apart from these, the following properties are also available, and may be useful
     Should be greater than or equal to 1. Number of allowed retries = this value - 1.
   </td>
 </tr>
+<tr>
+  <td>spark.broadcast.blockSize</td>
+  <td>4096</td>
+  <td>
+    Size of each piece of a block in kilobytes for <code>TorrentBroadcastFactory</code>. 
+    Too large a value decreases parallelism during broadcast (makes it slower); however, if it is too small, <code>BlockManager</code> might take a performance hit.
+  </td>
+</tr>
 
 </table>
 
diff --git a/docs/python-programming-guide.md b/docs/python-programming-guide.md
index 6c2336ad0c..55e39b1de1 100644
--- a/docs/python-programming-guide.md
+++ b/docs/python-programming-guide.md
@@ -131,6 +131,17 @@ sc = SparkContext("local", "App Name", pyFiles=['MyFile.py', 'lib.zip', 'app.egg
 Files listed here will be added to the `PYTHONPATH` and shipped to remote worker machines.
 Code dependencies can be added to an existing SparkContext using its `addPyFile()` method.
 
+You can set [system properties](configuration.html#system-properties)
+using `SparkContext.setSystemProperty()` class method *before*
+instantiating SparkContext. For example, to set the amount of memory
+per executor process:
+
+{% highlight python %}
+from pyspark import SparkContext
+SparkContext.setSystemProperty('spark.executor.memory', '2g')
+sc = SparkContext("local", "App Name")
+{% endhighlight %}
+
 # API Docs
 
 [API documentation](api/pyspark/index.html) for PySpark is available as Epydoc.
diff --git a/docs/scala-programming-guide.md b/docs/scala-programming-guide.md
index 03647a2ad2..94e8563a8b 100644
--- a/docs/scala-programming-guide.md
+++ b/docs/scala-programming-guide.md
@@ -142,7 +142,7 @@ All transformations in Spark are <i>lazy</i>, in that they do not compute their
 
 By default, each transformed RDD is recomputed each time you run an action on it. However, you may also *persist* an RDD in memory using the `persist` (or `cache`) method, in which case Spark will keep the elements around on the cluster for much faster access the next time you query it. There is also support for persisting datasets on disk, or replicated across the cluster. The next section in this document describes these options.
 
-The following tables list the transformations and actions currently supported (see also the [RDD API doc](api/core/index.html#org.apache.spark.RDD) for details):
+The following tables list the transformations and actions currently supported (see also the [RDD API doc](api/core/index.html#org.apache.spark.rdd.RDD) for details):
 
 ### Transformations
 
@@ -211,7 +211,7 @@ The following tables list the transformations and actions currently supported (s
 </tr>
 </table>
 
-A complete list of transformations is available in the [RDD API doc](api/core/index.html#org.apache.spark.RDD).
+A complete list of transformations is available in the [RDD API doc](api/core/index.html#org.apache.spark.rdd.RDD).
 
 ### Actions
 
@@ -259,7 +259,7 @@ A complete list of transformations is available in the [RDD API doc](api/core/in
 </tr>
 </table>
 
-A complete list of actions is available in the [RDD API doc](api/core/index.html#org.apache.spark.RDD).
+A complete list of actions is available in the [RDD API doc](api/core/index.html#org.apache.spark.rdd.RDD).
 
 ## RDD Persistence
 
diff --git a/docs/streaming-programming-guide.md b/docs/streaming-programming-guide.md
index 835b257238..851e30fe76 100644
--- a/docs/streaming-programming-guide.md
+++ b/docs/streaming-programming-guide.md
@@ -73,6 +73,10 @@ DStreams support many of the transformations available on normal Spark RDD's:
     Iterator[T] => Iterator[U] when running on an DStream of type T. </td>
 </tr>
 <tr>
+  <td> <b>repartition</b>(<i>numPartitions</i>) </td>
+  <td> Changes the level of parallelism in this DStream by creating more or fewer partitions. </td>
+</tr>
+<tr>
   <td> <b>union</b>(<i>otherStream</i>) </td>
   <td> Return a new DStream that contains the union of the elements in the source DStream and the argument DStream. </td>
 </tr>
author	Ankur Dave <ankurdave@gmail.com>	2013-10-30 15:59:09 -0700
committer	Ankur Dave <ankurdave@gmail.com>	2013-10-30 15:59:09 -0700
commit	5064f9b2d22b9d28734bf19d825d20292a3b0fd9 (patch)
tree	5dc2b23dcb54091585dccd3d11810e3110706428 /docs
parent	a0c86c36896c20cd70a8fecfe23284486f898883 (diff)
parent	618c1f6cf3008caae7a8c0202721a6bd77d29a0f (diff)
download	spark-5064f9b2d22b9d28734bf19d825d20292a3b0fd9.tar.gz spark-5064f9b2d22b9d28734bf19d825d20292a3b0fd9.tar.bz2 spark-5064f9b2d22b9d28734bf19d825d20292a3b0fd9.zip