diff options
Diffstat (limited to 'docs')
-rw-r--r-- | docs/configuration.md | 41 |
1 files changed, 6 insertions, 35 deletions
diff --git a/docs/configuration.md b/docs/configuration.md index da70cabba2..3bb655075f 100644 --- a/docs/configuration.md +++ b/docs/configuration.md @@ -18,8 +18,8 @@ Spark provides three locations to configure the system: Spark properties control most application settings and are configured separately for each application. The preferred way to set them is by passing a [SparkConf](api/core/index.html#org.apache.spark.SparkConf) class to your SparkContext constructor. -Alternatively, Spark will also load them from Java system properties (for compatibility with old versions -of Spark) and from a [`spark.conf` file](#configuration-files) on your classpath. +Alternatively, Spark will also load them from Java system properties, for compatibility with old versions +of Spark. SparkConf lets you configure most of the common properties to initialize a cluster (e.g., master URL and application name), as well as arbitrary key-value pairs through the `set()` method. For example, we could @@ -98,7 +98,7 @@ Apart from these, the following properties are also available, and may be useful <td>spark.default.parallelism</td> <td>8</td> <td> - Default number of tasks to use for distributed shuffle operations (<code>groupByKey</code>, + Default number of tasks to use across the cluster for distributed shuffle operations (<code>groupByKey</code>, <code>reduceByKey</code>, etc) when not set by user. </td> </tr> @@ -158,7 +158,9 @@ Apart from these, the following properties are also available, and may be useful <td>spark.shuffle.spill.compress</td> <td>true</td> <td> - Whether to compress data spilled during shuffles. + Whether to compress data spilled during shuffles. If enabled, spill compression + always uses the `org.apache.spark.io.LZFCompressionCodec` codec, + regardless of the value of `spark.io.compression.codec`. </td> </tr> <tr> @@ -379,13 +381,6 @@ Apart from these, the following properties are also available, and may be useful Too large a value decreases parallelism during broadcast (makes it slower); however, if it is too small, <code>BlockManager</code> might take a performance hit. </td> </tr> -<tr> - <td>akka.x.y....</td> - <td>value</td> - <td> - An arbitrary akka configuration can be set directly on spark conf and it is applied for all the ActorSystems created spark wide for that SparkContext and its assigned executors as well. - </td> -</tr> <tr> <td>spark.shuffle.consolidateFiles</td> @@ -468,30 +463,6 @@ Apart from these, the following properties are also available, and may be useful The application web UI at `http://<driver>:4040` lists Spark properties in the "Environment" tab. This is a useful place to check to make sure that your properties have been set correctly. -## Configuration Files - -You can also configure Spark properties through a `spark.conf` file on your Java classpath. -Because these properties are usually application-specific, we recommend putting this fine *only* on your -application's classpath, and not in a global Spark classpath. - -The `spark.conf` file uses Typesafe Config's [HOCON format](https://github.com/typesafehub/config#json-superset), -which is a superset of Java properties files and JSON. For example, the following is a simple config file: - -{% highlight awk %} -# Comments are allowed -spark.executor.memory = 512m -spark.serializer = org.apache.spark.serializer.KryoSerializer -{% endhighlight %} - -The format also allows hierarchical nesting, as follows: - -{% highlight awk %} -spark.akka { - threads = 8 - timeout = 200 -} -{% endhighlight %} - # Environment Variables Certain Spark settings can be configured through environment variables, which are read from the `conf/spark-env.sh` |