aboutsummaryrefslogtreecommitdiff
path: root/docs/configuration.md
diff options
context:
space:
mode:
authorgatorsmile <gatorsmile@gmail.com>2015-12-18 20:06:05 -0800
committerDavies Liu <davies.liu@gmail.com>2015-12-18 20:06:05 -0800
commit499ac3e69a102f9b10a1d7e14382fa191516f7b5 (patch)
treedcb84cda0ddb75094a39946dfd09f0ed29dd058c /docs/configuration.md
parenta78a91f4d7239c14bd5d0b18cdc87d55594a8d8a (diff)
downloadspark-499ac3e69a102f9b10a1d7e14382fa191516f7b5.tar.gz
spark-499ac3e69a102f9b10a1d7e14382fa191516f7b5.tar.bz2
spark-499ac3e69a102f9b10a1d7e14382fa191516f7b5.zip
[SPARK-12091] [PYSPARK] Deprecate the JAVA-specific deserialized storage levels
The current default storage level of Python persist API is MEMORY_ONLY_SER. This is different from the default level MEMORY_ONLY in the official document and RDD APIs. davies Is this inconsistency intentional? Thanks! Updates: Since the data is always serialized on the Python side, the storage levels of JAVA-specific deserialization are not removed, such as MEMORY_ONLY. Updates: Based on the reviewers' feedback. In Python, stored objects will always be serialized with the [Pickle](https://docs.python.org/2/library/pickle.html) library, so it does not matter whether you choose a serialized level. The available storage levels in Python include `MEMORY_ONLY`, `MEMORY_ONLY_2`, `MEMORY_AND_DISK`, `MEMORY_AND_DISK_2`, `DISK_ONLY`, `DISK_ONLY_2` and `OFF_HEAP`. Author: gatorsmile <gatorsmile@gmail.com> Closes #10092 from gatorsmile/persistStorageLevel.
Diffstat (limited to 'docs/configuration.md')
-rw-r--r--docs/configuration.md7
1 files changed, 4 insertions, 3 deletions
diff --git a/docs/configuration.md b/docs/configuration.md
index 38d3d059f9..85e7d1202d 100644
--- a/docs/configuration.md
+++ b/docs/configuration.md
@@ -687,9 +687,10 @@ Apart from these, the following properties are also available, and may be useful
<td><code>spark.rdd.compress</code></td>
<td>false</td>
<td>
- Whether to compress serialized RDD partitions (e.g. for
- <code>StorageLevel.MEMORY_ONLY_SER</code>). Can save substantial space at the cost of some
- extra CPU time.
+ Whether to compress serialized RDD partitions (e.g. for
+ <code>StorageLevel.MEMORY_ONLY_SER</code> in Java
+ and Scala or <code>StorageLevel.MEMORY_ONLY</code> in Python).
+ Can save substantial space at the cost of some extra CPU time.
</td>
</tr>
<tr>