diff options
author | Reynold Xin <rxin@databricks.com> | 2016-05-18 19:16:28 -0700 |
---|---|---|
committer | Reynold Xin <rxin@databricks.com> | 2016-05-18 19:16:28 -0700 |
commit | 4987f39ac7a694e1c8b8b82246eb4fbd863201c4 (patch) | |
tree | ab3752196641559a226ec4b9a8afab5357070ac4 /python/pyspark/sql | |
parent | 9c2a376e413b0701097b0784bd725e4ca87cd837 (diff) | |
download | spark-4987f39ac7a694e1c8b8b82246eb4fbd863201c4.tar.gz spark-4987f39ac7a694e1c8b8b82246eb4fbd863201c4.tar.bz2 spark-4987f39ac7a694e1c8b8b82246eb4fbd863201c4.zip |
[SPARK-14463][SQL] Document the semantics for read.text
## What changes were proposed in this pull request?
This patch is a follow-up to https://github.com/apache/spark/pull/13104 and adds documentation to clarify the semantics of read.text with respect to partitioning.
## How was this patch tested?
N/A
Author: Reynold Xin <rxin@databricks.com>
Closes #13184 from rxin/SPARK-14463.
Diffstat (limited to 'python/pyspark/sql')
-rw-r--r-- | python/pyspark/sql/readwriter.py | 3 |
1 files changed, 3 insertions, 0 deletions
diff --git a/python/pyspark/sql/readwriter.py b/python/pyspark/sql/readwriter.py index 8e6bce9001..855c9d666f 100644 --- a/python/pyspark/sql/readwriter.py +++ b/python/pyspark/sql/readwriter.py @@ -286,6 +286,9 @@ class DataFrameReader(object): @since(1.6) def text(self, paths): """Loads a text file and returns a [[DataFrame]] with a single string column named "value". + If the directory structure of the text files contains partitioning information, + those are ignored in the resulting DataFrame. To include partitioning information as + columns, use ``read.format('text').load(...)``. Each line in the text file is a new row in the resulting DataFrame. |