diff options
author | Reynold Xin <rxin@databricks.com> | 2016-05-18 19:16:28 -0700 |
---|---|---|
committer | Reynold Xin <rxin@databricks.com> | 2016-05-18 19:16:28 -0700 |
commit | 4987f39ac7a694e1c8b8b82246eb4fbd863201c4 (patch) | |
tree | ab3752196641559a226ec4b9a8afab5357070ac4 | |
parent | 9c2a376e413b0701097b0784bd725e4ca87cd837 (diff) | |
download | spark-4987f39ac7a694e1c8b8b82246eb4fbd863201c4.tar.gz spark-4987f39ac7a694e1c8b8b82246eb4fbd863201c4.tar.bz2 spark-4987f39ac7a694e1c8b8b82246eb4fbd863201c4.zip |
[SPARK-14463][SQL] Document the semantics for read.text
## What changes were proposed in this pull request?
This patch is a follow-up to https://github.com/apache/spark/pull/13104 and adds documentation to clarify the semantics of read.text with respect to partitioning.
## How was this patch tested?
N/A
Author: Reynold Xin <rxin@databricks.com>
Closes #13184 from rxin/SPARK-14463.
-rw-r--r-- | R/pkg/R/SQLContext.R | 2 | ||||
-rw-r--r-- | python/pyspark/sql/readwriter.py | 3 | ||||
-rw-r--r-- | sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala | 8 |
3 files changed, 11 insertions, 2 deletions
diff --git a/R/pkg/R/SQLContext.R b/R/pkg/R/SQLContext.R index 3824e0a995..6b7a341bee 100644 --- a/R/pkg/R/SQLContext.R +++ b/R/pkg/R/SQLContext.R @@ -298,6 +298,8 @@ parquetFile <- function(sqlContext, ...) { #' Create a SparkDataFrame from a text file. #' #' Loads a text file and returns a SparkDataFrame with a single string column named "value". +#' If the directory structure of the text files contains partitioning information, those are +#' ignored in the resulting DataFrame. #' Each line in the text file is a new row in the resulting SparkDataFrame. #' #' @param sqlContext SQLContext to use diff --git a/python/pyspark/sql/readwriter.py b/python/pyspark/sql/readwriter.py index 8e6bce9001..855c9d666f 100644 --- a/python/pyspark/sql/readwriter.py +++ b/python/pyspark/sql/readwriter.py @@ -286,6 +286,9 @@ class DataFrameReader(object): @since(1.6) def text(self, paths): """Loads a text file and returns a [[DataFrame]] with a single string column named "value". + If the directory structure of the text files contains partitioning information, + those are ignored in the resulting DataFrame. To include partitioning information as + columns, use ``read.format('text').load(...)``. Each line in the text file is a new row in the resulting DataFrame. diff --git a/sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala b/sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala index e33fd831ab..57a2091fe8 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala @@ -440,10 +440,14 @@ class DataFrameReader private[sql](sparkSession: SparkSession) extends Logging { } /** - * Loads a text file and returns a [[Dataset]] of String. The underlying schema of the Dataset + * Loads text files and returns a [[Dataset]] of String. The underlying schema of the Dataset * contains a single string column named "value". * - * Each line in the text file is a new row in the resulting Dataset. For example: + * If the directory structure of the text files contains partitioning information, those are + * ignored in the resulting Dataset. To include partitioning information as columns, use + * `read.format("text").load("...")`. + * + * Each line in the text files is a new element in the resulting Dataset. For example: * {{{ * // Scala: * spark.read.text("/path/to/spark/README.md") |