[SPARK-14463][SQL] Document the semantics for read.text

## What changes were proposed in this pull request? This patch is a follow-up to https://github.com/apache/spark/pull/13104 and adds documentation to clarify the semantics of read.text with respect to partitioning. ## How was this patch tested? N/A Author: Reynold Xin <rxin@databricks.com> Closes #13184 from rxin/SPARK-14463.
author: Reynold Xin <rxin@databricks.com> 2016-05-18 19:16:28 -0700
committer: Reynold Xin <rxin@databricks.com> 2016-05-18 19:16:28 -0700
commit: 4987f39ac7a694e1c8b8b82246eb4fbd863201c4 (patch)
tree: ab3752196641559a226ec4b9a8afab5357070ac4
parent: 9c2a376e413b0701097b0784bd725e4ca87cd837 (diff)
download: spark-4987f39ac7a694e1c8b8b82246eb4fbd863201c4.tar.gz
spark-4987f39ac7a694e1c8b8b82246eb4fbd863201c4.tar.bz2
spark-4987f39ac7a694e1c8b8b82246eb4fbd863201c4.zip
3 files changed, 11 insertions, 2 deletions
diff --git a/R/pkg/R/SQLContext.R b/R/pkg/R/SQLContext.R
index 3824e0a995..6b7a341bee 100644
--- a/R/pkg/R/SQLContext.R
+++ b/R/pkg/R/SQLContext.R
@@ -298,6 +298,8 @@ parquetFile <- function(sqlContext, ...) {
 #' Create a SparkDataFrame from a text file.
 #'
 #' Loads a text file and returns a SparkDataFrame with a single string column named "value".
+#' If the directory structure of the text files contains partitioning information, those are
+#' ignored in the resulting DataFrame.
 #' Each line in the text file is a new row in the resulting SparkDataFrame.
 #'
 #' @param sqlContext SQLContext to use
diff --git a/python/pyspark/sql/readwriter.py b/python/pyspark/sql/readwriter.py
index 8e6bce9001..855c9d666f 100644
--- a/python/pyspark/sql/readwriter.py
+++ b/python/pyspark/sql/readwriter.py
@@ -286,6 +286,9 @@ class DataFrameReader(object):
     @since(1.6)
     def text(self, paths):
         """Loads a text file and returns a [[DataFrame]] with a single string column named "value".
+        If the directory structure of the text files contains partitioning information,
+        those are ignored in the resulting DataFrame. To include partitioning information as
+        columns, use ``read.format('text').load(...)``.
 
         Each line in the text file is a new row in the resulting DataFrame.
 
diff --git a/sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala b/sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala
index e33fd831ab..57a2091fe8 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala
@@ -440,10 +440,14 @@ class DataFrameReader private[sql](sparkSession: SparkSession) extends Logging {
   }
 
   /**
-   * Loads a text file and returns a [[Dataset]] of String. The underlying schema of the Dataset
+   * Loads text files and returns a [[Dataset]] of String. The underlying schema of the Dataset
    * contains a single string column named "value".
    *
-   * Each line in the text file is a new row in the resulting Dataset. For example:
+   * If the directory structure of the text files contains partitioning information, those are
+   * ignored in the resulting Dataset. To include partitioning information as columns, use
+   * `read.format("text").load("...")`.
+   *
+   * Each line in the text files is a new element in the resulting Dataset. For example:
    * {{{
    *   // Scala:
    *   spark.read.text("/path/to/spark/README.md")
author	Reynold Xin <rxin@databricks.com>	2016-05-18 19:16:28 -0700
committer	Reynold Xin <rxin@databricks.com>	2016-05-18 19:16:28 -0700
commit	4987f39ac7a694e1c8b8b82246eb4fbd863201c4 (patch)
tree	ab3752196641559a226ec4b9a8afab5357070ac4
parent	9c2a376e413b0701097b0784bd725e4ca87cd837 (diff)
download	spark-4987f39ac7a694e1c8b8b82246eb4fbd863201c4.tar.gz spark-4987f39ac7a694e1c8b8b82246eb4fbd863201c4.tar.bz2 spark-4987f39ac7a694e1c8b8b82246eb4fbd863201c4.zip