[SPARK-8437] [DOCS] Corrected: Using directory path without wildcard for filename slow for large number of files with wholeTextFiles and binaryFiles

Note that 'dir/*' can be more efficient in some Hadoop FS implementations that 'dir/' (now fixed scaladoc by using HTML entity for *) Author: Sean Owen <sowen@cloudera.com> Closes #7126 from srowen/SPARK-8437.2 and squashes the following commits: 7bb45da [Sean Owen] Note that 'dir/*' can be more efficient in some Hadoop FS implementations that 'dir/' (now fixed scaladoc by using HTML entity for *) (cherry picked from commit ada384b785c663392a0b69fad5bfe7a0a0584ee0) Signed-off-by: Andrew Or <andrew@databricks.com>
author: Sean Owen <sowen@cloudera.com> 2015-06-30 10:07:26 -0700
committer: Andrew Or <andrew@databricks.com> 2015-06-30 10:07:34 -0700
commit: 255b2be94bbd2b527175d8e7a5a2b89fecf8a835 (patch)
tree: 82dac6883c74bb01741fe522876bd9e803a13a63
parent: eab1d16a7abebbf901fcfe7e997ac015ed4e4cf7 (diff)
download: spark-255b2be94bbd2b527175d8e7a5a2b89fecf8a835.tar.gz
spark-255b2be94bbd2b527175d8e7a5a2b89fecf8a835.tar.bz2
spark-255b2be94bbd2b527175d8e7a5a2b89fecf8a835.zip
1 files changed, 5 insertions, 3 deletions
diff --git a/core/src/main/scala/org/apache/spark/SparkContext.scala b/core/src/main/scala/org/apache/spark/SparkContext.scala
index b4c0d4c2f5..d499aba790 100644
--- a/core/src/main/scala/org/apache/spark/SparkContext.scala
+++ b/core/src/main/scala/org/apache/spark/SparkContext.scala
@@ -824,7 +824,8 @@ class SparkContext(config: SparkConf) extends Logging with ExecutorAllocationCli
    * }}}
    *
    * @note Small files are preferred, large file is also allowable, but may cause bad performance.
-   *
+   * @note On some filesystems, `.../path/&#42;` can be a more efficient way to read all files
+   *       in a directory rather than `.../path/` or `.../path`
    * @param minPartitions A suggestion value of the minimal splitting number for input data.
    */
   def wholeTextFiles(
@@ -871,9 +872,10 @@ class SparkContext(config: SparkConf) extends Logging with ExecutorAllocationCli
    *   (a-hdfs-path/part-nnnnn, its content)
    * }}}
    *
-   * @param minPartitions A suggestion value of the minimal splitting number for input data.
-   *
    * @note Small files are preferred; very large files may cause bad performance.
+   * @note On some filesystems, `.../path/&#42;` can be a more efficient way to read all files
+   *       in a directory rather than `.../path/` or `.../path`
+   * @param minPartitions A suggestion value of the minimal splitting number for input data.
    */
   @Experimental
   def binaryFiles(
author	Sean Owen <sowen@cloudera.com>	2015-06-30 10:07:26 -0700
committer	Andrew Or <andrew@databricks.com>	2015-06-30 10:07:34 -0700
commit	255b2be94bbd2b527175d8e7a5a2b89fecf8a835 (patch)
tree	82dac6883c74bb01741fe522876bd9e803a13a63
parent	eab1d16a7abebbf901fcfe7e997ac015ed4e4cf7 (diff)
download	spark-255b2be94bbd2b527175d8e7a5a2b89fecf8a835.tar.gz spark-255b2be94bbd2b527175d8e7a5a2b89fecf8a835.tar.bz2 spark-255b2be94bbd2b527175d8e7a5a2b89fecf8a835.zip