[SPARK-18723][DOC] Expanded programming guide information on wholeTex…

## What changes were proposed in this pull request? Add additional information to wholeTextFiles in the Programming Guide. Also explain partitioning policy difference in relation to textFile and its impact on performance. Also added reference to the underlying CombineFileInputFormat ## How was this patch tested? Manual build of documentation and inspection in browser ``` cd docs jekyll serve --watch ``` Author: Michal Senkyr <mike.senkyr@gmail.com> Closes #16157 from michalsenkyr/wholeTextFilesExpandedDocs.
author: Michal Senkyr <mike.senkyr@gmail.com> 2016-12-16 17:43:39 +0000
committer: Sean Owen <sowen@cloudera.com> 2016-12-16 17:43:39 +0000
commit: 836c95b108ddd350b10796c97fc30b13371fb0fb (patch)
tree: ea74001197a1a42b97a7b7039d34e80cb18a108d /core/src
parent: dc2a4d4ad478fdb0486cc0515d4fe8b402d24db4 (diff)
download: spark-836c95b108ddd350b10796c97fc30b13371fb0fb.tar.gz
spark-836c95b108ddd350b10796c97fc30b13371fb0fb.tar.bz2
spark-836c95b108ddd350b10796c97fc30b13371fb0fb.zip
1 files changed, 4 insertions, 0 deletions
diff --git a/core/src/main/scala/org/apache/spark/SparkContext.scala b/core/src/main/scala/org/apache/spark/SparkContext.scala
index 02c009cdb5..bd3f454485 100644
--- a/core/src/main/scala/org/apache/spark/SparkContext.scala
+++ b/core/src/main/scala/org/apache/spark/SparkContext.scala
@@ -851,6 +851,8 @@ class SparkContext(config: SparkConf) extends Logging {
    * @note Small files are preferred, large file is also allowable, but may cause bad performance.
    * @note On some filesystems, `.../path/&#42;` can be a more efficient way to read all files
    *       in a directory rather than `.../path/` or `.../path`
+   * @note Partitioning is determined by data locality. This may result in too few partitions
+   *       by default.
    *
    * @param path Directory to the input data files, the path can be comma separated paths as the
    *             list of inputs.
@@ -900,6 +902,8 @@ class SparkContext(config: SparkConf) extends Logging {
    * @note Small files are preferred; very large files may cause bad performance.
    * @note On some filesystems, `.../path/&#42;` can be a more efficient way to read all files
    *       in a directory rather than `.../path/` or `.../path`
+   * @note Partitioning is determined by data locality. This may result in too few partitions
+   *       by default.
    *
    * @param path Directory to the input data files, the path can be comma separated paths as the
    *             list of inputs.
author	Michal Senkyr <mike.senkyr@gmail.com>	2016-12-16 17:43:39 +0000
committer	Sean Owen <sowen@cloudera.com>	2016-12-16 17:43:39 +0000
commit	836c95b108ddd350b10796c97fc30b13371fb0fb (patch)
tree	ea74001197a1a42b97a7b7039d34e80cb18a108d /core/src
parent	dc2a4d4ad478fdb0486cc0515d4fe8b402d24db4 (diff)
download	spark-836c95b108ddd350b10796c97fc30b13371fb0fb.tar.gz spark-836c95b108ddd350b10796c97fc30b13371fb0fb.tar.bz2 spark-836c95b108ddd350b10796c97fc30b13371fb0fb.zip