[DOCS][SQL] Add a Note on jsonFile having separate JSON objects per line

* This commit hopes to avoid the confusion I faced when trying to submit a regular, valid multi-line JSON file, also see http://apache-spark-user-list.1001560.n3.nabble.com/Loading-JSON-Dataset-fails-with-com-fasterxml-jackson-databind-JsonMappingException-td20041.html Author: Peter Vandenabeele <peter@vandenabeele.com> Closes #3517 from petervandenabeele/pv-docs-note-on-jsonFile-format/01 and squashes the following commits: 1f98e52 [Peter Vandenabeele] Revert to people.json and simple Note text 6b6e062 [Peter Vandenabeele] Change the "JSON" connotation to "txt" fca7dfb [Peter Vandenabeele] Add a Note on jsonFile having separate JSON objects per line
author: Peter Vandenabeele <peter@vandenabeele.com> 2014-12-16 13:57:55 -0800
committer: Michael Armbrust <michael@databricks.com> 2014-12-16 13:58:01 -0800
commit: 1a9e35e57ab80984b81802ffc461d19cc9239edd (patch)
tree: 2e0502a145c2d18396f171ede71e01550e08d740 /docs/sql-programming-guide.md
parent: 17688d14299f18a93591818ae5fef69e9dc20eb5 (diff)
download: spark-1a9e35e57ab80984b81802ffc461d19cc9239edd.tar.gz
spark-1a9e35e57ab80984b81802ffc461d19cc9239edd.tar.bz2
spark-1a9e35e57ab80984b81802ffc461d19cc9239edd.zip
1 files changed, 12 insertions, 0 deletions
diff --git a/docs/sql-programming-guide.md b/docs/sql-programming-guide.md
index ad51b9cf41..2aea8a8aed 100644
--- a/docs/sql-programming-guide.md
+++ b/docs/sql-programming-guide.md
@@ -625,6 +625,10 @@ This conversion can be done using one of two methods in a SQLContext:
 * `jsonFile` - loads data from a directory of JSON files where each line of the files is a JSON object.
 * `jsonRDD` - loads data from an existing RDD where each element of the RDD is a string containing a JSON object.
 
+Note that the file that is offered as _jsonFile_ is not a typical JSON file. Each
+line must contain a separate, self-contained valid JSON object. As a consequence,
+a regular multi-line JSON file will most often fail.
+
 {% highlight scala %}
 // sc is an existing SparkContext.
 val sqlContext = new org.apache.spark.sql.SQLContext(sc)
@@ -663,6 +667,10 @@ This conversion can be done using one of two methods in a JavaSQLContext :
 * `jsonFile` - loads data from a directory of JSON files where each line of the files is a JSON object.
 * `jsonRDD` - loads data from an existing RDD where each element of the RDD is a string containing a JSON object.
 
+Note that the file that is offered as _jsonFile_ is not a typical JSON file. Each
+line must contain a separate, self-contained valid JSON object. As a consequence,
+a regular multi-line JSON file will most often fail.
+
 {% highlight java %}
 // sc is an existing JavaSparkContext.
 JavaSQLContext sqlContext = new org.apache.spark.sql.api.java.JavaSQLContext(sc);
@@ -701,6 +709,10 @@ This conversion can be done using one of two methods in a SQLContext:
 * `jsonFile` - loads data from a directory of JSON files where each line of the files is a JSON object.
 * `jsonRDD` - loads data from an existing RDD where each element of the RDD is a string containing a JSON object.
 
+Note that the file that is offered as _jsonFile_ is not a typical JSON file. Each
+line must contain a separate, self-contained valid JSON object. As a consequence,
+a regular multi-line JSON file will most often fail.
+
 {% highlight python %}
 # sc is an existing SparkContext.
 from pyspark.sql import SQLContext
author	Peter Vandenabeele <peter@vandenabeele.com>	2014-12-16 13:57:55 -0800
committer	Michael Armbrust <michael@databricks.com>	2014-12-16 13:58:01 -0800
commit	1a9e35e57ab80984b81802ffc461d19cc9239edd (patch)
tree	2e0502a145c2d18396f171ede71e01550e08d740 /docs/sql-programming-guide.md
parent	17688d14299f18a93591818ae5fef69e9dc20eb5 (diff)
download	spark-1a9e35e57ab80984b81802ffc461d19cc9239edd.tar.gz spark-1a9e35e57ab80984b81802ffc461d19cc9239edd.tar.bz2 spark-1a9e35e57ab80984b81802ffc461d19cc9239edd.zip