diff options
author | Peter Vandenabeele <peter@vandenabeele.com> | 2014-12-16 13:57:55 -0800 |
---|---|---|
committer | Michael Armbrust <michael@databricks.com> | 2014-12-16 13:58:19 -0800 |
commit | 4f9916f1e8ffb1ffc647a036ee35702d7d7e6646 (patch) | |
tree | 09e32d4e3d655743df77099a2c512653be6b243a /docs | |
parent | 6bd8a9666a2ff5e3f603dba5a7de4687b72c08c1 (diff) | |
download | spark-4f9916f1e8ffb1ffc647a036ee35702d7d7e6646.tar.gz spark-4f9916f1e8ffb1ffc647a036ee35702d7d7e6646.tar.bz2 spark-4f9916f1e8ffb1ffc647a036ee35702d7d7e6646.zip |
[DOCS][SQL] Add a Note on jsonFile having separate JSON objects per line
* This commit hopes to avoid the confusion I faced when trying
to submit a regular, valid multi-line JSON file, also see
http://apache-spark-user-list.1001560.n3.nabble.com/Loading-JSON-Dataset-fails-with-com-fasterxml-jackson-databind-JsonMappingException-td20041.html
Author: Peter Vandenabeele <peter@vandenabeele.com>
Closes #3517 from petervandenabeele/pv-docs-note-on-jsonFile-format/01 and squashes the following commits:
1f98e52 [Peter Vandenabeele] Revert to people.json and simple Note text
6b6e062 [Peter Vandenabeele] Change the "JSON" connotation to "txt"
fca7dfb [Peter Vandenabeele] Add a Note on jsonFile having separate JSON objects per line
(cherry picked from commit 1a9e35e57ab80984b81802ffc461d19cc9239edd)
Signed-off-by: Michael Armbrust <michael@databricks.com>
Diffstat (limited to 'docs')
-rw-r--r-- | docs/sql-programming-guide.md | 12 |
1 files changed, 12 insertions, 0 deletions
diff --git a/docs/sql-programming-guide.md b/docs/sql-programming-guide.md index be284fbe21..7e3e9c061a 100644 --- a/docs/sql-programming-guide.md +++ b/docs/sql-programming-guide.md @@ -625,6 +625,10 @@ This conversion can be done using one of two methods in a SQLContext: * `jsonFile` - loads data from a directory of JSON files where each line of the files is a JSON object. * `jsonRDD` - loads data from an existing RDD where each element of the RDD is a string containing a JSON object. +Note that the file that is offered as _jsonFile_ is not a typical JSON file. Each +line must contain a separate, self-contained valid JSON object. As a consequence, +a regular multi-line JSON file will most often fail. + {% highlight scala %} // sc is an existing SparkContext. val sqlContext = new org.apache.spark.sql.SQLContext(sc) @@ -663,6 +667,10 @@ This conversion can be done using one of two methods in a JavaSQLContext : * `jsonFile` - loads data from a directory of JSON files where each line of the files is a JSON object. * `jsonRDD` - loads data from an existing RDD where each element of the RDD is a string containing a JSON object. +Note that the file that is offered as _jsonFile_ is not a typical JSON file. Each +line must contain a separate, self-contained valid JSON object. As a consequence, +a regular multi-line JSON file will most often fail. + {% highlight java %} // sc is an existing JavaSparkContext. JavaSQLContext sqlContext = new org.apache.spark.sql.api.java.JavaSQLContext(sc); @@ -701,6 +709,10 @@ This conversion can be done using one of two methods in a SQLContext: * `jsonFile` - loads data from a directory of JSON files where each line of the files is a JSON object. * `jsonRDD` - loads data from an existing RDD where each element of the RDD is a string containing a JSON object. +Note that the file that is offered as _jsonFile_ is not a typical JSON file. Each +line must contain a separate, self-contained valid JSON object. As a consequence, +a regular multi-line JSON file will most often fail. + {% highlight python %} # sc is an existing SparkContext. from pyspark.sql import SQLContext |