aboutsummaryrefslogtreecommitdiff
path: root/docs/sql-programming-guide.md
diff options
context:
space:
mode:
authorFelix Cheung <felixcheung_m@hotmail.com>2016-10-26 23:06:11 -0700
committerFelix Cheung <felixcheung@apache.org>2016-10-26 23:06:11 -0700
commit44c8bfda793b7655e2bd1da5e9915a09ed9d42ce (patch)
tree4f42de29a093fe5f845307dca278a80d51bae96c /docs/sql-programming-guide.md
parent1dbe9896b7f30538a5fad2f5d718d035c7906936 (diff)
downloadspark-44c8bfda793b7655e2bd1da5e9915a09ed9d42ce.tar.gz
spark-44c8bfda793b7655e2bd1da5e9915a09ed9d42ce.tar.bz2
spark-44c8bfda793b7655e2bd1da5e9915a09ed9d42ce.zip
[SQL][DOC] updating doc for JSON source to link to jsonlines.org
## What changes were proposed in this pull request? API and programming guide doc changes for Scala, Python and R. ## How was this patch tested? manual test Author: Felix Cheung <felixcheung_m@hotmail.com> Closes #15629 from felixcheung/jsondoc.
Diffstat (limited to 'docs/sql-programming-guide.md')
-rw-r--r--docs/sql-programming-guide.md22
1 files changed, 13 insertions, 9 deletions
diff --git a/docs/sql-programming-guide.md b/docs/sql-programming-guide.md
index 064af41965..b9be7a7545 100644
--- a/docs/sql-programming-guide.md
+++ b/docs/sql-programming-guide.md
@@ -316,7 +316,7 @@ Serializable and has getters and setters for all of its fields.
Spark SQL can convert an RDD of Row objects to a DataFrame, inferring the datatypes. Rows are constructed by passing a list of
key/value pairs as kwargs to the Row class. The keys of this list define the column names of the table,
-and the types are inferred by sampling the whole datase, similar to the inference that is performed on JSON files.
+and the types are inferred by sampling the whole dataset, similar to the inference that is performed on JSON files.
{% include_example schema_inferring python/sql/basic.py %}
</div>
@@ -832,8 +832,9 @@ This conversion can be done using `SparkSession.read.json()` on either an RDD of
or a JSON file.
Note that the file that is offered as _a json file_ is not a typical JSON file. Each
-line must contain a separate, self-contained valid JSON object. As a consequence,
-a regular multi-line JSON file will most often fail.
+line must contain a separate, self-contained valid JSON object. For more information, please see
+[JSON Lines text format, also called newline-delimited JSON](http://jsonlines.org/). As a
+consequence, a regular multi-line JSON file will most often fail.
{% include_example json_dataset scala/org/apache/spark/examples/sql/SQLDataSourceExample.scala %}
</div>
@@ -844,8 +845,9 @@ This conversion can be done using `SparkSession.read().json()` on either an RDD
or a JSON file.
Note that the file that is offered as _a json file_ is not a typical JSON file. Each
-line must contain a separate, self-contained valid JSON object. As a consequence,
-a regular multi-line JSON file will most often fail.
+line must contain a separate, self-contained valid JSON object. For more information, please see
+[JSON Lines text format, also called newline-delimited JSON](http://jsonlines.org/). As a
+consequence, a regular multi-line JSON file will most often fail.
{% include_example json_dataset java/org/apache/spark/examples/sql/JavaSQLDataSourceExample.java %}
</div>
@@ -855,8 +857,9 @@ Spark SQL can automatically infer the schema of a JSON dataset and load it as a
This conversion can be done using `SparkSession.read.json` on a JSON file.
Note that the file that is offered as _a json file_ is not a typical JSON file. Each
-line must contain a separate, self-contained valid JSON object. As a consequence,
-a regular multi-line JSON file will most often fail.
+line must contain a separate, self-contained valid JSON object. For more information, please see
+[JSON Lines text format, also called newline-delimited JSON](http://jsonlines.org/). As a
+consequence, a regular multi-line JSON file will most often fail.
{% include_example json_dataset python/sql/datasource.py %}
</div>
@@ -867,8 +870,9 @@ the `read.json()` function, which loads data from a directory of JSON files wher
files is a JSON object.
Note that the file that is offered as _a json file_ is not a typical JSON file. Each
-line must contain a separate, self-contained valid JSON object. As a consequence,
-a regular multi-line JSON file will most often fail.
+line must contain a separate, self-contained valid JSON object. For more information, please see
+[JSON Lines text format, also called newline-delimited JSON](http://jsonlines.org/). As a
+consequence, a regular multi-line JSON file will most often fail.
{% include_example json_dataset r/RSparkSQLExample.R %}