aboutsummaryrefslogtreecommitdiff
path: root/docs/sql-programming-guide.md
diff options
context:
space:
mode:
Diffstat (limited to 'docs/sql-programming-guide.md')
-rw-r--r--docs/sql-programming-guide.md48
1 files changed, 5 insertions, 43 deletions
diff --git a/docs/sql-programming-guide.md b/docs/sql-programming-guide.md
index 2fdc97f8a0..2d9849d032 100644
--- a/docs/sql-programming-guide.md
+++ b/docs/sql-programming-guide.md
@@ -1467,37 +1467,6 @@ Configuration of Parquet can be done using the `setConf` method on `SQLContext`
</td>
</tr>
<tr>
- <td><code>spark.sql.parquet.output.committer.class</code></td>
- <td><code>org.apache.parquet.hadoop.<br />ParquetOutputCommitter</code></td>
- <td>
- <p>
- The output committer class used by Parquet. The specified class needs to be a subclass of
- <code>org.apache.hadoop.<br />mapreduce.OutputCommitter</code>. Typically, it's also a
- subclass of <code>org.apache.parquet.hadoop.ParquetOutputCommitter</code>.
- </p>
- <p>
- <b>Note:</b>
- <ul>
- <li>
- This option is automatically ignored if <code>spark.speculation</code> is turned on.
- </li>
- <li>
- This option must be set via Hadoop <code>Configuration</code> rather than Spark
- <code>SQLConf</code>.
- </li>
- <li>
- This option overrides <code>spark.sql.sources.<br />outputCommitterClass</code>.
- </li>
- </ul>
- </p>
- <p>
- Spark SQL comes with a builtin
- <code>org.apache.spark.sql.<br />parquet.DirectParquetOutputCommitter</code>, which can be more
- efficient then the default Parquet output committer when writing data to S3.
- </p>
- </td>
-</tr>
-<tr>
<td><code>spark.sql.parquet.mergeSchema</code></td>
<td><code>false</code></td>
<td>
@@ -1533,7 +1502,7 @@ val people = sqlContext.read.json(path)
// The inferred schema can be visualized using the printSchema() method.
people.printSchema()
// root
-// |-- age: integer (nullable = true)
+// |-- age: long (nullable = true)
// |-- name: string (nullable = true)
// Register this DataFrame as a table.
@@ -1571,7 +1540,7 @@ DataFrame people = sqlContext.read().json("examples/src/main/resources/people.js
// The inferred schema can be visualized using the printSchema() method.
people.printSchema();
// root
-// |-- age: integer (nullable = true)
+// |-- age: long (nullable = true)
// |-- name: string (nullable = true)
// Register this DataFrame as a table.
@@ -1609,7 +1578,7 @@ people = sqlContext.read.json("examples/src/main/resources/people.json")
# The inferred schema can be visualized using the printSchema() method.
people.printSchema()
# root
-# |-- age: integer (nullable = true)
+# |-- age: long (nullable = true)
# |-- name: string (nullable = true)
# Register this DataFrame as a table.
@@ -1648,7 +1617,7 @@ people <- jsonFile(sqlContext, path)
# The inferred schema can be visualized using the printSchema() method.
printSchema(people)
# root
-# |-- age: integer (nullable = true)
+# |-- age: long (nullable = true)
# |-- name: string (nullable = true)
# Register this DataFrame as a table.
@@ -1687,12 +1656,7 @@ on all of the worker nodes, as they will need access to the Hive serialization a
(SerDes) in order to access data stored in Hive.
Configuration of Hive is done by placing your `hive-site.xml`, `core-site.xml` (for security configuration),
- `hdfs-site.xml` (for HDFS configuration) file in `conf/`. Please note when running
-the query on a YARN cluster (`cluster` mode), the `datanucleus` jars under the `lib` directory
-and `hive-site.xml` under `conf/` directory need to be available on the driver and all executors launched by the
-YARN cluster. The convenient way to do this is adding them through the `--jars` option and `--file` option of the
-`spark-submit` command.
-
+`hdfs-site.xml` (for HDFS configuration) file in `conf/`.
<div class="codetabs">
@@ -2170,8 +2134,6 @@ options.
- In the `sql` dialect, floating point numbers are now parsed as decimal. HiveQL parsing remains
unchanged.
- The canonical name of SQL/DataFrame functions are now lower case (e.g. sum vs SUM).
- - It has been determined that using the DirectOutputCommitter when speculation is enabled is unsafe
- and thus this output committer will not be used when speculation is on, independent of configuration.
- JSON data source will not automatically load new files that are created by other applications
(i.e. files that are not inserted to the dataset through Spark SQL).
For a JSON persistent table (i.e. the metadata of the table is stored in Hive Metastore),