diff options
author | hyukjinkwon <gurwls223@gmail.com> | 2016-06-11 23:20:40 -0700 |
---|---|---|
committer | Reynold Xin <rxin@databricks.com> | 2016-06-11 23:20:40 -0700 |
commit | 9e204c62c6800e03759e04ef68268105d4b86bf2 (patch) | |
tree | cb218a5da7c64b4cbe4ff74e318171d35affb4ad /examples | |
parent | e1f986c7a3fcc3864d53ef99ef7f14fa4d262ac3 (diff) | |
download | spark-9e204c62c6800e03759e04ef68268105d4b86bf2.tar.gz spark-9e204c62c6800e03759e04ef68268105d4b86bf2.tar.bz2 spark-9e204c62c6800e03759e04ef68268105d4b86bf2.zip |
[SPARK-15840][SQL] Add two missing options in documentation and some option related changes
## What changes were proposed in this pull request?
This PR
1. Adds the documentations for some missing options, `inferSchema` and `mergeSchema` for Python and Scala.
2. Fiixes `[[DataFrame]]` to ```:class:`DataFrame` ``` so that this can be shown
- from
![2016-06-09 9 31 16](https://cloud.githubusercontent.com/assets/6477701/15929721/8b864734-2e89-11e6-83f6-207527de4ac9.png)
- to (with class link)
![2016-06-09 9 31 00](https://cloud.githubusercontent.com/assets/6477701/15929717/8a03d728-2e89-11e6-8a3f-08294964db22.png)
(Please refer [the latest documentation](https://people.apache.org/~pwendell/spark-nightly/spark-master-docs/latest/api/python/pyspark.sql.html))
3. Moves `mergeSchema` option to `ParquetOptions` with removing unused options, `metastoreSchema` and `metastoreTableName`.
They are not used anymore. They were removed in https://github.com/apache/spark/commit/e720dda42e806229ccfd970055c7b8a93eb447bf and there are no use cases as below:
```bash
grep -r -e METASTORE_SCHEMA -e \"metastoreSchema\" -e \"metastoreTableName\" -e METASTORE_TABLE_NAME .
```
```
./sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala: private[sql] val METASTORE_SCHEMA = "metastoreSchema"
./sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala: private[sql] val METASTORE_TABLE_NAME = "metastoreTableName"
./sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala: ParquetFileFormat.METASTORE_TABLE_NAME -> TableIdentifier(
```
It only sets `metastoreTableName` in the last case but does not use the table name.
4. Sets the correct default values (in the documentation) for `compression` option for ORC(`snappy`, see [OrcOptions.scala#L33-L42](https://github.com/apache/spark/blob/3ded5bc4db2badc9ff49554e73421021d854306b/sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcOptions.scala#L33-L42)) and Parquet(`the value specified in SQLConf`, see [ParquetOptions.scala#L38-L47](https://github.com/apache/spark/blob/3ded5bc4db2badc9ff49554e73421021d854306b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetOptions.scala#L38-L47)) and `columnNameOfCorruptRecord` for JSON(`the value specified in SQLConf`, see [JsonFileFormat.scala#L53-L55](https://github.com/apache/spark/blob/4538443e276597530a27c6922e48503677b13956/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JsonFileFormat.scala#L53-L55) and [JsonFileFormat.scala#L105-L106](https://github.com/apache/spark/blob/4538443e276597530a27c6922e48503677b13956/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JsonFileFormat.scala#L105-L106)).
## How was this patch tested?
Existing tests should cover this.
Author: hyukjinkwon <gurwls223@gmail.com>
Author: Hyukjin Kwon <gurwls223@gmail.com>
Closes #13576 from HyukjinKwon/SPARK-15840.
Diffstat (limited to 'examples')
0 files changed, 0 insertions, 0 deletions