aboutsummaryrefslogtreecommitdiff
path: root/docs/sql-programming-guide.md
Commit message (Collapse)AuthorAgeFilesLines
* [SPARK-12919][SPARKR] Implement dapply() on DataFrame in SparkR.Sun Rui2016-04-291-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | ## What changes were proposed in this pull request? dapply() applies an R function on each partition of a DataFrame and returns a new DataFrame. The function signature is: dapply(df, function(localDF) {}, schema = NULL) R function input: local data.frame from the partition on local node R function output: local data.frame Schema specifies the Row format of the resulting DataFrame. It must match the R function's output. If schema is not specified, each partition of the result DataFrame will be serialized in R into a single byte array. Such resulting DataFrame can be processed by successive calls to dapply(). ## How was this patch tested? SparkR unit tests. Author: Sun Rui <rui.sun@intel.com> Author: Sun Rui <sunrui2016@gmail.com> Closes #12493 from sun-rui/SPARK-12919.
* [SPARK-14883][DOCS] Fix wrong R examples and make them up-to-dateDongjoon Hyun2016-04-241-17/+13
| | | | | | | | | | | | | | | | | | | | | | | ## What changes were proposed in this pull request? This issue aims to fix some errors in R examples and make them up-to-date in docs and example modules. - Remove the wrong usage of `map`. We need to use `lapply` in `sparkR` if needed. However, `lapply` is private so far. The corrected example will be added later. - Fix the wrong example in Section `Generic Load/Save Functions` of `docs/sql-programming-guide.md` for consistency - Fix datatypes in `sparkr.md`. - Update a data result in `sparkr.md`. - Replace deprecated functions to remove warnings: jsonFile -> read.json, parquetFile -> read.parquet - Use up-to-date R-like functions: loadDF -> read.df, saveDF -> write.df, saveAsParquetFile -> write.parquet - Replace `SparkR DataFrame` with `SparkDataFrame` in `dataframe.R` and `data-manipulation.R`. - Other minor syntax fixes and a typo. ## How was this patch tested? Manual. Author: Dongjoon Hyun <dongjoon@apache.org> Closes #12649 from dongjoon-hyun/SPARK-14883.
* [SPARK-14601][DOC] Minor doc/usage changes related to removal of Spark assemblyMark Grover2016-04-141-2/+2
| | | | | | | | | | | | | | | ## What changes were proposed in this pull request? Removing references to assembly jar in documentation. Adding an additional (previously undocumented) usage of spark-submit to run examples. ## How was this patch tested? Ran spark-submit usage to ensure formatting was fine. Ran examples using SparkSubmit. Author: Mark Grover <mark@apache.org> Closes #12365 from markgrover/spark-14601.
* [MINOR][DOCS] Fix wrong data types in JSON Datasets example.Dongjoon Hyun2016-04-111-4/+4
| | | | | | | | | | | | | | ## What changes were proposed in this pull request? This PR fixes the `age` data types from `integer` to `long` in `SQL Programming Guide: JSON Datasets`. ## How was this patch tested? Manual. Author: Dongjoon Hyun <dongjoon@apache.org> Closes #12290 from dongjoon-hyun/minor_fix_type_in_json_example.
* [SPARK-10063][SQL] Remove DirectParquetOutputCommitterReynold Xin2016-04-071-33/+0
| | | | | | | | | | | | ## What changes were proposed in this pull request? This patch removes DirectParquetOutputCommitter. This was initially created by Databricks as a faster way to write Parquet data to S3. However, given how the underlying S3 Hadoop implementation works, this committer only works when there are no failures. If there are multiple attempts of the same task (e.g. speculation or task failures or node failures), the output data can be corrupted. I don't think this performance optimization outweighs the correctness issue. ## How was this patch tested? Removed the related tests also. Author: Reynold Xin <rxin@databricks.com> Closes #12229 from rxin/SPARK-10063.
* [SPARK-13579][BUILD] Stop building the main Spark assembly.Marcelo Vanzin2016-04-041-6/+1
| | | | | | | | | | | | | | | | | | | | This change modifies the "assembly/" module to just copy needed dependencies to its build directory, and modifies the packaging script to pick those up (and remove duplicate jars packages in the examples module). I also made some minor adjustments to dependencies to remove some test jars from the final packaging, and remove jars that conflict with each other when packaged separately (e.g. servlet api). Also note that this change restores guava in applications' classpaths, even though it's still shaded inside Spark. This is now needed for the Hadoop libraries that are packaged with Spark, which now are not processed by the shade plugin. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #11796 from vanzin/SPARK-13579.
* [SPARK-12855][MINOR][SQL][DOC][TEST] remove spark.sql.dialect from doc and testDaoyuan Wang2016-03-161-7/+0
| | | | | | | | | | | | | | ## What changes were proposed in this pull request? Since developer API of plug-able parser has been removed in #10801 , docs should be updated accordingly. ## How was this patch tested? This patch will not affect the real code path. Author: Daoyuan Wang <daoyuan.wang@intel.com> Closes #11758 from adrian-wang/spark12855.
* [SPARK-13942][CORE][DOCS] Remove Shark-related docs for 2.xDongjoon Hyun2016-03-161-45/+0
| | | | | | | | | | | | | | | | | | | | | | | | | ## What changes were proposed in this pull request? `Shark` was merged into `Spark SQL` since [July 2014](https://databricks.com/blog/2014/07/01/shark-spark-sql-hive-on-spark-and-the-future-of-sql-on-spark.html). The followings seem to be the only legacy. For Spark 2.x, we had better clean up those docs. **Migration Guide** ``` - ## Migration Guide for Shark Users - ... - ### Scheduling - ... - ### Reducer number - ... - ### Caching ``` ## How was this patch tested? Pass the Jenkins test. Author: Dongjoon Hyun <dongjoon@apache.org> Closes #11770 from dongjoon-hyun/SPARK-13942.
* [SPARK-13702][CORE][SQL][MLLIB] Use diamond operator for generic instance ↵Dongjoon Hyun2016-03-091-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | creation in Java code. ## What changes were proposed in this pull request? In order to make `docs/examples` (and other related code) more simple/readable/user-friendly, this PR replaces existing codes like the followings by using `diamond` operator. ``` - final ArrayList<Product2<Object, Object>> dataToWrite = - new ArrayList<Product2<Object, Object>>(); + final ArrayList<Product2<Object, Object>> dataToWrite = new ArrayList<>(); ``` Java 7 or higher supports **diamond** operator which replaces the type arguments required to invoke the constructor of a generic class with an empty set of type parameters (<>). Currently, Spark Java code use mixed usage of this. ## How was this patch tested? Manual. Pass the existing tests. Author: Dongjoon Hyun <dongjoon@apache.org> Closes #11541 from dongjoon-hyun/SPARK-13702.
* [MINOR][DOCS] Fix all typos in markdown files of `doc` and similar patterns ↵Dongjoon Hyun2016-02-221-1/+1
| | | | | | | | | | | | | | | | | in other comments ## What changes were proposed in this pull request? This PR tries to fix all typos in all markdown files under `docs` module, and fixes similar typos in other comments, too. ## How was the this patch tested? manual tests. Author: Dongjoon Hyun <dongjoon@apache.org> Closes #11300 from dongjoon-hyun/minor_fix_typos.
* [SPARK-13264][DOC] Removed multi-byte characters in spark-env.sh.templateSasaki Toru2016-02-111-1/+1
| | | | | | | | In spark-env.sh.template, there are multi-byte characters, this PR will remove it. Author: Sasaki Toru <sasakitoa@nttdata.co.jp> Closes #11149 from sasakitoa/remove_multibyte_in_sparkenv.
* [SPARK-13040][DOCS] Update JDBC deprecated SPARK_CLASSPATH documentationSebastián Ramírez2016-02-091-1/+1
| | | | | | | | | | | | Update JDBC documentation based on http://stackoverflow.com/a/30947090/219530 as SPARK_CLASSPATH is deprecated. Also, that's how it worked, it didn't work with the SPARK_CLASSPATH or the --jars alone. This would solve issue: https://issues.apache.org/jira/browse/SPARK-13040 Author: Sebastián Ramírez <tiangolo@gmail.com> Closes #10948 from tiangolo/patch-docs-jdbc.
* [DOCS] Fix the jar location of datanucleus in sql-programming-guid.mdTakeshi YAMAMURO2016-02-011-1/+1
| | | | | | | | ISTM `lib` is better because `datanucleus` jars are located in `lib` for release builds. Author: Takeshi YAMAMURO <linguin.m.s@gmail.com> Closes #10901 from maropu/DocFix.
* [SPARK-12204][SPARKR] Implement drop method for DataFrame in SparkR.Sun Rui2016-01-201-0/+13
| | | | | | Author: Sun Rui <rui.sun@intel.com> Closes #10201 from sun-rui/SPARK-12204.
* [SPARK-12758][SQL] add note to Spark SQL Migration guide about TimestampType ↵Brandon Bradley2016-01-111-0/+5
| | | | | | | | | | casting Warning users about casting changes. Author: Brandon Bradley <bradleytastic@gmail.com> Closes #10708 from blbradley/spark-12758.
* [SPARK-12579][SQL] Force user-specified JDBC driver to take precedenceJosh Rosen2016-01-041-3/+1
| | | | | | | | | | | | | | | | Spark SQL's JDBC data source allows users to specify an explicit JDBC driver to load (using the `driver` argument), but in the current code it's possible that the user-specified driver will not be used when it comes time to actually create a JDBC connection. In a nutshell, the problem is that you might have multiple JDBC drivers on the classpath that claim to be able to handle the same subprotocol, so simply registering the user-provided driver class with the our `DriverRegistry` and JDBC's `DriverManager` is not sufficient to ensure that it's actually used when creating the JDBC connection. This patch addresses this issue by first registering the user-specified driver with the DriverManager, then iterating over the driver manager's loaded drivers in order to obtain the correct driver and use it to create a connection (previously, we just called `DriverManager.getConnection()` directly). If a user did not specify a JDBC driver to use, then we call `DriverManager.getDriver` to figure out the class of the driver to use, then pass that class's name to executors; this guards against corner-case bugs in situations where the driver and executor JVMs might have different sets of JDBC drivers on their classpaths (previously, there was the (rare) potential for `DriverManager.getConnection()` to use different drivers on the driver and executors if the user had not explicitly specified a JDBC driver class and the classpaths were different). This patch is inspired by a similar patch that I made to the `spark-redshift` library (https://github.com/databricks/spark-redshift/pull/143), which contains its own modified fork of some of Spark's JDBC data source code (for cross-Spark-version compatibility reasons). Author: Josh Rosen <joshrosen@databricks.com> Closes #10519 from JoshRosen/jdbc-driver-precedence.
* [SPARK-11678][SQL][DOCS] Document basePath in the programming guide.Yin Huai2015-12-091-0/+7
| | | | | | | | | | | | | This PR adds document for `basePath`, which is a new parameter used by `HadoopFsRelation`. The compiled doc is shown below. ![image](https://cloud.githubusercontent.com/assets/2072857/11673132/1ba01192-9dcb-11e5-98d9-ac0b4e92e98c.png) JIRA: https://issues.apache.org/jira/browse/SPARK-11678 Author: Yin Huai <yhuai@databricks.com> Closes #10211 from yhuai/basePathDoc.
* [SPARK-12069][SQL] Update documentation with DatasetsMichael Armbrust2015-12-081-98/+170
| | | | | | Author: Michael Armbrust <michael@databricks.com> Closes #10060 from marmbrus/docs.
* [SPARK-11821] Propagate Kerberos keytab for all environmentswoj-i2015-12-011-3/+4
| | | | | | | | | andrewor14 the same PR as in branch 1.5 harishreedharan Author: woj-i <wojciechindyk@gmail.com> Closes #9859 from woj-i/master.
* Updated sql programming guide to include jdbc fetch sizeStephen Samuel2015-11-231-0/+8
| | | | | | Author: Stephen Samuel <sam@sksamuel.com> Closes #9377 from sksamuel/master.
* [SPARK-11089][SQL] Adds option for disabling multi-session in Thrift serverCheng Lian2015-11-171-0/+14
| | | | | | | | | | This PR adds a new option `spark.sql.hive.thriftServer.singleSession` for disabling multi-session support in the Thrift server. Note that this option is added as a Spark configuration (retrieved from `SparkConf`) rather than Spark SQL configuration (retrieved from `SQLConf`). This is because all SQL configurations are session-ized. Since multi-session support is by default on, no JDBC connection can modify global configurations like the newly added one. Author: Cheng Lian <lian@databricks.com> Closes #9740 from liancheng/spark-11089.single-session-option.
* [SPARK-11360][DOC] Loss of nullability when writing parquet filesgatorsmile2015-11-091-1/+2
| | | | | | | | This fix is to add one line to explain the current behavior of Spark SQL when writing Parquet files. All columns are forced to be nullable for compatibility reasons. Author: gatorsmile <gatorsmile@gmail.com> Closes #9314 from gatorsmile/lossNull.
* [DOC][MINOR][SQL] Fix internal linkRohit Agarwal2015-11-091-1/+1
| | | | | | | | It doesn't show up as a hyperlink currently. It will show up as a hyperlink after this change. Author: Rohit Agarwal <mindprince@gmail.com> Closes #9544 from mindprince/patch-2.
* [SPARK-10046][SQL] Hive warehouse dir not set in current directory when not …xin Wu2015-11-081-2/+4
| | | | | | | | Doc change to align with HiveConf default in terms of where to create `warehouse` directory. Author: xin Wu <xinwu@us.ibm.com> Closes #9365 from xwu0226/spark-10046-commit.
* [DOC][SQL] Remove redundant out-of-place python snippetRohit Agarwal2015-11-081-9/+0
| | | | | | | | This snippet seems to be mistakenly introduced at two places in #5348. Author: Rohit Agarwal <mindprince@gmail.com> Closes #9540 from mindprince/patch-1.
* [SPARK-11197][SQL] add doc for run SQL on files directlyWenchen Fan2015-11-041-0/+38
| | | | | | Author: Wenchen Fan <wenchen@databricks.com> Closes #9467 from cloud-fan/doc.
* [DOC] Missing link to R DataFrame API doclewuathe2015-11-031-1/+1
| | | | | | | Author: lewuathe <lewuathe@me.com> Author: Lewuathe <lewuathe@me.com> Closes #9394 from Lewuathe/missing-link-to-R-dataframe.
* [SPARK-11299][DOC] Fix link to Scala DataFrame Functions referenceJosh Rosen2015-10-251-1/+1
| | | | | | | | The SQL programming guide's link to the DataFrame functions reference points to the wrong location; this patch fixes that. Author: Josh Rosen <joshrosen@databricks.com> Closes #9269 from JoshRosen/SPARK-11299.
* [SPARK-9570] [DOCS] Consistent recommendation for submitting spark apps to ↵Sean Owen2015-10-041-1/+1
| | | | | | | | | | | | YARN, -master yarn --deploy-mode x vs -master yarn-x'. Recommend `--master yarn --deploy-mode {cluster,client}` consistently in docs. Follow-on to https://github.com/apache/spark/pull/8385 CC nssalian Author: Sean Owen <sowen@cloudera.com> Closes #8968 from srowen/SPARK-9570.
* [SPARK-10662] [DOCS] Code snippets are not properly formatted in tablesJacek Laskowski2015-09-211-8/+8
| | | | | | | | | | * Backticks are processed properly in Spark Properties table * Removed unnecessary spaces * See http://people.apache.org/~pwendell/spark-nightly/spark-master-docs/latest/running-on-yarn.html Author: Jacek Laskowski <jacek.laskowski@deepsense.io> Closes #8795 from jaceklaskowski/docs-yarn-formatting.
* [SPARK-10710] Remove ability to disable spilling in core and SQLJosh Rosen2015-09-191-7/+0
| | | | | | | | | | It does not make much sense to set `spark.shuffle.spill` or `spark.sql.planner.externalSort` to false: I believe that these configurations were initially added as "escape hatches" to guard against bugs in the external operators, but these operators are now mature and well-tested. In addition, these configurations are not handled in a consistent way anymore: SQL's Tungsten codepath ignores these configurations and will continue to use spilling operators. Similarly, Spark Core's `tungsten-sort` shuffle manager does not respect `spark.shuffle.spill=false`. This pull request removes these configurations, adds warnings at the appropriate places, and deletes a large amount of code which was only used in code paths that did not support spilling. Author: Josh Rosen <joshrosen@databricks.com> Closes #8831 from JoshRosen/remove-ability-to-disable-spilling.
* [SPARK-10584] [SQL] [DOC] Documentation about the compatible Hive version is ↵Kousuke Saruta2015-09-191-3/+5
| | | | | | | | | | | | wrong. In Spark 1.5.0, Spark SQL is compatible with Hive 0.12.0 through 1.2.1 but the documentation is wrong. /CC yhuai Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp> Closes #8776 from sarutak/SPARK-10584-2.
* [SPARK-10584] [DOC] [SQL] Documentation about ↵Kousuke Saruta2015-09-141-1/+1
| | | | | | | | | | | spark.sql.hive.metastore.version is wrong. The default value of hive metastore version is 1.2.1 but the documentation says the value of `spark.sql.hive.metastore.version` is 0.13.1. Also, we cannot get the default value by `sqlContext.getConf("spark.sql.hive.metastore.version")`. Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp> Closes #8739 from sarutak/SPARK-10584.
* [SPARK-10350] [DOC] [SQL] Removed duplicated option description from SQL guideGuoQiang Li2015-08-291-10/+0
| | | | | | Author: GuoQiang Li <witgo@qq.com> Closes #8520 from witgo/SPARK-10350.
* [SPARK-10287] [SQL] Fixes JSONRelation refreshing on read pathYin Huai2015-08-271-0/+6
| | | | | | | | | | https://issues.apache.org/jira/browse/SPARK-10287 After porting json to HadoopFsRelation, it seems hard to keep the behavior of picking up new files automatically for JSON. This PR removes this behavior, so JSON is consistent with others (ORC and Parquet). Author: Yin Huai <yhuai@databricks.com> Closes #8469 from yhuai/jsonRefresh.
* [SPARK-9148] [SPARK-10252] [SQL] Update SQL Programming GuideMichael Armbrust2015-08-271-19/+73
| | | | | | Author: Michael Armbrust <michael@databricks.com> Closes #8441 from marmbrus/documentation.
* [SPARK-9424] [SQL] Parquet programming guide updates for 1.5Cheng Lian2015-08-261-8/+37
| | | | | | Author: Cheng Lian <lian@databricks.com> Closes #8467 from liancheng/spark-9424/parquet-docs-for-1.5.
* Fix Broken LinkBill Chambers2015-08-191-1/+1
| | | | | | | | Link was broken because it included tick marks. Author: Bill Chambers <wchambers@ischool.berkeley.edu> Closes #8302 from anabranch/patch-1.
* [SPARK-9228] [SQL] use tungsten.enabled in public for both of codegen/unsafeDavies Liu2015-08-061-3/+3
| | | | | | | | | | | | | | | | spark.sql.tungsten.enabled will be the default value for both codegen and unsafe, they are kept internally for debug/testing. cc marmbrus rxin Author: Davies Liu <davies@databricks.com> Closes #7998 from davies/tungsten and squashes the following commits: c1c16da [Davies Liu] update doc 1a47be1 [Davies Liu] use tungsten.enabled for both of codegen/unsafe (cherry picked from commit 4e70e8256ce2f45b438642372329eac7b1e9e8cf) Signed-off-by: Reynold Xin <rxin@databricks.com>
* Revert "[SPARK-9228] [SQL] use tungsten.enabled in public for both of ↵Davies Liu2015-08-061-3/+3
| | | | | | codegen/unsafe" This reverts commit 4e70e8256ce2f45b438642372329eac7b1e9e8cf.
* [SPARK-9228] [SQL] use tungsten.enabled in public for both of codegen/unsafeDavies Liu2015-08-061-3/+3
| | | | | | | | | | | | | spark.sql.tungsten.enabled will be the default value for both codegen and unsafe, they are kept internally for debug/testing. cc marmbrus rxin Author: Davies Liu <davies@databricks.com> Closes #7998 from davies/tungsten and squashes the following commits: c1c16da [Davies Liu] update doc 1a47be1 [Davies Liu] use tungsten.enabled for both of codegen/unsafe
* [SPARK-9535][SQL][DOCS] Modify document for codegen.KaiXinXiaoLei2015-08-021-4/+3
| | | | | | | | | | | | | | | | | #7142 made codegen enabled by default so let's modify the corresponding documents. Closes #7142 Author: KaiXinXiaoLei <huleilei1@huawei.com> Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp> Closes #7863 from sarutak/SPARK-9535 and squashes the following commits: 0884424 [Kousuke Saruta] Removed a line which mentioned about the effect of codegen enabled 3c11af0 [Kousuke Saruta] Merge branch 'sqlconfig' of https://github.com/KaiXinXiaoLei/spark into SPARK-9535 4ee531d [KaiXinXiaoLei] delete space 4cfd11d [KaiXinXiaoLei] change spark.sql.planner.externalSort d624cf8 [KaiXinXiaoLei] sql config is wrong
* [SPARK-9490] [DOCS] [MLLIB] MLlib evaluation metrics guide example python ↵Sean Owen2015-07-311-3/+3
| | | | | | | | | | | | | | code uses deprecated print statement Use print(x) not print x for Python 3 in eval examples CC sethah mengxr -- just wanted to close this out before 1.5 Author: Sean Owen <sowen@cloudera.com> Closes #7822 from srowen/SPARK-9490 and squashes the following commits: 01abeba [Sean Owen] Change "print x" to "print(x)" in the rest of the docs too bd7f7fb [Sean Owen] Use print(x) not print x for Python 3 in eval examples
* [SPARK-9207] [SQL] Enables Parquet filter push-down by defaultCheng Lian2015-07-231-7/+2
| | | | | | | | | | PARQUET-136 and PARQUET-173 have been fixed in parquet-mr 1.7.0. It's time to enable filter push-down by default now. Author: Cheng Lian <lian@databricks.com> Closes #7612 from liancheng/spark-9207 and squashes the following commits: 77e6b5e [Cheng Lian] Enables Parquet filter push-down by default
* [SPARK-8909][Documentation] Change the scala example in sql-programmi…Alok Singh2015-07-081-1/+1
| | | | | | | | | | | …ng-guide#Manually Specifying Options to be in sync with java,python, R version Author: Alok Singh <“singhal@us.ibm.com”> Closes #7299 from aloknsingh/aloknsingh_SPARK-8909 and squashes the following commits: d3c20ba [Alok Singh] fix the file to .parquet from .json d476140 [Alok Singh] [SPARK-8909][Documentation] Change the scala example in sql-programming-guide#Manually Specifying Options to be in sync with java,python, R version
* [SPARK-8894] [SPARKR] [DOC] Example code errors in SparkR documentation.Sun Rui2015-07-081-1/+1
| | | | | | | | Author: Sun Rui <rui.sun@intel.com> Closes #7287 from sun-rui/SPARK-8894 and squashes the following commits: da63898 [Sun Rui] [SPARK-8894][SPARKR][DOC] Example code errors in SparkR documentation.
* [SPARK-8886][Documentation]python Style updateTijo Thomas2015-07-071-1/+1
| | | | | | | | | | | Fixed comment given by rxin Author: Tijo Thomas <tijoparacka@gmail.com> Closes #7281 from tijoparacka/modification_for_python_style and squashes the following commits: 6334e21 [Tijo Thomas] removed space 3de4cd8 [Tijo Thomas] python Style update
* [SPARK-8615] [DOCUMENTATION] Fixed Sample deprecated codeTijo Thomas2015-06-301-5/+5
| | | | | | | | | | | | Modified the deprecated jdbc api in the documentation. Author: Tijo Thomas <tijoparacka@gmail.com> Closes #7039 from tijoparacka/JIRA_8615 and squashes the following commits: 6e73b8a [Tijo Thomas] Reverted new lines 4042fcf [Tijo Thomas] updated to sql documentation a27949c [Tijo Thomas] Fixed Sample deprecated code
* [SPARK-8139] [SQL] Updates docs and comments of data sources and Parquet ↵Cheng Lian2015-06-231-1/+29
| | | | | | | | | | | | | output committer options This PR only applies to master branch (1.5.0-SNAPSHOT) since it references `org.apache.parquet` classes which only appear in Parquet 1.7.0. Author: Cheng Lian <lian@databricks.com> Closes #6683 from liancheng/output-committer-docs and squashes the following commits: b4648b8 [Cheng Lian] Removes spark.sql.sources.outputCommitterClass as it's not a public option ee63923 [Cheng Lian] Updates docs and comments of data sources and Parquet output committer options
* [DOC] [SQL] Addes Hive metastore Parquet table conversion sectionCheng Lian2015-06-231-6/+88
| | | | | | | | | | | | | | | | | | This PR adds a section about Hive metastore Parquet table conversion. It documents: 1. Schema reconciliation rules introduced in #5214 (see [this comment] [1] in #5188) 2. Metadata refreshing requirement introduced in #5339 [1]: https://github.com/apache/spark/pull/5188#issuecomment-86531248 Author: Cheng Lian <lian@databricks.com> Closes #5348 from liancheng/sql-doc-parquet-conversion and squashes the following commits: 42ae0d0 [Cheng Lian] Adds Python `refreshTable` snippet 4c9847d [Cheng Lian] Resorts to SQL for Python metadata refreshing snippet 756e660 [Cheng Lian] Adds Python snippet for metadata refreshing 50675db [Cheng Lian] Addes Hive metastore Parquet table conversion section