[SPARK-17719][SPARK-17776][SQL] Unify and tie up options in a single place in JDBC datasource package

## What changes were proposed in this pull request? This PR proposes to fix arbitrary usages among `Map[String, String]`, `Properties` and `JDBCOptions` instances for options in `execution/jdbc` package and make the connection properties exclude Spark-only options. This PR includes some changes as below: - Unify `Map[String, String]`, `Properties` and `JDBCOptions` in `execution/jdbc` package to `JDBCOptions`. - Move `batchsize`, `fetchszie`, `driver` and `isolationlevel` options into `JDBCOptions` instance. - Document `batchSize` and `isolationlevel` with marking both read-only options and write-only options. Also, this includes minor types and detailed explanation for some statements such as url. - Throw exceptions fast by checking arguments first rather than in execution time (e.g. for `fetchsize`). - Exclude Spark-only options in connection properties. ## How was this patch tested? Existing tests should cover this. Author: hyukjinkwon <gurwls223@gmail.com> Closes #15292 from HyukjinKwon/SPARK-17719.
author: hyukjinkwon <gurwls223@gmail.com> 2016-10-10 22:22:41 -0700
committer: gatorsmile <gatorsmile@gmail.com> 2016-10-10 22:22:41 -0700
commit: 0c0ad436ad909364915b910867d08262c62bc95d (patch)
tree: b8afcc1aa41d83596258fc541ab99e01231b851e /docs/sql-programming-guide.md
parent: 19a5bae47f69929d00d9de43387c7df37a05ee25 (diff)
download: spark-0c0ad436ad909364915b910867d08262c62bc95d.tar.gz
spark-0c0ad436ad909364915b910867d08262c62bc95d.tar.bz2
spark-0c0ad436ad909364915b910867d08262c62bc95d.zip
1 files changed, 27 insertions, 9 deletions
diff --git a/docs/sql-programming-guide.md b/docs/sql-programming-guide.md
index 835cb6981f..d0f43ab0a9 100644
--- a/docs/sql-programming-guide.md
+++ b/docs/sql-programming-guide.md
@@ -1049,16 +1049,20 @@ bin/spark-shell --driver-class-path postgresql-9.4.1207.jar --jars postgresql-9.
 {% endhighlight %}
 
 Tables from the remote database can be loaded as a DataFrame or Spark SQL Temporary table using
-the Data Sources API. The following options are supported:
+the Data Sources API. Users can specify the JDBC connection properties in the data source options.
+<code>user</code> and <code>password</code> are normally provided as connection properties for
+logging into the data sources. In addition to the connection properties, Spark also supports
+the following case-sensitive options:
 
 <table class="table">
   <tr><th>Property Name</th><th>Meaning</th></tr>
   <tr>
     <td><code>url</code></td>
     <td>
-      The JDBC URL to connect to.
+      The JDBC URL to connect to. The source-specific connection properties may be specified in the URL. e.g., <code>jdbc:postgresql://localhost/test?user=fred&password=secret</code>
     </td>
   </tr>
+
   <tr>
     <td><code>dbtable</code></td>
     <td>
@@ -1083,28 +1087,42 @@ the Data Sources API. The following options are supported:
       <code>partitionColumn</code> must be a numeric column from the table in question. Notice
       that <code>lowerBound</code> and <code>upperBound</code> are just used to decide the
       partition stride, not for filtering the rows in table. So all rows in the table will be
-      partitioned and returned.
+      partitioned and returned. This option applies only to reading.
     </td>
   </tr>
 
   <tr>
     <td><code>fetchsize</code></td>
     <td>
-      The JDBC fetch size, which determines how many rows to fetch per round trip. This can help performance on JDBC drivers which default to low fetch size (eg. Oracle with 10 rows).
+      The JDBC fetch size, which determines how many rows to fetch per round trip. This can help performance on JDBC drivers which default to low fetch size (eg. Oracle with 10 rows). This option applies only to reading.
     </td>
   </tr>
 
   <tr>
+     <td><code>batchsize</code></td>
+     <td>
+       The JDBC batch size, which determines how many rows to insert per round trip. This can help performance on JDBC drivers. This option applies only to writing. It defaults to <code>1000</code>.
+     </td>
+  </tr>
+
+  <tr>
+     <td><code>isolationLevel</code></td>
+     <td>
+       The transaction isolation level, which applies to current connection. It can be one of <code>NONE<code>, <code>READ_COMMITTED<code>, <code>READ_UNCOMMITTED<code>, <code>REPEATABLE_READ<code>, or <code>SERIALIZABLE<code>, corresponding to standard transaction isolation levels defined by JDBC's Connection object, with default of <code>READ_UNCOMMITTED<code>. This option applies only to writing. Please refer the documentation in <code>java.sql.Connection</code>.
+     </td>
+   </tr>
+
+  <tr>
     <td><code>truncate</code></td>
     <td>
-     This is a JDBC writer related option. When <code>SaveMode.Overwrite</code> is enabled, this option causes Spark to truncate an existing table instead of dropping and recreating it. This can be more efficient, and prevents the table metadata (e.g. indices) from being removed. However, it will not work in some cases, such as when the new data has a different schema. It defaults to <code>false</code>.
+     This is a JDBC writer related option. When <code>SaveMode.Overwrite</code> is enabled, this option causes Spark to truncate an existing table instead of dropping and recreating it. This can be more efficient, and prevents the table metadata (e.g., indices) from being removed. However, it will not work in some cases, such as when the new data has a different schema. It defaults to <code>false</code>. This option applies only to writing.
    </td>
   </tr>
 
   <tr>
     <td><code>createTableOptions</code></td>
     <td>
-     This is a JDBC writer related option. If specified, this option allows setting of database-specific table and partition options when creating a table. For example: <code>CREATE TABLE t (name string) ENGINE=InnoDB.</code>
+     This is a JDBC writer related option. If specified, this option allows setting of database-specific table and partition options when creating a table (e.g., <code>CREATE TABLE t (name string) ENGINE=InnoDB.</code>). This option applies only to writing.
    </td>
   </tr>
 </table>
@@ -1328,7 +1346,7 @@ options.
 
  - Dataset API and DataFrame API are unified. In Scala, `DataFrame` becomes a type alias for
    `Dataset[Row]`, while Java API users must replace `DataFrame` with `Dataset<Row>`. Both the typed
-   transformations (e.g. `map`, `filter`, and `groupByKey`) and untyped transformations (e.g.
+   transformations (e.g., `map`, `filter`, and `groupByKey`) and untyped transformations (e.g.,
    `select` and `groupBy`) are available on the Dataset class. Since compile-time type-safety in
    Python and R is not a language feature, the concept of Dataset does not apply to these languages’
    APIs. Instead, `DataFrame` remains the primary programing abstraction, which is analogous to the
@@ -1377,7 +1395,7 @@ options.
  - Timestamps are now stored at a precision of 1us, rather than 1ns
  - In the `sql` dialect, floating point numbers are now parsed as decimal. HiveQL parsing remains
    unchanged.
- - The canonical name of SQL/DataFrame functions are now lower case (e.g. sum vs SUM).
+ - The canonical name of SQL/DataFrame functions are now lower case (e.g., sum vs SUM).
  - JSON data source will not automatically load new files that are created by other applications
    (i.e. files that are not inserted to the dataset through Spark SQL).
    For a JSON persistent table (i.e. the metadata of the table is stored in Hive Metastore),
@@ -1392,7 +1410,7 @@ options.
 
 Based on user feedback, we created a new, more fluid API for reading data in (`SQLContext.read`)
 and writing data out (`DataFrame.write`),
-and deprecated the old APIs (e.g. `SQLContext.parquetFile`, `SQLContext.jsonFile`).
+and deprecated the old APIs (e.g., `SQLContext.parquetFile`, `SQLContext.jsonFile`).
 
 See the API docs for `SQLContext.read` (
   <a href="api/scala/index.html#org.apache.spark.sql.SQLContext@read:DataFrameReader">Scala</a>,
author	hyukjinkwon <gurwls223@gmail.com>	2016-10-10 22:22:41 -0700
committer	gatorsmile <gatorsmile@gmail.com>	2016-10-10 22:22:41 -0700
commit	0c0ad436ad909364915b910867d08262c62bc95d (patch)
tree	b8afcc1aa41d83596258fc541ab99e01231b851e /docs/sql-programming-guide.md
parent	19a5bae47f69929d00d9de43387c7df37a05ee25 (diff)
download	spark-0c0ad436ad909364915b910867d08262c62bc95d.tar.gz spark-0c0ad436ad909364915b910867d08262c62bc95d.tar.bz2 spark-0c0ad436ad909364915b910867d08262c62bc95d.zip