[SPARK-18413][SQL] Add `maxConnections` JDBCOption

## What changes were proposed in this pull request? This PR adds a new JDBCOption `maxConnections` which means the maximum number of simultaneous JDBC connections allowed. This option applies only to writing with coalesce operation if needed. It defaults to the number of partitions of RDD. Previously, SQL users cannot cannot control this while Scala/Java/Python users can use `coalesce` (or `repartition`) API. **Reported Scenario** For the following cases, the number of connections becomes 200 and database cannot handle all of them. ```sql CREATE OR REPLACE TEMPORARY VIEW resultview USING org.apache.spark.sql.jdbc OPTIONS ( url "jdbc:oracle:thin:10.129.10.111:1521:BKDB", dbtable "result", user "HIVE", password "HIVE" ); -- set spark.sql.shuffle.partitions=200 INSERT OVERWRITE TABLE resultview SELECT g, count(1) AS COUNT FROM tnet.DT_LIVE_INFO GROUP BY g ``` ## How was this patch tested? Manual. Do the followings and see Spark UI. **Step 1 (MySQL)** ``` CREATE TABLE t1 (a INT); CREATE TABLE data (a INT); INSERT INTO data VALUES (1); INSERT INTO data VALUES (2); INSERT INTO data VALUES (3); ``` **Step 2 (Spark)** ```scala SPARK_HOME=$PWD bin/spark-shell --driver-memory 4G --driver-class-path mysql-connector-java-5.1.40-bin.jar scala> sql("SET spark.sql.shuffle.partitions=3") scala> sql("CREATE OR REPLACE TEMPORARY VIEW data USING org.apache.spark.sql.jdbc OPTIONS (url 'jdbc:mysql://localhost:3306/t', dbtable 'data', user 'root', password '')") scala> sql("CREATE OR REPLACE TEMPORARY VIEW t1 USING org.apache.spark.sql.jdbc OPTIONS (url 'jdbc:mysql://localhost:3306/t', dbtable 't1', user 'root', password '', maxConnections '1')") scala> sql("INSERT OVERWRITE TABLE t1 SELECT a FROM data GROUP BY a") scala> sql("CREATE OR REPLACE TEMPORARY VIEW t1 USING org.apache.spark.sql.jdbc OPTIONS (url 'jdbc:mysql://localhost:3306/t', dbtable 't1', user 'root', password '', maxConnections '2')") scala> sql("INSERT OVERWRITE TABLE t1 SELECT a FROM data GROUP BY a") scala> sql("CREATE OR REPLACE TEMPORARY VIEW t1 USING org.apache.spark.sql.jdbc OPTIONS (url 'jdbc:mysql://localhost:3306/t', dbtable 't1', user 'root', password '', maxConnections '3')") scala> sql("INSERT OVERWRITE TABLE t1 SELECT a FROM data GROUP BY a") scala> sql("CREATE OR REPLACE TEMPORARY VIEW t1 USING org.apache.spark.sql.jdbc OPTIONS (url 'jdbc:mysql://localhost:3306/t', dbtable 't1', user 'root', password '', maxConnections '4')") scala> sql("INSERT OVERWRITE TABLE t1 SELECT a FROM data GROUP BY a") ``` ![maxconnections](https://cloud.githubusercontent.com/assets/9700541/20287987/ed8409c2-aa84-11e6-8aab-ae28e63fe54d.png) Author: Dongjoon Hyun <dongjoon@apache.org> Closes #15868 from dongjoon-hyun/SPARK-18413.
author: Dongjoon Hyun <dongjoon@apache.org> 2016-11-21 13:57:36 +0000
committer: Sean Owen <sowen@cloudera.com> 2016-11-21 13:57:36 +0000
commit: 07beb5d21c6803e80733149f1560c71cd3cacc86 (patch)
tree: 2ac23532eb040d56ac1ba86aaabea492aac42a53 /docs/sql-programming-guide.md
parent: 9f262ae163b6dca6526665b3ad12b3b2ea8fb873 (diff)
download: spark-07beb5d21c6803e80733149f1560c71cd3cacc86.tar.gz
spark-07beb5d21c6803e80733149f1560c71cd3cacc86.tar.bz2
spark-07beb5d21c6803e80733149f1560c71cd3cacc86.zip
1 files changed, 7 insertions, 0 deletions
diff --git a/docs/sql-programming-guide.md b/docs/sql-programming-guide.md
index ba3e55fc06..656e7ecdab 100644
--- a/docs/sql-programming-guide.md
+++ b/docs/sql-programming-guide.md
@@ -1087,6 +1087,13 @@ the following case-sensitive options:
   </tr>
 
   <tr>
+     <td><code>maxConnections</code></td>
+     <td>
+       The maximum number of concurrent JDBC connections that can be used, if set. Only applies when writing. It works by limiting the operation's parallelism, which depends on the input's partition count. If its partition count exceeds this limit, the operation will coalesce the input to fewer partitions before writing.
+     </td>
+  </tr>
+
+  <tr>
      <td><code>isolationLevel</code></td>
      <td>
        The transaction isolation level, which applies to current connection. It can be one of <code>NONE<code>, <code>READ_COMMITTED<code>, <code>READ_UNCOMMITTED<code>, <code>REPEATABLE_READ<code>, or <code>SERIALIZABLE<code>, corresponding to standard transaction isolation levels defined by JDBC's Connection object, with default of <code>READ_UNCOMMITTED<code>. This option applies only to writing. Please refer the documentation in <code>java.sql.Connection</code>.
author	Dongjoon Hyun <dongjoon@apache.org>	2016-11-21 13:57:36 +0000
committer	Sean Owen <sowen@cloudera.com>	2016-11-21 13:57:36 +0000
commit	07beb5d21c6803e80733149f1560c71cd3cacc86 (patch)
tree	2ac23532eb040d56ac1ba86aaabea492aac42a53 /docs/sql-programming-guide.md
parent	9f262ae163b6dca6526665b3ad12b3b2ea8fb873 (diff)
download	spark-07beb5d21c6803e80733149f1560c71cd3cacc86.tar.gz spark-07beb5d21c6803e80733149f1560c71cd3cacc86.tar.bz2 spark-07beb5d21c6803e80733149f1560c71cd3cacc86.zip