[SPARK-12579][SQL] Force user-specified JDBC driver to take precedence

Spark SQL's JDBC data source allows users to specify an explicit JDBC driver to load (using the `driver` argument), but in the current code it's possible that the user-specified driver will not be used when it comes time to actually create a JDBC connection. In a nutshell, the problem is that you might have multiple JDBC drivers on the classpath that claim to be able to handle the same subprotocol, so simply registering the user-provided driver class with the our `DriverRegistry` and JDBC's `DriverManager` is not sufficient to ensure that it's actually used when creating the JDBC connection. This patch addresses this issue by first registering the user-specified driver with the DriverManager, then iterating over the driver manager's loaded drivers in order to obtain the correct driver and use it to create a connection (previously, we just called `DriverManager.getConnection()` directly). If a user did not specify a JDBC driver to use, then we call `DriverManager.getDriver` to figure out the class of the driver to use, then pass that class's name to executors; this guards against corner-case bugs in situations where the driver and executor JVMs might have different sets of JDBC drivers on their classpaths (previously, there was the (rare) potential for `DriverManager.getConnection()` to use different drivers on the driver and executors if the user had not explicitly specified a JDBC driver class and the classpaths were different). This patch is inspired by a similar patch that I made to the `spark-redshift` library (https://github.com/databricks/spark-redshift/pull/143), which contains its own modified fork of some of Spark's JDBC data source code (for cross-Spark-version compatibility reasons). Author: Josh Rosen <joshrosen@databricks.com> Closes #10519 from JoshRosen/jdbc-driver-precedence.
author: Josh Rosen <joshrosen@databricks.com> 2016-01-04 10:39:42 -0800
committer: Yin Huai <yhuai@databricks.com> 2016-01-04 10:39:42 -0800
commit: 6c83d938cc61bd5fabaf2157fcc3936364a83f02 (patch)
tree: 2b67c0ee623e1536a36a1554c80c810572d3bdb3 /docs/sql-programming-guide.md
parent: 8f659393b270c46e940c4e98af2d996bd4fd6442 (diff)
download: spark-6c83d938cc61bd5fabaf2157fcc3936364a83f02.tar.gz
spark-6c83d938cc61bd5fabaf2157fcc3936364a83f02.tar.bz2
spark-6c83d938cc61bd5fabaf2157fcc3936364a83f02.zip
1 files changed, 1 insertions, 3 deletions
diff --git a/docs/sql-programming-guide.md b/docs/sql-programming-guide.md
index 3f9a831edd..b058833616 100644
--- a/docs/sql-programming-guide.md
+++ b/docs/sql-programming-guide.md
@@ -1895,9 +1895,7 @@ the Data Sources API. The following options are supported:
   <tr>
     <td><code>driver</code></td>
     <td>
-      The class name of the JDBC driver needed to connect to this URL. This class will be loaded
-      on the master and workers before running an JDBC commands to allow the driver to
-      register itself with the JDBC subsystem.
+      The class name of the JDBC driver to use to connect to this URL.
     </td>
   </tr>
author	Josh Rosen <joshrosen@databricks.com>	2016-01-04 10:39:42 -0800
committer	Yin Huai <yhuai@databricks.com>	2016-01-04 10:39:42 -0800
commit	6c83d938cc61bd5fabaf2157fcc3936364a83f02 (patch)
tree	2b67c0ee623e1536a36a1554c80c810572d3bdb3 /docs/sql-programming-guide.md
parent	8f659393b270c46e940c4e98af2d996bd4fd6442 (diff)
download	spark-6c83d938cc61bd5fabaf2157fcc3936364a83f02.tar.gz spark-6c83d938cc61bd5fabaf2157fcc3936364a83f02.tar.bz2 spark-6c83d938cc61bd5fabaf2157fcc3936364a83f02.zip