aboutsummaryrefslogtreecommitdiff
path: root/docs/sql-programming-guide.md
diff options
context:
space:
mode:
authorJosh Rosen <joshrosen@databricks.com>2016-01-04 10:39:42 -0800
committerYin Huai <yhuai@databricks.com>2016-01-04 10:39:42 -0800
commit6c83d938cc61bd5fabaf2157fcc3936364a83f02 (patch)
tree2b67c0ee623e1536a36a1554c80c810572d3bdb3 /docs/sql-programming-guide.md
parent8f659393b270c46e940c4e98af2d996bd4fd6442 (diff)
downloadspark-6c83d938cc61bd5fabaf2157fcc3936364a83f02.tar.gz
spark-6c83d938cc61bd5fabaf2157fcc3936364a83f02.tar.bz2
spark-6c83d938cc61bd5fabaf2157fcc3936364a83f02.zip
[SPARK-12579][SQL] Force user-specified JDBC driver to take precedence
Spark SQL's JDBC data source allows users to specify an explicit JDBC driver to load (using the `driver` argument), but in the current code it's possible that the user-specified driver will not be used when it comes time to actually create a JDBC connection. In a nutshell, the problem is that you might have multiple JDBC drivers on the classpath that claim to be able to handle the same subprotocol, so simply registering the user-provided driver class with the our `DriverRegistry` and JDBC's `DriverManager` is not sufficient to ensure that it's actually used when creating the JDBC connection. This patch addresses this issue by first registering the user-specified driver with the DriverManager, then iterating over the driver manager's loaded drivers in order to obtain the correct driver and use it to create a connection (previously, we just called `DriverManager.getConnection()` directly). If a user did not specify a JDBC driver to use, then we call `DriverManager.getDriver` to figure out the class of the driver to use, then pass that class's name to executors; this guards against corner-case bugs in situations where the driver and executor JVMs might have different sets of JDBC drivers on their classpaths (previously, there was the (rare) potential for `DriverManager.getConnection()` to use different drivers on the driver and executors if the user had not explicitly specified a JDBC driver class and the classpaths were different). This patch is inspired by a similar patch that I made to the `spark-redshift` library (https://github.com/databricks/spark-redshift/pull/143), which contains its own modified fork of some of Spark's JDBC data source code (for cross-Spark-version compatibility reasons). Author: Josh Rosen <joshrosen@databricks.com> Closes #10519 from JoshRosen/jdbc-driver-precedence.
Diffstat (limited to 'docs/sql-programming-guide.md')
-rw-r--r--docs/sql-programming-guide.md4
1 files changed, 1 insertions, 3 deletions
diff --git a/docs/sql-programming-guide.md b/docs/sql-programming-guide.md
index 3f9a831edd..b058833616 100644
--- a/docs/sql-programming-guide.md
+++ b/docs/sql-programming-guide.md
@@ -1895,9 +1895,7 @@ the Data Sources API. The following options are supported:
<tr>
<td><code>driver</code></td>
<td>
- The class name of the JDBC driver needed to connect to this URL. This class will be loaded
- on the master and workers before running an JDBC commands to allow the driver to
- register itself with the JDBC subsystem.
+ The class name of the JDBC driver to use to connect to this URL.
</td>
</tr>