diff options
author | Josh Rosen <joshrosen@databricks.com> | 2016-01-04 10:39:42 -0800 |
---|---|---|
committer | Yin Huai <yhuai@databricks.com> | 2016-01-04 10:39:42 -0800 |
commit | 6c83d938cc61bd5fabaf2157fcc3936364a83f02 (patch) | |
tree | 2b67c0ee623e1536a36a1554c80c810572d3bdb3 /licenses/LICENSE-boto.txt | |
parent | 8f659393b270c46e940c4e98af2d996bd4fd6442 (diff) | |
download | spark-6c83d938cc61bd5fabaf2157fcc3936364a83f02.tar.gz spark-6c83d938cc61bd5fabaf2157fcc3936364a83f02.tar.bz2 spark-6c83d938cc61bd5fabaf2157fcc3936364a83f02.zip |
[SPARK-12579][SQL] Force user-specified JDBC driver to take precedence
Spark SQL's JDBC data source allows users to specify an explicit JDBC driver to load (using the `driver` argument), but in the current code it's possible that the user-specified driver will not be used when it comes time to actually create a JDBC connection.
In a nutshell, the problem is that you might have multiple JDBC drivers on the classpath that claim to be able to handle the same subprotocol, so simply registering the user-provided driver class with the our `DriverRegistry` and JDBC's `DriverManager` is not sufficient to ensure that it's actually used when creating the JDBC connection.
This patch addresses this issue by first registering the user-specified driver with the DriverManager, then iterating over the driver manager's loaded drivers in order to obtain the correct driver and use it to create a connection (previously, we just called `DriverManager.getConnection()` directly).
If a user did not specify a JDBC driver to use, then we call `DriverManager.getDriver` to figure out the class of the driver to use, then pass that class's name to executors; this guards against corner-case bugs in situations where the driver and executor JVMs might have different sets of JDBC drivers on their classpaths (previously, there was the (rare) potential for `DriverManager.getConnection()` to use different drivers on the driver and executors if the user had not explicitly specified a JDBC driver class and the classpaths were different).
This patch is inspired by a similar patch that I made to the `spark-redshift` library (https://github.com/databricks/spark-redshift/pull/143), which contains its own modified fork of some of Spark's JDBC data source code (for cross-Spark-version compatibility reasons).
Author: Josh Rosen <joshrosen@databricks.com>
Closes #10519 from JoshRosen/jdbc-driver-precedence.
Diffstat (limited to 'licenses/LICENSE-boto.txt')
0 files changed, 0 insertions, 0 deletions