[SPARK-12579][SQL] Force user-specified JDBC driver to take precedence - spark

diff options

author	Josh Rosen <joshrosen@databricks.com>	2016-01-04 10:39:42 -0800
committer	Yin Huai <yhuai@databricks.com>	2016-01-04 10:39:42 -0800
commit	6c83d938cc61bd5fabaf2157fcc3936364a83f02 (patch)
tree	2b67c0ee623e1536a36a1554c80c810572d3bdb3 /licenses/LICENSE-boto.txt
parent	8f659393b270c46e940c4e98af2d996bd4fd6442 (diff)
download	spark-6c83d938cc61bd5fabaf2157fcc3936364a83f02.tar.gz spark-6c83d938cc61bd5fabaf2157fcc3936364a83f02.tar.bz2 spark-6c83d938cc61bd5fabaf2157fcc3936364a83f02.zip

[SPARK-12579][SQL] Force user-specified JDBC driver to take precedence

Spark SQL's JDBC data source allows users to specify an explicit JDBC driver to load (using the `driver` argument), but in the current code it's possible that the user-specified driver will not be used when it comes time to actually create a JDBC connection. In a nutshell, the problem is that you might have multiple JDBC drivers on the classpath that claim to be able to handle the same subprotocol, so simply registering the user-provided driver class with the our `DriverRegistry` and JDBC's `DriverManager` is not sufficient to ensure that it's actually used when creating the JDBC connection. This patch addresses this issue by first registering the user-specified driver with the DriverManager, then iterating over the driver manager's loaded drivers in order to obtain the correct driver and use it to create a connection (previously, we just called `DriverManager.getConnection()` directly). If a user did not specify a JDBC driver to use, then we call `DriverManager.getDriver` to figure out the class of the driver to use, then pass that class's name to executors; this guards against corner-case bugs in situations where the driver and executor JVMs might have different sets of JDBC drivers on their classpaths (previously, there was the (rare) potential for `DriverManager.getConnection()` to use different drivers on the driver and executors if the user had not explicitly specified a JDBC driver class and the classpaths were different). This patch is inspired by a similar patch that I made to the `spark-redshift` library (https://github.com/databricks/spark-redshift/pull/143), which contains its own modified fork of some of Spark's JDBC data source code (for cross-Spark-version compatibility reasons). Author: Josh Rosen <joshrosen@databricks.com> Closes #10519 from JoshRosen/jdbc-driver-precedence.

Diffstat (limited to 'licenses/LICENSE-boto.txt')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: