aboutsummaryrefslogtreecommitdiff
path: root/R/pkg
diff options
context:
space:
mode:
authorWenchen Fan <wenchen@databricks.com>2016-11-02 18:05:14 -0700
committerYin Huai <yhuai@databricks.com>2016-11-02 18:05:14 -0700
commit3a1bc6f4780f8384c1211b1335e7394a4a28377e (patch)
tree5d6e8f5d035d4a8c1078d93348087129d5582750 /R/pkg
parentfd90541c35af2bccf0155467bec8cea7c8865046 (diff)
downloadspark-3a1bc6f4780f8384c1211b1335e7394a4a28377e.tar.gz
spark-3a1bc6f4780f8384c1211b1335e7394a4a28377e.tar.bz2
spark-3a1bc6f4780f8384c1211b1335e7394a4a28377e.zip
[SPARK-17470][SQL] unify path for data source table and locationUri for hive serde table
## What changes were proposed in this pull request? Due to a limitation of hive metastore(table location must be directory path, not file path), we always store `path` for data source table in storage properties, instead of the `locationUri` field. However, we should not expose this difference to `CatalogTable` level, but just treat it as a hack in `HiveExternalCatalog`, like we store table schema of data source table in table properties. This PR unifies `path` and `locationUri` outside of `HiveExternalCatalog`, both data source table and hive serde table should use the `locationUri` field. This PR also unifies the way we handle default table location for managed table. Previously, the default table location of hive serde managed table is set by external catalog, but the one of data source table is set by command. After this PR, we follow the hive way and the default table location is always set by external catalog. For managed non-file-based tables, we will assign a default table location and create an empty directory for it, the table location will be removed when the table is dropped. This is reasonable as metastore doesn't care about whether a table is file-based or not, and an empty table directory has no harm. For external non-file-based tables, ideally we can omit the table location, but due to a hive metastore issue, we will assign a random location to it, and remove it right after the table is created. See SPARK-15269 for more details. This is fine as it's well isolated in `HiveExternalCatalog`. To keep the existing behaviour of the `path` option, in this PR we always add the `locationUri` to storage properties using key `path`, before passing storage properties to `DataSource` as data source options. ## How was this patch tested? existing tests. Author: Wenchen Fan <wenchen@databricks.com> Closes #15024 from cloud-fan/path.
Diffstat (limited to 'R/pkg')
-rw-r--r--R/pkg/inst/tests/testthat/test_sparkSQL.R4
1 files changed, 2 insertions, 2 deletions
diff --git a/R/pkg/inst/tests/testthat/test_sparkSQL.R b/R/pkg/inst/tests/testthat/test_sparkSQL.R
index d7fe6b3282..ee48baa59c 100644
--- a/R/pkg/inst/tests/testthat/test_sparkSQL.R
+++ b/R/pkg/inst/tests/testthat/test_sparkSQL.R
@@ -2659,7 +2659,7 @@ test_that("Call DataFrameWriter.save() API in Java without path and check argume
# It makes sure that we can omit path argument in write.df API and then it calls
# DataFrameWriter.save() without path.
expect_error(write.df(df, source = "csv"),
- "Error in save : illegal argument - 'path' is not specified")
+ "Error in save : illegal argument - Expected exactly one path to be specified")
expect_error(write.json(df, jsonPath),
"Error in json : analysis error - path file:.*already exists")
expect_error(write.text(df, jsonPath),
@@ -2667,7 +2667,7 @@ test_that("Call DataFrameWriter.save() API in Java without path and check argume
expect_error(write.orc(df, jsonPath),
"Error in orc : analysis error - path file:.*already exists")
expect_error(write.parquet(df, jsonPath),
- "Error in parquet : analysis error - path file:.*already exists")
+ "Error in parquet : analysis error - path file:.*already exists")
# Arguments checking in R side.
expect_error(write.df(df, "data.tmp", source = c(1, 2)),