diff options
author | hyukjinkwon <gurwls223@gmail.com> | 2016-04-14 15:43:44 +0800 |
---|---|---|
committer | Wenchen Fan <wenchen@databricks.com> | 2016-04-14 15:43:44 +0800 |
commit | b4819404a65f9b97c1f8deb1fcb8419969831574 (patch) | |
tree | 3e8fa19af63386bf700a3890f924600ee2b3e9a1 /core/src | |
parent | 62b7f306fbf77de7f6cbb36181ebebdb4a55acc5 (diff) | |
download | spark-b4819404a65f9b97c1f8deb1fcb8419969831574.tar.gz spark-b4819404a65f9b97c1f8deb1fcb8419969831574.tar.bz2 spark-b4819404a65f9b97c1f8deb1fcb8419969831574.zip |
[SPARK-14596][SQL] Remove not used SqlNewHadoopRDD and some more unused imports
## What changes were proposed in this pull request?
Old `HadoopFsRelation` API includes `buildInternalScan()` which uses `SqlNewHadoopRDD` in `ParquetRelation`.
Because now the old API is removed, `SqlNewHadoopRDD` is not used anymore.
So, this PR removes `SqlNewHadoopRDD` and several unused imports.
This was discussed in https://github.com/apache/spark/pull/12326.
## How was this patch tested?
Several related existing unit tests and `sbt scalastyle`.
Author: hyukjinkwon <gurwls223@gmail.com>
Closes #12354 from HyukjinKwon/SPARK-14596.
Diffstat (limited to 'core/src')
-rw-r--r-- | core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala | 8 | ||||
-rw-r--r-- | core/src/main/scala/org/apache/spark/rdd/InputFileNameHolder.scala (renamed from core/src/main/scala/org/apache/spark/rdd/SqlNewHadoopRDDState.scala) | 6 |
2 files changed, 6 insertions, 8 deletions
diff --git a/core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala b/core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala index 08db96edd6..ac5ba9e79f 100644 --- a/core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala +++ b/core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala @@ -213,15 +213,13 @@ class HadoopRDD[K, V]( logInfo("Input split: " + split.inputSplit) val jobConf = getJobConf() - // TODO: there is a lot of duplicate code between this and NewHadoopRDD and SqlNewHadoopRDD - val inputMetrics = context.taskMetrics().registerInputMetrics(DataReadMethod.Hadoop) val existingBytesRead = inputMetrics.bytesRead // Sets the thread local variable for the file's name split.inputSplit.value match { - case fs: FileSplit => SqlNewHadoopRDDState.setInputFileName(fs.getPath.toString) - case _ => SqlNewHadoopRDDState.unsetInputFileName() + case fs: FileSplit => InputFileNameHolder.setInputFileName(fs.getPath.toString) + case _ => InputFileNameHolder.unsetInputFileName() } // Find a function that will return the FileSystem bytes read by this thread. Do this before @@ -271,7 +269,7 @@ class HadoopRDD[K, V]( override def close() { if (reader != null) { - SqlNewHadoopRDDState.unsetInputFileName() + InputFileNameHolder.unsetInputFileName() // Close the reader and release it. Note: it's very important that we don't close the // reader more than once, since that exposes us to MAPREDUCE-5918 when running against // Hadoop 1.x and older Hadoop 2.x releases. That bug can lead to non-deterministic diff --git a/core/src/main/scala/org/apache/spark/rdd/SqlNewHadoopRDDState.scala b/core/src/main/scala/org/apache/spark/rdd/InputFileNameHolder.scala index 3f15fff793..108e9d2558 100644 --- a/core/src/main/scala/org/apache/spark/rdd/SqlNewHadoopRDDState.scala +++ b/core/src/main/scala/org/apache/spark/rdd/InputFileNameHolder.scala @@ -20,10 +20,10 @@ package org.apache.spark.rdd import org.apache.spark.unsafe.types.UTF8String /** - * State for SqlNewHadoopRDD objects. This is split this way because of the package splits. - * TODO: Move/Combine this with org.apache.spark.sql.datasources.SqlNewHadoopRDD + * This holds file names of the current Spark task. This is used in HadoopRDD, + * FileScanRDD and InputFileName function in Spark SQL. */ -private[spark] object SqlNewHadoopRDDState { +private[spark] object InputFileNameHolder { /** * The thread variable for the name of the current file being read. This is used by * the InputFileName function in Spark SQL. |