| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
| |
local list of java beans
Similar to SPARK-10630 it would be nice if Java users didn't have to parallelize there data explicitly (as Scala users already can skip). Issue came up in http://stackoverflow.com/questions/32613413/apache-spark-machine-learning-cant-get-estimator-example-to-work
Author: Holden Karau <holden@pigscanfly.ca>
Closes #8879 from holdenk/SPARK-10720-add-a-java-wrapper-to-create-a-dataframe-from-a-local-list-of-java-beans.
|
|
|
|
|
|
|
|
|
|
|
| |
working
https://issues.apache.org/jira/browse/SPARK-10741
I choose the second approach: do not change output exprIds when convert MetastoreRelation to LogicalRelation
Author: Wenchen Fan <cloud0fan@163.com>
Closes #8889 from cloud-fan/hot-bug.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This makes two changes:
- Allow reduce tasks to fetch multiple map output partitions -- this is a pretty small change to HashShuffleFetcher
- Move shuffle locality computation out of DAGScheduler and into ShuffledRDD / MapOutputTracker; this was needed because the code in DAGScheduler wouldn't work for RDDs that fetch multiple map output partitions from each reduce task
I also added an AdaptiveSchedulingSuite that creates RDDs depending on multiple map output partitions.
Author: Matei Zaharia <matei@databricks.com>
Closes #8844 from mateiz/spark-9852.
|
|
|
|
|
|
|
|
|
|
| |
JIRA: https://issues.apache.org/jira/browse/SPARK-10705
As described in the JIRA ticket, `DataFrame.toJSON` uses `DataFrame.mapPartitions`, which converts internal rows to external rows. We should use `queryExecution.toRdd.mapPartitions` that directly uses internal rows for better performance.
Author: Liang-Chi Hsieh <viirya@appier.com>
Closes #8865 from viirya/df-tojson-internalrow.
|
|
|
|
|
|
| |
Author: Wenchen Fan <cloud0fan@163.com>
Closes #8874 from cloud-fan/hive-agg.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
(round 2)
This patch reverts most of the changes in a previous fix #8827.
The real cause of the issue is that in `TungstenAggregate`'s prepare method we only reserve 1 page, but later when we switch to sort-based aggregation we try to acquire 1 page AND a pointer array. The longer-term fix should be to reserve also the pointer array, but for now ***we will simply not track the pointer array***. (Note that elsewhere we already don't track the pointer array, e.g. [here](https://github.com/apache/spark/blob/a18208047f06a4244703c17023bb20cbe1f59d73/sql/core/src/main/java/org/apache/spark/sql/execution/UnsafeKVExternalSorter.java#L88))
Note: This patch reuses the unit test added in #8827 so it doesn't show up in the diff.
Author: Andrew Or <andrew@databricks.com>
Closes #8888 from andrewor14/dont-track-pointer-array.
|
|
|
|
|
|
|
|
|
|
|
|
| |
Python DataFrame.
Python DataFrame.head/take now requires scanning all the partitions. This pull request changes them to delegate the actual implementation to Scala DataFrame (by calling DataFrame.take).
This is more of a hack for fixing this issue in 1.5.1. A more proper fix is to change executeCollect and executeTake to return InternalRow rather than Row, and thus eliminate the extra round-trip conversion.
Author: Reynold Xin <rxin@databricks.com>
Closes #8876 from rxin/SPARK-10731.
|
|
|
|
|
|
|
|
|
|
| |
ShuffleManager
This patch attempts to fix an issue where Spark SQL's UnsafeRowSerializer was incompatible with the `tungsten-sort` ShuffleManager.
Author: Josh Rosen <joshrosen@databricks.com>
Closes #8873 from JoshRosen/SPARK-10403.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch refactors Python UDF handling:
1. Extract the per-partition Python UDF calling logic from PythonRDD into a PythonRunner. PythonRunner itself expects iterator as input/output, and thus has no dependency on RDD. This way, we can use PythonRunner directly in a mapPartitions call, or in the future in an environment without RDDs.
2. Use PythonRunner in Spark SQL's BatchPythonEvaluation.
3. Updated BatchPythonEvaluation to only use its input once, rather than twice. This should fix Python UDF performance regression in Spark 1.5.
There are a number of small cleanups I wanted to do when I looked at the code, but I kept most of those out so the diff looks small.
This basically implements the approach in https://github.com/apache/spark/pull/8833, but with some code moving around so the correctness doesn't depend on the inner workings of Spark serialization and task execution.
Author: Reynold Xin <rxin@databricks.com>
Closes #8835 from rxin/python-iter-refactor.
|
|
|
|
|
|
|
|
|
|
| |
results
https://issues.apache.org/jira/browse/SPARK-10737
Author: Yin Huai <yhuai@databricks.com>
Closes #8854 from yhuai/SMJBug.
|
|
|
|
|
|
|
|
|
|
| |
operations
https://issues.apache.org/jira/browse/SPARK-10740
Author: Wenchen Fan <cloud0fan@163.com>
Closes #8858 from cloud-fan/non-deter.
|
|
|
|
|
|
|
|
| |
DataFrame.explain should use foreach to print the explain content.
Author: Reynold Xin <rxin@databricks.com>
Closes #8862 from rxin/map-foreach.
|
|
|
|
|
|
|
|
|
|
|
|
| |
usingColumns
JIRA: https://issues.apache.org/jira/browse/SPARK-10446
Currently the method `join(right: DataFrame, usingColumns: Seq[String])` only supports inner join. It is more convenient to have it support other join types.
Author: Liang-Chi Hsieh <viirya@appier.com>
Closes #8600 from viirya/usingcolumns_df.
|
|
|
|
|
|
|
|
|
|
|
|
| |
JdbcDialects
Reading from Microsoft SQL Server over jdbc fails when the table contains datetimeoffset types.
This patch registers a SQLServer JDBC Dialect that maps datetimeoffset to a String, as Microsoft suggest.
Author: Ewan Leith <ewan.leith@realitymine.com>
Closes #8575 from realitymine-coordinator/sqlserver.
|
|
|
|
|
|
|
|
| |
https://issues.apache.org/jira/browse/SPARK-10681
Author: Yin Huai <yhuai@databricks.com>
Closes #8806 from yhuai/SPARK-10495.
|
|
|
|
|
|
|
|
| |
It would be nice to support creating a DataFrame directly from a Java List of Row.
Author: Holden Karau <holden@pigscanfly.ca>
Closes #8779 from holdenk/SPARK-10630-create-DataFrame-from-Java-List.
|
|
|
|
|
|
|
|
|
|
| |
It does not make much sense to set `spark.shuffle.spill` or `spark.sql.planner.externalSort` to false: I believe that these configurations were initially added as "escape hatches" to guard against bugs in the external operators, but these operators are now mature and well-tested. In addition, these configurations are not handled in a consistent way anymore: SQL's Tungsten codepath ignores these configurations and will continue to use spilling operators. Similarly, Spark Core's `tungsten-sort` shuffle manager does not respect `spark.shuffle.spill=false`.
This pull request removes these configurations, adds warnings at the appropriate places, and deletes a large amount of code which was only used in code paths that did not support spilling.
Author: Josh Rosen <joshrosen@databricks.com>
Closes #8831 from JoshRosen/remove-ability-to-disable-spilling.
|
|
|
|
|
|
|
|
|
|
| |
Since `scala.util.parsing.combinator.Parsers` is thread-safe since Scala 2.10 (See [SI-4929](https://issues.scala-lang.org/browse/SI-4929)), we can change SqlParser to object to avoid memory leak.
I didn't change other subclasses of `scala.util.parsing.combinator.Parsers` because there is only one instance in one SQLContext, which should not be an issue.
Author: zsxwing <zsxwing@gmail.com>
Closes #8357 from zsxwing/sql-memory-leak.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When `TungstenAggregation` hits memory pressure, it switches from hash-based to sort-based aggregation in-place. However, in the process we try to allocate the pointer array for writing to the new `UnsafeExternalSorter` *before* actually freeing the memory from the hash map. This lead to the following exception:
```
java.io.IOException: Could not acquire 65536 bytes of memory
at org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.initializeForWriting(UnsafeExternalSorter.java:169)
at org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.spill(UnsafeExternalSorter.java:220)
at org.apache.spark.sql.execution.UnsafeKVExternalSorter.<init>(UnsafeKVExternalSorter.java:126)
at org.apache.spark.sql.execution.UnsafeFixedWidthAggregationMap.destructAndCreateExternalSorter(UnsafeFixedWidthAggregationMap.java:257)
at org.apache.spark.sql.execution.aggregate.TungstenAggregationIterator.switchToSortBasedAggregation(TungstenAggregationIterator.scala:435)
```
Author: Andrew Or <andrew@databricks.com>
Closes #8827 from andrewor14/allocate-pointer-array.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Except #8742
Intersect and Except are both set operators and they use the all the columns to compare equality between rows. When pushing their Project parent down, the relations they based on would change, therefore not an equivalent transformation.
JIRA: https://issues.apache.org/jira/browse/SPARK-10539
I added some comments based on the fix of https://github.com/apache/spark/pull/8742.
Author: Yijie Shen <henry.yijieshen@gmail.com>
Author: Yin Huai <yhuai@databricks.com>
Closes #8823 from yhuai/fix_set_optimization.
|
|
|
|
|
|
|
|
|
|
|
|
| |
InMemoryColumnarTableScan
Many of the fields in InMemoryColumnar scan and InMemoryRelation can be made transient.
This reduces my 1000ms job to abt 700 ms . The task size reduces from 2.8 mb to ~1300kb
Author: Yash Datta <Yash.Datta@guavus.com>
Closes #8604 from saucam/serde.
|
|
|
|
|
|
|
|
| |
https://issues.apache.org/jira/browse/SPARK-10639
Author: Yin Huai <yhuai@databricks.com>
Closes #8788 from yhuai/udafConversion.
|
|
|
|
|
|
|
|
|
|
|
|
| |
JIRA: https://issues.apache.org/jira/browse/SPARK-10459
As mentioned in the JIRA, `PythonUDF` actually could process `UnsafeRow`.
Specially, the rows in `childResults` in `BatchPythonEvaluation` will be projected to a `MutableRow`. So I think we can enable `canProcessUnsafeRows` for `BatchPythonEvaluation` and get rid of redundant `ConvertToSafe`.
Author: Liang-Chi Hsieh <viirya@appier.com>
Closes #8616 from viirya/pyudf-unsafe.
|
|
|
|
|
|
|
|
|
| |
1. Support collecting data of MapType from DataFrame.
2. Support data of MapType in createDataFrame.
Author: Sun Rui <rui.sun@intel.com>
Closes #8711 from sun-rui/SPARK-10050.
|
|
|
|
|
|
|
|
|
|
|
|
| |
the table.
Current implementation uses query with a LIMIT clause to find if table already exists. This syntax works only in some database systems. This patch changes the default query to the one that is likely to work on most databases, and adds a new method to the JdbcDialect abstract class to allow dialects to override the default query.
I looked at using the JDBC meta data calls, it turns out there is no common way to find the current schema, catalog..etc. There is a new method Connection.getSchema() , but that is available only starting jdk1.7 , and existing jdbc drivers may not have implemented it. Other option was to use jdbc escape syntax clause for LIMIT, not sure on how well this supported in all the databases also. After looking at all the jdbc metadata options my conclusion was most common way is to use the simple select query with 'where 1 =0' , and allow dialects to customize as needed
Author: sureshthalamati <suresh.thalamati@gmail.com>
Closes #8676 from sureshthalamati/table_exists_spark-9078.
|
|
|
|
|
|
|
|
|
|
|
|
| |
SQLContext
Instead of relying on `DataFrames` to verify our answers, we can just use simple arrays. This significantly simplifies the test logic for `LocalNode`s and reduces a lot of code duplicated from `SparkPlanTest`.
This also fixes an additional issue [SPARK-10624](https://issues.apache.org/jira/browse/SPARK-10624) where the output of `TakeOrderedAndProjectNode` is not actually ordered.
Author: Andrew Or <andrew@databricks.com>
Closes #8764 from andrewor14/sql-local-tests-cleanup.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
OutputCommitCoordinator
When speculative execution is enabled, consider a scenario where the authorized committer of a particular output partition fails during the OutputCommitter.commitTask() call. In this case, the OutputCommitCoordinator is supposed to release that committer's exclusive lock on committing once that task fails. However, due to a unit mismatch (we used task attempt number in one place and task attempt id in another) the lock will not be released, causing Spark to go into an infinite retry loop.
This bug was masked by the fact that the OutputCommitCoordinator does not have enough end-to-end tests (the current tests use many mocks). Other factors contributing to this bug are the fact that we have many similarly-named identifiers that have different semantics but the same data types (e.g. attemptNumber and taskAttemptId, with inconsistent variable naming which makes them difficult to distinguish).
This patch adds a regression test and fixes this bug by always using task attempt numbers throughout this code.
Author: Josh Rosen <joshrosen@databricks.com>
Closes #8544 from JoshRosen/SPARK-10381.
|
|
|
|
|
|
|
|
| |
The idea is that we should separate the function call that does memory reservation (i.e. prepare) from the function call that consumes the input (e.g. open()), so all operators can be a chance to reserve memory before they are all consumed.
Author: Reynold Xin <rxin@databricks.com>
Closes #8761 from rxin/SPARK-10612.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
*Note: this is for master branch only.* The fix for branch-1.5 is at #8721.
The query execution ID is currently passed from a thread to its children, which is not the intended behavior. This led to `IllegalArgumentException: spark.sql.execution.id is already set` when running queries in parallel, e.g.:
```
(1 to 100).par.foreach { _ =>
sc.parallelize(1 to 5).map { i => (i, i) }.toDF("a", "b").count()
}
```
The cause is `SparkContext`'s local properties are inherited by default. This patch adds a way to exclude keys we don't want to be inherited, and makes SQL go through that code path.
Author: Andrew Or <andrew@databricks.com>
Closes #8710 from andrewor14/concurrent-sql-executions.
|
|
|
|
|
|
|
|
|
|
| |
JIRA: https://issues.apache.org/jira/browse/SPARK-10437
If an expression in `SortOrder` is a resolved one, such as `count(1)`, the corresponding rule in `Analyzer` to make it work in order by will not be applied.
Author: Liang-Chi Hsieh <viirya@appier.com>
Closes #8599 from viirya/orderby-agg.
|
|
|
|
|
|
| |
run-tests.py."
This reverts commit 8abef21dac1a6538c4e4e0140323b83d804d602b.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This change does two things:
- tag a few tests and adds the mechanism in the build to be able to disable those tags,
both in maven and sbt, for both junit and scalatest suites.
- add some logic to run-tests.py to disable some tags depending on what files have
changed; that's used to disable expensive tests when a module hasn't explicitly
been changed, to speed up testing for changes that don't directly affect those
modules.
Author: Marcelo Vanzin <vanzin@cloudera.com>
Closes #8437 from vanzin/test-tags.
|
|
|
|
|
|
| |
Author: Reynold Xin <rxin@databricks.com>
Closes #8350 from rxin/1.6.
|
|
|
|
|
|
|
|
| |
This PR is in conflict with #8535 and #8573. Will update this one when they are merged.
Author: zsxwing <zsxwing@gmail.com>
Closes #8642 from zsxwing/expand-nest-join.
|
|
|
|
|
|
|
|
|
|
| |
Alternative to PR #6122; in this case the refactored out classes are replaced by inner classes with the same name for backwards binary compatibility
* process in a lighter-weight, backwards-compatible way
Author: Edoardo Vacchi <uncommonnonsense@gmail.com>
Closes #6356 from evacchi/sqlctx-refactoring-lite.
|
|
|
|
|
|
|
|
|
|
| |
JobContext methods
This is a followup to #8499 which adds a Scalastyle rule to mandate the use of SparkHadoopUtil's JobContext accessor methods and fixes the existing violations.
Author: Josh Rosen <joshrosen@databricks.com>
Closes #8521 from JoshRosen/SPARK-10330-part2.
|
|
|
|
|
|
|
|
|
|
|
| |
Adding STDDEV support for DataFrame using 1-pass online /parallel algorithm to compute variance. Please review the code change.
Author: JihongMa <linlin200605@gmail.com>
Author: Jihong MA <linlin200605@gmail.com>
Author: Jihong MA <jihongma@jihongs-mbp.usca.ibm.com>
Author: Jihong MA <jihongma@Jihongs-MacBook-Pro.local>
Closes #6297 from JihongMA/SPARK-SQL.
|
|
|
|
|
|
|
|
| |
Fix a few Java API test style issues: unused generic types, exceptions, wrong assert argument order
Author: Sean Owen <sowen@cloudera.com>
Closes #8706 from srowen/SPARK-10547.
|
|
|
|
|
|
|
|
|
| |
1. Hide `LocalNodeIterator` behind the `LocalNode#asIterator` method
2. Add tests for this
Author: Andrew Or <andrew@databricks.com>
Closes #8708 from andrewor14/local-hash-join-follow-up.
|
|
|
|
|
|
|
|
|
|
| |
sample and intersect operators
This PR is in conflict with #8535. I will update this one when #8535 gets merged.
Author: zsxwing <zsxwing@gmail.com>
Closes #8573 from zsxwing/more-local-operators.
|
|
|
|
|
|
|
|
| |
Before this fix, `MyDenseVectorUDT.typeName` gives `mydensevecto`, which is not desirable.
Author: Cheng Lian <lian@databricks.com>
Closes #8640 from liancheng/spark-10472/udt-type-name.
|
|
|
|
|
|
|
|
| |
`LeftOutputIterator` and `RightOutputIterator` are symmetrically identical and can share a lot of code. If someone makes a change in one but forgets to do the same thing in the other we'll end up with inconsistent behavior. This patch also adds inline comments to clarify the intention of the code.
Author: Andrew Or <andrew@databricks.com>
Closes #8596 from andrewor14/smoj-cleanup.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
this PR :
1. Enhance reflection in RBackend. Automatically matching a Java array to Scala Seq when finding methods. Util functions like seq(), listToSeq() in R side can be removed, as they will conflict with the Serde logic that transferrs a Scala seq to R side.
2. Enhance the SerDe to support transferring a Scala seq to R side. Data of ArrayType in DataFrame
after collection is observed to be of Scala Seq type.
3. Support ArrayType in createDataFrame().
Author: Sun Rui <rui.sun@intel.com>
Closes #8458 from sun-rui/SPARK-10049.
|
|
|
|
|
|
|
|
|
|
|
| |
This PR includes the following changes:
- Add SQLConf to LocalNode
- Add HashJoinNode
- Add ConvertToUnsafeNode and ConvertToSafeNode.scala to test unsafe hash join.
Author: zsxwing <zsxwing@gmail.com>
Closes #8535 from zsxwing/SPARK-9990.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Data Spill with UnsafeRow causes assert failure.
```
java.lang.AssertionError: assertion failed
at scala.Predef$.assert(Predef.scala:165)
at org.apache.spark.sql.execution.UnsafeRowSerializerInstance$$anon$2.writeKey(UnsafeRowSerializer.scala:75)
at org.apache.spark.storage.DiskBlockObjectWriter.write(DiskBlockObjectWriter.scala:180)
at org.apache.spark.util.collection.ExternalSorter$$anonfun$writePartitionedFile$2$$anonfun$apply$1.apply(ExternalSorter.scala:688)
at org.apache.spark.util.collection.ExternalSorter$$anonfun$writePartitionedFile$2$$anonfun$apply$1.apply(ExternalSorter.scala:687)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at org.apache.spark.util.collection.ExternalSorter$$anonfun$writePartitionedFile$2.apply(ExternalSorter.scala:687)
at org.apache.spark.util.collection.ExternalSorter$$anonfun$writePartitionedFile$2.apply(ExternalSorter.scala:683)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at org.apache.spark.util.collection.ExternalSorter.writePartitionedFile(ExternalSorter.scala:683)
at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:80)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:88)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
```
To reproduce that with code (thanks andrewor14):
```scala
bin/spark-shell --master local
--conf spark.shuffle.memoryFraction=0.005
--conf spark.shuffle.sort.bypassMergeThreshold=0
sc.parallelize(1 to 2 * 1000 * 1000, 10)
.map { i => (i, i) }.toDF("a", "b").groupBy("b").avg().count()
```
Author: Cheng Hao <hao.cheng@intel.com>
Closes #8635 from chenghao-intel/unsafe_spill.
|
|
|
|
|
|
|
|
| |
for master
Author: Cheng Lian <lian@databricks.com>
Closes #8670 from liancheng/spark-10301/address-pr-comments.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This PR is based on #8383 , thanks to viirya
JIRA: https://issues.apache.org/jira/browse/SPARK-9730
This patch adds the Full Outer Join support for SortMergeJoin. A new class SortMergeFullJoinScanner is added to scan rows from left and right iterators. FullOuterIterator is simply a wrapper of type RowIterator to consume joined rows from SortMergeFullJoinScanner.
Closes #8383
Author: Liang-Chi Hsieh <viirya@appier.com>
Author: Davies Liu <davies@databricks.com>
Closes #8579 from davies/smj_fullouter.
|
|
|
|
|
|
|
|
|
|
|
| |
The bulk of the changes are on `transient` annotation on class parameter. Often the compiler doesn't generate a field for this parameters, so the the transient annotation would be unnecessary.
But if the class parameter are used in methods, then fields are created. So it is safer to keep the annotations.
The remainder are some potential bugs, and deprecated syntax.
Author: Luc Bourlier <luc.bourlier@typesafe.com>
Closes #8433 from skyluc/issue/sbt-2.11.
|
|
|
|
|
|
| |
Author: Michael Armbrust <michael@databricks.com>
Closes #8659 from marmbrus/testBuildBreak.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
its project list
```scala
import org.apache.spark.sql.hive.execution.HiveTableScan
sql("select key, value, key + 1 from src").registerTempTable("abc")
cacheTable("abc")
val sparkPlan = sql(
"""select a.key, b.key, c.key from
|abc a join abc b on a.key=b.key
|join abc c on a.key=c.key""".stripMargin).queryExecution.sparkPlan
assert(sparkPlan.collect { case e: InMemoryColumnarTableScan => e }.size === 3) // failed
assert(sparkPlan.collect { case e: HiveTableScan => e }.size === 0) // failed
```
The actual plan is:
```
== Parsed Logical Plan ==
'Project [unresolvedalias('a.key),unresolvedalias('b.key),unresolvedalias('c.key)]
'Join Inner, Some(('a.key = 'c.key))
'Join Inner, Some(('a.key = 'b.key))
'UnresolvedRelation [abc], Some(a)
'UnresolvedRelation [abc], Some(b)
'UnresolvedRelation [abc], Some(c)
== Analyzed Logical Plan ==
key: int, key: int, key: int
Project [key#14,key#61,key#66]
Join Inner, Some((key#14 = key#66))
Join Inner, Some((key#14 = key#61))
Subquery a
Subquery abc
Project [key#14,value#15,(key#14 + 1) AS _c2#16]
MetastoreRelation default, src, None
Subquery b
Subquery abc
Project [key#61,value#62,(key#61 + 1) AS _c2#58]
MetastoreRelation default, src, None
Subquery c
Subquery abc
Project [key#66,value#67,(key#66 + 1) AS _c2#63]
MetastoreRelation default, src, None
== Optimized Logical Plan ==
Project [key#14,key#61,key#66]
Join Inner, Some((key#14 = key#66))
Project [key#14,key#61]
Join Inner, Some((key#14 = key#61))
Project [key#14]
InMemoryRelation [key#14,value#15,_c2#16], true, 10000, StorageLevel(true, true, false, true, 1), (Project [key#14,value#15,(key#14 + 1) AS _c2#16]), Some(abc)
Project [key#61]
MetastoreRelation default, src, None
Project [key#66]
MetastoreRelation default, src, None
== Physical Plan ==
TungstenProject [key#14,key#61,key#66]
BroadcastHashJoin [key#14], [key#66], BuildRight
TungstenProject [key#14,key#61]
BroadcastHashJoin [key#14], [key#61], BuildRight
ConvertToUnsafe
InMemoryColumnarTableScan [key#14], (InMemoryRelation [key#14,value#15,_c2#16], true, 10000, StorageLevel(true, true, false, true, 1), (Project [key#14,value#15,(key#14 + 1) AS _c2#16]), Some(abc))
ConvertToUnsafe
HiveTableScan [key#61], (MetastoreRelation default, src, None)
ConvertToUnsafe
HiveTableScan [key#66], (MetastoreRelation default, src, None)
```
Author: Cheng Hao <hao.cheng@intel.com>
Closes #8494 from chenghao-intel/weird_cache.
|