| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
unsupported types in Schema creation
Currently if trying to register an RDD (or DataFrame in 1.3) as a table that has types that have no supported Schema representation (e.g. type "Any") - it would throw a match error. e.g. scala.MatchError: Any (of class scala.reflect.internal.Types$ClassNoArgsTypeRef)
This fix is just to have a nicer error message than a MatchError
Author: Eran Medan <ehrann.mehdan@gmail.com>
Closes #5235 from eranation/patch-2 and squashes the following commits:
af4b1a2 [Eran Medan] Line should be under 100 chars
0c69e9d [Eran Medan] Change from sys.error UnsupportedOperationException
524be86 [Eran Medan] better exception than scala.MatchError: Any
|
|
|
|
|
|
|
|
|
| |
Author: Reynold Xin <rxin@databricks.com>
Closes #5226 from rxin/empty-df and squashes the following commits:
1306d88 [Reynold Xin] Proper fix.
e135bb9 [Reynold Xin] [SPARK-6564][SQL] SQLContext.emptyDataFrame should contain 0 rows, not 1 row.
|
|
|
|
|
|
|
|
|
| |
Author: Michael Armbrust <michael@databricks.com>
Closes #5191 from marmbrus/kryoRowsWithSchema and squashes the following commits:
bb83522 [Michael Armbrust] Fix serialization of GenericRowWithSchema using kryo
f914f16 [Michael Armbrust] Add no arg constructor to GenericRowWithSchema
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously this could result in sets compare equals when in fact the right was a subset of the left.
Based on #5133 by sisihj
Author: sisihj <jun.hejun@huawei.com>
Author: Michael Armbrust <michael@databricks.com>
Closes #5194 from marmbrus/pr/5133 and squashes the following commits:
5ed4615 [Michael Armbrust] fix imports
d4cbbc0 [Michael Armbrust] Add test cases
0a0834f [sisihj] AttributeSet.equal should compare size
|
|
|
|
|
|
|
|
|
|
|
| |
Current `castStruct` should be very slow. This pr slightly improves it.
Author: Liang-Chi Hsieh <viirya@gmail.com>
Closes #5017 from viirya/faster_caststruct and squashes the following commits:
385d5b0 [Liang-Chi Hsieh] Further improved.
746fcfb [Liang-Chi Hsieh] Make castStruct faster.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As issue [SPARK-6483](https://issues.apache.org/jira/browse/SPARK-6483) description, ScalaUdf is low performance because of calling *asInstanceOf* to convert per record.
With this, the performance of ScalaUdf is the same as other case.
thank lianhuiwang for telling me how to resolve this problem.
Author: zzcclp <xm_zzc@sina.com>
Closes #5154 from zzcclp/SPARK-6483 and squashes the following commits:
5ac6e09 [zzcclp] Add a newline at the end of source file
cc6868e [zzcclp] Fix for fail on unit test.
0a8cdc3 [zzcclp] indention issue
b73836a [zzcclp] Access Seq[Expression] element by :: operator, and update the code gen script.
7763848 [zzcclp] rebase from master
|
|
|
|
|
|
|
|
|
|
| |
I think after this PR, we can finally turn the rule on. There are still some smaller ones that need to be fixed, but those are easier.
Author: Reynold Xin <rxin@databricks.com>
Closes #5162 from rxin/catalyst-explicit-types and squashes the following commits:
e7eac03 [Reynold Xin] [SPARK-6428][SQL] Added explicit types for all public methods in catalyst.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously it was okay to throw away subqueries after analysis, as we would never try to use that tree for resolution again. However, with eager analysis in `DataFrame`s this can cause errors for queries such as:
```scala
val df = Seq(1,2,3).map(i => (i, i.toString)).toDF("int", "str")
df.as('x).join(df.as('y), $"x.str" === $"y.str").groupBy("x.str").count()
```
As a result, in this PR we defer the elimination of subqueries until the optimization phase.
Author: Michael Armbrust <michael@databricks.com>
Closes #5160 from marmbrus/subqueriesInDfs and squashes the following commits:
a9bb262 [Michael Armbrust] Update Optimizer.scala
27d25bf [Michael Armbrust] fix hive tests
9137e03 [Michael Armbrust] add type
81cd597 [Michael Armbrust] Avoid eliminating subqueries until optimization
|
|
|
|
|
|
|
|
| |
Author: Michael Armbrust <michael@databricks.com>
Closes #5155 from marmbrus/errorMessages and squashes the following commits:
b898188 [Michael Armbrust] Fix formatting of error messages.
|
|
|
|
|
|
|
|
|
|
| |
Due to a recent change that made `StructType` a `Seq` we started inadvertently turning `StructType`s into generic `Traversable` when attempting nested tree transformations. In this PR we explicitly avoid descending into `DataType`s to avoid this bug.
Author: Michael Armbrust <michael@databricks.com>
Closes #5157 from marmbrus/udfFix and squashes the following commits:
26f7087 [Michael Armbrust] Fix transformations of TreeNodes that hold StructTypes
|
|
|
|
|
|
|
|
|
|
| |
This is used by ML pipelines to embed ML attributes in columns created by ML transformers/estimators. marmbrus
Author: Xiangrui Meng <meng@databricks.com>
Closes #5151 from mengxr/SPARK-6361 and squashes the following commits:
bb30de3 [Xiangrui Meng] support adding a column with metadata in DF
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
all types of operator
In `CheckAnalysis`, `Filter` and `Aggregate` are checked in separate case clauses, thus never hit those clauses for unresolved operators and missing input attributes.
This PR also removes the `prettyString` call when generating error message for missing input attributes. Because result of `prettyString` doesn't contain expression ID, and may give confusing messages like
> resolved attributes a missing from a
cc rxin
<!-- Reviewable:start -->
[<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/5129)
<!-- Reviewable:end -->
Author: Cheng Lian <lian@databricks.com>
Closes #5129 from liancheng/spark-6452 and squashes the following commits:
52cdc69 [Cheng Lian] Addresses comments
029f9bd [Cheng Lian] Checks for missing attributes and unresolved operator for all types of operator
|
|
|
|
|
|
|
|
|
|
|
|
| |
https://github.com/apache/spark/pull/5082
/cc liancheng
Author: Yadong Qi <qiyadong2010@gmail.com>
Closes #5132 from watermen/sql-missingInput-new and squashes the following commits:
1e5bdc5 [Yadong Qi] Check the missingInput simply
|
|
|
|
| |
This reverts commit e566fe5982bac5d24e6be76e5d7d6270544a85e6.
|
|
|
|
|
|
|
|
| |
Author: q00251598 <qiyadong@huawei.com>
Closes #5082 from watermen/sql-missingInput and squashes the following commits:
25766b9 [q00251598] Check the missingInput simply
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
DDL parser.
This PR creates a trait `DataTypeParser` used to parse data types. This trait aims to be single place to provide the functionality of parsing data types' string representation. It is currently mixed in with `DDLParser` and `SqlParser`. It is also used to parse the data type for `DataFrame.cast` and to convert Hive metastore's data type string back to a `DataType`.
JIRA: https://issues.apache.org/jira/browse/SPARK-6250
Author: Yin Huai <yhuai@databricks.com>
Closes #5078 from yhuai/ddlKeywords and squashes the following commits:
0e66097 [Yin Huai] Special handle struct<>.
fea6012 [Yin Huai] Style.
c9733fb [Yin Huai] Create a trait to parse data types.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
SELECT sum('a'), avg('a'), variance('a'), std('a') FROM src;
Should give output as
0.0 NULL NULL NULL
This fixes hive udaf_number_format.q
Author: Venkata Ramana G <ramana.gollamudihuawei.com>
Author: Venkata Ramana Gollamudi <ramana.gollamudi@huawei.com>
Closes #4466 from gvramana/sum_fix and squashes the following commits:
42e14d1 [Venkata Ramana Gollamudi] Added comments
39415c0 [Venkata Ramana Gollamudi] Handled the partitioned Sum expression scenario
df66515 [Venkata Ramana Gollamudi] code style fix
4be2606 [Venkata Ramana Gollamudi] Add udaf_number_format to whitelist and golden answer
330fd64 [Venkata Ramana Gollamudi] fix sum function for all null data
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Because of no statistics override, in spute of super class say 'LeafNode must override'.
fix issue
[SPARK-5320: Joins on simple table created using select gives error](https://issues.apache.org/jira/browse/SPARK-5320)
Author: x1- <viva008@gmail.com>
Closes #5105 from x1-/SPARK-5320 and squashes the following commits:
e561aac [x1-] Add statistics method at NoRelation (override super).
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Also implemented equals/hashCode when they are missing.
This is done in order to enable automatic public method type checking.
Author: Reynold Xin <rxin@databricks.com>
Closes #5104 from rxin/sql-hashcode-explicittype and squashes the following commits:
ffce6f3 [Reynold Xin] Code review feedback.
8b36733 [Reynold Xin] [SPARK-6428][SQL] Added explicit type for all public methods.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Author: Marcelo Vanzin <vanzin@cloudera.com>
Closes #5056 from vanzin/SPARK-6371 and squashes the following commits:
63220df [Marcelo Vanzin] Merge branch 'master' into SPARK-6371
6506f75 [Marcelo Vanzin] Use more fine-grained exclusion.
178ba71 [Marcelo Vanzin] Oops.
75b2375 [Marcelo Vanzin] Exclude VertexRDD in MiMA.
a45a62c [Marcelo Vanzin] Work around MIMA warning.
1d8a670 [Marcelo Vanzin] Re-group jetty exclusion.
0e8e909 [Marcelo Vanzin] Ignore ml, don't ignore graphx.
cef4603 [Marcelo Vanzin] Indentation.
296cf82 [Marcelo Vanzin] [SPARK-6371] [build] Update version to 1.4.0-SNAPSHOT.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
orphaned temp files
Use `Utils.createTempDir()` to replace other temp file mechanisms used in some tests, to further ensure they are cleaned up, and simplify
Author: Sean Owen <sowen@cloudera.com>
Closes #5029 from srowen/SPARK-6338 and squashes the following commits:
27b740a [Sean Owen] Fix hive-thriftserver tests that don't expect an existing dir
4a212fa [Sean Owen] Standardize a bit more temp dir management
9004081 [Sean Owen] Revert some added recursive-delete calls
57609e4 [Sean Owen] Use Utils.createTempDir() to replace other temp file mechanisms used in some tests, to further ensure they are cleaned up, and simplify
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We need to handle ambiguous `exprId`s that are produced by new aliases as well as those caused by leaf nodes (`MultiInstanceRelation`).
Attempting to fix this revealed a bug in `equals` for `Alias` as these objects were comparing equal even when the expression ids did not match. Additionally, `LocalRelation` did not correctly provide statistics, and some tests in `catalyst` and `hive` were not using the helper functions for comparing plans.
Based on #4991 by chenghao-intel
Author: Michael Armbrust <michael@databricks.com>
Closes #5062 from marmbrus/selfJoins and squashes the following commits:
8e9b84b [Michael Armbrust] check qualifier too
8038a36 [Michael Armbrust] handle aggs too
0b9c687 [Michael Armbrust] fix more tests
c3c574b [Michael Armbrust] revert change.
725f1ab [Michael Armbrust] add statistics
a925d08 [Michael Armbrust] check for conflicting attributes in join resolution
b022ef7 [Michael Armbrust] Handle project aliases.
d8caa40 [Michael Armbrust] test case: SPARK-6247
f9c67c2 [Michael Armbrust] Check for duplicate attributes in join resolution.
898af73 [Michael Armbrust] Fix Alias equality.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
By default, the statistic for logical plan with multiple children is quite aggressive, and those statistic are quite critical for the join optimization, hence we need to estimate the statistics as accurate as possible.
For `Union`, which has 2 children, and overwrite the default implementation by `adding` its children `byteInSize` instead of `multiplying`.
For `Expand`, which only has a single child, but it will grows the size, and we need to multiply its inflating factor.
Author: Cheng Hao <hao.cheng@intel.com>
Closes #4914 from chenghao-intel/statistic and squashes the following commits:
d466bbc [Cheng Hao] Update the default statistic
|
|
|
|
|
|
|
|
|
|
| |
use prettyString instead of toString() (which include id of expression) as column name in agg()
Author: Davies Liu <davies@databricks.com>
Closes #5006 from davies/prettystring and squashes the following commits:
cb1fdcf [Davies Liu] use prettyString as column name in agg()
|
|
|
|
|
|
|
|
|
|
| |
Removed an repeated "from" in the comments.
Author: Hongbo Liu <liuhb86@gmail.com>
Closes #4976 from liuhb86/mine and squashes the following commits:
e280e7c [Hongbo Liu] [SQL][Minor] fix typo in comments
|
|
|
|
|
|
|
|
|
|
| |
The extra blank line is preventing the first lines from showing up in the package summary page.
Author: Reynold Xin <rxin@databricks.com>
Closes #4955 from rxin/datatype-docs and squashes the following commits:
1621114 [Reynold Xin] Minor doc: Remove the extra blank line in data types javadoc.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Based on #4904 with style errors fixed.
`LogicalPlan#resolve` will not only produce `Attribute`, but also "`GetField` chain".
So in `ResolveSortReferences`, after resolve the ordering expressions, we should not just collect the `Attribute` results, but also `Attribute` at the bottom of "`GetField` chain".
Author: Wenchen Fan <cloud0fan@outlook.com>
Author: Michael Armbrust <michael@databricks.com>
Closes #4918 from marmbrus/pr/4904 and squashes the following commits:
997f84e [Michael Armbrust] fix style
3eedbfc [Wenchen Fan] fix 6145
|
|
|
|
|
|
|
|
|
|
| |
Option 1 of 2: Convert spark-parent module name to spark-parent_2.10 / spark-parent_2.11
Author: Sean Owen <sowen@cloudera.com>
Closes #4912 from srowen/SPARK-6182.1 and squashes the following commits:
eff60de [Sean Owen] Convert spark-parent module name to spark-parent_2.10 / spark-parent_2.11
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
LongType value in defaultPrimitive
In `CodeGenerator`, the casting on `FloatType` should use `FloatType` instead of `IntegerType`.
Besides, `defaultPrimitive` for `LongType` should be `-1L` instead of `1L`.
Author: Liang-Chi Hsieh <viirya@gmail.com>
Closes #4870 from viirya/codegen_type and squashes the following commits:
76311dd [Liang-Chi Hsieh] Fix wrong datatype for casting on FloatType. Fix the wrong value for LongType in defaultPrimitive.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
work when using datasource api
This PR contains the following changes:
1. Add a new method, `DataType.equalsIgnoreCompatibleNullability`, which is the middle ground between DataType's equality check and `DataType.equalsIgnoreNullability`. For two data types `from` and `to`, it does `equalsIgnoreNullability` as well as if the nullability of `from` is compatible with that of `to`. For example, the nullability of `ArrayType(IntegerType, containsNull = false)` is compatible with that of `ArrayType(IntegerType, containsNull = true)` (for an array without null values, we can always say it may contain null values). However, the nullability of `ArrayType(IntegerType, containsNull = true)` is incompatible with that of `ArrayType(IntegerType, containsNull = false)` (for an array that may have null values, we cannot say it does not have null values).
2. For the `resolved` field of `InsertIntoTable`, use `equalsIgnoreCompatibleNullability` to replace the equality check of the data types.
3. For our data source write path, when appending data, we always use the schema of existing table to write the data. This is important for parquet, since nullability direct impacts the way to encode/decode values. If we do not do this, we may see corrupted values when reading values from a set of parquet files generated with different nullability settings.
4. When generating a new parquet table, we always set nullable/containsNull/valueContainsNull to true. So, we will not face situations that we cannot append data because containsNull/valueContainsNull in an Array/Map column of the existing table has already been set to `false`. This change makes the whole data pipeline more robust.
5. Update the equality check of JSON relation. Since JSON does not really cares nullability, `equalsIgnoreNullability` seems a better choice to compare schemata from to JSON tables.
JIRA: https://issues.apache.org/jira/browse/SPARK-5950
Thanks viirya for the initial work in #4729.
cc marmbrus liancheng
Author: Yin Huai <yhuai@databricks.com>
Closes #4826 from yhuai/insertNullabilityCheck and squashes the following commits:
3b61a04 [Yin Huai] Revert change on equals.
80e487e [Yin Huai] asNullable in UDT.
587d88b [Yin Huai] Make methods private.
0cb7ea2 [Yin Huai] marmbrus's comments.
3cec464 [Yin Huai] Cheng's comments.
486ed08 [Yin Huai] Merge remote-tracking branch 'upstream/master' into insertNullabilityCheck
d3747d1 [Yin Huai] Remove unnecessary change.
8360817 [Yin Huai] Merge remote-tracking branch 'upstream/master' into insertNullabilityCheck
8a3f237 [Yin Huai] Use equalsIgnoreNullability instead of equality check.
0eb5578 [Yin Huai] Fix tests.
f6ed813 [Yin Huai] Update old parquet path.
e4f397c [Yin Huai] Unit tests.
b2c06f8 [Yin Huai] Ignore nullability in JSON relation's equality check.
8bd008b [Yin Huai] nullable, containsNull, and valueContainsNull will be always true for parquet data.
bf50d73 [Yin Huai] When appending data, we use the schema of the existing table instead of the schema of the new data.
0a703e7 [Yin Huai] Test failed again since we cannot read correct content.
9a26611 [Yin Huai] Make InsertIntoTable happy.
8f19fe5 [Yin Huai] equalsIgnoreCompatibleNullability
4ec17fd [Yin Huai] Failed test.
|
|
|
|
|
|
|
|
|
|
| |
It should be `true` instead of `false`?
Author: Liang-Chi Hsieh <viirya@gmail.com>
Closes #4762 from viirya/doc_fix and squashes the following commits:
2e37482 [Liang-Chi Hsieh] Fix doc.
|
|
|
|
|
|
|
|
| |
Author: Liang-Chi Hsieh <viirya@gmail.com>
Closes #4760 from viirya/dup_literal and squashes the following commits:
06e7516 [Liang-Chi Hsieh] Remove duplicate Literal matching block.
|
|
|
|
|
|
|
|
| |
Author: Michael Armbrust <michael@databricks.com>
Closes #4736 from marmbrus/asExprs and squashes the following commits:
5ba97e4 [Michael Armbrust] [SPARK-5910][SQL] Support for as in selectExpr
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Author: Michael Armbrust <michael@databricks.com>
Closes #4684 from marmbrus/explainAnalysis and squashes the following commits:
afbaa19 [Michael Armbrust] fix python
d93278c [Michael Armbrust] fix hive
e5fa0a4 [Michael Armbrust] Merge remote-tracking branch 'origin/master' into explainAnalysis
52119f2 [Michael Armbrust] more tests
82a5431 [Michael Armbrust] fix tests
25753d2 [Michael Armbrust] Merge remote-tracking branch 'origin/master' into explainAnalysis
aee1e6a [Michael Armbrust] fix hive
b23a844 [Michael Armbrust] newline
de8dc51 [Michael Armbrust] more comments
acf620a [Michael Armbrust] [SPARK-5873][SQL] Show partially analyzed plans in query execution
|
|
|
|
|
|
|
|
|
|
|
|
| |
aggregates or generators
https://issues.apache.org/jira/browse/SPARK-5875 has a case to reproduce the bug and explain the root cause.
Author: Yin Huai <yhuai@databricks.com>
Closes #4663 from yhuai/projectResolved and squashes the following commits:
472f7b6 [Yin Huai] If a logical.Project has any AggregateExpression or Generator, it's resolved field should be false.
|
|
|
|
|
|
|
|
| |
Author: Reynold Xin <rxin@databricks.com>
Closes #4640 from rxin/SPARK-5853 and squashes the following commits:
9c6f569 [Reynold Xin] [SPARK-5853][SQL] Schema support in Row.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Existing implementation of arithmetic operators and BinaryComparison operators have redundant type checking codes, e.g.:
Expression.n2 is used by Add/Subtract/Multiply.
(1) n2 always checks left.dataType == right.dataType. However, this checking should be done once when we resolve expression types;
(2) n2 requires dataType is a NumericType. This can be done once.
This PR optimizes arithmetic and predicate operators by removing such redundant type-checking codes.
Some preliminary benchmarking on 10G TPC-H data over 5 r3.2xlarge EC2 machines shows that this PR can reduce the query time by 5.5% to 11%.
The benchmark queries follow the template below, where OP is plus/minus/times/divide/remainder/bitwise and/bitwise or/bitwise xor.
SELECT l_returnflag, l_linestatus, SUM(l_quantity OP cnt1), SUM(l_quantity OP cnt2), ...., SUM(l_quantity OP cnt700)
FROM (
SELECT l_returnflag, l_linestatus, l_quantity, 1 AS cnt1, 2 AS cnt2, ..., 700 AS cnt700
FROM lineitem
WHERE l_shipdate <= '1998-09-01'
)
GROUP BY l_returnflag, l_linestatus;
Author: kai <kaizeng@eecs.berkeley.edu>
Closes #4472 from kai-zeng/arithmetic-optimize and squashes the following commits:
fef0cf1 [kai] Merge branch 'master' of github.com:apache/spark into arithmetic-optimize
4b3a1bb [kai] chmod a-x
5a41e49 [kai] chmod a-x Expression.scala
cb37c94 [kai] rebase onto spark master
7f6e968 [kai] chmod 100755 -> 100644
6cddb46 [kai] format
7490dbc [kai] fix unresolved-expression exception for EqualTo
9c40bc0 [kai] fix bitwisenot
3cbd363 [kai] clean up test code
ca47801 [kai] override evalInternal for bitwise ops
8fa84a1 [kai] add bitwise or and xor
6892fc4 [kai] revert override evalInternal
f8eba24 [kai] override evalInternal
31ccdd4 [kai] rewrite all bitwise op and remove evalInternal
86297e2 [kai] generalized
cb92ae1 [kai] bitwise-and: override eval
97a7d6c [kai] bitwise-and: override evalInternal using and func
0906c39 [kai] add bitwise test
62abbbc [kai] clean up predicate and arithmetic
b34d58d [kai] add caching and benmark option
12c5b32 [kai] override eval
1cd7571 [kai] fix sqrt and maxof
03fd0c3 [kai] fix predicate
16fd84c [kai] optimize + - * / % -(unary) abs < > <= >=
fd95823 [kai] remove unnecessary type checking
24d062f [kai] test suite
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
JIRA: https://issues.apache.org/jira/browse/SPARK-5746
liancheng marmbrus
Author: Yin Huai <yhuai@databricks.com>
Closes #4617 from yhuai/insertOverwrite and squashes the following commits:
8e3019d [Yin Huai] Fix compilation error.
499e8e7 [Yin Huai] Merge remote-tracking branch 'upstream/master' into insertOverwrite
e76e85a [Yin Huai] Address comments.
ac31b3c [Yin Huai] Merge remote-tracking branch 'upstream/master' into insertOverwrite
f30bdad [Yin Huai] Use toDF.
99da57e [Yin Huai] Merge remote-tracking branch 'upstream/master' into insertOverwrite
6b7545c [Yin Huai] Add a pre write check to the data source API.
a88c516 [Yin Huai] DDLParser will take a parsering function to take care CTAS statements.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Lifts `HiveMetastoreCatalog.refreshTable` to `Catalog`. Adds `RefreshTable` command to refresh (possibly cached) metadata in external data sources tables.
<!-- Reviewable:start -->
[<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/4624)
<!-- Reviewable:end -->
Author: Cheng Lian <lian@databricks.com>
Closes #4624 from liancheng/refresh-table and squashes the following commits:
8d1aa4c [Cheng Lian] Adds REFRESH TABLE command
|
|
|
|
|
|
|
|
|
|
|
|
| |
Author: Michael Armbrust <michael@databricks.com>
Closes #4587 from marmbrus/position and squashes the following commits:
0810052 [Michael Armbrust] fix tests
395c019 [Michael Armbrust] Merge remote-tracking branch 'marmbrus/position' into position
e155dce [Michael Armbrust] more errors
f3efa51 [Michael Armbrust] Update AnalysisException.scala
d45ff60 [Michael Armbrust] [SQL] Initial support for reporting location of error in sql string
|
|
|
|
|
|
|
|
|
|
| |
When profiling the Join / Aggregate queries via VisualVM, I noticed lots of `SpecificMutableRow` objects created, as well as the `MutableValue`, since the `SpecificMutableRow` are mostly used in data source implementation, but the `copy` method could be called multiple times in upper modules (e.g. in Join / aggregation etc.), duplicated instances created should be avoid.
Author: Cheng Hao <hao.cheng@intel.com>
Closes #4619 from chenghao-intel/specific_mutable_row and squashes the following commits:
9300d23 [Cheng Hao] update the SpecificMutableRow.copy
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- The old implicit would convert RDDs directly to DataFrames, and that added too many methods.
- toDataFrame -> toDF
- Dsl -> functions
- implicits moved into SQLContext.implicits
- addColumn -> withColumn
- renameColumn -> withColumnRenamed
Python changes:
- toDataFrame -> toDF
- Dsl -> functions package
- addColumn -> withColumn
- renameColumn -> withColumnRenamed
- add toDF functions to RDD on SQLContext init
- add flatMap to DataFrame
Author: Reynold Xin <rxin@databricks.com>
Author: Davies Liu <davies@databricks.com>
Closes #4556 from rxin/SPARK-5752 and squashes the following commits:
5ef9910 [Reynold Xin] More fix
61d3fca [Reynold Xin] Merge branch 'df5' of github.com:davies/spark into SPARK-5752
ff5832c [Reynold Xin] Fix python
749c675 [Reynold Xin] count(*) fixes.
5806df0 [Reynold Xin] Fix build break again.
d941f3d [Reynold Xin] Fixed explode compilation break.
fe1267a [Davies Liu] flatMap
c4afb8e [Reynold Xin] style
d9de47f [Davies Liu] add comment
b783994 [Davies Liu] add comment for toDF
e2154e5 [Davies Liu] schema() -> schema
3a1004f [Davies Liu] Dsl -> functions, toDF()
fb256af [Reynold Xin] - toDataFrame -> toDF - Dsl -> functions - implicits moved into SQLContext.implicits - addColumn -> withColumn - renameColumn -> withColumnRenamed
0dd74eb [Reynold Xin] [SPARK-5752][SQL] Don't implicitly convert RDDs directly to DataFrames
97dd47c [Davies Liu] fix mistake
6168f74 [Davies Liu] fix test
1fc0199 [Davies Liu] fix test
a075cd5 [Davies Liu] clean up, toPandas
663d314 [Davies Liu] add test for agg('*')
9e214d5 [Reynold Xin] count(*) fixes.
1ed7136 [Reynold Xin] Fix build break again.
921b2e3 [Reynold Xin] Fixed explode compilation break.
14698d4 [Davies Liu] flatMap
ba3e12d [Reynold Xin] style
d08c92d [Davies Liu] add comment
5c8b524 [Davies Liu] add comment for toDF
a4e5e66 [Davies Liu] schema() -> schema
d377fc9 [Davies Liu] Dsl -> functions, toDF()
6b3086c [Reynold Xin] - toDataFrame -> toDF - Dsl -> functions - implicits moved into SQLContext.implicits - addColumn -> withColumn - renameColumn -> withColumnRenamed
807e8b1 [Reynold Xin] [SPARK-5752][SQL] Don't implicitly convert RDDs directly to DataFrames
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
select k from (select key k, max(value) v from src group by k) t
Author: Daoyuan Wang <daoyuan.wang@intel.com>
Author: Michael Armbrust <michael@databricks.com>
Closes #4415 from adrian-wang/groupprune and squashes the following commits:
5d2d8a3 [Daoyuan Wang] address Michael's comments
61f8ef7 [Daoyuan Wang] add a unit test
80ddcc6 [Daoyuan Wang] keep project
b69d385 [Daoyuan Wang] add a prune rule for grouping set
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This PR fix the issue SPARK-3365.
The reason is Spark generated wrong schema for the type `List` in `ScalaReflection.scala`
for example:
the generated schema for type `Seq[String]` is:
```
{"name":"x","type":{"type":"array","elementType":"string","containsNull":true},"nullable":true,"metadata":{}}`
```
the generated schema for type `List[String]` is:
```
{"name":"x","type":{"type":"struct","fields":[]},"nullable":true,"metadata":{}}`
```
Author: tianyi <tianyi.asiainfo@gmail.com>
Closes #4581 from tianyi/SPARK-3365 and squashes the following commits:
a097e86 [tianyi] change the order of resolution in ScalaReflection.scala
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
https://issues.apache.org/jira/browse/SPARK-3299
Author: Yin Huai <yhuai@databricks.com>
Closes #4547 from yhuai/tables and squashes the following commits:
6c8f92e [Yin Huai] Add tableNames.
acbb281 [Yin Huai] Update Python test.
7793dcb [Yin Huai] Fix scala test.
572870d [Yin Huai] Address comments.
aba2e88 [Yin Huai] Format.
12c86df [Yin Huai] Add tables() to SQLContext to return a DataFrame containing existing tables.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Author: Michael Armbrust <michael@databricks.com>
Closes #4546 from marmbrus/explode and squashes the following commits:
eefd33a [Michael Armbrust] whitespace
a8d496c [Michael Armbrust] Merge remote-tracking branch 'apache/master' into explode
4af740e [Michael Armbrust] Merge remote-tracking branch 'origin/master' into explode
dc86a5c [Michael Armbrust] simple version
d633d01 [Michael Armbrust] add scala specific
950707a [Michael Armbrust] fix comments
ba8854c [Michael Armbrust] [SPARK-5573][SQL] Add explode to dataframes
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Author: Michael Armbrust <michael@databricks.com>
Author: wangfei <wangfei1@huawei.com>
Closes #4558 from marmbrus/errorMessages and squashes the following commits:
5e5ab50 [Michael Armbrust] Merge pull request #15 from scwf/errorMessages
fa38881 [wangfei] fix for grouping__id
f279a71 [wangfei] make right references for ScriptTransformation
d29fbde [Michael Armbrust] extra case
1a797b4 [Michael Armbrust] comments
d4e9015 [Michael Armbrust] add comment
af9e668 [Michael Armbrust] no braces
34eb3a4 [Michael Armbrust] more work
6197cd5 [Michael Armbrust] [SQL] Better error messages for analysis failures
|
|
|
|
|
|
|
|
|
|
|
| |
As a follow-up to https://github.com/apache/spark/pull/4524
Author: Reynold Xin <rxin@databricks.com>
Closes #4539 from rxin/SPARK-3688 and squashes the following commits:
5ac56c7 [Reynold Xin] exists
da8eea4 [Reynold Xin] [SPARK-3688][SQL] More inline comments for LogicalPlan.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This PR fixed the resolving problem described in https://issues.apache.org/jira/browse/SPARK-3688
```
CREATE TABLE t1(x INT);
CREATE TABLE t2(a STRUCT<x: INT>, k INT);
SELECT a.x FROM t1 a JOIN t2 b ON a.x = b.k;
```
Author: tianyi <tianyi.asiainfo@gmail.com>
Closes #4524 from tianyi/SPARK-3688 and squashes the following commits:
237a256 [tianyi] resolve a name with table.column pattern first.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Also I fix a bunch of bad output in test cases.
Author: Michael Armbrust <michael@databricks.com>
Closes #4520 from marmbrus/selfJoin and squashes the following commits:
4f4a85c [Michael Armbrust] comments
49c8e26 [Michael Armbrust] fix tests
6fc38de [Michael Armbrust] fix style
55d64b3 [Michael Armbrust] fix dataframe selfjoins
|