| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
| |
This reverts commit d807023479ce10aec28ef3c1ab646ddefc2e663c.
|
|
|
|
| |
This reverts commit 67dd53d2556f03ce292e6889128cf441f1aa48f8.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
finding join ke...
...ys.
When tables are equi-joined by multiple-keys `HashJoin` should be used, but `CartesianProduct` and then `Filter` are used.
The join keys are paired by `And` expression so we need to apply `splitConjunctivePredicates` to join condition while finding join keys.
Author: Takuya UESHIN <ueshin@happy-camper.st>
Closes #836 from ueshin/issues/SPARK-1889 and squashes the following commits:
fe1c387 [Takuya UESHIN] Apply splitConjunctivePredicates to join condition while finding join keys.
(cherry picked from commit bb88875ad52e8209c25e8350af1fe4b7159086ae)
Signed-off-by: Reynold Xin <rxin@apache.org>
|
| |
|
| |
|
|
|
|
| |
This reverts commit 920f947eb5a22a679c0c3186cf69ee75f6041c75.
|
|
|
|
| |
This reverts commit f8e611955096c5c1c7db5764b9d2851b1d295f0d.
|
| |
|
| |
|
|
|
|
| |
This reverts commit 80eea0f111c06260ffaa780d2f3f7facd09c17bc.
|
|
|
|
| |
This reverts commit e5436b8c1a79ce108f3af402455ac5f6dc5d1eb3.
|
| |
|
| |
|
|
|
|
| |
This reverts commit 9212b3e5bb5545ccfce242da8d89108e6fb1c464.
|
|
|
|
| |
This reverts commit c4746aa6fe4aaf383e69e34353114d36d1eb9ba6.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch unify the foldable & nullable interface for Expression.
1) Deterministic-less UDF (like Rand()) can not be folded.
2) Short-circut will significantly improves the performance in Expression Evaluation, however, the stateful UDF should not be ignored in a short-circuit evaluation(e.g. in expression: col1 > 0 and row_sequence() < 1000, row_sequence() can not be ignored even if col1 > 0 is false)
I brought an concept of DeferredObject from Hive, which has 2 kinds of children classes (EagerResult / DeferredResult), the former requires triggering the evaluation before it's created, while the later trigger the evaluation when first called its get() method.
Author: Cheng Hao <hao.cheng@intel.com>
Closes #446 from chenghao-intel/expression_deferred_evaluation and squashes the following commits:
d2729de [Cheng Hao] Fix the codestyle issues
a08f09c [Cheng Hao] fix bug in or/and short-circuit evaluation
af2236b [Cheng Hao] revert the short-circuit expression evaluation for IF
b7861d2 [Cheng Hao] Add Support for Deferred Expression Evaluation
(cherry picked from commit a20fea98811d98958567780815fcf0d4fb4e28d4)
Signed-off-by: Reynold Xin <rxin@apache.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
`GetField.nullable` should be `true` not only when `field.nullable` is `true` but also when `child.nullable` is `true`.
Author: Takuya UESHIN <ueshin@happy-camper.st>
Closes #757 from ueshin/issues/SPARK-1819 and squashes the following commits:
8781a11 [Takuya UESHIN] Modify a test to use named parameters.
5bfc77d [Takuya UESHIN] Fix GetField.nullable.
(cherry picked from commit 94c9d6f59859ebc77fae112c2c42c64b7a4d7f83)
Signed-off-by: Reynold Xin <rxin@apache.org>
|
| |
|
| |
|
|
|
|
| |
This reverts commit 54133abdce0246f6643a1112a5204afb2c4caa82.
|
|
|
|
| |
This reverts commit e480bcfbd269ae1d7a6a92cfb50466cf192fe1fb.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is a few changes based on the original patch by @scrapcodes.
Author: Prashant Sharma <prashant.s@imaginea.com>
Author: Patrick Wendell <pwendell@gmail.com>
Closes #785 from pwendell/package-docs and squashes the following commits:
c32b731 [Patrick Wendell] Changes based on Prashant's patch
c0463d3 [Prashant Sharma] added eof new line
ce8bf73 [Prashant Sharma] Added eof new line to all files.
4c35f2e [Prashant Sharma] SPARK-1563 Add package-info.java and package.scala files for all packages that appear in docs
(cherry picked from commit 46324279dae2fa803267d788f7c56b0ed643b4c8)
Signed-off-by: Patrick Wendell <pwendell@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Author: wangfei <scnbwf@yeah.net>
Closes #765 from scwf/dslfix and squashes the following commits:
d2d1a9d [wangfei] Update package.scala
66ff53b [wangfei] fix the head notation of package object dsl
(cherry picked from commit 44165fc91a31e6293a79031c89571e139d2c5356)
Signed-off-by: Reynold Xin <rxin@apache.org>
|
| |
|
| |
|
|
|
|
| |
This reverts commit 18f062303303824139998e8fc8f4158217b0dbc3.
|
|
|
|
| |
This reverts commit d08e9604fc9958b7c768e91715c8152db2ed6fd0.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fixed a bug that was preventing us from ever pruning beneath Joins.
## TPC-DS Q3
### Before:
```
Aggregate false, [d_year#12,i_brand#65,i_brand_id#64], [d_year#12,i_brand_id#64 AS brand_id#0,i_brand#65 AS brand#1,SUM(PartialSum#79) AS sum_agg#2]
Exchange (HashPartitioning [d_year#12:0,i_brand#65:1,i_brand_id#64:2], 150)
Aggregate true, [d_year#12,i_brand#65,i_brand_id#64], [d_year#12,i_brand#65,i_brand_id#64,SUM(CAST(ss_ext_sales_price#49, DoubleType)) AS PartialSum#79]
Project [d_year#12:6,i_brand#65:59,i_brand_id#64:58,ss_ext_sales_price#49:43]
HashJoin [ss_item_sk#36], [i_item_sk#57], BuildRight
Exchange (HashPartitioning [ss_item_sk#36:30], 150)
HashJoin [d_date_sk#6], [ss_sold_date_sk#34], BuildRight
Exchange (HashPartitioning [d_date_sk#6:0], 150)
Filter (d_moy#14:8 = 12)
HiveTableScan [d_date_sk#6,d_date_id#7,d_date#8,d_month_seq#9,d_week_seq#10,d_quarter_seq#11,d_year#12,d_dow#13,d_moy#14,d_dom#15,d_qoy#16,d_fy_year#17,d_fy_quarter_seq#18,d_fy_week_seq#19,d_day_name#20,d_quarter_name#21,d_holiday#22,d_weekend#23,d_following_holiday#24,d_first_dom#25,d_last_dom#26,d_same_day_ly#27,d_same_day_lq#28,d_current_day#29,d_current_week#30,d_current_month#31,d_current_quarter#32,d_current_year#33], (MetastoreRelation default, date_dim, Some(dt)), None
Exchange (HashPartitioning [ss_sold_date_sk#34:0], 150)
HiveTableScan [ss_sold_date_sk#34,ss_sold_time_sk#35,ss_item_sk#36,ss_customer_sk#37,ss_cdemo_sk#38,ss_hdemo_sk#39,ss_addr_sk#40,ss_store_sk#41,ss_promo_sk#42,ss_ticket_number#43,ss_quantity#44,ss_wholesale_cost#45,ss_list_price#46,ss_sales_price#47,ss_ext_discount_amt#48,ss_ext_sales_price#49,ss_ext_wholesale_cost#50,ss_ext_list_price#51,ss_ext_tax#52,ss_coupon_amt#53,ss_net_paid#54,ss_net_paid_inc_tax#55,ss_net_profit#56], (MetastoreRelation default, store_sales, None), None
Exchange (HashPartitioning [i_item_sk#57:0], 150)
Filter (i_manufact_id#70:13 = 436)
HiveTableScan [i_item_sk#57,i_item_id#58,i_rec_start_date#59,i_rec_end_date#60,i_item_desc#61,i_current_price#62,i_wholesale_cost#63,i_brand_id#64,i_brand#65,i_class_id#66,i_class#67,i_category_id#68,i_category#69,i_manufact_id#70,i_manufact#71,i_size#72,i_formulation#73,i_color#74,i_units#75,i_container#76,i_manager_id#77,i_product_name#78], (MetastoreRelation default, item, None), None
```
### After
```
Aggregate false, [d_year#172,i_brand#225,i_brand_id#224], [d_year#172,i_brand_id#224 AS brand_id#160,i_brand#225 AS brand#161,SUM(PartialSum#239) AS sum_agg#162]
Exchange (HashPartitioning [d_year#172:0,i_brand#225:1,i_brand_id#224:2], 150)
Aggregate true, [d_year#172,i_brand#225,i_brand_id#224], [d_year#172,i_brand#225,i_brand_id#224,SUM(CAST(ss_ext_sales_price#209, DoubleType)) AS PartialSum#239]
Project [d_year#172:1,i_brand#225:5,i_brand_id#224:3,ss_ext_sales_price#209:0]
HashJoin [ss_item_sk#196], [i_item_sk#217], BuildRight
Exchange (HashPartitioning [ss_item_sk#196:2], 150)
Project [ss_ext_sales_price#209:2,d_year#172:1,ss_item_sk#196:3]
HashJoin [d_date_sk#166], [ss_sold_date_sk#194], BuildRight
Exchange (HashPartitioning [d_date_sk#166:0], 150)
Project [d_date_sk#166:0,d_year#172:1]
Filter (d_moy#174:2 = 12)
HiveTableScan [d_date_sk#166,d_year#172,d_moy#174], (MetastoreRelation default, date_dim, Some(dt)), None
Exchange (HashPartitioning [ss_sold_date_sk#194:2], 150)
HiveTableScan [ss_ext_sales_price#209,ss_item_sk#196,ss_sold_date_sk#194], (MetastoreRelation default, store_sales, None), None
Exchange (HashPartitioning [i_item_sk#217:1], 150)
Project [i_brand_id#224:0,i_item_sk#217:1,i_brand#225:2]
Filter (i_manufact_id#230:3 = 436)
HiveTableScan [i_brand_id#224,i_item_sk#217,i_brand#225,i_manufact_id#230], (MetastoreRelation default, item, None), None
```
Author: Michael Armbrust <michael@databricks.com>
Closes #729 from marmbrus/fixPruning and squashes the following commits:
5feeff0 [Michael Armbrust] Improve column pruning.
(cherry picked from commit 6ce0884446d3571fd6e9d967a080a59c657543b1)
Signed-off-by: Patrick Wendell <pwendell@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add the implementation for ApproximateCountDistinct to SparkSql. We use the HyperLogLog algorithm implemented in stream-lib, and do the count in two phases: 1) counting the number of distinct elements in each partitions, and 2) merge the HyperLogLog results from different partitions.
A simple serializer and test cases are added as well.
Author: larvaboy <larvaboy@gmail.com>
Closes #737 from larvaboy/master and squashes the following commits:
bd8ef3f [larvaboy] Add support of user-provided standard deviation to ApproxCountDistinct.
9ba8360 [larvaboy] Fix alignment and null handling issues.
95b4067 [larvaboy] Add a test case for count distinct and approximate count distinct.
f57917d [larvaboy] Add the parser for the approximate count.
a2d5d10 [larvaboy] Add ApproximateCountDistinct aggregates and functions.
7ad273a [larvaboy] Add SparkSql serializer for HyperLogLog.
1d9aacf [larvaboy] Fix a minor typo in the toString method of the Count case class.
653542b [larvaboy] Fix a couple of minor typos.
(cherry picked from commit c33b8dcbf65a3a0c5ee5e65cd1dcdbc7da36aa5f)
Signed-off-by: Reynold Xin <rxin@apache.org>
|
| |
|
| |
|
|
|
|
| |
This reverts commit 3d0a44833ab50360bf9feccc861cb5e8c44a4866.
|
|
|
|
| |
This reverts commit 9772d85c6f3893d42044f4bab0e16f8b6287613a.
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
https://issues.apache.org/jira/browse/SPARK-1757
The first test succeeds, but the second test fails with exception:
```
[info] - save and load case class RDD with Nones as parquet *** FAILED *** (14 milliseconds)
[info] java.lang.RuntimeException: Unsupported datatype StructType(List())
[info] at scala.sys.package$.error(package.scala:27)
[info] at org.apache.spark.sql.parquet.ParquetTypesConverter$.fromDataType(ParquetRelation.scala:201)
[info] at org.apache.spark.sql.parquet.ParquetTypesConverter$$anonfun$1.apply(ParquetRelation.scala:235)
[info] at org.apache.spark.sql.parquet.ParquetTypesConverter$$anonfun$1.apply(ParquetRelation.scala:235)
[info] at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
[info] at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
[info] at scala.collection.immutable.List.foreach(List.scala:318)
[info] at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
[info] at scala.collection.AbstractTraversable.map(Traversable.scala:105)
[info] at org.apache.spark.sql.parquet.ParquetTypesConverter$.convertFromAttributes(ParquetRelation.scala:234)
[info] at org.apache.spark.sql.parquet.ParquetTypesConverter$.writeMetaData(ParquetRelation.scala:267)
[info] at org.apache.spark.sql.parquet.ParquetRelation$.createEmpty(ParquetRelation.scala:143)
[info] at org.apache.spark.sql.parquet.ParquetRelation$.create(ParquetRelation.scala:122)
[info] at org.apache.spark.sql.execution.SparkStrategies$ParquetOperations$.apply(SparkStrategies.scala:139)
[info] at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
[info] at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
[info] at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
[info] at org.apache.spark.sql.catalyst.planning.QueryPlanner.apply(QueryPlanner.scala:59)
[info] at org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan$lzycompute(SQLContext.scala:264)
[info] at org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan(SQLContext.scala:264)
[info] at org.apache.spark.sql.SQLContext$QueryExecution.executedPlan$lzycompute(SQLContext.scala:265)
[info] at org.apache.spark.sql.SQLContext$QueryExecution.executedPlan(SQLContext.scala:265)
[info] at org.apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala:268)
[info] at org.apache.spark.sql.SQLContext$QueryExecution.toRdd(SQLContext.scala:268)
[info] at org.apache.spark.sql.SchemaRDDLike$class.saveAsParquetFile(SchemaRDDLike.scala:66)
[info] at org.apache.spark.sql.SchemaRDD.saveAsParquetFile(SchemaRDD.scala:98)
```
Author: Andrew Ash <andrew@andrewash.com>
Author: Michael Armbrust <michael@databricks.com>
Closes #690 from ash211/rdd-parquet-save and squashes the following commits:
747a0b9 [Andrew Ash] Merge pull request #1 from marmbrus/pr/690
54bd00e [Michael Armbrust] Need to put Option first since Option <: Seq.
8f3f281 [Andrew Ash] SPARK-1757 Add failing test for saving SparkSQL Schemas with Option[?] fields as parquet
(cherry picked from commit 156df87e7ca0e6cda2cc970ecd1466ce06f7576f)
Signed-off-by: Reynold Xin <rxin@apache.org>
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add missing arithmetic DSL operations: `unary_-`, `%`.
Author: Takuya UESHIN <ueshin@happy-camper.st>
Closes #689 from ueshin/issues/SPARK-1754 and squashes the following commits:
a09ef69 [Takuya UESHIN] Add also missing ! (not) operation.
f73ae2c [Takuya UESHIN] Remove redundant tests.
5b3f087 [Takuya UESHIN] Add tests relating DSL operations.
e09c5b8 [Takuya UESHIN] Add missing arithmetic DSL operations.
(cherry picked from commit 322b1808d21143dc323493203929488d69e8878a)
Signed-off-by: Patrick Wendell <pwendell@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Add native min/max (was using hive before).
* Handle nulls correctly in Avg and Sum.
Author: Michael Armbrust <michael@databricks.com>
Closes #683 from marmbrus/aggFixes and squashes the following commits:
64fe30b [Michael Armbrust] Improve SparkSQL Aggregates * Add native min/max (was using hive before). * Handle nulls correctly in Avg and Sum.
(cherry picked from commit 19c8fb02bc2c2f76c3c45bfff4b8d093be9d7c66)
Signed-off-by: Reynold Xin <rxin@apache.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Using lazy val object instead of function in the class Cast, which improved the performance nearly by 2X in my local micro-benchmark.
Author: Cheng Hao <hao.cheng@intel.com>
Closes #679 from chenghao-intel/fix_type_casting and squashes the following commits:
71b0902 [Cheng Hao] using lazy val object instead of function for data type casting
(cherry picked from commit ca43186867f0120c29d1b27cfee0c7ff4a107d84)
Signed-off-by: Reynold Xin <rxin@apache.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently, expression does not support the "constant null" well in constant folding.
e.g. Sum(a, 0) actually always produces Literal(0, NumericType) in runtime.
For example:
```
explain select isnull(key+null) from src;
== Logical Plan ==
Project [HiveGenericUdf#isnull((key#30 + CAST(null, IntegerType))) AS c_0#28]
MetastoreRelation default, src, None
== Optimized Logical Plan ==
Project [true AS c_0#28]
MetastoreRelation default, src, None
== Physical Plan ==
Project [true AS c_0#28]
HiveTableScan [], (MetastoreRelation default, src, None), None
```
I've create a new Optimization rule called NullPropagation for such kind of constant folding.
Author: Cheng Hao <hao.cheng@intel.com>
Author: Michael Armbrust <michael@databricks.com>
Closes #482 from chenghao-intel/optimize_constant_folding and squashes the following commits:
2f14b50 [Cheng Hao] Fix code style issues
68b9fad [Cheng Hao] Remove the Literal pattern matching for NullPropagation
29c8166 [Cheng Hao] Update the code for feedback of code review
50444cc [Cheng Hao] Remove the unnecessary null checking
80f9f18 [Cheng Hao] Update the UnitTest for aggregation constant folding
27ea3d7 [Cheng Hao] Fix Constant Folding Bugs & Add More Unittests
b28e03a [Cheng Hao] Merge pull request #1 from marmbrus/pr/482
9ccefdb [Michael Armbrust] Add tests for optimized expression evaluation.
543ef9d [Cheng Hao] fix code style issues
9cf0396 [Cheng Hao] update code according to the code review comment
536c005 [Cheng Hao] Add Exceptional case for constant folding
3c045c7 [Cheng Hao] Optimize the Constant Folding by adding more rules
2645d4f [Cheng Hao] Constant Folding(null propagation)
(cherry picked from commit 3eb53bd59e828275471d41730e6de601a887416d)
Signed-off-by: Reynold Xin <rxin@apache.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I also removed a println that I bumped into.
Author: Michael Armbrust <michael@databricks.com>
Closes #658 from marmbrus/nullPrimitives and squashes the following commits:
a3ec4f3 [Michael Armbrust] Remove println.
695606b [Michael Armbrust] Support for null primatives from using scala and java reflection.
(cherry picked from commit 3c64750bdd4c2d0a5562f90aead37be81627cc9d)
Signed-off-by: Matei Zaharia <matei@databricks.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Author: Michael Armbrust <michael@databricks.com>
Closes #616 from marmbrus/ruleLogging and squashes the following commits:
39c09fe [Michael Armbrust] Fix off by one error.
5af3537 [Michael Armbrust] Better logging when applying rules.
(cherry picked from commit b295714708476b2904e205ac6dc58867e205546e)
Signed-off-by: Reynold Xin <rxin@apache.org>
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
1, Fix SPARK-1441: compile spark core error with hadoop 0.23.x
2, Fix SPARK-1491: maven hadoop-provided profile fails to build
3, Fix org.scala-lang: * ,org.apache.avro:* inconsistent versions dependency
4, A modified on the sql/catalyst/pom.xml,sql/hive/pom.xml,sql/core/pom.xml (Four spaces formatted into two spaces)
Author: witgo <witgo@qq.com>
Closes #480 from witgo/format_pom and squashes the following commits:
03f652f [witgo] review commit
b452680 [witgo] Merge branch 'master' of https://github.com/apache/spark into format_pom
bee920d [witgo] revert fix SPARK-1629: Spark Core missing commons-lang dependence
7382a07 [witgo] Merge branch 'master' of https://github.com/apache/spark into format_pom
6902c91 [witgo] fix SPARK-1629: Spark Core missing commons-lang dependence
0da4bc3 [witgo] merge master
d1718ed [witgo] Merge branch 'master' of https://github.com/apache/spark into format_pom
e345919 [witgo] add avro dependency to yarn-alpha
77fad08 [witgo] Merge branch 'master' of https://github.com/apache/spark into format_pom
62d0862 [witgo] Fix org.scala-lang: * inconsistent versions dependency
1a162d7 [witgo] Merge branch 'master' of https://github.com/apache/spark into format_pom
934f24d [witgo] review commit
cf46edc [witgo] exclude jruby
06e7328 [witgo] Merge branch 'SparkBuild' into format_pom
99464d2 [witgo] fix maven hadoop-provided profile fails to build
0c6c1fc [witgo] Fix compile spark core error with hadoop 0.23.x
6851bec [witgo] Maintain consistent SparkBuild.scala, pom.xml
(cherry picked from commit 030f2c2126d5075576cd6d83a1ee7462c48b953b)
Conflicts:
sql/catalyst/pom.xml
sql/core/pom.xml
sql/hive/pom.xml
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
NumericType/TimestampType.
`Cast.nullable` should be `true` when cast from `StringType` to `NumericType` or `TimestampType`.
Because if `StringType` expression has an illegal number string or illegal timestamp string, the casted value becomes `null`.
Author: Takuya UESHIN <ueshin@happy-camper.st>
Closes #532 from ueshin/issues/SPARK-1608 and squashes the following commits:
065d37c [Takuya UESHIN] Add tests to check nullabilities of cast expressions.
f278ed7 [Takuya UESHIN] Revert test to keep it readable and concise.
9fc9380 [Takuya UESHIN] Fix Cast.nullable when cast from StringType to NumericType/TimestampType.
(cherry picked from commit 8e37ed6eb81687140b6cdb00f4ec609ec7ba9be1)
Signed-off-by: Reynold Xin <rxin@apache.org>
|