spark - Mirror of Apache Spark

	Commit message (Collapse)	Author	Age	Files	Lines
*	Revert "[maven-release-plugin] prepare release v1.0.0-rc11"	Tathagata Das	2014-05-26	3	-3/+3
\| \| \| \|	This reverts commit 2f1dc868e5714882cf40d2633fb66772baf34789.
*	Revert "[maven-release-plugin] prepare for next development iteration"	Tathagata Das	2014-05-26	3	-3/+3
\| \| \| \|	This reverts commit 832dc594e7666f1d402334f8015ce29917d9c888.
*	[SQL] Minor: Introduce SchemaRDD#aggregate() for simple aggregations	Aaron Davidson	2014-05-25	2	-2/+24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	```scala rdd.aggregate(Sum('val)) ``` is just shorthand for ```scala rdd.groupBy()(Sum('val)) ``` but seems be more natural than doing a groupBy with no grouping expressions when you really just want an aggregation over all rows. Did not add a JavaSchemaRDD or Python API, as these seem to be lacking several other methods like groupBy() already -- leaving that cleanup for future patches. Author: Aaron Davidson <aaron@databricks.com> Closes #874 from aarondav/schemardd and squashes the following commits: e9e68ee [Aaron Davidson] Add comment db6afe2 [Aaron Davidson] Introduce SchemaRDD#aggregate() for simple aggregations (cherry picked from commit c3576ffcd7910e38928f233a824dd9e037cde05f) Signed-off-by: Reynold Xin <rxin@apache.org>
*	[maven-release-plugin] prepare for next development iteration	Tathagata Das	2014-05-25	3	-3/+3
\|
*	[maven-release-plugin] prepare release v1.0.0-rc11	Tathagata Das	2014-05-25	3	-3/+3
\|
*	SPARK-1822: Some minor cleanup work on SchemaRDD.count()	Reynold Xin	2014-05-25	3	-6/+6
\| \| \| \| \| \| \| \| \| \| \| \| \|	Minor cleanup following #841. Author: Reynold Xin <rxin@apache.org> Closes #868 from rxin/schema-count and squashes the following commits: 5442651 [Reynold Xin] SPARK-1822: Some minor cleanup work on SchemaRDD.count() (cherry picked from commit d66642e3978a76977414c2fdaedebaad35662667) Signed-off-by: Reynold Xin <rxin@apache.org>
*	[SPARK-1822] SchemaRDD.count() should use query optimizer	Kan Zhang	2014-05-25	4	-7/+19
\| \| \| \| \| \| \| \| \| \| \| \| \|	Author: Kan Zhang <kzhang@apache.org> Closes #841 from kanzhang/SPARK-1822 and squashes the following commits: 2f8072a [Kan Zhang] [SPARK-1822] Minor style update cf4baa4 [Kan Zhang] [SPARK-1822] Adding Scaladoc e67c910 [Kan Zhang] [SPARK-1822] SchemaRDD.count() should use optimizer (cherry picked from commit 6052db9dc10c996215658485e805200e4f0cf549) Signed-off-by: Reynold Xin <rxin@apache.org>
*	Revert "[maven-release-plugin] prepare release v1.0.0-rc10"	Tathagata Das	2014-05-25	3	-3/+3
\| \| \| \|	This reverts commit d807023479ce10aec28ef3c1ab646ddefc2e663c.
*	Revert "[maven-release-plugin] prepare for next development iteration"	Tathagata Das	2014-05-25	3	-3/+3
\| \| \| \|	This reverts commit 67dd53d2556f03ce292e6889128cf441f1aa48f8.
*	[SPARK-1889] [SQL] Apply splitConjunctivePredicates to join condition while ↵	Takuya UESHIN	2014-05-21	2	-6/+24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	finding join ke... ...ys. When tables are equi-joined by multiple-keys `HashJoin` should be used, but `CartesianProduct` and then `Filter` are used. The join keys are paired by `And` expression so we need to apply `splitConjunctivePredicates` to join condition while finding join keys. Author: Takuya UESHIN <ueshin@happy-camper.st> Closes #836 from ueshin/issues/SPARK-1889 and squashes the following commits: fe1c387 [Takuya UESHIN] Apply splitConjunctivePredicates to join condition while finding join keys. (cherry picked from commit bb88875ad52e8209c25e8350af1fe4b7159086ae) Signed-off-by: Reynold Xin <rxin@apache.org>
*	[maven-release-plugin] prepare for next development iteration	Tathagata Das	2014-05-20	3	-3/+3
\|
*	[maven-release-plugin] prepare release v1.0.0-rc10	Tathagata Das	2014-05-20	3	-3/+3
\|
*	[Hotfix] Blacklisted flaky HiveCompatibility test	Tathagata Das	2014-05-20	1	-2/+4
\| \| \| \| \| \| \| \| \| \| \| \| \|	`lateral_view_outer` query sometimes returns a different set of 10 rows. Author: Tathagata Das <tathagata.das1565@gmail.com> Closes #838 from tdas/hive-test-fix2 and squashes the following commits: 9128a0d [Tathagata Das] Blacklisted flaky HiveCompatibility test. (cherry picked from commit 7f0cfe47f4709843d70ceccc25dee7551206ce0d) Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
*	Revert "[maven-release-plugin] prepare release v1.0.0-rc9"	Tathagata Das	2014-05-19	3	-3/+3
\| \| \| \|	This reverts commit 920f947eb5a22a679c0c3186cf69ee75f6041c75.
*	Revert "[maven-release-plugin] prepare for next development iteration"	Tathagata Das	2014-05-19	3	-3/+3
\| \| \| \|	This reverts commit f8e611955096c5c1c7db5764b9d2851b1d295f0d.
*	[SPARK-1875]NoClassDefFoundError: StringUtils when building with hadoop 1.x ↵	witgo	2014-05-19	1	-8/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	and hive Author: witgo <witgo@qq.com> Closes #824 from witgo/SPARK-1875_commons-lang-2.6 and squashes the following commits: ef7231d [witgo] review commit ead3c3b [witgo] SPARK-1875:NoClassDefFoundError: StringUtils when building against Hadoop 1 (cherry picked from commit 6a2c5c610c259f62cb12d8cfc18bf59cdb334bb2) Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
*	[maven-release-plugin] prepare for next development iteration	Patrick Wendell	2014-05-17	3	-3/+3
\|
*	[maven-release-plugin] prepare release v1.0.0-rc9	Patrick Wendell	2014-05-17	3	-3/+3
\|
*	Revert "[maven-release-plugin] prepare release v1.0.0-rc8"	Patrick Wendell	2014-05-16	3	-3/+3
\| \| \| \|	This reverts commit 80eea0f111c06260ffaa780d2f3f7facd09c17bc.
*	Revert "[maven-release-plugin] prepare for next development iteration"	Patrick Wendell	2014-05-16	3	-3/+3
\| \| \| \|	This reverts commit e5436b8c1a79ce108f3af402455ac5f6dc5d1eb3.
*	[SQL] Implement between in hql	Michael Armbrust	2014-05-16	3	-0/+21
\| \| \| \| \| \| \| \| \| \| \| \| \|	Author: Michael Armbrust <michael@databricks.com> Closes #804 from marmbrus/between and squashes the following commits: ae24672 [Michael Armbrust] add golden answer. d9997ef [Michael Armbrust] Implement between in hql. 9bd4433 [Michael Armbrust] Better error on parse failures. (cherry picked from commit 032d6632ad4ab88c97c9e568b63169a114220a02) Signed-off-by: Reynold Xin <rxin@apache.org>
*	[maven-release-plugin] prepare for next development iteration	Patrick Wendell	2014-05-16	3	-3/+3
\|
*	[maven-release-plugin] prepare release v1.0.0-rc8	Patrick Wendell	2014-05-16	3	-3/+3
\|
*	Revert "[maven-release-plugin] prepare release v1.0.0-rc7"	Patrick Wendell	2014-05-16	3	-3/+3
\| \| \| \|	This reverts commit 9212b3e5bb5545ccfce242da8d89108e6fb1c464.
*	Revert "[maven-release-plugin] prepare for next development iteration"	Patrick Wendell	2014-05-16	3	-3/+3
\| \| \| \|	This reverts commit c4746aa6fe4aaf383e69e34353114d36d1eb9ba6.
*	[Spark-1461] Deferred Expression Evaluation (short-circuit evaluation)	Cheng Hao	2014-05-15	2	-22/+53
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch unify the foldable & nullable interface for Expression. 1) Deterministic-less UDF (like Rand()) can not be folded. 2) Short-circut will significantly improves the performance in Expression Evaluation, however, the stateful UDF should not be ignored in a short-circuit evaluation(e.g. in expression: col1 > 0 and row_sequence() < 1000, row_sequence() can not be ignored even if col1 > 0 is false) I brought an concept of DeferredObject from Hive, which has 2 kinds of children classes (EagerResult / DeferredResult), the former requires triggering the evaluation before it's created, while the later trigger the evaluation when first called its get() method. Author: Cheng Hao <hao.cheng@intel.com> Closes #446 from chenghao-intel/expression_deferred_evaluation and squashes the following commits: d2729de [Cheng Hao] Fix the codestyle issues a08f09c [Cheng Hao] fix bug in or/and short-circuit evaluation af2236b [Cheng Hao] revert the short-circuit expression evaluation for IF b7861d2 [Cheng Hao] Add Support for Deferred Expression Evaluation (cherry picked from commit a20fea98811d98958567780815fcf0d4fb4e28d4) Signed-off-by: Reynold Xin <rxin@apache.org>
*	[SQL] Fix tiny/small ints from HiveMetastore.	Michael Armbrust	2014-05-15	1	-2/+4
\| \| \| \| \| \| \| \| \| \| \|	Author: Michael Armbrust <michael@databricks.com> Closes #797 from marmbrus/smallInt and squashes the following commits: 2db9dae [Michael Armbrust] Fix tiny/small ints from HiveMetastore. (cherry picked from commit a4aafe5f9fb191533400caeafddf04986492c95f) Signed-off-by: Reynold Xin <rxin@apache.org>
*	SPARK-1803 Replaced colon in filenames with a dash	Stevo Slavić	2014-05-15	16	-15/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch replaces colon in several filenames with dash to make these filenames Windows compatible. Author: Stevo Slavić <sslavic@gmail.com> Author: Stevo Slavic <sslavic@gmail.com> Closes #739 from sslavic/SPARK-1803 and squashes the following commits: 3ec66eb [Stevo Slavic] Removed extra empty line which was causing test to fail b967cc3 [Stevo Slavić] Aligned tests and names of test resources 2b12776 [Stevo Slavić] Fixed a typo in file name 1c5dfff [Stevo Slavić] Replaced colon in file name with dash 8f5bf7f [Stevo Slavić] Replaced colon in file name with dash c5b5083 [Stevo Slavić] Replaced colon in file name with dash a49801f [Stevo Slavić] Replaced colon in file name with dash 401d99e [Stevo Slavić] Replaced colon in file name with dash 40a9621 [Stevo Slavić] Replaced colon in file name with dash 4774580 [Stevo Slavić] Replaced colon in file name with dash 004f8bb [Stevo Slavić] Replaced colon in file name with dash d6a3e2c [Stevo Slavić] Replaced colon in file name with dash b585126 [Stevo Slavić] Replaced colon in file name with dash 028e48a [Stevo Slavić] Replaced colon in file name with dash ece0507 [Stevo Slavić] Replaced colon in file name with dash 84f5d2f [Stevo Slavić] Replaced colon in file name with dash 2fc7854 [Stevo Slavić] Replaced colon in file name with dash 9e1467d [Stevo Slavić] Replaced colon in file name with dash (cherry picked from commit e66e31be51f396c8f6b7a45119b8b31c4d8cdf79) Signed-off-by: Reynold Xin <rxin@apache.org>
*	[SPARK-1819] [SQL] Fix GetField.nullable.	Takuya UESHIN	2014-05-15	2	-1/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	`GetField.nullable` should be `true` not only when `field.nullable` is `true` but also when `child.nullable` is `true`. Author: Takuya UESHIN <ueshin@happy-camper.st> Closes #757 from ueshin/issues/SPARK-1819 and squashes the following commits: 8781a11 [Takuya UESHIN] Modify a test to use named parameters. 5bfc77d [Takuya UESHIN] Fix GetField.nullable. (cherry picked from commit 94c9d6f59859ebc77fae112c2c42c64b7a4d7f83) Signed-off-by: Reynold Xin <rxin@apache.org>
*	[SPARK-1845] [SQL] Use AllScalaRegistrar for SparkSqlSerializer to register ↵	Takuya UESHIN	2014-05-15	4	-26/+66
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	serializers of ... ...Scala collections. When I execute `orderBy` or `limit` for `SchemaRDD` including `ArrayType` or `MapType`, `SparkSqlSerializer` throws the following exception: ``` com.esotericsoftware.kryo.KryoException: Class cannot be created (missing no-arg constructor): scala.collection.immutable.$colon$colon ``` or ``` com.esotericsoftware.kryo.KryoException: Class cannot be created (missing no-arg constructor): scala.collection.immutable.Vector ``` or ``` com.esotericsoftware.kryo.KryoException: Class cannot be created (missing no-arg constructor): scala.collection.immutable.HashMap$HashTrieMap ``` and so on. This is because registrations of serializers for each concrete collections are missing in `SparkSqlSerializer`. I believe it should use `AllScalaRegistrar`. `AllScalaRegistrar` covers a lot of serializers for concrete classes of `Seq`, `Map` for `ArrayType`, `MapType`. Author: Takuya UESHIN <ueshin@happy-camper.st> Closes #790 from ueshin/issues/SPARK-1845 and squashes the following commits: d1ed992 [Takuya UESHIN] Use AllScalaRegistrar for SparkSqlSerializer to register serializers of Scala collections. (cherry picked from commit db8cc6f28abe4326cea6f53feb604920e4867a27) Signed-off-by: Reynold Xin <rxin@apache.org>
*	[maven-release-plugin] prepare for next development iteration	Patrick Wendell	2014-05-15	3	-3/+3
\|
*	[maven-release-plugin] prepare release v1.0.0-rc7	Patrick Wendell	2014-05-15	3	-3/+3
\|
*	Revert "[maven-release-plugin] prepare release v1.0.0-rc6"	Patrick Wendell	2014-05-14	3	-3/+3
\| \| \| \|	This reverts commit 54133abdce0246f6643a1112a5204afb2c4caa82.
*	Revert "[maven-release-plugin] prepare for next development iteration"	Patrick Wendell	2014-05-14	3	-3/+3
\| \| \| \|	This reverts commit e480bcfbd269ae1d7a6a92cfb50466cf192fe1fb.
*	fix different versions of commons-lang dependency and apache/spark#746 addendum	witgo	2014-05-14	1	-0/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Author: witgo <witgo@qq.com> Closes #754 from witgo/commons-lang and squashes the following commits: 3ebab31 [witgo] merge master f3b8fa2 [witgo] merge master 2083fae [witgo] repeat definition 5599cdb [witgo] multiple version of sbt dependency c1b66a1 [witgo] fix different versions of commons-lang dependency (cherry picked from commit bae07e36a6e0fb7982405316646b452b4ff06acc) Signed-off-by: Patrick Wendell <pwendell@gmail.com>
*	Package docs	Prashant Sharma	2014-05-14	3	-0/+59
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is a few changes based on the original patch by @scrapcodes. Author: Prashant Sharma <prashant.s@imaginea.com> Author: Patrick Wendell <pwendell@gmail.com> Closes #785 from pwendell/package-docs and squashes the following commits: c32b731 [Patrick Wendell] Changes based on Prashant's patch c0463d3 [Prashant Sharma] added eof new line ce8bf73 [Prashant Sharma] Added eof new line to all files. 4c35f2e [Prashant Sharma] SPARK-1563 Add package-info.java and package.scala files for all packages that appear in docs (cherry picked from commit 46324279dae2fa803267d788f7c56b0ed643b4c8) Signed-off-by: Patrick Wendell <pwendell@gmail.com>
*	[SPARK-1826] fix the head notation of package object dsl	wangfei	2014-05-14	1	-9/+12
\| \| \| \| \| \| \| \| \| \| \| \|	Author: wangfei <scnbwf@yeah.net> Closes #765 from scwf/dslfix and squashes the following commits: d2d1a9d [wangfei] Update package.scala 66ff53b [wangfei] fix the head notation of package object dsl (cherry picked from commit 44165fc91a31e6293a79031c89571e139d2c5356) Signed-off-by: Reynold Xin <rxin@apache.org>
*	[maven-release-plugin] prepare for next development iteration	Patrick Wendell	2014-05-14	3	-3/+3
\|
*	[maven-release-plugin] prepare release v1.0.0-rc6	Patrick Wendell	2014-05-14	3	-3/+3
\|
*	Revert "[maven-release-plugin] prepare release v1.0.0-rc5"	Patrick Wendell	2014-05-14	3	-3/+3
\| \| \| \|	This reverts commit 18f062303303824139998e8fc8f4158217b0dbc3.
*	Revert "[maven-release-plugin] prepare for next development iteration"	Patrick Wendell	2014-05-14	3	-3/+3
\| \| \| \|	This reverts commit d08e9604fc9958b7c768e91715c8152db2ed6fd0.
*	SPARK-1828: Created forked version of hive-exec that doesn't bundle other ↵	Patrick Wendell	2014-05-14	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	dependencies See https://issues.apache.org/jira/browse/SPARK-1828 for more information. This is being submitted to Jenkin's for testing. The dependency won't fully propagate in Maven central for a few more hours. Author: Patrick Wendell <pwendell@gmail.com> Closes #767 from pwendell/hive-shaded and squashes the following commits: ea10ac5 [Patrick Wendell] SPARK-1828: Created forked version of hive-exec that doesn't bundle other dependencies (cherry picked from commit d58cb33ffa9e98a64cecea7b40ce7bfbed145079) Signed-off-by: Patrick Wendell <pwendell@gmail.com>
*	[SQL] Improve column pruning.	Michael Armbrust	2014-05-13	1	-5/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fixed a bug that was preventing us from ever pruning beneath Joins. ## TPC-DS Q3 ### Before: ``` Aggregate false, [d_year#12,i_brand#65,i_brand_id#64], [d_year#12,i_brand_id#64 AS brand_id#0,i_brand#65 AS brand#1,SUM(PartialSum#79) AS sum_agg#2] Exchange (HashPartitioning [d_year#12:0,i_brand#65:1,i_brand_id#64:2], 150) Aggregate true, [d_year#12,i_brand#65,i_brand_id#64], [d_year#12,i_brand#65,i_brand_id#64,SUM(CAST(ss_ext_sales_price#49, DoubleType)) AS PartialSum#79] Project [d_year#12:6,i_brand#65:59,i_brand_id#64:58,ss_ext_sales_price#49:43] HashJoin [ss_item_sk#36], [i_item_sk#57], BuildRight Exchange (HashPartitioning [ss_item_sk#36:30], 150) HashJoin [d_date_sk#6], [ss_sold_date_sk#34], BuildRight Exchange (HashPartitioning [d_date_sk#6:0], 150) Filter (d_moy#14:8 = 12) HiveTableScan [d_date_sk#6,d_date_id#7,d_date#8,d_month_seq#9,d_week_seq#10,d_quarter_seq#11,d_year#12,d_dow#13,d_moy#14,d_dom#15,d_qoy#16,d_fy_year#17,d_fy_quarter_seq#18,d_fy_week_seq#19,d_day_name#20,d_quarter_name#21,d_holiday#22,d_weekend#23,d_following_holiday#24,d_first_dom#25,d_last_dom#26,d_same_day_ly#27,d_same_day_lq#28,d_current_day#29,d_current_week#30,d_current_month#31,d_current_quarter#32,d_current_year#33], (MetastoreRelation default, date_dim, Some(dt)), None Exchange (HashPartitioning [ss_sold_date_sk#34:0], 150) HiveTableScan [ss_sold_date_sk#34,ss_sold_time_sk#35,ss_item_sk#36,ss_customer_sk#37,ss_cdemo_sk#38,ss_hdemo_sk#39,ss_addr_sk#40,ss_store_sk#41,ss_promo_sk#42,ss_ticket_number#43,ss_quantity#44,ss_wholesale_cost#45,ss_list_price#46,ss_sales_price#47,ss_ext_discount_amt#48,ss_ext_sales_price#49,ss_ext_wholesale_cost#50,ss_ext_list_price#51,ss_ext_tax#52,ss_coupon_amt#53,ss_net_paid#54,ss_net_paid_inc_tax#55,ss_net_profit#56], (MetastoreRelation default, store_sales, None), None Exchange (HashPartitioning [i_item_sk#57:0], 150) Filter (i_manufact_id#70:13 = 436) HiveTableScan [i_item_sk#57,i_item_id#58,i_rec_start_date#59,i_rec_end_date#60,i_item_desc#61,i_current_price#62,i_wholesale_cost#63,i_brand_id#64,i_brand#65,i_class_id#66,i_class#67,i_category_id#68,i_category#69,i_manufact_id#70,i_manufact#71,i_size#72,i_formulation#73,i_color#74,i_units#75,i_container#76,i_manager_id#77,i_product_name#78], (MetastoreRelation default, item, None), None ``` ### After ``` Aggregate false, [d_year#172,i_brand#225,i_brand_id#224], [d_year#172,i_brand_id#224 AS brand_id#160,i_brand#225 AS brand#161,SUM(PartialSum#239) AS sum_agg#162] Exchange (HashPartitioning [d_year#172:0,i_brand#225:1,i_brand_id#224:2], 150) Aggregate true, [d_year#172,i_brand#225,i_brand_id#224], [d_year#172,i_brand#225,i_brand_id#224,SUM(CAST(ss_ext_sales_price#209, DoubleType)) AS PartialSum#239] Project [d_year#172:1,i_brand#225:5,i_brand_id#224:3,ss_ext_sales_price#209:0] HashJoin [ss_item_sk#196], [i_item_sk#217], BuildRight Exchange (HashPartitioning [ss_item_sk#196:2], 150) Project [ss_ext_sales_price#209:2,d_year#172:1,ss_item_sk#196:3] HashJoin [d_date_sk#166], [ss_sold_date_sk#194], BuildRight Exchange (HashPartitioning [d_date_sk#166:0], 150) Project [d_date_sk#166:0,d_year#172:1] Filter (d_moy#174:2 = 12) HiveTableScan [d_date_sk#166,d_year#172,d_moy#174], (MetastoreRelation default, date_dim, Some(dt)), None Exchange (HashPartitioning [ss_sold_date_sk#194:2], 150) HiveTableScan [ss_ext_sales_price#209,ss_item_sk#196,ss_sold_date_sk#194], (MetastoreRelation default, store_sales, None), None Exchange (HashPartitioning [i_item_sk#217:1], 150) Project [i_brand_id#224:0,i_item_sk#217:1,i_brand#225:2] Filter (i_manufact_id#230:3 = 436) HiveTableScan [i_brand_id#224,i_item_sk#217,i_brand#225,i_manufact_id#230], (MetastoreRelation default, item, None), None ``` Author: Michael Armbrust <michael@databricks.com> Closes #729 from marmbrus/fixPruning and squashes the following commits: 5feeff0 [Michael Armbrust] Improve column pruning. (cherry picked from commit 6ce0884446d3571fd6e9d967a080a59c657543b1) Signed-off-by: Patrick Wendell <pwendell@gmail.com>
*	Implement ApproximateCountDistinct for SparkSql	larvaboy	2014-05-13	4	-4/+119
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add the implementation for ApproximateCountDistinct to SparkSql. We use the HyperLogLog algorithm implemented in stream-lib, and do the count in two phases: 1) counting the number of distinct elements in each partitions, and 2) merge the HyperLogLog results from different partitions. A simple serializer and test cases are added as well. Author: larvaboy <larvaboy@gmail.com> Closes #737 from larvaboy/master and squashes the following commits: bd8ef3f [larvaboy] Add support of user-provided standard deviation to ApproxCountDistinct. 9ba8360 [larvaboy] Fix alignment and null handling issues. 95b4067 [larvaboy] Add a test case for count distinct and approximate count distinct. f57917d [larvaboy] Add the parser for the approximate count. a2d5d10 [larvaboy] Add ApproximateCountDistinct aggregates and functions. 7ad273a [larvaboy] Add SparkSql serializer for HyperLogLog. 1d9aacf [larvaboy] Fix a minor typo in the toString method of the Count case class. 653542b [larvaboy] Fix a couple of minor typos. (cherry picked from commit c33b8dcbf65a3a0c5ee5e65cd1dcdbc7da36aa5f) Signed-off-by: Reynold Xin <rxin@apache.org>
*	[SQL] Make it possible to create Java/Python SQLContexts from an existing ↵	Michael Armbrust	2014-05-13	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \|	Scala SQLContext. Author: Michael Armbrust <michael@databricks.com> Closes #761 from marmbrus/existingContext and squashes the following commits: 4651051 [Michael Armbrust] Make it possible to create Java/Python SQLContexts from an existing Scala SQLContext. (cherry picked from commit 44233865cf8020741d862d33cc660c88e9315dea) Signed-off-by: Reynold Xin <rxin@apache.org>
*	[maven-release-plugin] prepare for next development iteration	Patrick Wendell	2014-05-13	3	-3/+3
\|
*	[maven-release-plugin] prepare release v1.0.0-rc5	Patrick Wendell	2014-05-13	3	-3/+3
\|
*	Revert "[maven-release-plugin] prepare release v1.0.0-rc4"	Patrick Wendell	2014-05-12	3	-3/+3
\| \| \| \|	This reverts commit 3d0a44833ab50360bf9feccc861cb5e8c44a4866.
*	Revert "[maven-release-plugin] prepare for next development iteration"	Patrick Wendell	2014-05-12	3	-3/+3
\| \| \| \|	This reverts commit 9772d85c6f3893d42044f4bab0e16f8b6287613a.
*	[maven-release-plugin] prepare for next development iteration	Patrick Wendell	2014-05-13	3	-3/+3
\|