spark - Mirror of Apache Spark

	Commit message (Collapse)	Author	Age	Files	Lines
*	Preparing development version 1.2.1-SNAPSHOT	Patrick Wendell	2014-11-26	4	-4/+4
\|
*	Preparing Spark release v1.2.0-rc1	Patrick Wendell	2014-11-26	4	-4/+4
\|
*	Revert "Preparing Spark release v1.2.0-rc1"	Patrick Wendell	2014-11-26	4	-4/+4
\| \| \| \|	This reverts commit 5247dd859b95a440baa562b9827bdeb26aa6530e.
*	Revert "Preparing development version 1.2.1-SNAPSHOT"	Patrick Wendell	2014-11-26	4	-4/+4
\| \| \| \|	This reverts commit 79df6b43ae762263a8120f423ddb4a0811dd4b6f.
*	Preparing development version 1.2.1-SNAPSHOT	Patrick Wendell	2014-11-26	4	-4/+4
\|
*	Preparing Spark release v1.2.0-rc1	Patrick Wendell	2014-11-26	4	-4/+4
\|
*	Revert "Preparing Spark release v1.2.0-rc1"	Patrick Wendell	2014-11-26	4	-4/+4
\| \| \| \|	This reverts commit db7f4a898af22a02b36428507f8ef2b429d78dc1.
*	Revert "Preparing development version 1.2.1-SNAPSHOT"	Patrick Wendell	2014-11-26	4	-4/+4
\| \| \| \|	This reverts commit d7b1ecb25676d228deb6fe05efdb4e2ab9c3e30b.
*	Preparing development version 1.2.1-SNAPSHOT	Ubuntu	2014-11-26	4	-4/+4
\|
*	Preparing Spark release v1.2.0-rc1	Ubuntu	2014-11-26	4	-4/+4
\|
*	Revert "Preparing Spark release v1.2.0-snapshot1"	Patrick Wendell	2014-11-26	4	-4/+4
\| \| \| \|	This reverts commit 38c1fbd9694430cefd962c90bc36b0d108c6124b.
*	Revert "Preparing development version 1.2.1-SNAPSHOT"	Patrick Wendell	2014-11-26	4	-4/+4
\| \| \| \|	This reverts commit d7ac6013483e83caff8ea54c228f37aeca159db8.
*	[SQL] Compute timeTaken correctly	w00228970	2014-11-24	1	-7/+4
\| \| \| \| \| \| \| \| \| \| \| \| \|	```timeTaken``` should not count the time of printing result. Author: w00228970 <wangfei1@huawei.com> Closes #3423 from scwf/time-taken-bug and squashes the following commits: da7e102 [w00228970] compute time taken correctly (cherry picked from commit 723be60e233d0f85944d948efd06845ef546c9f5) Signed-off-by: Reynold Xin <rxin@databricks.com>
*	[SPARK-4548] []SPARK-4517] improve performance of python broadcast	Davies Liu	2014-11-24	2	-3/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Re-implement the Python broadcast using file: 1) serialize the python object using cPickle, write into disks. 2) Create a wrapper in JVM (for the dumped file), it read data from during serialization 3) Using TorrentBroadcast or HttpBroadcast to transfer the data (compressed) into executors 4) During deserialization, writing the data into disk. 5) Passing the path into Python worker, read data from disk and unpickle it into python object, until the first access. It fixes the performance regression introduced in #2659, has similar performance as 1.1, but support object larger than 2G, also improve the memory efficiency (only one compressed copy in driver and executor). Testing with a 500M broadcast and 4 tasks (excluding the benefit from reused worker in 1.2): name \| 1.1 \| 1.2 with this patch \| improvement ---------\|--------\|---------\|-------- python-broadcast-w-bytes \| 25.20 \| 9.33 \| 170.13% \| python-broadcast-w-set \| 4.13 \| 4.50 \| -8.35% \| Testing with 100 tasks (16 CPUs): name \| 1.1 \| 1.2 with this patch \| improvement ---------\|--------\|---------\|-------- python-broadcast-w-bytes \| 38.16 \| 8.40 \| 353.98% python-broadcast-w-set \| 23.29 \| 9.59 \| 142.80% Author: Davies Liu <davies@databricks.com> Closes #3417 from davies/pybroadcast and squashes the following commits: 50a58e0 [Davies Liu] address comments b98de1d [Davies Liu] disable gc while unpickle e5ee6b9 [Davies Liu] support large string 09303b8 [Davies Liu] read all data into memory dde02dd [Davies Liu] improve performance of python broadcast (cherry picked from commit 6cf507685efd01df77d663145ae08e48c7f92948) Signed-off-by: Josh Rosen <joshrosen@databricks.com>
*	[SPARK-4487][SQL] Fix attribute reference resolution error when using ORDER BY.	Kousuke Saruta	2014-11-24	2	-1/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When we use ORDER BY clause, at first, attributes referenced by projection are resolved (1). And then, attributes referenced at ORDER BY clause are resolved (2). But when resolving attributes referenced at ORDER BY clause, the resolution result generated in (1) is discarded so for example, following query fails. SELECT c1 + c2 FROM mytable ORDER BY c1; The query above fails because when resolving the attribute reference 'c1', the resolution result of 'c2' is discarded. Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp> Closes #3363 from sarutak/SPARK-4487 and squashes the following commits: fd314f3 [Kousuke Saruta] Fixed attribute resolution logic in Analyzer 6e60c20 [Kousuke Saruta] Fixed conflicts cb5b7e9 [Kousuke Saruta] Added test case for SPARK-4487 282d529 [Kousuke Saruta] Fixed attributes reference resolution error b6123e6 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into concat-feature 317b7fb [Kousuke Saruta] WIP (cherry picked from commit dd1c9cb36cde8202cede8014b5641ae8a0197812) Signed-off-by: Michael Armbrust <michael@databricks.com>
*	[SQL] Fix comment in HiveShim	Daniel Darabos	2014-11-24	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	This file is for Hive 0.13.1 I think. Author: Daniel Darabos <darabos.daniel@gmail.com> Closes #3432 from darabos/patch-2 and squashes the following commits: 4fd22ed [Daniel Darabos] Fix comment. This file is for Hive 0.13.1. (cherry picked from commit d5834f0732b586731034a7df5402c25454770fc5) Signed-off-by: Michael Armbrust <michael@databricks.com>
*	[SPARK-4479][SQL] Avoids unnecessary defensive copies when sort based ↵	Cheng Lian	2014-11-24	1	-1/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	shuffle is on This PR is a workaround for SPARK-4479. Two changes are introduced: when merge sort is bypassed in `ExternalSorter`, 1. also bypass RDD elements buffering as buffering is the reason that `MutableRow` backed row objects must be copied, and 2. avoids defensive copies in `Exchange` operator <!-- Reviewable:start --> [<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/3422) <!-- Reviewable:end --> Author: Cheng Lian <lian@databricks.com> Closes #3422 from liancheng/avoids-defensive-copies and squashes the following commits: 591f2e9 [Cheng Lian] Passes all shuffle suites 0c3c91e [Cheng Lian] Fixes shuffle write metrics when merge sort is bypassed ed5df3c [Cheng Lian] Fixes styling changes f75089b [Cheng Lian] Avoids unnecessary defensive copies when sort based shuffle is on (cherry picked from commit a6d7b61f92dc7c1f9632cecb232afa8040ab2b4d) Signed-off-by: Michael Armbrust <michael@databricks.com>
*	[SPARK-4522][SQL] Parse schema with missing metadata.	Michael Armbrust	2014-11-20	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \|	This is just a quick fix for 1.2. SPARK-4523 describes a more complete solution. Author: Michael Armbrust <michael@databricks.com> Closes #3392 from marmbrus/parquetMetadata and squashes the following commits: bcc6626 [Michael Armbrust] Parse schema with missing metadata. (cherry picked from commit 90a6a46bd11030672597f015dd443d954107123a) Signed-off-by: Michael Armbrust <michael@databricks.com>
*	[SPARK-4413][SQL] Parquet support through datasource API	Michael Armbrust	2014-11-20	5	-79/+458
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Goals: - Support for accessing parquet using SQL but not requiring Hive (thus allowing support of parquet tables with decimal columns) - Support for folder based partitioning with automatic discovery of available partitions - Caching of file metadata See scaladoc of `ParquetRelation2` for more details. Author: Michael Armbrust <michael@databricks.com> Closes #3269 from marmbrus/newParquet and squashes the following commits: 1dd75f1 [Michael Armbrust] Pass all paths for FileInputFormat at once. 645768b [Michael Armbrust] Review comments. abd8e2f [Michael Armbrust] Alternative implementation of parquet based on the datasources API. 938019e [Michael Armbrust] Add an experimental interface to data sources that exposes catalyst expressions. e9d2641 [Michael Armbrust] logging / formatting improvements. (cherry picked from commit 02ec058efe24348cdd3691b55942e6f0ef138732) Signed-off-by: Michael Armbrust <michael@databricks.com>
*	[SPARK-4244] [SQL] Support Hive Generic UDFs with constant object inspector ↵	Cheng Hao	2014-11-20	4	-8/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	parameters Query `SELECT named_struct(lower("AA"), "12", lower("Bb"), "13") FROM src LIMIT 1` will throw exception, some of the Hive Generic UDF/UDAF requires the input object inspector is `ConstantObjectInspector`, however, we won't get that before the expression optimization executed. (Constant Folding). This PR is a work around to fix this. (As ideally, the `output` of LogicalPlan should be identical before and after Optimization). Author: Cheng Hao <hao.cheng@intel.com> Closes #3109 from chenghao-intel/optimized and squashes the following commits: 487ff79 [Cheng Hao] rebase to the latest master & update the unittest (cherry picked from commit 84d79ee9ec47465269f7b0a7971176da93c96f3f) Signed-off-by: Michael Armbrust <michael@databricks.com>
*	[SQL] fix function description mistake	Jacky Li	2014-11-20	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	Sample code in the description of SchemaRDD.where is not correct Author: Jacky Li <jacky.likun@gmail.com> Closes #3344 from jackylk/patch-6 and squashes the following commits: 62cd126 [Jacky Li] [SQL] fix function description mistake (cherry picked from commit ad5f1f3ca240473261162c06ffc5aa70d15a5991) Signed-off-by: Michael Armbrust <michael@databricks.com>
*	[SPARK-2918] [SQL] Support the CTAS in EXPLAIN command	Cheng Hao	2014-11-20	2	-1/+41
\| \| \| \| \| \| \| \| \| \| \| \| \|	Hive supports the `explain` the CTAS, which was supported by Spark SQL previously, however, seems it was reverted after the code refactoring in HiveQL. Author: Cheng Hao <hao.cheng@intel.com> Closes #3357 from chenghao-intel/explain and squashes the following commits: 7aace63 [Cheng Hao] Support the CTAS in EXPLAIN command (cherry picked from commit 6aa0fc9f4d95f09383cbcb5f79166c60697e6683) Signed-off-by: Michael Armbrust <michael@databricks.com>
*	[SPARK-4318][SQL] Fix empty sum distinct.	Takuya UESHIN	2014-11-20	4	-52/+195
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Executing sum distinct for empty table throws `java.lang.UnsupportedOperationException: empty.reduceLeft`. Author: Takuya UESHIN <ueshin@happy-camper.st> Closes #3184 from ueshin/issues/SPARK-4318 and squashes the following commits: 8168c42 [Takuya UESHIN] Merge branch 'master' into issues/SPARK-4318 66fdb0a [Takuya UESHIN] Re-refine aggregate functions. 6186eb4 [Takuya UESHIN] Fix Sum of GeneratedAggregate. d2975f6 [Takuya UESHIN] Refine Sum and Average of GeneratedAggregate. 1bba675 [Takuya UESHIN] Refine Sum, SumDistinct and Average functions. 917e533 [Takuya UESHIN] Use aggregate instead of groupBy(). 1a5f874 [Takuya UESHIN] Add tests to be executed as non-partial aggregation. a5a57d2 [Takuya UESHIN] Fix empty Average. 22799dc [Takuya UESHIN] Fix empty Sum and SumDistinct. 65b7dd2 [Takuya UESHIN] Fix empty sum distinct. (cherry picked from commit 2c2e7a44db2ebe44121226f3eac924a0668b991a) Signed-off-by: Michael Armbrust <michael@databricks.com>
*	[SPARK-4513][SQL] Support relational operator '<=>' in Spark SQL	ravipesala	2014-11-20	3	-1/+14
\| \| \| \| \| \| \| \| \| \| \| \| \|	The relational operator '<=>' is not working in Spark SQL. Same works in Spark HiveQL Author: ravipesala <ravindra.pesala@huawei.com> Closes #3387 from ravipesala/<=> and squashes the following commits: 7198e90 [ravipesala] Supporting relational operator '<=>' in Spark SQL (cherry picked from commit 98e9419784a9ad5096cfd563fa9a433786a90bd4) Signed-off-by: Michael Armbrust <michael@databricks.com>
*	[SPARK-4228][SQL] SchemaRDD to JSON	Dan McClary	2014-11-20	4	-3/+208
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Here's a simple fix for SchemaRDD to JSON. Author: Dan McClary <dan.mcclary@gmail.com> Closes #3213 from dwmclary/SPARK-4228 and squashes the following commits: d714e1d [Dan McClary] fixed PEP 8 error cac2879 [Dan McClary] move pyspark comment and doctest to correct location f9471d3 [Dan McClary] added pyspark doc and doctest 6598cee [Dan McClary] adding complex type queries 1a5fd30 [Dan McClary] removing SPARK-4228 from SQLQuerySuite 4a651f0 [Dan McClary] cleaned PEP and Scala style failures. Moved tests to JsonSuite 47ceff6 [Dan McClary] cleaned up scala style issues 2ee1e70 [Dan McClary] moved rowToJSON to JsonRDD 4387dd5 [Dan McClary] Added UserDefinedType, cleaned up case formatting 8f7bfb6 [Dan McClary] Map type added to SchemaRDD.toJSON 1b11980 [Dan McClary] Map and UserDefinedTypes partially done 11d2016 [Dan McClary] formatting and unicode deserialization default fixed 6af72d1 [Dan McClary] deleted extaneous comment 4d11c0c [Dan McClary] JsonFactory rewrite of toJSON for SchemaRDD 149dafd [Dan McClary] wrapped scala toJSON in sql.py 5e5eb1b [Dan McClary] switched to Jackson for JSON processing 6c94a54 [Dan McClary] added toJSON to pyspark SchemaRDD aaeba58 [Dan McClary] added toJSON to pyspark SchemaRDD 1d171aa [Dan McClary] upated missing brace on if statement 319e3ba [Dan McClary] updated to upstream master with merged SPARK-4228 424f130 [Dan McClary] tests pass, ready for pull and PR 626a5b1 [Dan McClary] added toJSON to SchemaRDD f7d166a [Dan McClary] added toJSON method 5d34e37 [Dan McClary] merge resolved d6d19e9 [Dan McClary] pr example (cherry picked from commit b8e6886fb8ff8f667fb7e600cd727d8649cad1d1) Signed-off-by: Michael Armbrust <michael@databricks.com>
*	[SPARK-3938][SQL] Names in-memory columnar RDD with corresponding table name	Cheng Lian	2014-11-20	6	-16/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This PR enables the Web UI storage tab to show the in-memory table name instead of the mysterious query plan string as the name of the in-memory columnar RDD. Note that after #2501, a single columnar RDD can be shared by multiple in-memory tables, as long as their query results are the same. In this case, only the first cached table name is shown. For example: ```sql CACHE TABLE first AS SELECT * FROM src; CACHE TABLE second AS SELECT * FROM src; ``` The Web UI only shows "In-memory table first". <!-- Reviewable:start --> [<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/3383) <!-- Reviewable:end --> Author: Cheng Lian <lian@databricks.com> Closes #3383 from liancheng/columnar-rdd-name and squashes the following commits: 071907f [Cheng Lian] Fixes tests 12ddfa6 [Cheng Lian] Names in-memory columnar RDD with corresponding table name (cherry picked from commit abf29187f0342b607fcefe269391d4db58d2a957) Signed-off-by: Michael Armbrust <michael@databricks.com>
*	[SPARK-4468][SQL] Fixes Parquet filter creation for inequality predicates ↵	Cheng Lian	2014-11-18	2	-4/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	with literals on the left hand side For expressions like `10 < someVar`, we should create an `Operators.Gt` filter, but right now an `Operators.Lt` is created. This issue affects all inequality predicates with literals on the left hand side. (This bug existed before #3317 and affects branch-1.1. #3338 was opened to backport this to branch-1.1.) <!-- Reviewable:start --> [<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/3334) <!-- Reviewable:end --> Author: Cheng Lian <lian@databricks.com> Closes #3334 from liancheng/fix-parquet-comp-filter and squashes the following commits: 0130897 [Cheng Lian] Fixes Parquet comparison filter generation (cherry picked from commit 423baea953996a66dde671ff6db2fb1f32fbe8cb) Signed-off-by: Michael Armbrust <michael@databricks.com>
*	[SPARK-3721] [PySpark] broadcast objects larger than 2G	Davies Liu	2014-11-18	2	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch will bring support for broadcasting objects larger than 2G. pickle, zlib, FrameSerializer and Array[Byte] all can not support objects larger than 2G, so this patch introduce LargeObjectSerializer to serialize broadcast objects, the object will be serialized and compressed into small chunks, it also change the type of Broadcast[Array[Byte]]] into Broadcast[Array[Array[Byte]]]]. Testing for support broadcast objects larger than 2G is slow and memory hungry, so this is tested manually, could be added into SparkPerf. Author: Davies Liu <davies@databricks.com> Author: Davies Liu <davies.liu@gmail.com> Closes #2659 from davies/huge and squashes the following commits: 7b57a14 [Davies Liu] add more tests for broadcast 28acff9 [Davies Liu] Merge branch 'master' of github.com:apache/spark into huge a2f6a02 [Davies Liu] bug fix 4820613 [Davies Liu] Merge branch 'master' of github.com:apache/spark into huge 5875c73 [Davies Liu] address comments 10a349b [Davies Liu] address comments 0c33016 [Davies Liu] Merge branch 'master' of github.com:apache/spark into huge 6182c8f [Davies Liu] Merge branch 'master' into huge d94b68f [Davies Liu] Merge branch 'master' of github.com:apache/spark into huge 2514848 [Davies Liu] address comments fda395b [Davies Liu] Merge branch 'master' of github.com:apache/spark into huge 1c2d928 [Davies Liu] fix scala style 091b107 [Davies Liu] broadcast objects larger than 2G (cherry picked from commit 4a377aff2d36b64a65b54192a987aba44b8f78e0) Signed-off-by: Josh Rosen <joshrosen@databricks.com>
*	[SQL] Support partitioned parquet tables that have the key in both the ↵	Michael Armbrust	2014-11-18	2	-68/+108
\| \| \| \| \| \| \| \| \| \| \| \| \|	directory and the file Author: Michael Armbrust <michael@databricks.com> Closes #3272 from marmbrus/keyInPartitionedTable and squashes the following commits: 447f08c [Michael Armbrust] Support partitioned parquet tables that have the key in both the directory and the file (cherry picked from commit 90d72ec8502f7ec11d2fe42f08c884ad2159266f) Signed-off-by: Michael Armbrust <michael@databricks.com>
*	[SPARK-4453][SPARK-4213][SQL] Simplifies Parquet filter generation code	Cheng Lian	2014-11-17	5	-693/+161
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	While reviewing PR #3083 and #3161, I noticed that Parquet record filter generation code can be simplified significantly according to the clue stated in [SPARK-4453](https://issues.apache.org/jira/browse/SPARK-4213). This PR addresses both SPARK-4453 and SPARK-4213 with this simplification. While generating `ParquetTableScan` operator, we need to remove all Catalyst predicates that have already been pushed down to Parquet. Originally, we first generate the record filter, and then call `findExpression` to traverse the generated filter to find out all pushed down predicates [[1](https://github.com/apache/spark/blob/64c6b9bad559c21f25cd9fbe37c8813cdab939f2/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala#L213-L228)]. In this way, we have to introduce the `CatalystFilter` class hierarchy to bind the Catalyst predicates together with their generated Parquet filter, and complicate the code base a lot. The basic idea of this PR is that, we don't need `findExpression` after filter generation, because we already know a predicate can be pushed down if we can successfully generate its corresponding Parquet filter. SPARK-4213 is fixed by returning `None` for any unsupported predicate type. <!-- Reviewable:start --> [<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/3317) <!-- Reviewable:end --> Author: Cheng Lian <lian@databricks.com> Closes #3317 from liancheng/simplify-parquet-filters and squashes the following commits: d6a9499 [Cheng Lian] Fixes import styling issue 43760e8 [Cheng Lian] Simplifies Parquet filter generation logic (cherry picked from commit 36b0956a3eadc7343ed0d25c79a6ce0496eaaccd) Signed-off-by: Michael Armbrust <michael@databricks.com>
*	[SPARK-4448] [SQL] unwrap for the ConstantObjectInspector	Cheng Hao	2014-11-17	1	-4/+32
\| \| \| \| \| \| \| \| \| \| \| \|	Author: Cheng Hao <hao.cheng@intel.com> Closes #3308 from chenghao-intel/unwrap_constant_oi and squashes the following commits: 156b500 [Cheng Hao] rebase the master c5b20ab [Cheng Hao] unwrap for the ConstantObjectInspector (cherry picked from commit ef7c464effa1510b24bd8e665e4df6c4839b0c87) Signed-off-by: Michael Armbrust <michael@databricks.com>
*	[SPARK-4443][SQL] Fix statistics for external table in spark sql hive	w00228970	2014-11-17	3	-3/+12
\| \| \| \| \| \| \| \| \| \| \| \| \|	The `totalSize` of external table is always zero, which will influence join strategy(always use broadcast join for external table). Author: w00228970 <wangfei1@huawei.com> Closes #3304 from scwf/statistics and squashes the following commits: 568f321 [w00228970] fix statistics for external table (cherry picked from commit 42389b1780311d90499b4ce2315ceabf5b6ab384) Signed-off-by: Michael Armbrust <michael@databricks.com>
*	[SPARK-4309][SPARK-4407][SQL] Date type support for Thrift server, and fixes ↵	Cheng Lian	2014-11-17	4	-114/+141
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	for complex types This PR is exactly the same as #3178 except it reverts the `FileStatus.isDir` to `FileStatus.isDirectory` change, since it doesn't compile with Hadoop 1. <!-- Reviewable:start --> [<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/3298) <!-- Reviewable:end --> Author: Cheng Lian <lian@databricks.com> Closes #3298 from liancheng/date-for-thriftserver and squashes the following commits: 866037e [Cheng Lian] Revers isDirectory to isDir (it breaks Hadoop 1 profile) 6f71d0b [Cheng Lian] Makes toHiveString static 26fa955 [Cheng Lian] Fixes complex type support in Hive 0.13.1 shim a92882a [Cheng Lian] Updates HiveShim for 0.13.1 73f442b [Cheng Lian] Adds Date support for HiveThriftServer2 (Hive 0.12.0) (cherry picked from commit 6b7f2f753d16ff038881772f1958e3f4fd5597a7) Signed-off-by: Michael Armbrust <michael@databricks.com>
*	[SQL] Construct the MutableRow from an Array	Cheng Hao	2014-11-17	1	-2/+4
\| \| \| \| \| \| \| \| \| \| \| \| \|	Author: Cheng Hao <hao.cheng@intel.com> Closes #3217 from chenghao-intel/mutablerow and squashes the following commits: e8a10bd [Cheng Hao] revert the change of Row object 4681aea [Cheng Hao] Add toMutableRow method in object Row a751838 [Cheng Hao] Construct the MutableRow from an existed row (cherry picked from commit 69e858cc7748b6babadd0cbe20e65f3982161cbf) Signed-off-by: Michael Armbrust <michael@databricks.com>
*	[SPARK-4425][SQL] Handle NaN or Infinity cast to Timestamp correctly.	Takuya UESHIN	2014-11-17	2	-2/+17
\| \| \| \| \| \| \| \| \| \| \| \| \|	`Cast` from `NaN` or `Infinity` of `Double` or `Float` to `TimestampType` throws `NumberFormatException`. Author: Takuya UESHIN <ueshin@happy-camper.st> Closes #3283 from ueshin/issues/SPARK-4425 and squashes the following commits: 14def0c [Takuya UESHIN] Fix Cast to be able to handle NaN or Infinity to TimestampType. (cherry picked from commit 566c791931645bfaaaf57ee5a15b9ffad534f81e) Signed-off-by: Michael Armbrust <michael@databricks.com>
*	[SPARK-4420][SQL] Change nullability of Cast from DoubleType/FloatType to ↵	Takuya UESHIN	2014-11-17	2	-2/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	DecimalType. This is follow-up of [SPARK-4390](https://issues.apache.org/jira/browse/SPARK-4390) (#3256). Author: Takuya UESHIN <ueshin@happy-camper.st> Closes #3278 from ueshin/issues/SPARK-4420 and squashes the following commits: 7fea558 [Takuya UESHIN] Add some tests. cb2301a [Takuya UESHIN] Fix tests. 133bad5 [Takuya UESHIN] Change nullability of Cast from DoubleType/FloatType to DecimalType. (cherry picked from commit 3a81a1c9e0963173534d96850f3c0b7a16350838) Signed-off-by: Michael Armbrust <michael@databricks.com>
*	[SQL] Makes conjunction pushdown more aggressive for in-memory table	Cheng Lian	2014-11-17	2	-5/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is inspired by the [Parquet record filter generation code](https://github.com/apache/spark/blob/64c6b9bad559c21f25cd9fbe37c8813cdab939f2/sql/core/src/main/scala/org/apache/spark/sql/parquet/ParquetFilters.scala#L387-L400). <!-- Reviewable:start --> [<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/3318) <!-- Reviewable:end --> Author: Cheng Lian <lian@databricks.com> Closes #3318 from liancheng/aggresive-conj-pushdown and squashes the following commits: 78b69d2 [Cheng Lian] Makes conjunction pushdown more aggressive (cherry picked from commit 5ce7dae859dc273b0fc532c9456b5960b1eca399) Signed-off-by: Michael Armbrust <michael@databricks.com>
*	Preparing development version 1.2.1-SNAPSHOT	Ubuntu	2014-11-17	4	-4/+4
\|
*	Preparing Spark release v1.2.0-snapshot1	Ubuntu	2014-11-17	4	-4/+4
\|
*	Revert "Preparing Spark release v1.2.0-snapshot0"	Patrick Wendell	2014-11-16	4	-4/+4
\| \| \| \|	This reverts commit bc09875799aa373f4320d38b02618173ffa4c96f.
*	Revert "Preparing development version 1.2.1-SNAPSHOT"	Patrick Wendell	2014-11-16	4	-8/+8
\| \| \| \|	This reverts commit 6c6fd218c83a049c874b8a0ea737333c1899c94a.
*	[SPARK-4410][SQL] Add support for external sort	Michael Armbrust	2014-11-16	4	-6/+59
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Adds a new operator that uses Spark's `ExternalSort` class. It is off by default now, but we might consider making it the default if benchmarks show that it does not regress performance. Author: Michael Armbrust <michael@databricks.com> Closes #3268 from marmbrus/externalSort and squashes the following commits: 48b9726 [Michael Armbrust] comments b98799d [Michael Armbrust] Add test afd7562 [Michael Armbrust] Add support for external sort. (cherry picked from commit 64c6b9bad559c21f25cd9fbe37c8813cdab939f2) Signed-off-by: Reynold Xin <rxin@databricks.com>
*	Preparing development version 1.2.1-SNAPSHOT	Ubuntu	2014-11-17	4	-8/+8
\|
*	Preparing Spark release v1.2.0-snapshot0	Ubuntu	2014-11-17	4	-4/+4
\|
*	Revert "[SPARK-4309][SPARK-4407][SQL] Date type support for Thrift server, ↵	Michael Armbrust	2014-11-16	4	-142/+115
\| \| \| \| \| \| \| \| \| \| \| \| \|	and fixes for complex types" Author: Michael Armbrust <michael@databricks.com> Closes #3292 from marmbrus/revert4309 and squashes the following commits: 808e96e [Michael Armbrust] Revert "[SPARK-4309][SPARK-4407][SQL] Date type support for Thrift server, and fixes for complex types" (cherry picked from commit 45ce3273cb618d14ec4d20c4c95699634b951086) Signed-off-by: Michael Armbrust <michael@databricks.com>
*	[SPARK-4309][SPARK-4407][SQL] Date type support for Thrift server, and fixes ↵	Cheng Lian	2014-11-16	4	-115/+142
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	for complex types SPARK-4407 was detected while working on SPARK-4309. Merged these two into a single PR since 1.2.0 RC is approaching. <!-- Reviewable:start --> [<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/3178) <!-- Reviewable:end --> Author: Cheng Lian <lian@databricks.com> Closes #3178 from liancheng/date-for-thriftserver and squashes the following commits: 6f71d0b [Cheng Lian] Makes toHiveString static 26fa955 [Cheng Lian] Fixes complex type support in Hive 0.13.1 shim a92882a [Cheng Lian] Updates HiveShim for 0.13.1 73f442b [Cheng Lian] Adds Date support for HiveThriftServer2 (Hive 0.12.0) (cherry picked from commit cb6bd83a91d9b4a227dc6467255231869c1820e2) Signed-off-by: Michael Armbrust <michael@databricks.com>
*	[SPARK-4426][SQL][Minor] The symbol of BitwiseOr is wrong, should not be '&'	Kousuke Saruta	2014-11-15	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	The symbol of BitwiseOr is defined as '&' but I think it's wrong. It should be '\|'. Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp> Closes #3284 from sarutak/bitwise-or-symbol-fix and squashes the following commits: aff4be5 [Kousuke Saruta] Fixed symbol of BitwiseOr (cherry picked from commit 84468b2e2031d646dcf035cb18947170ba326ccd) Signed-off-by: Reynold Xin <rxin@databricks.com>
*	Added contains(key) to Metadata	kai	2014-11-14	2	-0/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add contains(key) to org.apache.spark.sql.catalyst.util.Metadata to test the existence of a key. Otherwise, Class Metadata's get methods may throw NoSuchElement exception if the key does not exist. Testcases are added to MetadataSuite as well. Author: kai <kaizeng@eecs.berkeley.edu> Closes #3273 from kai-zeng/metadata-fix and squashes the following commits: 74b3d03 [kai] Added contains(key) to Metadata (cherry picked from commit cbddac23696d89b672dce380cc7360a873e27b3b) Signed-off-by: Reynold Xin <rxin@databricks.com>
*	[SPARK-4412][SQL] Fix Spark's control of Parquet logging.	Jim Carroll	2014-11-14	1	-0/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The Spark ParquetRelation.scala code makes the assumption that the parquet.Log class has already been loaded. If ParquetRelation.enableLogForwarding executes prior to the parquet.Log class being loaded then the code in enableLogForwarding has no affect. ParquetRelation.scala attempts to override the parquet logger but, at least currently (and if your application simply reads a parquet file before it does anything else with Parquet), the parquet.Log class hasn't been loaded yet. Therefore the code in ParquetRelation.enableLogForwarding has no affect. If you look at the code in parquet.Log there's a static initializer that needs to be called prior to enableLogForwarding or whatever enableLogForwarding does gets undone by this static initializer. The "fix" would be to force the static initializer to get called in parquet.Log as part of enableForwardLogging. Author: Jim Carroll <jim@dontcallme.com> Closes #3271 from jimfcarroll/parquet-logging and squashes the following commits: 37bdff7 [Jim Carroll] Fix Spark's control of Parquet logging. (cherry picked from commit 37482ce5a7b875f17d32a5e8c561cc8e9772c9b3) Signed-off-by: Michael Armbrust <michael@databricks.com>
*	[SPARK-4365][SQL] Remove unnecessary filter call on records returned from ↵	Yash Datta	2014-11-14	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	parquet library Since parquet library has been updated , we no longer need to filter the records returned from parquet library for null records , as now the library skips those : from parquet-hadoop/src/main/java/parquet/hadoop/InternalParquetRecordReader.java public boolean nextKeyValue() throws IOException, InterruptedException { boolean recordFound = false; while (!recordFound) { // no more records left if (current >= total) { return false; } try { checkRead(); currentValue = recordReader.read(); current ++; if (recordReader.shouldSkipCurrentRecord()) { // this record is being filtered via the filter2 package if (DEBUG) LOG.debug("skipping record"); continue; } if (currentValue == null) { // only happens with FilteredRecordReader at end of block current = totalCountLoadedSoFar; if (DEBUG) LOG.debug("filtered record reader reached end of block"); continue; } recordFound = true; if (DEBUG) LOG.debug("read value: " + currentValue); } catch (RuntimeException e) { throw new ParquetDecodingException(format("Can not read value at %d in block %d in file %s", current, currentBlock, file), e); } } return true; } Author: Yash Datta <Yash.Datta@guavus.com> Closes #3229 from saucam/remove_filter and squashes the following commits: 8909ae9 [Yash Datta] SPARK-4365: Remove unnecessary filter call on records returned from parquet library (cherry picked from commit 63ca3af66f9680fd12adee82fb4d342caae5cea4) Signed-off-by: Michael Armbrust <michael@databricks.com>