spark - Mirror of Apache Spark

	Commit message (Collapse)	Author	Age	Files	Lines
*	[SPARK-13278][CORE] Launcher fails to start with JDK 9 EA	Claes Redestad	2016-02-14	4	-7/+37
\| \| \| \| \| \| \| \|	See http://openjdk.java.net/jeps/223 for more information about the JDK 9 version string scheme. Author: Claes Redestad <claes.redestad@gmail.com> Closes #11160 from cl4es/master.
*	[SPARK-13300][DOCUMENTATION] Added pygments.rb dependancy	Amit Dev	2016-02-14	1	-8/+13
\| \| \| \| \| \| \| \|	Looks like pygments.rb gem is also required for jekyll build to work. At least on Ubuntu/RHEL I could not do build without this dependency. So added this to steps. Author: Amit Dev <amitdev@gmail.com> Closes #11180 from amitdev/master.
*	[SPARK-13296][SQL] Move UserDefinedFunction into sql.expressions.	Reynold Xin	2016-02-13	15	-217/+320
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This pull request has the following changes: 1. Moved UserDefinedFunction into expressions package. This is more consistent with how we structure the packages for window functions and UDAFs. 2. Moved UserDefinedPythonFunction into execution.python package, so we don't have a random private class in the top level sql package. 3. Move everything in execution/python.scala into the newly created execution.python package. Most of the diffs are just straight copy-paste. Author: Reynold Xin <rxin@databricks.com> Closes #11181 from rxin/SPARK-13296.
*	[SPARK-13172][CORE][SQL] Stop using RichException.getStackTrace it is deprecated	Sean Owen	2016-02-13	7	-12/+14
\| \| \| \| \| \| \| \|	Replace `getStackTraceString` with `Utils.exceptionString` Author: Sean Owen <sowen@cloudera.com> Closes #11182 from srowen/SPARK-13172.
*	Closes #11185	Reynold Xin	2016-02-13	0	-0/+0
\|
*	[SPARK-12363][MLLIB] Remove setRun and fix PowerIterationClustering failed test	Liang-Chi Hsieh	2016-02-13	4	-85/+96
\| \| \| \| \| \| \| \| \| \| \|	JIRA: https://issues.apache.org/jira/browse/SPARK-12363 This issue is pointed by yanboliang. When `setRuns` is removed from PowerIterationClustering, one of the tests will be failed. I found that some `dstAttr`s of the normalized graph are not correct values but 0.0. By setting `TripletFields.All` in `mapTriplets` it can work. Author: Liang-Chi Hsieh <viirya@gmail.com> Author: Xiangrui Meng <meng@databricks.com> Closes #10539 from viirya/fix-poweriter.
*	[SPARK-13142][WEB UI] Problem accessing Web UI /logPage/ on Microsoft Windows	markpavey	2016-02-13	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \|	Due to being on a Windows platform I have been unable to run the tests as described in the "Contributing to Spark" instructions. As the change is only to two lines of code in the Web UI, which I have manually built and tested, I am submitting this pull request anyway. I hope this is OK. Is it worth considering also including this fix in any future 1.5.x releases (if any)? I confirm this is my own original work and license it to the Spark project under its open source license. Author: markpavey <mark.pavey@thefilter.com> Closes #11135 from markpavey/JIRA_SPARK-13142_WindowsWebUILogFix.
*	[SPARK-13293][SQL] generate Expand	Davies Liu	2016-02-12	2	-1/+140
\| \| \| \| \| \| \| \| \| \| \| \|	Expand suffer from create the UnsafeRow from same input multiple times, with codegen, it only need to copy some of the columns. After this, we can see 3X improvements (from 43 seconds to 13 seconds) on a TPCDS query (Q67) that have eight columns in Rollup. Ideally, we could mask some of the columns based on bitmask, I'd leave that in the future, because currently Aggregation (50 ns) is much slower than that just copy the variables (1-2 ns). Author: Davies Liu <davies@databricks.com> Closes #11177 from davies/gen_expand.
*	[SPARK-5095] remove flaky test	Michael Gummelt	2016-02-12	1	-0/+5
\| \| \| \| \| \| \| \|	Overrode the start() method, which was previously starting a thread causing a race condition. I believe this should fix the flaky test. Author: Michael Gummelt <mgummelt@mesosphere.io> Closes #11164 from mgummelt/fix_mesos_tests.
*	[SPARK-5095] Fix style in mesos coarse grained scheduler code	Michael Gummelt	2016-02-12	2	-10/+12
\| \| \| \| \| \| \| \|	andrewor14 This addressed your style comments from #10993 Author: Michael Gummelt <mgummelt@mesosphere.io> Closes #11187 from mgummelt/fix_mesos_style.
*	[SPARK-12630][PYSPARK] [DOC] PySpark classification parameter desc to ↵	vijaykiran	2016-02-12	1	-118/+143
\| \| \| \| \| \| \| \| \| \| \|	consistent format Part of task for [SPARK-11219](https://issues.apache.org/jira/browse/SPARK-11219) to make PySpark MLlib parameter description formatting consistent. This is for the classification module. Author: vijaykiran <mail@vijaykiran.com> Author: Bryan Cutler <cutlerb@gmail.com> Closes #11183 from BryanCutler/pyspark-consistent-param-classification-SPARK-12630.
*	[SPARK-12962] [SQL] [PySpark] PySpark support covar_samp and covar_pop	Yanbo Liang	2016-02-12	1	-6/+35
\| \| \| \| \| \| \| \| \| \|	PySpark support ```covar_samp``` and ```covar_pop```. cc rxin davies marmbrus Author: Yanbo Liang <ybliang8@gmail.com> Closes #10876 from yanboliang/spark-12962.
*	[SPARK-13260][SQL] count(*) does not work with CSV data source	hyukjinkwon	2016-02-12	2	-45/+41
\| \| \| \| \| \| \| \| \| \| \| \| \|	https://issues.apache.org/jira/browse/SPARK-13260 This is a quicky fix for `count(*)`. When the `requiredColumns` is empty, currently it returns `sqlContext.sparkContext.emptyRDD[Row]` which does not have the count. Just like JSON datasource, this PR lets the CSV datasource count the rows but do not parse each set of tokens. Author: hyukjinkwon <gurwls223@gmail.com> Closes #11169 from HyukjinKwon/SPARK-13260.
*	[SPARK-13282][SQL] LogicalPlan toSql should just return a String	Reynold Xin	2016-02-12	6	-156/+141
\| \| \| \| \| \| \| \|	Previously we were using Option[String] and None to indicate the case when Spark fails to generate SQL. It is easier to just use exceptions to propagate error cases, rather than having for comprehension everywhere. I also introduced a "build" function that simplifies string concatenation (i.e. no need to reason about whether we have an extra space or not). Author: Reynold Xin <rxin@databricks.com> Closes #11171 from rxin/SPARK-13282.
*	[SPARK-12705] [SQL] push missing attributes for Sort	Davies Liu	2016-02-12	3	-83/+67
\| \| \| \| \| \| \| \|	The current implementation of ResolveSortReferences can only push one missing attributes into it's child, it failed to analyze TPCDS Q98, because of there are two missing attributes in that (one from Window, another from Aggregate). Author: Davies Liu <davies@databricks.com> Closes #11153 from davies/resolve_sort.
*	[SPARK-13154][PYTHON] Add linting for pydocs	Holden Karau	2016-02-12	2	-0/+27
\| \| \| \| \| \| \| \| \| \|	We should have lint rules using sphinx to automatically catch the pydoc issues that are sometimes introduced. Right now ./dev/lint-python will skip building the docs if sphinx isn't present - but it might make sense to fail hard - just a matter of if we want to insist all PySpark developers have sphinx present. Author: Holden Karau <holden@us.ibm.com> Closes #11109 from holdenk/SPARK-13154-add-pydoc-lint-for-docs.
*	[SPARK-12974][ML][PYSPARK] Add Python API for spark.ml bisecting k-means	Yanbo Liang	2016-02-12	1	-1/+124
\| \| \| \| \| \| \| \|	Add Python API for spark.ml bisecting k-means. Author: Yanbo Liang <ybliang8@gmail.com> Closes #10889 from yanboliang/spark-12974.
*	[SPARK-6166] Limit number of in flight outbound requests	Sanket	2016-02-11	5	-15/+49
\| \| \| \| \| \| \| \| \| \| \|	This JIRA is related to https://github.com/apache/spark/pull/5852 Had to do some minor rework and test to make sure it works with current version of spark. Author: Sanket <schintap@untilservice-lm> Closes #10838 from redsanket/limit-outbound-connections.
*	[SPARK-7889][WEBUI] HistoryServer updates UI for incomplete apps	Steve Loughran	2016-02-11	10	-74/+1654
\| \| \| \| \| \| \| \| \| \| \|	When the HistoryServer is showing an incomplete app, it needs to check if there is a newer version of the app available. It does this by checking if a version of the app has been loaded with a larger filesize. If so, it detaches the current UI, attaches the new one, and redirects back to the same URL to show the new UI. https://issues.apache.org/jira/browse/SPARK-7889 Author: Steve Loughran <stevel@hortonworks.com> Author: Imran Rashid <irashid@cloudera.com> Closes #11118 from squito/SPARK-7889-alternate.
*	[SPARK-13153][PYSPARK] ML persistence failed when handle no default value ↵	Tommy YU	2016-02-11	1	-2/+3
\| \| \| \| \| \| \| \| \| \| \| \|	parameter Fix this defect by check default value exist or not. yanboliang Please help to review. Author: Tommy YU <tummyyu@163.com> Closes #11043 from Wenpei/spark-13153-handle-param-withnodefaultvalue.
*	[SPARK-12746][ML] ArrayType(_, true) should also accept ArrayType(_, false)	Earthson Lu	2016-02-11	1	-1/+2
\| \| \| \| \| \| \| \|	https://issues.apache.org/jira/browse/SPARK-12746 Author: Earthson Lu <Earthson.Lu@gmail.com> Closes #10697 from Earthson/SPARK-12746.
*	[SPARK-13277][BUILD] Follow-up ANTLR warnings are treated as build errors	Herman van Hovell	2016-02-11	1	-3/+6
\| \| \| \| \| \| \| \| \| \|	It is possible to create faulty but legal ANTLR grammars. ANTLR will produce warnings but also a valid compileable parser. This PR makes sure we treat such warnings as build errors. cc rxin / viirya Author: Herman van Hovell <hvanhovell@questtec.nl> Closes #11174 from hvanhovell/ANTLR-warnings-as-errors.
*	[SPARK-12915][SQL] add SQL metrics of numOutputRows for whole stage codegen	Davies Liu	2016-02-11	9	-31/+71
\| \| \| \| \| \| \| \| \| \|	This PR add SQL metrics (numOutputRows) for generated operators (same as non-generated), the cost is about 0.2 nano seconds per row. <img width="806" alt="gen metrics" src="https://cloud.githubusercontent.com/assets/40902/12994694/47f5881e-d0d7-11e5-9d47-78229f559ab0.png"> Author: Davies Liu <davies@databricks.com> Closes #11170 from davies/gen_metric.
*	[SPARK-12765][ML][COUNTVECTORIZER] fix CountVectorizer.transform's lost ↵	Liu Xiang	2016-02-11	1	-0/+1
\| \| \| \| \| \| \| \| \| \|	transformSchema https://issues.apache.org/jira/browse/SPARK-12765 Author: Liu Xiang <lxmtlab@gmail.com> Closes #10720 from sloth2012/sloth.
*	[SPARK-13047][PYSPARK][ML] Pyspark Params.hasParam should not throw an error	sethah	2016-02-11	2	-4/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Pyspark Params class has a method `hasParam(paramName)` which returns `True` if the class has a parameter by that name, but throws an `AttributeError` otherwise. There is not currently a way of getting a Boolean to indicate if a class has a parameter. With Spark 2.0 we could modify the existing behavior of `hasParam` or add an additional method with this functionality. In Python: ```python from pyspark.ml.classification import NaiveBayes nb = NaiveBayes() print nb.hasParam("smoothing") print nb.hasParam("notAParam") ``` produces: > True > AttributeError: 'NaiveBayes' object has no attribute 'notAParam' However, in Scala: ```scala import org.apache.spark.ml.classification.NaiveBayes val nb = new NaiveBayes() nb.hasParam("smoothing") nb.hasParam("notAParam") ``` produces: > true > false cc holdenk Author: sethah <seth.hendrickson16@gmail.com> Closes #10962 from sethah/SPARK-13047.
*	[SPARK-13035][ML][PYSPARK] PySpark ml.clustering support export/import	Yanbo Liang	2016-02-11	1	-4/+25
\| \| \| \| \| \| \| \|	PySpark ml.clustering support export/import. Author: Yanbo Liang <ybliang8@gmail.com> Closes #10999 from yanboliang/spark-13035.
*	[MINOR][ML][PYSPARK] Cleanup test cases of clustering.py	Yanbo Liang	2016-02-11	2	-15/+9
\| \| \| \| \| \| \| \| \|	Test cases should be removed from annotation of ```setXXX``` function, otherwise it will be parts of [Python API docs](https://spark.apache.org/docs/latest/api/python/pyspark.ml.html#pyspark.ml.clustering.KMeans.setInitMode). cc mengxr jkbradley Author: Yanbo Liang <ybliang8@gmail.com> Closes #10975 from yanboliang/clustering-cleanup.
*	[SPARK-13037][ML][PYSPARK] PySpark ml.recommendation support export/import	Kai Jiang	2016-02-11	1	-4/+27
\| \| \| \| \| \| \| \|	PySpark ml.recommendation support export/import. Author: Kai Jiang <jiangkai@gmail.com> Closes #11044 from vectorijk/spark-13037.
*	[SPARK-11515][ML] QuantileDiscretizer should take random seed	Yu ISHIKAWA	2016-02-11	2	-6/+11
\| \| \| \| \| \| \| \|	cc jkbradley Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com> Closes #9535 from yu-iskw/SPARK-11515.
*	[SPARK-13265][ML] Refactoring of basic ML import/export for other file ↵	Yu ISHIKAWA	2016-02-11	1	-6/+7
\| \| \| \| \| \| \| \| \| \|	system besides HDFS jkbradley I tried to improve the function to export a model. When I tried to export a model to S3 under Spark 1.6, we couldn't do that. So, it should offer S3 besides HDFS. Can you review it when you have time? Thanks! Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com> Closes #11151 from yu-iskw/SPARK-13265.
*	Revert "[SPARK-13279] Remove O(n^2) operation from scheduler."	Reynold Xin	2016-02-11	1	-9/+6
\| \| \| \|	This reverts commit 50fa6fd1b365d5db7e2b2c59624a365cef0d1696.
*	[SPARK-13279] Remove O(n^2) operation from scheduler.	Sital Kedia	2016-02-11	1	-6/+9
\| \| \| \| \| \| \| \| \| \| \|	This commit removes an unnecessary duplicate check in addPendingTask that meant that scheduling a task set took time proportional to (# tasks)^2. Author: Sital Kedia <skedia@fb.com> Closes #11167 from sitalkedia/fix_stuck_driver and squashes the following commits: 3fe1af8 [Sital Kedia] [SPARK-13279] Remove unnecessary duplicate check in addPendingTask function
*	[SPARK-12982][SQL] Add table name validation in temp table registration	jayadevanmurali	2016-02-11	2	-1/+13
\| \| \| \| \| \| \| \|	Add the table name validation at the temp table creation Author: jayadevanmurali <jayadevan.m@tcs.com> Closes #11051 from jayadevanmurali/branch-0.2-SPARK-12982.
*	[SPARK-13277][SQL] ANTLR ignores other rule using the USING keyword	Liang-Chi Hsieh	2016-02-11	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	JIRA: https://issues.apache.org/jira/browse/SPARK-13277 There is an ANTLR warning during compilation: warning(200): org/apache/spark/sql/catalyst/parser/SparkSqlParser.g:938:7: Decision can match input such as "KW_USING Identifier" using multiple alternatives: 2, 3 As a result, alternative(s) 3 were disabled for that input This patch is to fix it. Author: Liang-Chi Hsieh <viirya@gmail.com> Closes #11168 from viirya/fix-parser-using.
*	[STREAMING][TEST] Fix flaky streaming.FailureSuite	Tathagata Das	2016-02-11	2	-2/+6
\| \| \| \| \| \| \| \| \| \|	Under some corner cases, the test suite failed to shutdown the SparkContext causing cascaded failures. This fix does two things - Makes sure no SparkContext is active after every test - Makes sure StreamingContext is always shutdown (prevents leaking of StreamingContexts as well, just in case) Author: Tathagata Das <tathagata.das1565@gmail.com> Closes #11166 from tdas/fix-failuresuite.
*	[SPARK-13124][WEB UI] Fixed CSS and JS issues caused by addition of JQuery ↵	Alex Bozarth	2016-02-11	3	-14/+20
\| \| \| \| \| \| \| \| \| \|	DataTables Made sure the old tables continue to use the old css and the new DataTables use the new css. Also fixed it so the Safari Web Inspector doesn't throw errors when on the new DataTables pages. Author: Alex Bozarth <ajbozart@us.ibm.com> Closes #11038 from ajbozarth/spark13124.
*	[SPARK-13074][CORE] Add JavaSparkContext. getPersistentRDDs method	Junyang	2016-02-11	2	-0/+22
\| \| \| \| \| \| \| \| \| \|	The "getPersistentRDDs()" is a useful API of SparkContext to get cached RDDs. However, the JavaSparkContext does not have this API. Add a simple getPersistentRDDs() to get java.util.Map<Integer, JavaRDD> for Java users. Author: Junyang <fly.shenjy@gmail.com> Closes #10978 from flyjy/master.
*	[SPARK-13264][DOC] Removed multi-byte characters in spark-env.sh.template	Sasaki Toru	2016-02-11	5	-5/+5
\| \| \| \| \| \| \| \|	In spark-env.sh.template, there are multi-byte characters, this PR will remove it. Author: Sasaki Toru <sasakitoa@nttdata.co.jp> Closes #11149 from sasakitoa/remove_multibyte_in_sparkenv.
*	[SPARK-13270][SQL] Remove extra new lines in whole stage codegen and include ↵	Nong Li	2016-02-10	2	-2/+20
\| \| \| \| \| \| \| \|	pipeline plan in comments. Author: Nong Li <nong@databricks.com> Closes #11155 from nongli/spark-13270.
*	[SPARK-13235][SQL] Removed an Extra Distinct from the Plan when Using Union ↵	gatorsmile	2016-02-11	2	-29/+32
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	in SQL Currently, the parser added two `Distinct` operators in the plan if we are using `Union` or `Union Distinct` in the SQL. This PR is to remove the extra `Distinct` from the plan. For example, before the fix, the following query has a plan with two `Distinct` ```scala sql("select * from t0 union select * from t0").explain(true) ``` ``` == Parsed Logical Plan == 'Project [unresolvedalias(,None)] +- 'Subquery u_2 +- 'Distinct +- 'Project [unresolvedalias(,None)] +- 'Subquery u_1 +- 'Distinct +- 'Union :- 'Project [unresolvedalias(,None)] : +- 'UnresolvedRelation `t0`, None +- 'Project [unresolvedalias(,None)] +- 'UnresolvedRelation `t0`, None == Analyzed Logical Plan == id: bigint Project [id#16L] +- Subquery u_2 +- Distinct +- Project [id#16L] +- Subquery u_1 +- Distinct +- Union :- Project [id#16L] : +- Subquery t0 : +- Relation[id#16L] ParquetRelation +- Project [id#16L] +- Subquery t0 +- Relation[id#16L] ParquetRelation == Optimized Logical Plan == Aggregate [id#16L], [id#16L] +- Aggregate [id#16L], [id#16L] +- Union :- Project [id#16L] : +- Relation[id#16L] ParquetRelation +- Project [id#16L] +- Relation[id#16L] ParquetRelation ``` After the fix, the plan is changed without the extra `Distinct` as follows: ``` == Parsed Logical Plan == 'Project [unresolvedalias(,None)] +- 'Subquery u_1 +- 'Distinct +- 'Union :- 'Project [unresolvedalias(,None)] : +- 'UnresolvedRelation `t0`, None +- 'Project [unresolvedalias(*,None)] +- 'UnresolvedRelation `t0`, None == Analyzed Logical Plan == id: bigint Project [id#17L] +- Subquery u_1 +- Distinct +- Union :- Project [id#16L] : +- Subquery t0 : +- Relation[id#16L] ParquetRelation +- Project [id#16L] +- Subquery t0 +- Relation[id#16L] ParquetRelation == Optimized Logical Plan == Aggregate [id#17L], [id#17L] +- Union :- Project [id#16L] : +- Relation[id#16L] ParquetRelation +- Project [id#16L] +- Relation[id#16L] ParquetRelation ``` Author: gatorsmile <gatorsmile@gmail.com> Closes #11120 from gatorsmile/unionDistinct.
*	[SPARK-13276] Catch bad characters at the end of a Table ↵	Herman van Hovell	2016-02-11	3	-4/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Identifier/Expression string The parser currently parses the following strings without a hitch: * Table Identifier: * `a.b.c` should fail, but results in the following table identifier `a.b` * `table!#` should fail, but results in the following table identifier `table` * Expression * `1+2 r+e` should fail, but results in the following expression `1 + 2` This PR fixes this by adding terminated rules for both expression parsing and table identifier parsing. cc cloud-fan (we discussed this in https://github.com/apache/spark/pull/10649) jayadevanmurali (this causes your PR https://github.com/apache/spark/pull/11051 to fail) Author: Herman van Hovell <hvanhovell@questtec.nl> Closes #11159 from hvanhovell/SPARK-13276.
*	[SPARK-13234] [SQL] remove duplicated SQL metrics	Davies Liu	2016-02-10	24	-208/+80
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	For lots of SQL operators, we have metrics for both of input and output, the number of input rows should be exactly the number of output rows of child, we could only have metrics for output rows. After we improved the performance using whole stage codegen, the overhead of SQL metrics are not trivial anymore, we should avoid that if it's not necessary. This PR remove all the SQL metrics for number of input rows, add SQL metric of number of output rows for all LeafNode. All remove the SQL metrics from those operators that have the same number of rows from input and output (for example, Projection, we may don't need that). The new SQL UI will looks like: ![metrics](https://cloud.githubusercontent.com/assets/40902/12965227/63614e5e-d009-11e5-88b3-84fea04f9c20.png) Author: Davies Liu <davies@databricks.com> Closes #11163 from davies/remove_metrics.
*	[SPARK-12706] [SQL] grouping() and grouping_id()	Davies Liu	2016-02-10	17	-63/+309
\| \| \| \| \| \| \| \| \| \| \| \|	Grouping() returns a column is aggregated or not, grouping_id() returns the aggregation levels. grouping()/grouping_id() could be used with window function, but does not work in having/sort clause, will be fixed by another PR. The GROUPING__ID/grouping_id() in Hive is wrong (according to docs), we also did it wrongly, this PR change that to match the behavior in most databases (also the docs of Hive). Author: Davies Liu <davies@databricks.com> Closes #10677 from davies/grouping.
*	[SPARK-13205][SQL] SQL Generation Support for Self Join	gatorsmile	2016-02-11	3	-2/+22
\| \| \| \| \| \| \| \| \| \| \| \|	This PR addresses two issues: - Self join does not work in SQL Generation - When creating new instances for `LogicalRelation`, `metastoreTableIdentifier` is lost. liancheng Could you please review the code changes? Thank you! Author: gatorsmile <gatorsmile@gmail.com> Closes #11084 from gatorsmile/selfJoinInSQLGen.
*	[SPARK-12725][SQL] Resolving Name Conflicts in SQL Generation and Name ↵	gatorsmile	2016-02-11	9	-36/+78
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Ambiguity Caused by Internally Generated Expressions Some analysis rules generate aliases or auxiliary attribute references with the same name but different expression IDs. For example, `ResolveAggregateFunctions` introduces `havingCondition` and `aggOrder`, and `DistinctAggregationRewriter` introduces `gid`. This is OK for normal query execution since these attribute references get expression IDs. However, it's troublesome when converting resolved query plans back to SQL query strings since expression IDs are erased. Here's an example Spark 1.6.0 snippet for illustration: ```scala sqlContext.range(10).select('id as 'a, 'id as 'b).registerTempTable("t") sqlContext.sql("SELECT SUM(a) FROM t GROUP BY a, b ORDER BY COUNT(a), COUNT(b)").explain(true) ``` The above code produces the following resolved plan: ``` == Analyzed Logical Plan == _c0: bigint Project [_c0#101L] +- Sort [aggOrder#102L ASC,aggOrder#103L ASC], true +- Aggregate [a#47L,b#48L], [(sum(a#47L),mode=Complete,isDistinct=false) AS _c0#101L,(count(a#47L),mode=Complete,isDistinct=false) AS aggOrder#102L,(count(b#48L),mode=Complete,isDistinct=false) AS aggOrder#103L] +- Subquery t +- Project [id#46L AS a#47L,id#46L AS b#48L] +- LogicalRDD [id#46L], MapPartitionsRDD[44] at range at <console>:26 ``` Here we can see that both aggregate expressions in `ORDER BY` are extracted into an `Aggregate` operator, and both of them are named `aggOrder` with different expression IDs. The solution is to automatically add the expression IDs into the attribute name for the Alias and AttributeReferences that are generated by Analyzer in SQL Generation. In this PR, it also resolves another issue. Users could use the same name as the internally generated names. The duplicate names should not cause name ambiguity. When resolving the column, Catalyst should not pick the column that is internally generated. Could you review the solution? marmbrus liancheng I did not set the newly added flag for all the alias and attribute reference generated by Analyzers. Please let me know if I should do it? Thank you! Author: gatorsmile <gatorsmile@gmail.com> Closes #11050 from gatorsmile/namingConflicts.
*	[SPARK-13274] Fix Aggregator Links on GroupedDataset Scala API	raela	2016-02-10	1	-4/+8
\| \| \| \| \| \| \| \|	Update Aggregator links to point to #org.apache.spark.sql.expressions.Aggregator Author: raela <raela@databricks.com> Closes #11158 from raelawang/master.
*	[SPARK-13146][SQL] Management API for continuous queries	Tathagata Das	2016-02-10	17	-109/+1680
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	### Management API for Continuous Queries API for getting status of each query - Whether active or not - Unique name of each query - Status of the sources and sinks - Exceptions API for managing each query - Immediately stop an active query - Waiting for a query to be terminated, correctly or with error API for managing multiple queries - Listing all active queries - Getting an active query by name - Waiting for any one of the active queries to be terminated API for listening to query life cycle events - ContinuousQueryListener API for query start, progress and termination events. Author: Tathagata Das <tathagata.das1565@gmail.com> Closes #11030 from tdas/streaming-df-management-api.
*	[SPARK-12414][CORE] Remove closure serializer	Sean Owen	2016-02-10	4	-14/+3
\| \| \| \| \| \| \| \| \| \|	Remove spark.closure.serializer option and use JavaSerializer always CC andrewor14 rxin I see there's a discussion in the JIRA but just thought I'd offer this for a look at what the change would be. Author: Sean Owen <sowen@cloudera.com> Closes #11150 from srowen/SPARK-12414.
*	[SPARK-13057][SQL] Add benchmark codes and the performance results for ↵	Takeshi YAMAMURO	2016-02-10	1	-0/+240
\| \| \| \| \| \| \| \| \| \|	implemented compression schemes for InMemoryRelation This pr adds benchmark codes for in-memory cache compression to make future developments and discussions more smooth. Author: Takeshi YAMAMURO <linguin.m.s@gmail.com> Closes #10965 from maropu/ImproveColumnarCache.
*	[HOTFIX] Fix Scala 2.10 build break in TakeOrderedAndProjectSuite.	Josh Rosen	2016-02-10	1	-2/+2
\|