spark - Mirror of Apache Spark

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	[SPARK-12862][SPARKR] Jenkins does not run R tests	felixcheung	2016-01-17	2	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \|	Slight correction: I'm leaving sparkR as-is (ie. R file not supported) and fixed only run-tests.sh as shivaram described. I also assume we are going to cover all doc changes in https://issues.apache.org/jira/browse/SPARK-12846 instead of here. rxin shivaram zjffdu Author: felixcheung <felixcheung_m@hotmail.com> Closes #10792 from felixcheung/sparkRcmd.
*	[SPARK-12860] [SQL] speed up safe projection for primitive types	Wenchen Fan	2016-01-17	1	-2/+3
\| \| \| \| \| \| \| \| \| \|	The idea is simple, use `SpecificMutableRow` instead of `GenericMutableRow` as result row for safe projection. A simple benchmark shows about 1.5x speed up for primitive types, code: https://gist.github.com/cloud-fan/fa77713ccebf0823b2ab#file-safeprojectionbenchmark-scala Author: Wenchen Fan <wenchen@databricks.com> Closes #10790 from cloud-fan/safe-projection.
*	[SPARK-12796] [SQL] Whole stage codegen	Davies Liu	2016-01-16	37	-107/+694
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is the initial work for whole stage codegen, it support Projection/Filter/Range, we will continue work on this to support more physical operators. A micro benchmark show that a query with range, filter and projection could be 3X faster then before. It's turned on by default. For a tree that have at least two chained plans, a WholeStageCodegen will be inserted into it, for example, the following plan ``` Limit 10 +- Project [(id#5L + 1) AS (id + 1)#6L] +- Filter ((id#5L & 1) = 1) +- Range 0, 1, 4, 10, [id#5L] ``` will be translated into ``` Limit 10 +- WholeStageCodegen +- Project [(id#1L + 1) AS (id + 1)#2L] +- Filter ((id#1L & 1) = 1) +- Range 0, 1, 4, 10, [id#1L] ``` Here is the call graph to generate Java source for A and B (A support codegen, but B does not): ``` * WholeStageCodegen Plan A FakeInput Plan B * ========================================================================= * * -> execute() * \| * doExecute() --------> produce() * \| * doProduce() -------> produce() * \| * doProduce() ---> execute() * \| * consume() * doConsume() ------------\| * \| * doConsume() <----- consume() ``` A SparkPlan that support codegen need to implement doProduce() and doConsume(): ``` def doProduce(ctx: CodegenContext): (RDD[InternalRow], String) def doConsume(ctx: CodegenContext, child: SparkPlan, input: Seq[ExprCode]): String ``` Author: Davies Liu <davies@databricks.com> Closes #10735 from davies/whole2.
*	[SPARK-12722][DOCS] Fixed typo in Pipeline example	Jeff Lam	2016-01-16	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	http://spark.apache.org/docs/latest/ml-guide.html#example-pipeline ``` val sameModel = Pipeline.load("/tmp/spark-logistic-regression-model") ``` should be ``` val sameModel = PipelineModel.load("/tmp/spark-logistic-regression-model") ``` cc: jkbradley Author: Jeff Lam <sha0lin@alumni.carnegiemellon.edu> Closes #10769 from Agent007/SPARK-12722.
*	[SPARK-12856] [SQL] speed up hashCode of unsafe array	Wenchen Fan	2016-01-16	1	-5/+2
\| \| \| \| \| \| \| \| \| \|	We iterate the bytes to calculate hashCode before, but now we have `Murmur3_x86_32.hashUnsafeBytes` that don't require the bytes to be word algned, we should use that instead. A simple benchmark shows it's about 3 X faster, benchmark code: https://gist.github.com/cloud-fan/fa77713ccebf0823b2ab#file-arrayhashbenchmark-scala Author: Wenchen Fan <wenchen@databricks.com> Closes #10784 from cloud-fan/array-hashcode.
*	[SPARK-12840] [SQL] Support passing arbitrary objects (not just expressions) ↵	Davies Liu	2016-01-15	11	-49/+48
\| \| \| \| \| \| \| \| \| \|	into code generated classes This is a refactor to support codegen for aggregation and broadcast join. Author: Davies Liu <davies@databricks.com> Closes #10777 from davies/rename2.
*	[SPARK-12644][SQL] Update parquet reader to be vectorized.	Nong Li	2016-01-15	12	-56/+625
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This inlines a few of the Parquet decoders and adds vectorized APIs to support decoding in batch. There are a few particulars in the Parquet encodings that make this much more efficient. In particular, RLE encodings are very well suited for batch decoding. The Parquet 2.0 encodings are also very suited for this. This is a work in progress and does not affect the current execution. In subsequent patches, we will support more encodings and types before enabling this. Simple benchmarks indicate this can decode single ints about > 3x faster. Author: Nong Li <nong@databricks.com> Author: Nong <nongli@gmail.com> Closes #10593 from nongli/spark-12644.
*	[SPARK-12649][SQL] support reading bucketed table	Wenchen Fan	2016-01-15	18	-45/+314
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This PR adds the support to read bucketed tables, and correctly populate `outputPartitioning`, so that we can avoid shuffle for some cases. TODO(follow-up PRs): * bucket pruning * avoid shuffle for bucketed table join when use any super-set of the bucketing key. (we should re-visit it after https://issues.apache.org/jira/browse/SPARK-12704 is fixed) * recognize hive bucketed table Author: Wenchen Fan <wenchen@databricks.com> Closes #10604 from cloud-fan/bucket-read.
*	[SPARK-12842][TEST-HADOOP2.7] Add Hadoop 2.7 build profile	Josh Rosen	2016-01-15	7	-2/+206
\| \| \| \| \| \| \| \| \| \|	This patch adds a Hadoop 2.7 build profile in order to let us automate tests against that version. /cc rxin srowen Author: Josh Rosen <joshrosen@databricks.com> Closes #10775 from JoshRosen/add-hadoop-2.7-profile.
*	[SPARK-12833][HOT-FIX] Reset the locale after we set it.	Yin Huai	2016-01-15	1	-4/+9
\| \| \| \| \| \|	Author: Yin Huai <yhuai@databricks.com> Closes #10778 from yhuai/resetLocale.
*	[SPARK-11925][ML][PYSPARK] Add PySpark missing methods for ml.feature during ↵	Yanbo Liang	2016-01-15	1	-10/+62
\| \| \| \| \| \| \| \| \| \| \| \| \|	Spark 1.6 QA Add PySpark missing methods and params for ml.feature: * ```RegexTokenizer``` should support setting ```toLowercase```. * ```MinMaxScalerModel``` should support output ```originalMin``` and ```originalMax```. * ```PCAModel``` should support output ```pc```. Author: Yanbo Liang <ybliang8@gmail.com> Closes #9908 from yanboliang/spark-11925.
*	[SPARK-12575][SQL] Grammar parity with existing SQL parser	Herman van Hovell	2016-01-15	33	-972/+286
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In this PR the new CatalystQl parser stack reaches grammar parity with the old Parser-Combinator based SQL Parser. This PR also replaces all uses of the old Parser, and removes it from the code base. Although the existing Hive and SQL parser dialects were mostly the same, some kinks had to be worked out: - The SQL Parser allowed syntax like ```APPROXIMATE(0.01) COUNT(DISTINCT a)```. In order to make this work we needed to hardcode approximate operators in the parser, or we would have to create an approximate expression. ```APPROXIMATE_COUNT_DISTINCT(a, 0.01)``` would also do the job and is much easier to maintain. So, this PR removes this keyword. - The old SQL Parser supports ```LIMIT``` clauses in nested queries. This is not supported anymore. See https://github.com/apache/spark/pull/10689 for the rationale for this. - Hive has a charset name char set literal combination it supports, for instance the following expression ```_ISO-8859-1 0x4341464562616265``` would yield this string: ```CAFEbabe```. Hive will only allow charset names to start with an underscore. This is quite annoying in spark because as soon as you use a tuple names will start with an underscore. In this PR we remove this feature from the parser. It would be quite easy to implement such a feature as an Expression later on. - Hive and the SQL Parser treat decimal literals differently. Hive will turn any decimal into a ```Double``` whereas the SQL Parser would convert a non-scientific decimal into a ```BigDecimal```, and would turn a scientific decimal into a Double. We follow Hive's behavior here. The new parser supports a big decimal literal, for instance: ```81923801.42BD```, which can be used when a big decimal is needed. cc rxin viirya marmbrus yhuai cloud-fan Author: Herman van Hovell <hvanhovell@questtec.nl> Closes #10745 from hvanhovell/SPARK-12575-2.
*	[SQL][MINOR] BoundReference do not need to be NamedExpression	Wenchen Fan	2016-01-15	1	-11/+1
\| \| \| \| \| \| \| \|	We made it a `NamedExpression` to workaroud some hacky cases long time ago, and now seems it's safe to remove it. Author: Wenchen Fan <wenchen@databricks.com> Closes #10765 from cloud-fan/minor.
*	[SPARK-12716][WEB UI] Add a TOTALS row to the Executors Web UI	Alex Bozarth	2016-01-15	1	-10/+64
\| \| \| \| \| \| \| \| \| \| \|	Added a Totals table to the top of the page to display the totals of each applicable column in the executors table. Old Description: ~~Created a TOTALS row containing the totals of each column in the executors UI. By default the TOTALS row appears at the top of the table. When a column is sorted the TOTALS row will always sort to either the top or bottom of the table.~~ Author: Alex Bozarth <ajbozart@us.ibm.com> Closes #10668 from ajbozarth/spark12716.
*	Fix typo	Julien Baley	2016-01-15	1	-3/+3
\| \| \| \| \| \| \| \|	disvoered => discovered Author: Julien Baley <julien.baley@gmail.com> Closes #10773 from julienbaley/patch-1.
*	[SPARK-12833][HOT-FIX] Fix scala 2.11 compilation.	Yin Huai	2016-01-15	1	-3/+3
\| \| \| \| \| \| \| \|	Seems https://github.com/apache/spark/commit/5f83c6991c95616ecbc2878f8860c69b2826f56c breaks scala 2.11 compilation. Author: Yin Huai <yhuai@databricks.com> Closes #10774 from yhuai/fixScala211Compile.
*	[SPARK-12667] Remove block manager's internal "external block store" API	Reynold Xin	2016-01-15	34	-1212/+139
\| \| \| \| \| \| \| \| \| \|	This pull request removes the external block store API. This is rarely used, and the file system interface is actually a better, more standard way to interact with external storage systems. There are some other things to remove also, as pointed out by JoshRosen. We will do those as follow-up pull requests. Author: Reynold Xin <rxin@databricks.com> Closes #10752 from rxin/remove-offheap.
*	[SPARK-12833][SQL] Initial import of spark-csv	Hossein	2016-01-15	27	-8/+1653
\| \| \| \| \| \| \| \| \| \| \|	CSV is the most common data format in the "small data" world. It is often the first format people want to try when they see Spark on a single node. Having to rely on a 3rd party component for this leads to poor user experience for new users. This PR merges the popular spark-csv data source package (https://github.com/databricks/spark-csv) with SparkSQL. This is a first PR to bring the functionality to spark 2.0 master. We will complete items outlines in the design document (see JIRA attachment) in follow up pull requests. Author: Hossein <hossein@databricks.com> Author: Reynold Xin <rxin@databricks.com> Closes #10766 from rxin/csv.
*	[MINOR] [SQL] GeneratedExpressionCode -> ExprCode	Davies Liu	2016-01-15	32	-249/+249
\| \| \| \| \| \| \| \|	GeneratedExpressionCode is too long Author: Davies Liu <davies@databricks.com> Closes #10767 from davies/renaming.
*	[SPARK-11031][SPARKR] Method str() on a DataFrame	Oscar D. Lara Yejas	2016-01-15	5	-22/+140
\| \| \| \| \| \| \| \| \|	Author: Oscar D. Lara Yejas <odlaraye@oscars-mbp.usca.ibm.com> Author: Oscar D. Lara Yejas <olarayej@mail.usf.edu> Author: Oscar D. Lara Yejas <oscar.lara.yejas@us.ibm.com> Author: Oscar D. Lara Yejas <odlaraye@oscars-mbp.attlocal.net> Closes #9613 from olarayej/SPARK-11031.
*	[SPARK-2930] clarify docs on using webhdfs with spark.yarn.access.nam…	Tom Graves	2016-01-15	1	-4/+4
\| \| \| \| \| \| \| \|	…enodes Author: Tom Graves <tgraves@yahoo-inc.com> Closes #10699 from tgravescs/SPARK-2930.
*	[SPARK-12655][GRAPHX] GraphX does not unpersist RDDs	Jason Lee	2016-01-15	3	-2/+20
\| \| \| \| \| \| \| \| \| \|	Some VertexRDD and EdgeRDD are created during the intermediate step of g.connectedComponents() but unnecessarily left cached after the method is done. The fix is to unpersist these RDDs once they are no longer in use. A test case is added to confirm the fix for the reported bug. Author: Jason Lee <cjlee@us.ibm.com> Closes #10713 from jasoncl/SPARK-12655.
*	[SPARK-12830] Java style: disallow trailing whitespaces.	Reynold Xin	2016-01-14	8	-13/+19
\| \| \| \| \| \|	Author: Reynold Xin <rxin@databricks.com> Closes #10764 from rxin/SPARK-12830.
*	[SPARK-12829] Turn Java style checker on	Reynold Xin	2016-01-14	1	-2/+1
\| \| \| \| \| \| \| \|	It was previously turned off because there was a problem with a pull request. We should turn it on now. Author: Reynold Xin <rxin@databricks.com> Closes #10763 from rxin/SPARK-12829.
*	[SPARK-12708][UI] Sorting task error in Stages Page when yarn mode.	Koyo Yoshida	2016-01-15	6	-18/+46
\| \| \| \| \| \| \| \| \| \| \| \| \|	If sort column contains slash(e.g. "Executor ID / Host") when yarn mode,sort fail with following message. ![spark-12708](https://cloud.githubusercontent.com/assets/6679275/12193320/80814f8c-b62a-11e5-9914-7bf3907029df.png) Ｉt's similar to SPARK-4313 . Author: root <root@R520T1.(none)> Author: Koyo Yoshida <koyo0615@gmail.com> Closes #10663 from yoshidakuy/SPARK-12708.
*	[SPARK-12813][SQL] Eliminate serialization for back to back operations	Michael Armbrust	2016-01-14	17	-274/+518
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The goal of this PR is to eliminate unnecessary translations when there are back-to-back `MapPartitions` operations. In order to achieve this I also made the following simplifications: - Operators no longer have hold encoders, instead they have only the expressions that they need. The benefits here are twofold: the expressions are visible to transformations so go through the normal resolution/binding process. now that they are visible we can change them on a case by case basis. - Operators no longer have type parameters. Since the engine is responsible for its own type checking, having the types visible to the complier was an unnecessary complication. We still leverage the scala compiler in the companion factory when constructing a new operator, but after this the types are discarded. Deferred to a follow up PR: - Remove as much of the resolution/binding from Dataset/GroupedDataset as possible. We should still eagerly check resolution and throw an error though in the case of mismatches for an `as` operation. - Eliminate serializations in more cases by adding more cases to `EliminateSerialization` Author: Michael Armbrust <michael@databricks.com> Closes #10747 from marmbrus/encoderExpressions.
*	[SPARK-12174] Speed up BlockManagerSuite getRemoteBytes() test	Josh Rosen	2016-01-14	1	-41/+30
\| \| \| \| \| \| \| \| \| \|	This patch significantly speeds up the BlockManagerSuite's "SPARK-9591: getRemoteBytes from another location when Exception throw" test, reducing the test time from 45s to ~250ms. The key change was to set `spark.shuffle.io.maxRetries` to 0 (the code previously set `spark.network.timeout` to `2s`, but this didn't make a difference because the slowdown was not due to this timeout). Along the way, I also cleaned up the way that we handle SparkConf in BlockManagerSuite: previously, each test would mutate a shared SparkConf instance, while now each test gets a fresh SparkConf. Author: Josh Rosen <joshrosen@databricks.com> Closes #10759 from JoshRosen/SPARK-12174.
*	[SPARK-12821][BUILD] Style checker should run when some configuration files ↵	Kousuke Saruta	2016-01-14	1	-2/+7
\| \| \| \| \| \| \| \| \| \|	for style are modified but any source files are not. When running the `run-tests` script, style checkers run only when any source files are modified but they should run when configuration files related to style are modified. Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp> Closes #10754 from sarutak/SPARK-12821.
*	[SPARK-12771][SQL] Simplify CaseWhen code generation	Reynold Xin	2016-01-14	1	-25/+35
\| \| \| \| \| \| \| \| \| \|	The generated code for CaseWhen uses a control variable "got" to make sure we do not evaluate more branches once a branch is true. Changing that to generate just simple "if / else" would be slightly more efficient. This closes #10737. Author: Reynold Xin <rxin@databricks.com> Closes #10755 from rxin/SPARK-12771.
*	[SPARK-12784][UI] Fix Spark UI IndexOutOfBoundsException with dynamic allocation	Shixiong Zhu	2016-01-14	2	-6/+17
\| \| \| \| \| \| \| \|	Add `listener.synchronized` to get `storageStatusList` and `execInfo` atomically. Author: Shixiong Zhu <shixiong@databricks.com> Closes #10728 from zsxwing/SPARK-12784.
*	[SPARK-9844][CORE] File appender race condition during shutdown	Bryan Cutler	2016-01-14	2	-10/+95
\| \| \| \| \| \| \| \|	When an Executor process is destroyed, the FileAppender that is asynchronously reading the stderr stream of the process can throw an IOException during read because the stream is closed. Before the ExecutorRunner destroys the process, the FileAppender thread is flagged to stop. This PR wraps the inputStream.read call of the FileAppender in a try/catch block so that if an IOException is thrown and the thread has been flagged to stop, it will safely ignore the exception. Additionally, the FileAppender thread was changed to use Utils.tryWithSafeFinally to better log any exception that do occur. Added unit tests to verify a IOException is thrown and logged if FileAppender is not flagged to stop, and that no IOException when the flag is set. Author: Bryan Cutler <cutlerb@gmail.com> Closes #10714 from BryanCutler/file-appender-read-ioexception-SPARK-9844.
*	[SPARK-12707][SPARK SUBMIT] Remove submit python/R scripts through py…	Jeff Zhang	2016-01-13	1	-7/+6
\| \| \| \| \| \| \| \|	…spark/sparkR Author: Jeff Zhang <zjffdu@apache.org> Closes #10658 from zjffdu/SPARK-12707.
*	[SPARK-12756][SQL] use hash expression in Exchange	Wenchen Fan	2016-01-13	12	-64/+84
\| \| \| \| \| \| \| \| \| \|	This PR makes bucketing and exchange share one common hash algorithm, so that we can guarantee the data distribution is same between shuffle and bucketed data source, which enables us to only shuffle one side when join a bucketed table and a normal one. This PR also fixes the tests that are broken by the new hash behaviour in shuffle. Author: Wenchen Fan <wenchen@databricks.com> Closes #10703 from cloud-fan/use-hash-expr-in-shuffle.
*	[SPARK-12819] Deprecate TaskContext.isRunningLocally()	Josh Rosen	2016-01-13	5	-19/+4
\| \| \| \| \| \| \| \|	We've already removed local execution but didn't deprecate `TaskContext.isRunningLocally()`; we should deprecate it for 2.0. Author: Josh Rosen <joshrosen@databricks.com> Closes #10751 from JoshRosen/remove-local-exec-from-taskcontext.
*	[SPARK-12703][MLLIB][DOC][PYTHON] Fixed pyspark.mllib.clustering.KMeans user ↵	Joseph K. Bradley	2016-01-13	1	-5/+1
\| \| \| \| \| \| \| \| \| \|	guide example Fixed WSSSE computeCost in Python mllib KMeans user guide example by using new computeCost method API in Python. Author: Joseph K. Bradley <joseph@databricks.com> Closes #10707 from jkbradley/kmeans-doc-fix.
*	[SPARK-12026][MLLIB] ChiSqTest gets slower and slower over time when number ↵	Yuhao Yang	2016-01-13	1	-2/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	of features is large jira: https://issues.apache.org/jira/browse/SPARK-12026 The issue is valid as features.toArray.view.zipWithIndex.slice(startCol, endCol) becomes slower as startCol gets larger. I tested on local and the change can improve the performance and the running time was stable. Author: Yuhao Yang <hhbyyh@gmail.com> Closes #10146 from hhbyyh/chiSq.
*	[SPARK-12400][SHUFFLE] Avoid generating temp shuffle files for empty partitions	jerryshao	2016-01-13	2	-12/+51
\| \| \| \| \| \| \| \| \| \| \| \|	This problem lies in `BypassMergeSortShuffleWriter`, empty partition will also generate a temp shuffle file with several bytes. So here change to only create file when partition is not empty. This problem only lies in here, no such issue in `HashShuffleWriter`. Please help to review, thanks a lot. Author: jerryshao <sshao@hortonworks.com> Closes #10376 from jerryshao/SPARK-12400.
*	[SPARK-12690][CORE] Fix NPE in UnsafeInMemorySorter.free()	Carson Wang	2016-01-13	1	-2/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I hit the exception below. The `UnsafeKVExternalSorter` does pass `null` as the consumer when creating an `UnsafeInMemorySorter`. Normally the NPE doesn't occur because the `inMemSorter` is set to null later and the `free()` method is not called. It happens when there is another exception like OOM thrown before setting `inMemSorter` to null. Anyway, we can add the null check to avoid it. ``` ERROR spark.TaskContextImpl: Error in TaskCompletionListener java.lang.NullPointerException at org.apache.spark.util.collection.unsafe.sort.UnsafeInMemorySorter.free(UnsafeInMemorySorter.java:110) at org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.cleanupResources(UnsafeExternalSorter.java:288) at org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter$1.onTaskCompletion(UnsafeExternalSorter.java:141) at org.apache.spark.TaskContextImpl$$anonfun$markTaskCompleted$1.apply(TaskContextImpl.scala:79) at org.apache.spark.TaskContextImpl$$anonfun$markTaskCompleted$1.apply(TaskContextImpl.scala:77) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.TaskContextImpl.markTaskCompleted(TaskContextImpl.scala:77) at org.apache.spark.scheduler.Task.run(Task.scala:91) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) ``` Author: Carson Wang <carson.wang@intel.com> Closes #10637 from carsonwang/FixNPE.
*	[SPARK-12791][SQL] Simplify CaseWhen by breaking "branches" into ↵	Reynold Xin	2016-01-13	12	-138/+156
\| \| \| \| \| \| \| \| \| \| \| \|	"conditions" and "values" This pull request rewrites CaseWhen expression to break the single, monolithic "branches" field into a sequence of tuples (Seq[(condition, value)]) and an explicit optional elseValue field. Prior to this pull request, each even position in "branches" represents the condition for each branch, and each odd position represents the value for each branch. The use of them have been pretty confusing with a lot sliding windows or grouped(2) calls. Author: Reynold Xin <rxin@databricks.com> Closes #10734 from rxin/simplify-case.
*	[SPARK-12642][SQL] improve the hash expression to be decoupled from unsafe row	Wenchen Fan	2016-01-13	6	-29/+288
\| \| \| \| \| \| \| \|	https://issues.apache.org/jira/browse/SPARK-12642 Author: Wenchen Fan <wenchen@databricks.com> Closes #10694 from cloud-fan/hash-expr.
*	[SPARK-12268][PYSPARK] Make pyspark shell pythonstartup work under python3	Erik Selin	2016-01-13	1	-1/+3
\| \| \| \| \| \| \| \| \| \|	This replaces the `execfile` used for running custom python shell scripts with explicit open, compile and exec (as recommended by 2to3). The reason for this change is to make the pythonstartup option compatible with python3. Author: Erik Selin <erik.selin@gmail.com> Closes #10255 from tyro89/pythonstartup-python3.
*	[SPARK-9383][PROJECT-INFRA] PR merge script should reset back to previous ↵	Josh Rosen	2016-01-13	1	-3/+16
\| \| \| \| \| \| \| \| \| \| \| \|	branch when possible This patch modifies our PR merge script to reset back to a named branch when restoring the original checkout upon exit. When the committer is originally checked out to a detached head, then they will be restored back to that same ref (the same as today's behavior). This is a slightly updated version of #7569, with an extra fix to handle the detached head corner-case. Author: Josh Rosen <joshrosen@databricks.com> Closes #10709 from JoshRosen/SPARK-9383.
*	[SPARK-12761][CORE] Remove duplicated code	Jakob Odersky	2016-01-13	1	-5/+1
\| \| \| \| \| \| \| \|	Removes some duplicated code that was reintroduced during a merge. Author: Jakob Odersky <jodersky@gmail.com> Closes #10711 from jodersky/repl-2.11-duplicate.
*	[SPARK-12805][MESOS] Fixes documentation on Mesos run modes	Luc Bourlier	2016-01-13	1	-7/+5
\| \| \| \| \| \| \| \|	The default run has changed, but the documentation didn't fully reflect the change. Author: Luc Bourlier <luc.bourlier@typesafe.com> Closes #10740 from skyluc/issue/mesos-modes-doc.
*	[SPARK-9297] [SQL] Add covar_pop and covar_samp	Liang-Chi Hsieh	2016-01-13	4	-0/+272
\| \| \| \| \| \| \| \| \| \| \|	JIRA: https://issues.apache.org/jira/browse/SPARK-9297 Add two aggregation functions: covar_pop and covar_samp. Author: Liang-Chi Hsieh <viirya@gmail.com> Author: Liang-Chi Hsieh <viirya@appier.com> Closes #10029 from viirya/covar-funcs.
*	[SPARK-12692][BUILD][HOT-FIX] Fix the scala style of ↵	Yin Huai	2016-01-13	1	-2/+2
\| \| \| \| \| \| \| \| \| \|	KinesisBackedBlockRDDSuite.scala. https://github.com/apache/spark/pull/10736 was merged yesterday and caused the master start to fail because of the style issue. Author: Yin Huai <yhuai@databricks.com> Closes #10742 from yhuai/fixStyle.
*	[SPARK-12692][BUILD] Enforce style checking about white space before comma	Kousuke Saruta	2016-01-13	1	-7/+6
\| \| \| \| \| \| \| \| \|	This is the final PR about SPARK-12692. We have removed all of white spaces before comma from code so let's enforce style checking. Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp> Closes #10736 from sarutak/SPARK-12692-followup-enforce-checking.
*	[SPARK-12692][BUILD][SQL] Scala style: Fix the style violation (Space before ↵	Kousuke Saruta	2016-01-12	10	-22/+22
\| \| \| \| \| \| \| \| \| \| \|	",") Fix the style violation (space before , and :). This PR is a followup for #10643 and rework of #10685 . Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp> Closes #10732 from sarutak/SPARK-12692-followup-sql.
*	[SPARK-12558][SQL] AnalysisException when multiple functions applied in ↵	Dilip Biswal	2016-01-12	2	-0/+30
\| \| \| \| \| \| \| \| \| \| \| \|	GROUP BY clause cloud-fan Can you please take a look ? In this case, we are failing during check analysis while validating the aggregation expression. I have added a semanticEquals for HiveGenericUDF to fix this. Please let me know if this is the right way to address this issue. Author: Dilip Biswal <dbiswal@us.ibm.com> Closes #10520 from dilipbiswal/spark-12558.
*	[SPARK-12692][BUILD][CORE] Scala style: Fix the style violation (Space ↵	Kousuke Saruta	2016-01-12	6	-6/+6
\| \| \| \| \| \| \| \| \| \| \|	before ",") Fix the style violation (space before , and :). This PR is a followup for #10643 Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp> Closes #10719 from sarutak/SPARK-12692-followup-core.