spark - Mirror of Apache Spark

	Commit message (Collapse)	Author	Age	Files	Lines
*	[SPARK-13324][CORE][BUILD] Update plugin, test, example dependencies for 2.x	Sean Owen	2016-02-17	9	-42/+42
\| \| \| \| \| \| \| \|	Phase 1: update plugin versions, test dependencies, some example and third-party versions Author: Sean Owen <sowen@cloudera.com> Closes #11206 from srowen/SPARK-13324.
*	[MINOR][MLLIB] fix mllib compile warnings	Xiangrui Meng	2016-02-17	2	-0/+6
\| \| \| \| \| \| \| \|	This PR fixes some warnings found by `build/sbt mllib/test:compile`. Author: Xiangrui Meng <meng@databricks.com> Closes #11227 from mengxr/fix-mllib-warnings-201602.
*	[SPARK-13344][TEST] Fix harmless accumulator not found exceptions	Andrew Or	2016-02-17	3	-4/+30
\| \| \| \| \| \| \| \|	See [JIRA](https://issues.apache.org/jira/browse/SPARK-13344) for more detail. This was caused by #10835. Author: Andrew Or <andrew@databricks.com> Closes #11222 from andrewor14/fix-test-accum-exceptions.
*	[SPARK-12953][EXAMPLES] RDDRelation writer set overwrite mode	shijinkui	2016-02-17	1	-4/+3
\| \| \| \| \| \| \| \| \| \| \| \| \|	https://issues.apache.org/jira/browse/SPARK-12953 fix error when run RDDRelation.main(): "path file:/Users/sjk/pair.parquet already exists" Set DataFrameWriter's mode to SaveMode.Overwrite Author: shijinkui <shijinkui666@163.com> Closes #10864 from shijinkui/set_mode.
*	[SPARK-13109][BUILD] Fix SBT publishLocal issue	jerryshao	2016-02-17	3	-1/+4
\| \| \| \| \| \| \| \| \| \|	Add local ivy repo to the SBT build file to fix this. Scaladoc compile error is fixed. Author: jerryshao <sshao@hortonworks.com> Closes #11001 from jerryshao/SPARK-13109.
*	[SPARK-13350][DOCS] Config doc updated to state that PYSPARK_PYTHON's ↵	Christopher C. Aycock	2016-02-17	1	-1/+1
\| \| \| \| \| \| \| \|	default is "python2.7" Author: Christopher C. Aycock <chris@chrisaycock.com> Closes #11239 from chrisaycock/master.
*	[SPARK-13357][SQL] Use generated projection and ordering for ↵	Takuya UESHIN	2016-02-17	1	-3/+4
\| \| \| \| \| \| \| \| \| \|	TakeOrderedAndProjectNode `TakeOrderedAndProjectNode` should use generated projection and ordering like other `LocalNode`s. Author: Takuya UESHIN <ueshin@happy-camper.st> Closes #11230 from ueshin/issues/SPARK-13357.
*	[SPARK-13279] Remove O(n^2) operation from scheduler.	Sital Kedia	2016-02-16	1	-15/+13
\| \| \| \| \| \| \| \| \|	This commit removes an unnecessary duplicate check in addPendingTask that meant that scheduling a task set took time proportional to (# tasks)^2. Author: Sital Kedia <skedia@fb.com> Closes #11175 from sitalkedia/fix_stuck_driver.
*	[SPARK-11627] Add initial input rate limit for spark streaming backpressure ↵	junhao	2016-02-16	2	-1/+16
\| \| \| \| \| \| \| \| \| \| \| \| \|	mechanism. https://issues.apache.org/jira/browse/SPARK-11627 Spark Streaming backpressure mechanism has no initial input rate limit, it might cause OOM exception. In the firest batch task ,receivers receive data at the maximum speed they can reach,it might exhaust executors memory resources. Add a initial input rate limit value can make sure the Streaming job execute success in the first batch,then the backpressure mechanism can adjust receiving rate adaptively. Author: junhao <junhao@mogujie.com> Closes #9593 from junhaoMg/junhao-dev.
*	[SPARK-13308] ManagedBuffers passed to OneToOneStreamManager need to be ↵	Josh Rosen	2016-02-16	7	-9/+119
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	freed in non-error cases ManagedBuffers that are passed to `OneToOneStreamManager.registerStream` need to be freed by the manager once it's done using them. However, the current code only frees them in certain error-cases and not during typical operation. This isn't a major problem today, but it will cause memory leaks after we implement better locking / pinning in the BlockManager (see #10705). This patch modifies the relevant network code so that the ManagedBuffers are freed as soon as the messages containing them are processed by the lower-level Netty message sending code. /cc zsxwing for review. Author: Josh Rosen <joshrosen@databricks.com> Closes #11193 from JoshRosen/add-missing-release-calls-in-network-layer.
*	[SPARK-13280][STREAMING] Use a better logger name for FileBasedWriteAheadLog.	Marcelo Vanzin	2016-02-16	1	-5/+15
\| \| \| \| \| \| \| \| \| \|	The new logger name is under the org.apache.spark namespace. The detection of the caller name was also enhanced a bit to ignore some common things that show up in the call stack. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #11165 from vanzin/SPARK-13280.
*	[SPARK-12976][SQL] Add LazilyGenerateOrdering and use it for ↵	Takuya UESHIN	2016-02-16	3	-8/+42
\| \| \| \| \| \| \| \| \| \|	RangePartitioner of Exchange. Add `LazilyGenerateOrdering` to support generated ordering for `RangePartitioner` of `Exchange` instead of `InterpretedOrdering`. Author: Takuya UESHIN <ueshin@happy-camper.st> Closes #10894 from ueshin/issues/SPARK-12976.
*	[SPARK-12247][ML][DOC] Documentation for spark.ml's ALS and collaborative ↵	BenFradet	2016-02-16	10	-298/+431
\| \| \| \| \| \| \| \| \| \|	filtering in general This documents the implementation of ALS in `spark.ml` with example code in scala, java and python. Author: BenFradet <benjamin.fradet@gmail.com> Closes #10411 from BenFradet/SPARK-12247.
*	Correct SparseVector.parse documentation	Miles Yucht	2016-02-16	1	-1/+1
\| \| \| \| \| \| \| \|	There's a small typo in the SparseVector.parse docstring (which says that it returns a DenseVector rather than a SparseVector), which seems to be incorrect. Author: Miles Yucht <miles@databricks.com> Closes #11213 from mgyucht/fix-sparsevector-docs.
*	[SPARK-13221] [SQL] Fixing GroupingSets when Aggregate Functions Containing ↵	gatorsmile	2016-02-15	10	-107/+155
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	GroupBy Columns Using GroupingSets will generate a wrong result when Aggregate Functions containing GroupBy columns. This PR is to fix it. Since the code changes are very small. Maybe we also can merge it to 1.6 For example, the following query returns a wrong result: ```scala sql("select course, sum(earnings) as sum from courseSales group by course, earnings" + " grouping sets((), (course), (course, earnings))" + " order by course, sum").show() ``` Before the fix, the results are like ``` [null,null] [Java,null] [Java,20000.0] [Java,30000.0] [dotNET,null] [dotNET,5000.0] [dotNET,10000.0] [dotNET,48000.0] ``` After the fix, the results become correct: ``` [null,113000.0] [Java,20000.0] [Java,30000.0] [Java,50000.0] [dotNET,5000.0] [dotNET,10000.0] [dotNET,48000.0] [dotNET,63000.0] ``` UPDATE: This PR also deprecated the external column: GROUPING__ID. Author: gatorsmile <gatorsmile@gmail.com> Closes #11100 from gatorsmile/groupingSets.
*	[SPARK-13018][DOCS] Replace example code in mllib-pmml-model-export.md using ↵	Xin Ren	2016-02-15	2	-32/+62
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	include_example Replace example code in mllib-pmml-model-export.md using include_example https://issues.apache.org/jira/browse/SPARK-13018 The example code in the user guide is embedded in the markdown and hence it is not easy to test. It would be nice to automatically test them. This JIRA is to discuss options to automate example code testing and see what we can do in Spark 1.6. Goal is to move actual example code to spark/examples and test compilation in Jenkins builds. Then in the markdown, we can reference part of the code to show in the user guide. This requires adding a Jekyll tag that is similar to https://github.com/jekyll/jekyll/blob/master/lib/jekyll/tags/include.rb, e.g., called include_example. `{% include_example scala/org/apache/spark/examples/mllib/PMMLModelExportExample.scala %}` Jekyll will find `examples/src/main/scala/org/apache/spark/examples/mllib/PMMLModelExportExample.scala` and pick code blocks marked "example" and replace code block in `{% highlight %}` in the markdown. See more sub-tasks in parent ticket: https://issues.apache.org/jira/browse/SPARK-11337 Author: Xin Ren <iamshrek@126.com> Closes #11126 from keypointt/SPARK-13018.
*	[SPARK-13097][ML] Binarizer allowing Double AND Vector input types	seddonm1	2016-02-15	2	-17/+81
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This enhancement extends the existing SparkML Binarizer [SPARK-5891] to allow Vector in addition to the existing Double input column type. A use case for this enhancement is for when a user wants to Binarize many similar feature columns at once using the same threshold value (for example a binary threshold applied to many pixels in an image). This contribution is my original work and I license the work to the project under the project's open source license. viirya mengxr Author: seddonm1 <seddonm1@gmail.com> Closes #10976 from seddonm1/master.
*	[SPARK-13312][MLLIB] Update java train-validation-split example in ml-guide	JeremyNixon	2016-02-15	1	-2/+2
\| \| \| \| \| \| \| \| \| \|	Response to JIRA https://issues.apache.org/jira/browse/SPARK-13312. This contribution is my original work and I license the work to this project. Author: JeremyNixon <jnixon2@gmail.com> Closes #11199 from JeremyNixon/update_train_val_split_example.
*	[SPARK-12995][GRAPHX] Remove deprecate APIs from Pregel	Takeshi YAMAMURO	2016-02-15	7	-134/+42
\| \| \| \| \| \|	Author: Takeshi YAMAMURO <linguin.m.s@gmail.com> Closes #10918 from maropu/RemoveDeprecateInPregel.
*	[SPARK-12503][SPARK-12505] Limit pushdown in UNION ALL and OUTER JOIN	Josh Rosen	2016-02-14	6	-9/+294
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch adds a new optimizer rule for performing limit pushdown. Limits will now be pushed down in two cases: - If a limit is on top of a `UNION ALL` operator, then a partition-local limit operator will be pushed to each of the union operator's children. - If a limit is on top of an `OUTER JOIN` then a partition-local limit will be pushed to one side of the join. For `LEFT OUTER` and `RIGHT OUTER` joins, the limit will be pushed to the left and right side, respectively. For `FULL OUTER` join, we will only push limits when at most one of the inputs is already limited: if one input is limited we will push a smaller limit on top of it and if neither input is limited then we will limit the input which is estimated to be larger. These optimizations were proposed previously by gatorsmile in #10451 and #10454, but those earlier PRs were closed and deferred for later because at that time Spark's physical `Limit` operator would trigger a full shuffle to perform global limits so there was a chance that pushdowns could actually harm performance by causing additional shuffles/stages. In #7334, we split the `Limit` operator into separate `LocalLimit` and `GlobalLimit` operators, so we can now push down only local limits (which don't require extra shuffles). This patch is based on both of gatorsmile's patches, with changes and simplifications due to partition-local-limiting. When we push down the limit, we still keep the original limit in place, so we need a mechanism to ensure that the optimizer rule doesn't keep pattern-matching once the limit has been pushed down. In order to handle this, this patch adds a `maxRows` method to `SparkPlan` which returns the maximum number of rows that the plan can compute, then defines the pushdown rules to only push limits to children if the children's maxRows are greater than the limit's maxRows. This idea is carried over from #10451; see that patch for additional discussion. Author: Josh Rosen <joshrosen@databricks.com> Closes #11121 from JoshRosen/limit-pushdown-2.
*	[SPARK-13185][SQL] Reuse Calendar object in DateTimeUtils.StringToDate ↵	Carson Wang	2016-02-14	1	-1/+9
\| \| \| \| \| \| \| \| \| \| \| \|	method to improve performance The java `Calendar` object is expensive to create. I have a sub query like this `SELECT a, b, c FROM table UV WHERE (datediff(UV.visitDate, '1997-01-01')>=0 AND datediff(UV.visitDate, '2015-01-01')<=0))` The table stores `visitDate` as String type and has 3 billion records. A `Calendar` object is created every time `DateTimeUtils.stringToDate` is called. By reusing the `Calendar` object, I saw about 20 seconds performance improvement for this stage. Author: Carson Wang <carson.wang@intel.com> Closes #11090 from carsonwang/SPARK-13185.
*	[SPARK-13278][CORE] Launcher fails to start with JDK 9 EA	Claes Redestad	2016-02-14	4	-7/+37
\| \| \| \| \| \| \| \|	See http://openjdk.java.net/jeps/223 for more information about the JDK 9 version string scheme. Author: Claes Redestad <claes.redestad@gmail.com> Closes #11160 from cl4es/master.
*	[SPARK-13300][DOCUMENTATION] Added pygments.rb dependancy	Amit Dev	2016-02-14	1	-8/+13
\| \| \| \| \| \| \| \|	Looks like pygments.rb gem is also required for jekyll build to work. At least on Ubuntu/RHEL I could not do build without this dependency. So added this to steps. Author: Amit Dev <amitdev@gmail.com> Closes #11180 from amitdev/master.
*	[SPARK-13296][SQL] Move UserDefinedFunction into sql.expressions.	Reynold Xin	2016-02-13	15	-217/+320
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This pull request has the following changes: 1. Moved UserDefinedFunction into expressions package. This is more consistent with how we structure the packages for window functions and UDAFs. 2. Moved UserDefinedPythonFunction into execution.python package, so we don't have a random private class in the top level sql package. 3. Move everything in execution/python.scala into the newly created execution.python package. Most of the diffs are just straight copy-paste. Author: Reynold Xin <rxin@databricks.com> Closes #11181 from rxin/SPARK-13296.
*	[SPARK-13172][CORE][SQL] Stop using RichException.getStackTrace it is deprecated	Sean Owen	2016-02-13	7	-12/+14
\| \| \| \| \| \| \| \|	Replace `getStackTraceString` with `Utils.exceptionString` Author: Sean Owen <sowen@cloudera.com> Closes #11182 from srowen/SPARK-13172.
*	Closes #11185	Reynold Xin	2016-02-13	0	-0/+0
\|
*	[SPARK-12363][MLLIB] Remove setRun and fix PowerIterationClustering failed test	Liang-Chi Hsieh	2016-02-13	4	-85/+96
\| \| \| \| \| \| \| \| \| \| \|	JIRA: https://issues.apache.org/jira/browse/SPARK-12363 This issue is pointed by yanboliang. When `setRuns` is removed from PowerIterationClustering, one of the tests will be failed. I found that some `dstAttr`s of the normalized graph are not correct values but 0.0. By setting `TripletFields.All` in `mapTriplets` it can work. Author: Liang-Chi Hsieh <viirya@gmail.com> Author: Xiangrui Meng <meng@databricks.com> Closes #10539 from viirya/fix-poweriter.
*	[SPARK-13142][WEB UI] Problem accessing Web UI /logPage/ on Microsoft Windows	markpavey	2016-02-13	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \|	Due to being on a Windows platform I have been unable to run the tests as described in the "Contributing to Spark" instructions. As the change is only to two lines of code in the Web UI, which I have manually built and tested, I am submitting this pull request anyway. I hope this is OK. Is it worth considering also including this fix in any future 1.5.x releases (if any)? I confirm this is my own original work and license it to the Spark project under its open source license. Author: markpavey <mark.pavey@thefilter.com> Closes #11135 from markpavey/JIRA_SPARK-13142_WindowsWebUILogFix.
*	[SPARK-13293][SQL] generate Expand	Davies Liu	2016-02-12	2	-1/+140
\| \| \| \| \| \| \| \| \| \| \| \|	Expand suffer from create the UnsafeRow from same input multiple times, with codegen, it only need to copy some of the columns. After this, we can see 3X improvements (from 43 seconds to 13 seconds) on a TPCDS query (Q67) that have eight columns in Rollup. Ideally, we could mask some of the columns based on bitmask, I'd leave that in the future, because currently Aggregation (50 ns) is much slower than that just copy the variables (1-2 ns). Author: Davies Liu <davies@databricks.com> Closes #11177 from davies/gen_expand.
*	[SPARK-5095] remove flaky test	Michael Gummelt	2016-02-12	1	-0/+5
\| \| \| \| \| \| \| \|	Overrode the start() method, which was previously starting a thread causing a race condition. I believe this should fix the flaky test. Author: Michael Gummelt <mgummelt@mesosphere.io> Closes #11164 from mgummelt/fix_mesos_tests.
*	[SPARK-5095] Fix style in mesos coarse grained scheduler code	Michael Gummelt	2016-02-12	2	-10/+12
\| \| \| \| \| \| \| \|	andrewor14 This addressed your style comments from #10993 Author: Michael Gummelt <mgummelt@mesosphere.io> Closes #11187 from mgummelt/fix_mesos_style.
*	[SPARK-12630][PYSPARK] [DOC] PySpark classification parameter desc to ↵	vijaykiran	2016-02-12	1	-118/+143
\| \| \| \| \| \| \| \| \| \| \|	consistent format Part of task for [SPARK-11219](https://issues.apache.org/jira/browse/SPARK-11219) to make PySpark MLlib parameter description formatting consistent. This is for the classification module. Author: vijaykiran <mail@vijaykiran.com> Author: Bryan Cutler <cutlerb@gmail.com> Closes #11183 from BryanCutler/pyspark-consistent-param-classification-SPARK-12630.
*	[SPARK-12962] [SQL] [PySpark] PySpark support covar_samp and covar_pop	Yanbo Liang	2016-02-12	1	-6/+35
\| \| \| \| \| \| \| \| \| \|	PySpark support ```covar_samp``` and ```covar_pop```. cc rxin davies marmbrus Author: Yanbo Liang <ybliang8@gmail.com> Closes #10876 from yanboliang/spark-12962.
*	[SPARK-13260][SQL] count(*) does not work with CSV data source	hyukjinkwon	2016-02-12	2	-45/+41
\| \| \| \| \| \| \| \| \| \| \| \| \|	https://issues.apache.org/jira/browse/SPARK-13260 This is a quicky fix for `count(*)`. When the `requiredColumns` is empty, currently it returns `sqlContext.sparkContext.emptyRDD[Row]` which does not have the count. Just like JSON datasource, this PR lets the CSV datasource count the rows but do not parse each set of tokens. Author: hyukjinkwon <gurwls223@gmail.com> Closes #11169 from HyukjinKwon/SPARK-13260.
*	[SPARK-13282][SQL] LogicalPlan toSql should just return a String	Reynold Xin	2016-02-12	6	-156/+141
\| \| \| \| \| \| \| \|	Previously we were using Option[String] and None to indicate the case when Spark fails to generate SQL. It is easier to just use exceptions to propagate error cases, rather than having for comprehension everywhere. I also introduced a "build" function that simplifies string concatenation (i.e. no need to reason about whether we have an extra space or not). Author: Reynold Xin <rxin@databricks.com> Closes #11171 from rxin/SPARK-13282.
*	[SPARK-12705] [SQL] push missing attributes for Sort	Davies Liu	2016-02-12	3	-83/+67
\| \| \| \| \| \| \| \|	The current implementation of ResolveSortReferences can only push one missing attributes into it's child, it failed to analyze TPCDS Q98, because of there are two missing attributes in that (one from Window, another from Aggregate). Author: Davies Liu <davies@databricks.com> Closes #11153 from davies/resolve_sort.
*	[SPARK-13154][PYTHON] Add linting for pydocs	Holden Karau	2016-02-12	2	-0/+27
\| \| \| \| \| \| \| \| \| \|	We should have lint rules using sphinx to automatically catch the pydoc issues that are sometimes introduced. Right now ./dev/lint-python will skip building the docs if sphinx isn't present - but it might make sense to fail hard - just a matter of if we want to insist all PySpark developers have sphinx present. Author: Holden Karau <holden@us.ibm.com> Closes #11109 from holdenk/SPARK-13154-add-pydoc-lint-for-docs.
*	[SPARK-12974][ML][PYSPARK] Add Python API for spark.ml bisecting k-means	Yanbo Liang	2016-02-12	1	-1/+124
\| \| \| \| \| \| \| \|	Add Python API for spark.ml bisecting k-means. Author: Yanbo Liang <ybliang8@gmail.com> Closes #10889 from yanboliang/spark-12974.
*	[SPARK-6166] Limit number of in flight outbound requests	Sanket	2016-02-11	5	-15/+49
\| \| \| \| \| \| \| \| \| \| \|	This JIRA is related to https://github.com/apache/spark/pull/5852 Had to do some minor rework and test to make sure it works with current version of spark. Author: Sanket <schintap@untilservice-lm> Closes #10838 from redsanket/limit-outbound-connections.
*	[SPARK-7889][WEBUI] HistoryServer updates UI for incomplete apps	Steve Loughran	2016-02-11	10	-74/+1654
\| \| \| \| \| \| \| \| \| \| \|	When the HistoryServer is showing an incomplete app, it needs to check if there is a newer version of the app available. It does this by checking if a version of the app has been loaded with a larger filesize. If so, it detaches the current UI, attaches the new one, and redirects back to the same URL to show the new UI. https://issues.apache.org/jira/browse/SPARK-7889 Author: Steve Loughran <stevel@hortonworks.com> Author: Imran Rashid <irashid@cloudera.com> Closes #11118 from squito/SPARK-7889-alternate.
*	[SPARK-13153][PYSPARK] ML persistence failed when handle no default value ↵	Tommy YU	2016-02-11	1	-2/+3
\| \| \| \| \| \| \| \| \| \| \| \|	parameter Fix this defect by check default value exist or not. yanboliang Please help to review. Author: Tommy YU <tummyyu@163.com> Closes #11043 from Wenpei/spark-13153-handle-param-withnodefaultvalue.
*	[SPARK-12746][ML] ArrayType(_, true) should also accept ArrayType(_, false)	Earthson Lu	2016-02-11	1	-1/+2
\| \| \| \| \| \| \| \|	https://issues.apache.org/jira/browse/SPARK-12746 Author: Earthson Lu <Earthson.Lu@gmail.com> Closes #10697 from Earthson/SPARK-12746.
*	[SPARK-13277][BUILD] Follow-up ANTLR warnings are treated as build errors	Herman van Hovell	2016-02-11	1	-3/+6
\| \| \| \| \| \| \| \| \| \|	It is possible to create faulty but legal ANTLR grammars. ANTLR will produce warnings but also a valid compileable parser. This PR makes sure we treat such warnings as build errors. cc rxin / viirya Author: Herman van Hovell <hvanhovell@questtec.nl> Closes #11174 from hvanhovell/ANTLR-warnings-as-errors.
*	[SPARK-12915][SQL] add SQL metrics of numOutputRows for whole stage codegen	Davies Liu	2016-02-11	9	-31/+71
\| \| \| \| \| \| \| \| \| \|	This PR add SQL metrics (numOutputRows) for generated operators (same as non-generated), the cost is about 0.2 nano seconds per row. <img width="806" alt="gen metrics" src="https://cloud.githubusercontent.com/assets/40902/12994694/47f5881e-d0d7-11e5-9d47-78229f559ab0.png"> Author: Davies Liu <davies@databricks.com> Closes #11170 from davies/gen_metric.
*	[SPARK-12765][ML][COUNTVECTORIZER] fix CountVectorizer.transform's lost ↵	Liu Xiang	2016-02-11	1	-0/+1
\| \| \| \| \| \| \| \| \| \|	transformSchema https://issues.apache.org/jira/browse/SPARK-12765 Author: Liu Xiang <lxmtlab@gmail.com> Closes #10720 from sloth2012/sloth.
*	[SPARK-13047][PYSPARK][ML] Pyspark Params.hasParam should not throw an error	sethah	2016-02-11	2	-4/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Pyspark Params class has a method `hasParam(paramName)` which returns `True` if the class has a parameter by that name, but throws an `AttributeError` otherwise. There is not currently a way of getting a Boolean to indicate if a class has a parameter. With Spark 2.0 we could modify the existing behavior of `hasParam` or add an additional method with this functionality. In Python: ```python from pyspark.ml.classification import NaiveBayes nb = NaiveBayes() print nb.hasParam("smoothing") print nb.hasParam("notAParam") ``` produces: > True > AttributeError: 'NaiveBayes' object has no attribute 'notAParam' However, in Scala: ```scala import org.apache.spark.ml.classification.NaiveBayes val nb = new NaiveBayes() nb.hasParam("smoothing") nb.hasParam("notAParam") ``` produces: > true > false cc holdenk Author: sethah <seth.hendrickson16@gmail.com> Closes #10962 from sethah/SPARK-13047.
*	[SPARK-13035][ML][PYSPARK] PySpark ml.clustering support export/import	Yanbo Liang	2016-02-11	1	-4/+25
\| \| \| \| \| \| \| \|	PySpark ml.clustering support export/import. Author: Yanbo Liang <ybliang8@gmail.com> Closes #10999 from yanboliang/spark-13035.
*	[MINOR][ML][PYSPARK] Cleanup test cases of clustering.py	Yanbo Liang	2016-02-11	2	-15/+9
\| \| \| \| \| \| \| \| \|	Test cases should be removed from annotation of ```setXXX``` function, otherwise it will be parts of [Python API docs](https://spark.apache.org/docs/latest/api/python/pyspark.ml.html#pyspark.ml.clustering.KMeans.setInitMode). cc mengxr jkbradley Author: Yanbo Liang <ybliang8@gmail.com> Closes #10975 from yanboliang/clustering-cleanup.
*	[SPARK-13037][ML][PYSPARK] PySpark ml.recommendation support export/import	Kai Jiang	2016-02-11	1	-4/+27
\| \| \| \| \| \| \| \|	PySpark ml.recommendation support export/import. Author: Kai Jiang <jiangkai@gmail.com> Closes #11044 from vectorijk/spark-13037.
*	[SPARK-11515][ML] QuantileDiscretizer should take random seed	Yu ISHIKAWA	2016-02-11	2	-6/+11
\| \| \| \| \| \| \| \|	cc jkbradley Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com> Closes #9535 from yu-iskw/SPARK-11515.