spark - Mirror of Apache Spark

	Commit message (Collapse)	Author	Age	Files	Lines
*	[SPARK-11463] [PYSPARK] only install signal in main thread	Davies Liu	2015-11-10	1	-1/+4
\| \| \| \| \| \| \| \|	Only install signal in main thread, or it will fail to create context in not-main thread. Author: Davies Liu <davies@databricks.com> Closes #9574 from davies/python_signal.
*	[SPARK-11566] [MLLIB] [PYTHON] Refactoring GaussianMixtureModel.gaussians in ↵	Yu ISHIKAWA	2015-11-10	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	Python cc jkbradley Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com> Closes #9534 from yu-iskw/SPARK-11566.
*	[SPARK-11567] [PYTHON] Add Python API for corr Aggregate function	felixcheung	2015-11-10	1	-0/+16
\| \| \| \| \| \| \| \| \| \|	like `df.agg(corr("col1", "col2")` davies Author: felixcheung <felixcheung_m@hotmail.com> Closes #9536 from felixcheung/pyfunc.
*	[SPARK-9830][SQL] Remove AggregateExpression1 and Aggregate Operator used to ↵	Yin Huai	2015-11-10	3	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	evaluate AggregateExpression1s https://issues.apache.org/jira/browse/SPARK-9830 This PR contains the following main changes. * Removing `AggregateExpression1`. * Removing `Aggregate` operator, which is used to evaluate `AggregateExpression1`. * Removing planner rule used to plan `Aggregate`. * Linking `MultipleDistinctRewriter` to analyzer. * Renaming `AggregateExpression2` to `AggregateExpression` and `AggregateFunction2` to `AggregateFunction`. * Updating places where we create aggregate expression. The way to create aggregate expressions is `AggregateExpression(aggregateFunction, mode, isDistinct)`. * Changing `val`s in `DeclarativeAggregate`s that touch children of this function to `lazy val`s (when we create aggregate expression in DataFrame API, children of an aggregate function can be unresolved). Author: Yin Huai <yhuai@databricks.com> Closes #9556 from yhuai/removeAgg1.
*	[SPARK-11610][MLLIB][PYTHON][DOCS] Make the docs of LDAModel.describeTopics ↵	Yu ISHIKAWA	2015-11-09	1	-0/+6
\| \| \| \| \| \| \| \| \| \|	in Python more specific cc jkbradley Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com> Closes #9577 from yu-iskw/SPARK-11610.
*	[SPARK-9301][SQL] Add collect_set and collect_list aggregate functions	Nick Buroojy	2015-11-09	2	-11/+31
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	For now they are thin wrappers around the corresponding Hive UDAFs. One limitation with these in Hive 0.13.0 is they only support aggregating primitive types. I chose snake_case here instead of camelCase because it seems to be used in the majority of the multi-word fns. Do we also want to add these to `functions.py`? This approach was recommended here: https://github.com/apache/spark/pull/8592#issuecomment-154247089 marmbrus rxin Author: Nick Buroojy <nick.buroojy@civitaslearning.com> Closes #9526 from nburoojy/nick/udaf-alias. (cherry picked from commit a6ee4f989d020420dd08b97abb24802200ff23b2) Signed-off-by: Michael Armbrust <michael@databricks.com>
*	[SPARK-10280][MLLIB][PYSPARK][DOCS] Add @since annotation to ↵	Yu ISHIKAWA	2015-11-09	1	-0/+56
\| \| \| \| \| \| \| \|	pyspark.ml.classification Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com> Closes #8690 from yu-iskw/SPARK-10280.
*	[SPARK-8467] [MLLIB] [PYSPARK] Add LDAModel.describeTopics() in Python	Yu ISHIKAWA	2015-11-06	1	-15/+18
\| \| \| \| \| \| \| \| \| \| \| \| \|	Could jkbradley and davies review it? - Create a wrapper class: `LDAModelWrapper` for `LDAModel`. Because we can't deal with the return value of`describeTopics` in Scala from pyspark directly. `Array[(Array[Int], Array[Double])]` is too complicated to convert it. - Add `loadLDAModel` in `PythonMLlibAPI`. Since `LDAModel` in Scala is an abstract class and we need to call `load` of `DistributedLDAModel`. [[SPARK-8467] Add LDAModel.describeTopics() in Python - ASF JIRA](https://issues.apache.org/jira/browse/SPARK-8467) Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com> Closes #8643 from yu-iskw/SPARK-8467-2.
*	[HOTFIX] Fix python tests after #9527	Michael Armbrust	2015-11-06	1	-1/+1
\| \| \| \| \| \| \| \|	#9527 missed updating the python tests. Author: Michael Armbrust <michael@databricks.com> Closes #9533 from marmbrus/hotfixTextValue.
*	[SPARK-11410] [PYSPARK] Add python bindings for repartition and sortW…	Nong Li	2015-11-06	1	-16/+101
\| \| \| \| \| \| \| \|	…ithinPartitions. Author: Nong Li <nong@databricks.com> Closes #9504 from nongli/spark-11410.
*	[SPARK-10116][CORE] XORShiftRandom.hashSeed is random in high bits	Imran Rashid	2015-11-06	4	-18/+18
\| \| \| \| \| \| \| \| \| \| \| \|	https://issues.apache.org/jira/browse/SPARK-10116 This is really trivial, just happened to notice it -- if `XORShiftRandom.hashSeed` is really supposed to have random bits throughout (as the comment implies), it needs to do something for the conversion to `long`. mengxr mkolod Author: Imran Rashid <irashid@cloudera.com> Closes #8314 from squito/SPARK-10116.
*	[SPARK-11473][ML] R-like summary statistics with intercept for OLS via ↵	Yanbo Liang	2015-11-05	1	-8/+8
\| \| \| \| \| \| \| \| \| \|	normal equation solver Follow up [SPARK-9836](https://issues.apache.org/jira/browse/SPARK-9836), we should also support summary statistics for ```intercept```. Author: Yanbo Liang <ybliang8@gmail.com> Closes #9485 from yanboliang/spark-11473.
*	[SPARK-11527][ML][PYSPARK] PySpark AFTSurvivalRegressionModel should expose ↵	Yanbo Liang	2015-11-05	1	-0/+24
\| \| \| \| \| \| \| \| \| \|	coefficients/intercept/scale PySpark ```AFTSurvivalRegressionModel``` should expose coefficients/intercept/scale. mengxr vectorijk Author: Yanbo Liang <ybliang8@gmail.com> Closes #9492 from yanboliang/spark-11527.
*	[SPARK-11378][STREAMING] make StreamingContext.awaitTerminationOrTimeout ↵	Nick Evans	2015-11-05	2	-1/+8
\| \| \| \| \| \| \| \| \| \| \| \|	return properly This adds a failing test checking that `awaitTerminationOrTimeout` returns the expected value, and then fixes that failing test with the addition of a `return`. tdas zsxwing Author: Nick Evans <me@nicolasevans.org> Closes #9336 from manygrams/fix_await_termination_or_timeout.
*	[SPARK-10028][MLLIB][PYTHON] Add Python API for PrefixSpan	Yu ISHIKAWA	2015-11-04	1	-1/+68
\| \| \| \| \| \|	Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com> Closes #9469 from yu-iskw/SPARK-10028.
*	[SPARK-11489][SQL] Only include common first order statistics in GroupedData	Reynold Xin	2015-11-03	1	-88/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We added a bunch of higher order statistics such as skewness and kurtosis to GroupedData. I don't think they are common enough to justify being listed, since users can always use the normal statistics aggregate functions. That is to say, after this change, we won't support ```scala df.groupBy("key").kurtosis("colA", "colB") ``` However, we will still support ```scala df.groupBy("key").agg(kurtosis(col("colA")), kurtosis(col("colB"))) ``` Author: Reynold Xin <rxin@databricks.com> Closes #9446 from rxin/SPARK-11489.
*	[SPARK-11467][SQL] add Python API for stddev/variance	Davies Liu	2015-11-03	2	-0/+105
\| \| \| \| \| \| \| \|	Add Python API for stddev/stddev_pop/stddev_samp/variance/var_pop/var_samp/skewness/kurtosis Author: Davies Liu <davies@databricks.com> Closes #9424 from davies/py_var.
*	[SPARK-10592] [ML] [PySpark] Deprecate weights and use coefficients instead ↵	vectorijk	2015-11-02	2	-0/+25
\| \| \| \| \| \| \| \| \| \|	in ML models Deprecated in `LogisticRegression` and `LinearRegression` Author: vectorijk <jiangkai@gmail.com> Closes #9311 from vectorijk/spark-10592.
*	[SPARK-10286][ML][PYSPARK][DOCS] Add @since annotation to pyspark.ml.param ↵	lihao	2015-11-02	4	-0/+230
\| \| \| \| \| \| \| \|	and pyspark.ml.* Author: lihao <lihaowhu@gmail.com> Closes #9275 from lidinghao/SPARK-10286.
*	[SPARK-11358][MLLIB] deprecate runs in k-means	Xiangrui Meng	2015-11-02	1	-0/+4
\| \| \| \| \| \| \| \| \| \|	This PR deprecates `runs` in k-means. `runs` introduces extra complexity and overhead in MLlib's k-means implementation. I haven't seen much usage with `runs` not equal to `1`. We don't have a unit test for it either. We can deprecate this method in 1.6, and void it in 1.7. It helps us simplify the implementation. cc: srowen Author: Xiangrui Meng <meng@databricks.com> Closes #9322 from mengxr/SPARK-11358.
*	[SPARK-11437] [PYSPARK] Don't .take when converting RDD to DataFrame with ↵	Jason White	2015-11-02	1	-7/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	provided schema When creating a DataFrame from an RDD in PySpark, `createDataFrame` calls `.take(10)` to verify the first 10 rows of the RDD match the provided schema. Similar to https://issues.apache.org/jira/browse/SPARK-8070, but that issue affected cases where a schema was not provided. Verifying the first 10 rows is of limited utility and causes the DAG to be executed non-lazily. If necessary, I believe this verification should be done lazily on all rows. However, since the caller is providing a schema to follow, I think it's acceptable to simply fail if the schema is incorrect. marmbrus We chatted about this at SparkSummitEU. davies you made a similar change for the infer-schema path in https://github.com/apache/spark/pull/6606 Author: Jason White <jason.white@shopify.com> Closes #9392 from JasonMWhite/createDataFrame_without_take.
*	[SPARK-11322] [PYSPARK] Keep full stack trace in captured exception	Liang-Chi Hsieh	2015-10-28	2	-4/+21
\| \| \| \| \| \| \| \| \| \|	JIRA: https://issues.apache.org/jira/browse/SPARK-11322 As reported by JoshRosen in [databricks/spark-redshift/issues/89](https://github.com/databricks/spark-redshift/issues/89#issuecomment-149828308), the exception-masking behavior sometimes makes debugging harder. To deal with this issue, we should keep full stack trace in the captured exception. Author: Liang-Chi Hsieh <viirya@appier.com> Closes #9283 from viirya/py-exception-stacktrace.
*	[SPARK-11292] [SQL] Python API for text data source	Reynold Xin	2015-10-28	2	-2/+27
\| \| \| \| \| \| \| \|	Adds DataFrameReader.text and DataFrameWriter.text. Author: Reynold Xin <rxin@databricks.com> Closes #9259 from rxin/SPARK-11292.
*	[SPARK-11367][ML][PYSPARK] Python LinearRegression should support setting solver	Yanbo Liang	2015-10-28	3	-22/+37
\| \| \| \| \| \| \| \|	[SPARK-10668](https://issues.apache.org/jira/browse/SPARK-10668) has provided ```WeightedLeastSquares``` solver("normal") in ```LinearRegression``` with L2 regularization in Scala and R, Python ML ```LinearRegression``` should also support setting solver("auto", "normal", "l-bfgs") Author: Yanbo Liang <ybliang8@gmail.com> Closes #9328 from yanboliang/spark-11367.
*	[SPARK-11302][MLLIB] 2) Multivariate Gaussian Model with Covariance matrix ↵	Sean Owen	2015-10-27	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \|	returns incorrect answer in some cases Fix computation of root-sigma-inverse in multivariate Gaussian; add a test and fix related Python mixture model test. Supersedes https://github.com/apache/spark/pull/9293 Author: Sean Owen <sowen@cloudera.com> Closes #9309 from srowen/SPARK-11302.2.
*	[SPARK-10024][PYSPARK] Python API RF and GBT related params clear up	vectorijk	2015-10-27	2	-338/+168
\| \| \| \| \| \| \| \| \|	implement {RandomForest, GBT, TreeEnsemble, TreeClassifier, TreeRegressor}Params for Python API in pyspark/ml/{classification, regression}.py Author: vectorijk <jiangkai@gmail.com> Closes #9233 from vectorijk/spark-10024.
*	[SPARK-6488][MLLIB][PYTHON] Support addition/multiplication in PySpark's ↵	Mike Dusenberry	2015-10-27	1	-0/+68
\| \| \| \| \| \| \| \| \| \|	BlockMatrix This PR adds addition and multiplication to PySpark's `BlockMatrix` class via `add` and `multiply` functions. Author: Mike Dusenberry <mwdusenb@us.ibm.com> Closes #9139 from dusenberrymw/SPARK-6488_Add_Addition_and_Multiplication_to_PySpark_BlockMatrix.
*	[SPARK-11270][STREAMING] Add improved equality testing for TopicAndPartition ↵	Nick Evans	2015-10-27	2	-0/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	from the Kafka Streaming API jerryshao tdas I know this is kind of minor, and I know you all are busy, but this brings this class in line with the `OffsetRange` class, and makes tests a little more concise. Instead of doing something like: ``` assert topic_and_partition_instance._topic == "foo" assert topic_and_partition_instance._partition == 0 ``` You can do something like: ``` assert topic_and_partition_instance == TopicAndPartition("foo", 0) ``` Before: ``` >>> from pyspark.streaming.kafka import TopicAndPartition >>> TopicAndPartition("foo", 0) == TopicAndPartition("foo", 0) False ``` After: ``` >>> from pyspark.streaming.kafka import TopicAndPartition >>> TopicAndPartition("foo", 0) == TopicAndPartition("foo", 0) True ``` I couldn't find any tests - am I missing something? Author: Nick Evans <me@nicolasevans.org> Closes #9236 from manygrams/topic_and_partition_equality.
*	[SPARK-10271][PYSPARK][MLLIB] Added @since tags to pyspark.mllib.clustering	noelsmith	2015-10-26	1	-1/+68
\| \| \| \| \| \| \| \| \| \|	Duplicated the since decorator from pyspark.sql into pyspark (also tweaked to handle functions without docstrings). Added since to methods + "versionadded::" to classes (derived from the git file history in pyspark). Author: noelsmith <mail@noelsmith.com> Closes #8627 from noel-smith/SPARK-10271-since-mllib-clustering.
*	[SPARK-11279][PYSPARK] Add DataFrame#toDF in PySpark	Jeff Zhang	2015-10-26	1	-0/+12
\| \| \| \| \| \|	Author: Jeff Zhang <zjffdu@apache.org> Closes #9248 from zjffdu/SPARK-11279.
*	[SPARK-10277] [MLLIB] [PYSPARK] Add @since annotation to ↵	Yu ISHIKAWA	2015-10-23	1	-1/+101
\| \| \| \| \| \| \| \|	pyspark.mllib.regression Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com> Closes #8684 from yu-iskw/SPARK-10277.
*	[SPARK-7021] Add JUnit output for Python unit tests	Gábor Lipták	2015-10-22	5	-9/+48
\| \| \| \| \| \| \| \|	WIP Author: Gábor Lipták <gliptak@gmail.com> Closes #8323 from gliptak/SPARK-7021.
*	[SPARK-11205][PYSPARK] Delegate to scala DataFrame API rather than p…	Jeff Zhang	2015-10-20	1	-1/+2
\| \| \| \| \| \| \| \| \| \|	…rint in python No test needed. Verify it manually in pyspark shell Author: Jeff Zhang <zjffdu@apache.org> Closes #9177 from zjffdu/SPARK-11205.
*	[MINOR][ML] fix doc warnings	Xiangrui Meng	2015-10-20	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	Without an empty line, sphinx will treat doctest as docstring. holdenk ~~~ /Users/meng/src/spark/python/pyspark/ml/feature.py:docstring of pyspark.ml.feature.CountVectorizer:3: ERROR: Undefined substitution referenced: "label\|raw \|vectors \| +-----+---------------+-------------------------+ \|0 \|[a, b, c] \|(3,[0,1,2],[1.0,1.0,1.0])". /Users/meng/src/spark/python/pyspark/ml/feature.py:docstring of pyspark.ml.feature.CountVectorizer:3: ERROR: Undefined substitution referenced: "1 \|[a, b, b, c, a]\|(3,[0,1,2],[2.0,2.0,1.0])". ~~~ Author: Xiangrui Meng <meng@databricks.com> Closes #9188 from mengxr/py-count-vec-doc-fix.
*	[SPARK-10767][PYSPARK] Make pyspark shared params codegen more consistent	Holden Karau	2015-10-20	3	-65/+65
\| \| \| \| \| \| \| \|	Namely "." shows up in some places in the template when using the param docstring and not in others Author: Holden Karau <holden@pigscanfly.ca> Closes #9017 from holdenk/SPARK-10767-Make-pyspark-shared-params-codegen-more-consistent.
*	[SPARK-10269][PYSPARK][MLLIB] Add @since annotation to ↵	noelsmith	2015-10-20	1	-4/+66
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	pyspark.mllib.classification Duplicated the since decorator from pyspark.sql into pyspark (also tweaked to handle functions without docstrings). Added since to methods + "versionadded::" to classes derived from the file history. Note - some methods are inherited from the regression module (i.e. LinearModel.intercept) so these won't have version numbers in the API docs until that model is updated. Author: noelsmith <mail@noelsmith.com> Closes #8626 from noel-smith/SPARK-10269-since-mlib-classification.
*	[SPARK-10272][PYSPARK][MLLIB] Added @since tags to pyspark.mllib.evaluation	noelsmith	2015-10-20	1	-0/+41
\| \| \| \| \| \| \| \| \| \| \| \|	Duplicated the since decorator from pyspark.sql into pyspark (also tweaked to handle functions without docstrings). Added since to public methods + "versionadded::" to classes (derived from the git file history in pyspark). Note - I added also the tags to MultilabelMetrics even though it isn't declared as public in the __all__ statement... if that's incorrect - I'll remove. Author: noelsmith <mail@noelsmith.com> Closes #8628 from noel-smith/SPARK-10272-since-mllib-evalutation.
*	[SPARK-10447][SPARK-3842][PYSPARK] upgrade pyspark to py4j0.9	Holden Karau	2015-10-20	9	-57/+25
\| \| \| \| \| \| \| \| \|	Upgrade to Py4j0.9 Author: Holden Karau <holden@pigscanfly.ca> Author: Holden Karau <holden@us.ibm.com> Closes #8615 from holdenk/SPARK-10447-upgrade-pyspark-to-py4j0.9.
*	[SPARK-11114][PYSPARK] add getOrCreate for SparkContext/SQLContext in Python	Davies Liu	2015-10-19	4	-2/+59
\| \| \| \| \| \| \| \|	Also added SQLContext.newSession() Author: Davies Liu <davies@databricks.com> Closes #9122 from davies/py_create.
*	[SPARK-7018][BUILD] Refactor dev/run-tests-jenkins into Python	Brennon York	2015-10-18	1	-18/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This commit refactors the `run-tests-jenkins` script into Python. This refactoring was done by brennonyork in #7401; this PR contains a few minor edits from joshrosen in order to bring it up to date with other recent changes. From the original PR description (by brennonyork): Currently a few things are left out that, could and I think should, be smaller JIRA's after this. 1. There are still a few areas where we use environment variables where we don't need to (like `CURRENT_BLOCK`). I might get around to fixing this one in lieu of everything else, but wanted to point that out. 2. The PR tests are still written in bash. I opted to not change those and just rewrite the runner into Python. This is a great follow-on JIRA IMO. 3. All of the linting scripts are still in bash as well and would likely do to just add those in as follow-on JIRA's as well. Closes #7401. Author: Brennon York <brennon.york@capitalone.com> Closes #9161 from JoshRosen/run-tests-jenkins-refactoring.
*	[SPARK-11158][SQL] Modified _verify_type() to be more informative on Errors ↵	Mahmoud Lababidi	2015-10-18	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \|	by presenting the Object The _verify_type() function had Errors that were raised when there were Type conversion issues but left out the Object in question. The Object is now added in the Error to reduce the strain on the user to debug through to figure out the Object that failed the Type conversion. The use case for me was a Pandas DataFrame that contained 'nan' as values for columns of Strings. Author: Mahmoud Lababidi <mahmoud@thehumangeo.com> Author: Mahmoud Lababidi <lababidi@gmail.com> Closes #9149 from lababidi/master.
*	[SPARK-10185] [SQL] Feat sql comma separated paths	Koert Kuipers	2015-10-17	2	-1/+15
\| \| \| \| \| \| \| \|	Make sure comma-separated paths get processed correcly in ResolvedDataSource for a HadoopFsRelationProvider Author: Koert Kuipers <koert@tresata.com> Closes #8416 from koertkuipers/feat-sql-comma-separated-paths.
*	[SPARK-11084] [ML] [PYTHON] Check if index can contain non-zero value before ↵	zero323	2015-10-16	2	-2/+12
\| \| \| \| \| \| \| \| \| \|	binary search At this moment `SparseVector.__getitem__` executes `np.searchsorted` first and checks if result is in an expected range after that. It is possible to check if index can contain non-zero value before executing `np.searchsorted`. Author: zero323 <matthew.szymkiewicz@gmail.com> Closes #9098 from zero323/sparse_vector_getitem_improved.
*	[SPARK-11050] [MLLIB] PySpark SparseVector can return wrong index in e…	Bhargav Mangipudi	2015-10-16	1	-2/+3
\| \| \| \| \| \| \| \| \| \| \| \|	…rror message For negative indices in the SparseVector, we update the index value. If we have an incorrect index at this point, the error message has the incorrect updated index instead of the original one. This change contains the fix for the same. Author: Bhargav Mangipudi <bhargav.mangipudi@gmail.com> Closes #9069 from bhargav/spark-10759.
*	[PYTHON] [MINOR] List modules in PySpark tests when given bad name	Joseph K. Bradley	2015-10-13	1	-1/+2
\| \| \| \| \| \| \| \| \| \|	Output list of supported modules for python tests in error message when given bad module name. CC: davies Author: Joseph K. Bradley <joseph@databricks.com> Closes #9088 from jkbradley/python-tests-modules.
*	[SPARK-8170] [PYTHON] Add signal handler to trap Ctrl-C in pyspark and ↵	Ashwin Shankar	2015-10-12	1	-0/+7
\| \| \| \| \| \| \| \| \| \|	cancel all running jobs This patch adds a signal handler to trap Ctrl-C and cancels running job. Author: Ashwin Shankar <ashankar@netflix.com> Closes #9033 from ashwinshankar77/master.
*	[SPARK-10535] Sync up API for matrix factorization model between Scala and ↵	Vladimir Vladimirov	2015-10-09	1	-4/+28
\| \| \| \| \| \| \| \| \| \|	PySpark Support for recommendUsersForProducts and recommendProductsForUsers in matrix factorization model for PySpark Author: Vladimir Vladimirov <vladimir.vladimirov@magnetic.com> Closes #8700 from smartkiwi/SPARK-10535_.
*	[SPARK-10959] [PYSPARK] StreamingLogisticRegressionWithSGD does not train ↵	Bryan Cutler	2015-10-08	2	-2/+3
\| \| \| \| \| \| \| \| \| \|	with given regParam and convergenceTol parameters These params were being passed into the StreamingLogisticRegressionWithSGD constructor, but not transferred to the call for model training. Same with StreamingLinearRegressionWithSGD. I added the params as named arguments to the call and also fixed the intercept parameter, which was being passed as regularization value. Author: Bryan Cutler <bjcutler@us.ibm.com> Closes #9002 from BryanCutler/StreamingSGD-convergenceTol-bug-10959.
*	[SPARK-10973] [ML] [PYTHON] __gettitem__ method throws IndexError exception ↵	zero323	2015-10-08	2	-5/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	when we… __gettitem__ method throws IndexError exception when we try to access index after the last non-zero entry from pyspark.mllib.linalg import Vectors sv = Vectors.sparse(5, {1: 3}) sv[0] ## 0.0 sv[1] ## 3.0 sv[2] ## Traceback (most recent call last): ## File "<stdin>", line 1, in <module> ## File "/python/pyspark/mllib/linalg/__init__.py", line 734, in __getitem__ ## row_ind = inds[insert_index] ## IndexError: index out of bounds Author: zero323 <matthew.szymkiewicz@gmail.com> Closes #9009 from zero323/sparse_vector_index_error.
*	[SPARK-9774] [ML] [PYSPARK] Add python api for ml regression isotonicregression	Holden Karau	2015-10-07	3	-1/+149
\| \| \| \| \| \| \| \|	Add the Python API for isotonicregression. Author: Holden Karau <holden@pigscanfly.ca> Closes #8214 from holdenk/SPARK-9774-add-python-api-for-ml-regression-isotonicregression.