aboutsummaryrefslogtreecommitdiff
path: root/python/pyspark/ml
Commit message (Collapse)AuthorAgeFilesLines
* [SPARK-9690][ML][PYTHON] pyspark CrossValidator random seedMartin Menestret2015-12-161-7/+13
| | | | | | | | | | | | | Extend CrossValidator with HasSeed in PySpark. This PR replaces [https://github.com/apache/spark/pull/7997] CC: yanboliang thunterdb mmenestret Would one of you mind taking a look? Thanks! Author: Joseph K. Bradley <joseph@databricks.com> Author: Martin MENESTRET <mmenestret@ippon.fr> Closes #10268 from jkbradley/pyspark-cv-seed.
* [MINOR][ML] Use coefficients replace weightsYanbo Liang2015-12-032-2/+2
| | | | | | | | | Use ```coefficients``` replace ```weights```, I wish they are the last two. mengxr Author: Yanbo Liang <ybliang8@gmail.com> Closes #10065 from yanboliang/coefficients.
* [SPARK-11875][ML][PYSPARK] Update doc for PySpark HasCheckpointIntervalYanbo Liang2015-11-192-9/+11
| | | | | | | | | * Update doc for PySpark ```HasCheckpointInterval``` that users can understand how to disable checkpoint. * Update doc for PySpark ```cacheNodeIds``` of ```DecisionTreeParams``` to notify the relationship between ```cacheNodeIds``` and ```checkpointInterval```. Author: Yanbo Liang <ybliang8@gmail.com> Closes #9856 from yanboliang/spark-11875.
* [SPARK-11820][ML][PYSPARK] PySpark LiR & LoR should support weightColYanbo Liang2015-11-182-16/+17
| | | | | | | | [SPARK-7685](https://issues.apache.org/jira/browse/SPARK-7685) and [SPARK-9642](https://issues.apache.org/jira/browse/SPARK-9642) have already supported setting weight column for ```LogisticRegression``` and ```LinearRegression```. It's a very important feature, PySpark should also support. mengxr Author: Yanbo Liang <ybliang8@gmail.com> Closes #9811 from yanboliang/spark-11820.
* [SPARK-10280][MLLIB][PYSPARK][DOCS] Add @since annotation to ↵Yu ISHIKAWA2015-11-091-0/+56
| | | | | | | | pyspark.ml.classification Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com> Closes #8690 from yu-iskw/SPARK-10280.
* [SPARK-10116][CORE] XORShiftRandom.hashSeed is random in high bitsImran Rashid2015-11-062-13/+13
| | | | | | | | | | | | https://issues.apache.org/jira/browse/SPARK-10116 This is really trivial, just happened to notice it -- if `XORShiftRandom.hashSeed` is really supposed to have random bits throughout (as the comment implies), it needs to do something for the conversion to `long`. mengxr mkolod Author: Imran Rashid <irashid@cloudera.com> Closes #8314 from squito/SPARK-10116.
* [SPARK-11473][ML] R-like summary statistics with intercept for OLS via ↵Yanbo Liang2015-11-051-8/+8
| | | | | | | | | | normal equation solver Follow up [SPARK-9836](https://issues.apache.org/jira/browse/SPARK-9836), we should also support summary statistics for ```intercept```. Author: Yanbo Liang <ybliang8@gmail.com> Closes #9485 from yanboliang/spark-11473.
* [SPARK-11527][ML][PYSPARK] PySpark AFTSurvivalRegressionModel should expose ↵Yanbo Liang2015-11-051-0/+24
| | | | | | | | | | coefficients/intercept/scale PySpark ```AFTSurvivalRegressionModel``` should expose coefficients/intercept/scale. mengxr vectorijk Author: Yanbo Liang <ybliang8@gmail.com> Closes #9492 from yanboliang/spark-11527.
* [SPARK-10592] [ML] [PySpark] Deprecate weights and use coefficients instead ↵vectorijk2015-11-022-0/+25
| | | | | | | | | | in ML models Deprecated in `LogisticRegression` and `LinearRegression` Author: vectorijk <jiangkai@gmail.com> Closes #9311 from vectorijk/spark-10592.
* [SPARK-10286][ML][PYSPARK][DOCS] Add @since annotation to pyspark.ml.param ↵lihao2015-11-024-0/+230
| | | | | | | | and pyspark.ml.* Author: lihao <lihaowhu@gmail.com> Closes #9275 from lidinghao/SPARK-10286.
* [SPARK-11367][ML][PYSPARK] Python LinearRegression should support setting solverYanbo Liang2015-10-283-22/+37
| | | | | | | | [SPARK-10668](https://issues.apache.org/jira/browse/SPARK-10668) has provided ```WeightedLeastSquares``` solver("normal") in ```LinearRegression``` with L2 regularization in Scala and R, Python ML ```LinearRegression``` should also support setting solver("auto", "normal", "l-bfgs") Author: Yanbo Liang <ybliang8@gmail.com> Closes #9328 from yanboliang/spark-11367.
* [SPARK-10024][PYSPARK] Python API RF and GBT related params clear upvectorijk2015-10-272-338/+168
| | | | | | | | | implement {RandomForest, GBT, TreeEnsemble, TreeClassifier, TreeRegressor}Params for Python API in pyspark/ml/{classification, regression}.py Author: vectorijk <jiangkai@gmail.com> Closes #9233 from vectorijk/spark-10024.
* [SPARK-7021] Add JUnit output for Python unit testsGábor Lipták2015-10-221-1/+8
| | | | | | | | WIP Author: Gábor Lipták <gliptak@gmail.com> Closes #8323 from gliptak/SPARK-7021.
* [MINOR][ML] fix doc warningsXiangrui Meng2015-10-201-0/+1
| | | | | | | | | | | | | Without an empty line, sphinx will treat doctest as docstring. holdenk ~~~ /Users/meng/src/spark/python/pyspark/ml/feature.py:docstring of pyspark.ml.feature.CountVectorizer:3: ERROR: Undefined substitution referenced: "label|raw |vectors | +-----+---------------+-------------------------+ |0 |[a, b, c] |(3,[0,1,2],[1.0,1.0,1.0])". /Users/meng/src/spark/python/pyspark/ml/feature.py:docstring of pyspark.ml.feature.CountVectorizer:3: ERROR: Undefined substitution referenced: "1 |[a, b, b, c, a]|(3,[0,1,2],[2.0,2.0,1.0])". ~~~ Author: Xiangrui Meng <meng@databricks.com> Closes #9188 from mengxr/py-count-vec-doc-fix.
* [SPARK-10767][PYSPARK] Make pyspark shared params codegen more consistentHolden Karau2015-10-203-65/+65
| | | | | | | | Namely "." shows up in some places in the template when using the param docstring and not in others Author: Holden Karau <holden@pigscanfly.ca> Closes #9017 from holdenk/SPARK-10767-Make-pyspark-shared-params-codegen-more-consistent.
* [SPARK-9774] [ML] [PYSPARK] Add python api for ml regression isotonicregressionHolden Karau2015-10-073-1/+149
| | | | | | | | Add the Python API for isotonicregression. Author: Holden Karau <holden@pigscanfly.ca> Closes #8214 from holdenk/SPARK-9774-add-python-api-for-ml-regression-isotonicregression.
* [SPARK-10957] [ML] setParams changes quantileProbabilities unexpectly in ↵Xiangrui Meng2015-10-061-5/+1
| | | | | | | | | | PySpark's AFTSurvivalRegression If user doesn't specify `quantileProbs` in `setParams`, it will get reset to the default value. We don't need special handling here. vectorijk yanboliang Author: Xiangrui Meng <meng@databricks.com> Closes #9001 from mengxr/SPARK-10957.
* [SPARK-10688] [ML] [PYSPARK] Python API for AFTSurvivalRegressionvectorijk2015-10-061-2/+169
| | | | | | | | Implement Python API for AFTSurvivalRegression Author: vectorijk <jiangkai@gmail.com> Closes #8926 from vectorijk/spark-10688.
* [SPARK-9681] [ML] Support R feature interactions in RFormulaEric Liang2015-09-251-1/+1
| | | | | | | | | | | | This integrates the Interaction feature transformer with SparkR R formula support (i.e. support `:`). To generate reasonable ML attribute names for feature interactions, it was necessary to add the ability to read attribute the original attribute names back from `StructField`, and also to specify custom group prefixes in `VectorAssembler`. This also has the side-benefit of cleaning up the double-underscores in the attributes generated for non-interaction terms. mengxr Author: Eric Liang <ekl@databricks.com> Closes #8830 from ericl/interaction-2.
* [DOC] [PYSPARK] [MLLIB] Added newlines to docstrings to fix parameter formattingnoelsmith2015-09-214-0/+9
| | | | | | | | | | | | | | Added newlines before `:param ...:` and `:return:` markup. Without these, parameter lists aren't formatted correctly in the API docs. I.e: ![screen shot 2015-09-21 at 21 49 26](https://cloud.githubusercontent.com/assets/11915197/10004686/de3c41d4-60aa-11e5-9c50-a46dcb51243f.png) .. looks like this once newline is added: ![screen shot 2015-09-21 at 21 50 14](https://cloud.githubusercontent.com/assets/11915197/10004706/f86bfb08-60aa-11e5-8524-ae4436713502.png) Author: noelsmith <mail@noelsmith.com> Closes #8851 from noel-smith/docstring-missing-newline-fix.
* [SPARK-9769] [ML] [PY] add python api for countvectorizermodelHolden Karau2015-09-211-6/+142
| | | | | | | | From JIRA: Add Python API, user guide and example for ml.feature.CountVectorizerModel Author: Holden Karau <holden@pigscanfly.ca> Closes #8561 from holdenk/SPARK-9769-add-python-api-for-countvectorizermodel.
* [SPARK-10615] [PYSPARK] change assertEquals to assertEqualYanbo Liang2015-09-181-8/+8
| | | | | | | | As ```assertEquals``` is deprecated, so we need to change ```assertEquals``` to ```assertEqual``` for existing python unit tests. Author: Yanbo Liang <ybliang8@gmail.com> Closes #8814 from yanboliang/spark-10615.
* [SPARK-10282] [ML] [PYSPARK] [DOCS] Add @since annotation to ↵Yu ISHIKAWA2015-09-171-0/+28
| | | | | | | | pyspark.ml.recommendation Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com> Closes #8692 from yu-iskw/SPARK-10282.
* [SPARK-10281] [ML] [PYSPARK] [DOCS] Add @since annotation to ↵Yu ISHIKAWA2015-09-171-0/+13
| | | | | | | | pyspark.ml.clustering Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com> Closes #8691 from yu-iskw/SPARK-10281.
* [SPARK-10283] [ML] [PYSPARK] [DOCS] Add @since annotation to ↵Yu ISHIKAWA2015-09-171-0/+65
| | | | | | | | pyspark.ml.regression Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com> Closes #8693 from yu-iskw/SPARK-10283.
* [SPARK-10284] [ML] [PYSPARK] [DOCS] Add @since annotation to pyspark.ml.tuningYu ISHIKAWA2015-09-171-0/+28
| | | | | | Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com> Closes #8694 from yu-iskw/SPARK-10284.
* [SPARK-8530] [ML] add python API for MinMaxScalerYuhao Yang2015-09-111-5/+99
| | | | | | | | | | | jira: https://issues.apache.org/jira/browse/SPARK-8530 add python API for MinMaxScaler jira for MinMaxScaler: https://issues.apache.org/jira/browse/SPARK-7514 Author: Yuhao Yang <hhbyyh@gmail.com> Closes #7150 from hhbyyh/pythonMinMax.
* [MINOR] [MLLIB] [ML] [DOC] Minor doc fixes for StringIndexer and MetadataUtilsJoseph K. Bradley2015-09-111-8/+8
| | | | | | | | | | | | Changes: * Make Scala doc for StringIndexerInverse clearer. Also remove Scala doc from transformSchema, so that the doc is inherited. * MetadataUtils.scala: “ Helper utilities for tree-based algorithms” —> not just trees anymore CC: holdenk mengxr Author: Joseph K. Bradley <joseph@databricks.com> Closes #8679 from jkbradley/doc-fixes-1.5.
* [SPARK-9773] [ML] [PySpark] Add Python API for MultilayerPerceptronClassifierYanbo Liang2015-09-111-1/+131
| | | | | | | | Add Python API for ```MultilayerPerceptronClassifier```. Author: Yanbo Liang <ybliang8@gmail.com> Closes #8067 from yanboliang/SPARK-9773.
* [SPARK-10026] [ML] [PySpark] Implement some common Params for regression in ↵Yanbo Liang2015-09-114-96/+143
| | | | | | | | | | | | | | | | | PySpark LinearRegression and LogisticRegression lack of some Params for Python, and some Params are not shared classes which lead we need to write them for each class. These kinds of Params are list here: ```scala HasElasticNetParam HasFitIntercept HasStandardization HasThresholds ``` Here we implement them in shared params at Python side and make LinearRegression/LogisticRegression parameters peer with Scala one. Author: Yanbo Liang <ybliang8@gmail.com> Closes #8508 from yanboliang/spark-10026.
* [SPARK-10027] [ML] [PySpark] Add Python API missing methods for ml.featureYanbo Liang2015-09-103-8/+59
| | | | | | | | | | | Missing method of ml.feature are listed here: ```StringIndexer``` lacks of parameter ```handleInvalid```. ```StringIndexerModel``` lacks of method ```labels```. ```VectorIndexerModel``` lacks of methods ```numFeatures``` and ```categoryMaps```. Author: Yanbo Liang <ybliang8@gmail.com> Closes #8313 from yanboliang/spark-10027.
* [SPARK-9772] [PYSPARK] [ML] Add Python API for ml.feature.VectorSlicerYanbo Liang2015-09-091-5/+90
| | | | | | | | Add Python API for ml.feature.VectorSlicer. Author: Yanbo Liang <ybliang8@gmail.com> Closes #8102 from yanboliang/SPARK-9772.
* [SPARK-9654] [ML] [PYSPARK] Add IndexToString to PySparkHolden Karau2015-09-082-5/+72
| | | | | | | | Adds IndexToString to PySpark. Author: Holden Karau <holden@pigscanfly.ca> Closes #7976 from holdenk/SPARK-9654-add-string-indexer-inverse-in-pyspark.
* [SPARK-10094] Pyspark ML Feature transformers marked as experimentalnoelsmith2015-09-081-0/+52
| | | | | | | | Modified class-level docstrings to mark all feature transformers in pyspark.ml as experimental. Author: noelsmith <mail@noelsmith.com> Closes #8623 from noel-smith/SPARK-10094-mark-pyspark-ml-trans-exp.
* [SPARK-9679] [ML] [PYSPARK] Add Python API for Stop Words RemoverHolden Karau2015-09-012-4/+89
| | | | | | | | Add a python API for the Stop Words Remover. Author: Holden Karau <holden@pigscanfly.ca> Closes #8118 from holdenk/SPARK-9679-python-StopWordsRemover.
* [SPARK-10355] [ML] [PySpark] Add Python API for SQLTransformerYanbo Liang2015-08-311-3/+54
| | | | | | | | Add Python API for SQLTransformer Author: Yanbo Liang <ybliang8@gmail.com> Closes #8527 from yanboliang/spark-10355.
* [SPARK-8472] [ML] [PySpark] Python API for DCTYanbo Liang2015-08-311-1/+64
| | | | | | | | Add Python API for ml.feature.DCT. Author: Yanbo Liang <ybliang8@gmail.com> Closes #8485 from yanboliang/spark-8472.
* [SPARK-10188] [PYSPARK] Pyspark CrossValidator with RMSE selects incorrect modelnoelsmith2015-08-273-1/+104
| | | | | | | | | | | | | * Added isLargerBetter() method to Pyspark Evaluator to match the Scala version. * JavaEvaluator delegates isLargerBetter() to underlying Scala object. * Added check for isLargerBetter() in CrossValidator to determine whether to use argmin or argmax. * Added test cases for where smaller is better (RMSE) and larger is better (R-Squared). (This contribution is my original work and that I license the work to the project under Sparks' open source license) Author: noelsmith <mail@noelsmith.com> Closes #8399 from noel-smith/pyspark-rmse-xval-fix.
* [SPARK-10097] Adds `shouldMaximize` flag to `ml.evaluation.Evaluator`Feynman Liang2015-08-191-2/+2
| | | | | | | | | | | | | Previously, users of evaluator (`CrossValidator` and `TrainValidationSplit`) would only maximize the metric in evaluator, leading to a hacky solution which negated metrics to be minimized and caused erroneous negative values to be reported to the user. This PR adds a `isLargerBetter` attribute to the `Evaluator` base class, instructing users of `Evaluator` on whether the chosen metric should be maximized or minimized. CC jkbradley Author: Feynman Liang <fliang@databricks.com> Author: Joseph K. Bradley <joseph@databricks.com> Closes #8290 from feynmanliang/SPARK-10097.
* [SPARK-9768] [PYSPARK] [ML] Add Python API and user guide for ↵Yanbo Liang2015-08-171-5/+62
| | | | | | | | | | ml.feature.ElementwiseProduct Add Python API, user guide and example for ml.feature.ElementwiseProduct. Author: Yanbo Liang <ybliang8@gmail.com> Closes #8061 from yanboliang/SPARK-9768.
* [SPARK-9828] [PYSPARK] Mutable values should not be default argumentsMechCoder2015-08-144-14/+28
| | | | | | Author: MechCoder <manojkumarsivaraj334@gmail.com> Closes #8110 from MechCoder/spark-9828.
* [SPARK-9918] [MLLIB] remove runs from k-means and rename epsilon to tolXiangrui Meng2015-08-121-50/+13
| | | | | | | | | | | | | | | | This requires some discussion. I'm not sure whether `runs` is a useful parameter. It certainly complicates the implementation. We might want to optimize the k-means implementation with block matrix operations. In this case, having `runs` may not be worth the trade-off. Also it increases the communication cost in a single job, which might cause other issues. This PR also renames `epsilon` to `tol` to have consistent naming among algorithms. The Python constructor is updated to include all parameters. jkbradley yu-iskw Author: Xiangrui Meng <meng@databricks.com> Closes #8148 from mengxr/SPARK-9918 and squashes the following commits: 149b9e5 [Xiangrui Meng] fix constructor in Python and rename epsilon to tol 3cc15b3 [Xiangrui Meng] fix test and change initStep to initSteps in python a0a0274 [Xiangrui Meng] remove runs from k-means in the pipeline API
* [SPARK-9789] [ML] Added logreg threshold param backJoseph K. Bradley2015-08-121-36/+62
| | | | | | | | | | Reinstated LogisticRegression.threshold Param for binary compatibility. Param thresholds overrides threshold, if set. CC: mengxr dbtsai feynmanliang Author: Joseph K. Bradley <joseph@databricks.com> Closes #8079 from jkbradley/logreg-reinstate-threshold.
* [SPARK-9766] [ML] [PySpark] check and add miss docs for PySpark MLYanbo Liang2015-08-124-8/+20
| | | | | | | | Check and add miss docs for PySpark ML (this issue only check miss docs for o.a.s.ml not o.a.s.mllib). Author: Yanbo Liang <ybliang8@gmail.com> Closes #8059 from yanboliang/SPARK-9766.
* [SPARK-9533] [PYSPARK] [ML] Add missing methods in Word2Vec MLMechCoder2015-08-061-0/+40
| | | | | | | | | | | After https://github.com/apache/spark/pull/7263 it is pretty straightforward to Python wrappers. Author: MechCoder <manojkumarsivaraj334@gmail.com> Closes #7930 from MechCoder/spark-9533 and squashes the following commits: 1bea394 [MechCoder] make getVectors a lazy val 5522756 [MechCoder] [SPARK-9533] [PySpark] [ML] Add missing methods in Word2Vec ML
* [SPARK-9447] [ML] [PYTHON] Added HasRawPredictionCol, HasProbabilityCol to ↵Joseph K. Bradley2015-08-041-1/+12
| | | | | | | | | | | | | | | RandomForestClassifier Added HasRawPredictionCol, HasProbabilityCol to RandomForestClassifier, plus doc tests for those columns. CC: holdenk yanboliang Author: Joseph K. Bradley <joseph@databricks.com> Closes #7903 from jkbradley/rf-prob-python and squashes the following commits: c62a83f [Joseph K. Bradley] made unit test more robust 14eeba2 [Joseph K. Bradley] added HasRawPredictionCol, HasProbabilityCol to RandomForestClassifier in PySpark
* [SPARK-8069] [ML] Add multiclass thresholds for ProbabilisticClassifierHolden Karau2015-08-041-12/+60
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This PR replaces the old "threshold" with a generalized "thresholds" Param. We keep getThreshold,setThreshold for backwards compatibility for binary classification. Note that the primary author of this PR is holdenk Author: Holden Karau <holden@pigscanfly.ca> Author: Joseph K. Bradley <joseph@databricks.com> Closes #7909 from jkbradley/holdenk-SPARK-8069-add-cutoff-aka-threshold-to-random-forest and squashes the following commits: 3952977 [Joseph K. Bradley] fixed pyspark doc test 85febc8 [Joseph K. Bradley] made python unit tests a little more robust 7eb1d86 [Joseph K. Bradley] small cleanups 6cc2ed8 [Joseph K. Bradley] Fixed remaining merge issues. 0255e44 [Joseph K. Bradley] Many cleanups for thresholds, some more tests 7565a60 [Holden Karau] fix pep8 style checks, add a getThreshold method similar to our LogisticRegression.scala one for API compat be87f26 [Holden Karau] Convert threshold to thresholds in the python code, add specialized support for Array[Double] to shared parems codegen, etc. 6747dad [Holden Karau] Override raw2prediction for ProbabilisticClassifier, fix some tests 25df168 [Holden Karau] Fix handling of thresholds in LogisticRegression c02d6c0 [Holden Karau] No default for thresholds 5e43628 [Holden Karau] CR feedback and fixed the renamed test f3fbbd1 [Holden Karau] revert the changes to random forest :( 51f581c [Holden Karau] Add explicit types to public methods, fix long line f7032eb [Holden Karau] Fix a java test bug, remove some unecessary changes adf15b4 [Holden Karau] rename the classifier suite test to ProbabilisticClassifierSuite now that we only have it in Probabilistic 398078a [Holden Karau] move the thresholding around a bunch based on the design doc 4893bdc [Holden Karau] Use numtrees of 3 since previous result was tied (one tree for each) and the switch from different max methods picked a different element (since they were equal I think this is ok) 638854c [Holden Karau] Add a scala RandomForestClassifierSuite test based on corresponding python test e09919c [Holden Karau] Fix return type, I need more coffee.... 8d92cac [Holden Karau] Use ClassifierParams as the head 3456ed3 [Holden Karau] Add explicit return types even though just test a0f3b0c [Holden Karau] scala style fixes 6f14314 [Holden Karau] Since hasthreshold/hasthresholds is in root classifier now ffc8dab [Holden Karau] Update the sharedParams 0420290 [Holden Karau] Allow us to override the get methods selectively 978e77a [Holden Karau] Move HasThreshold into classifier params and start defining the overloaded getThreshold/getThresholds functions 1433e52 [Holden Karau] Revert "try and hide threshold but chainges the API so no dice there" 1f09a2e [Holden Karau] try and hide threshold but chainges the API so no dice there efb9084 [Holden Karau] move setThresholds only to where its used 6b34809 [Holden Karau] Add a test with thresholding for the RFCS 74f54c3 [Holden Karau] Fix creation of vote array 1986fa8 [Holden Karau] Setting the thresholds only makes sense if the underlying class hasn't overridden predict, so lets push it down. 2f44b18 [Holden Karau] Add a global default of null for thresholds param f338cfc [Holden Karau] Wait that wasn't a good idea, Revert "Some progress towards unifying threshold and thresholds" 634b06f [Holden Karau] Some progress towards unifying threshold and thresholds 85c9e01 [Holden Karau] Test passes again... little fnur 099c0f3 [Holden Karau] Move thresholds around some more (set on model not trainer) 0f46836 [Holden Karau] Start adding a classifiersuite f70eb5e [Holden Karau] Fix test compile issues a7d59c8 [Holden Karau] Move thresholding into Classifier trait 5d999d2 [Holden Karau] Some more progress, start adding a test (maybe try and see if we can find a better thing to use for the base of the test) 1fed644 [Holden Karau] Use thresholds to scale scores in random forest classifcation 31d6bf2 [Holden Karau] Start threading the threshold info through 0ef228c [Holden Karau] Add hasthresholds
* [SPARK-9544] [MLLIB] add Python API for RFormulaXiangrui Meng2015-08-031-1/+84
| | | | | | | | | | | | | Add Python API for RFormula. Similar to other feature transformers in Python. This is just a thin wrapper over the Scala implementation. ericl MechCoder Author: Xiangrui Meng <meng@databricks.com> Closes #7879 from mengxr/SPARK-9544 and squashes the following commits: 3d5ff03 [Xiangrui Meng] add an doctest for . and - 5e969a5 [Xiangrui Meng] fix pydoc 1cd41f8 [Xiangrui Meng] organize imports 3c18b10 [Xiangrui Meng] add Python API for RFormula
* [SPARK-9536] [SPARK-9537] [SPARK-9538] [ML] [PYSPARK] ml.classification ↵Yanbo Liang2015-08-021-18/+43
| | | | | | | | | | | | | | | | | support raw and probability prediction for PySpark Make the following ml.classification class support raw and probability prediction for PySpark: ```scala NaiveBayesModel DecisionTreeClassifierModel LogisticRegressionModel ``` Author: Yanbo Liang <ybliang8@gmail.com> Closes #7866 from yanboliang/spark-9536-9537 and squashes the following commits: 2934dab [Yanbo Liang] ml.NaiveBayes, ml.DecisionTreeClassifier and ml.LogisticRegression support probability prediction
* [SPARK-9214] [ML] [PySpark] support ml.NaiveBayes for PythonYanbo Liang2015-07-301-1/+115
| | | | | | | | | | | | | | support ml.NaiveBayes for Python Author: Yanbo Liang <ybliang8@gmail.com> Closes #7568 from yanboliang/spark-9214 and squashes the following commits: 5ee3fd6 [Yanbo Liang] fix typos 3ecd046 [Yanbo Liang] fix typos f9c94d1 [Yanbo Liang] change lambda_ to smoothing and fix other issues 180452a [Yanbo Liang] fix typos 7dda1f4 [Yanbo Liang] support ml.NaiveBayes for Python