aboutsummaryrefslogtreecommitdiff
path: root/python/pyspark/ml/pipeline.py
diff options
context:
space:
mode:
authorXiangrui Meng <meng@databricks.com>2015-02-15 20:29:26 -0800
committerXiangrui Meng <meng@databricks.com>2015-02-15 20:29:26 -0800
commitcd4a15366244657c4b7936abe5054754534366f2 (patch)
treefbee98a5031440c879705f2c7f9717b5d815c66e /python/pyspark/ml/pipeline.py
parent836577b382695558f5c97d94ee725d0156ebfad2 (diff)
downloadspark-cd4a15366244657c4b7936abe5054754534366f2.tar.gz
spark-cd4a15366244657c4b7936abe5054754534366f2.tar.bz2
spark-cd4a15366244657c4b7936abe5054754534366f2.zip
[SPARK-5769] Set params in constructors and in setParams in Python ML pipelines
This PR allow Python users to set params in constructors and in setParams, where we use decorator `keyword_only` to force keyword arguments. The trade-off is discussed in the design doc of SPARK-4586. Generated doc: ![screen shot 2015-02-12 at 3 06 58 am](https://cloud.githubusercontent.com/assets/829644/6166491/9cfcd06a-b265-11e4-99ea-473d866634fc.png) CC: davies rxin Author: Xiangrui Meng <meng@databricks.com> Closes #4564 from mengxr/py-pipeline-kw and squashes the following commits: fedf720 [Xiangrui Meng] use toDF d565f2c [Xiangrui Meng] Merge remote-tracking branch 'apache/master' into py-pipeline-kw cbc15d3 [Xiangrui Meng] fix style 5032097 [Xiangrui Meng] update pipeline signature 950774e [Xiangrui Meng] simplify keyword_only and update constructor/setParams signatures fdde5fc [Xiangrui Meng] fix style c9384b8 [Xiangrui Meng] fix sphinx doc 8e59180 [Xiangrui Meng] add setParams and make constructors take params, where we force keyword args
Diffstat (limited to 'python/pyspark/ml/pipeline.py')
-rw-r--r--python/pyspark/ml/pipeline.py19
1 files changed, 17 insertions, 2 deletions
diff --git a/python/pyspark/ml/pipeline.py b/python/pyspark/ml/pipeline.py
index 2d239f8c80..18d8a58f35 100644
--- a/python/pyspark/ml/pipeline.py
+++ b/python/pyspark/ml/pipeline.py
@@ -18,7 +18,7 @@
from abc import ABCMeta, abstractmethod
from pyspark.ml.param import Param, Params
-from pyspark.ml.util import inherit_doc
+from pyspark.ml.util import inherit_doc, keyword_only
__all__ = ['Estimator', 'Transformer', 'Pipeline', 'PipelineModel']
@@ -89,10 +89,16 @@ class Pipeline(Estimator):
identity transformer.
"""
- def __init__(self):
+ @keyword_only
+ def __init__(self, stages=[]):
+ """
+ __init__(self, stages=[])
+ """
super(Pipeline, self).__init__()
#: Param for pipeline stages.
self.stages = Param(self, "stages", "pipeline stages")
+ kwargs = self.__init__._input_kwargs
+ self.setParams(**kwargs)
def setStages(self, value):
"""
@@ -110,6 +116,15 @@ class Pipeline(Estimator):
if self.stages in self.paramMap:
return self.paramMap[self.stages]
+ @keyword_only
+ def setParams(self, stages=[]):
+ """
+ setParams(self, stages=[])
+ Sets params for Pipeline.
+ """
+ kwargs = self.setParams._input_kwargs
+ return self._set_params(**kwargs)
+
def fit(self, dataset, params={}):
paramMap = self._merge_params(params)
stages = paramMap[self.stages]