aboutsummaryrefslogtreecommitdiff
path: root/python/pyspark/ml/feature.py
diff options
context:
space:
mode:
authorEric Liang <ekl@databricks.com>2015-09-25 00:43:22 -0700
committerXiangrui Meng <meng@databricks.com>2015-09-25 00:43:22 -0700
commit922338812c03eba43f2f1a6c414d1b6b049811cf (patch)
tree2df940a08de0645e2b88ba69d0c63931f9ec1f2f /python/pyspark/ml/feature.py
parent21fd12cb17b9e08a0cc49b4fda801af947a4183b (diff)
downloadspark-922338812c03eba43f2f1a6c414d1b6b049811cf.tar.gz
spark-922338812c03eba43f2f1a6c414d1b6b049811cf.tar.bz2
spark-922338812c03eba43f2f1a6c414d1b6b049811cf.zip
[SPARK-9681] [ML] Support R feature interactions in RFormula
This integrates the Interaction feature transformer with SparkR R formula support (i.e. support `:`). To generate reasonable ML attribute names for feature interactions, it was necessary to add the ability to read attribute the original attribute names back from `StructField`, and also to specify custom group prefixes in `VectorAssembler`. This also has the side-benefit of cleaning up the double-underscores in the attributes generated for non-interaction terms. mengxr Author: Eric Liang <ekl@databricks.com> Closes #8830 from ericl/interaction-2.
Diffstat (limited to 'python/pyspark/ml/feature.py')
-rw-r--r--python/pyspark/ml/feature.py2
1 files changed, 1 insertions, 1 deletions
diff --git a/python/pyspark/ml/feature.py b/python/pyspark/ml/feature.py
index f41d72f877..a4e60f916b 100644
--- a/python/pyspark/ml/feature.py
+++ b/python/pyspark/ml/feature.py
@@ -1850,7 +1850,7 @@ class RFormula(JavaEstimator, HasFeaturesCol, HasLabelCol):
Implements the transforms required for fitting a dataset against an
R model formula. Currently we support a limited subset of the R
- operators, including '~', '+', '-', and '.'. Also see the R formula
+ operators, including '~', '.', ':', '+', and '-'. Also see the R formula
docs:
http://stat.ethz.ch/R-manual/R-patched/library/stats/html/formula.html