diff options
author | Peter Rudenko <petro.rudenko@gmail.com> | 2015-02-15 20:51:32 -0800 |
---|---|---|
committer | Xiangrui Meng <meng@databricks.com> | 2015-02-15 20:51:32 -0800 |
commit | c78a12c4cc4d4312c4ee1069d3b218882d32d678 (patch) | |
tree | ba52dd6941c4c3c561b5a7726bfb52ef110a6507 /mllib | |
parent | acf2558dc92901c342262c35eebb95f2a9b7a9ae (diff) | |
download | spark-c78a12c4cc4d4312c4ee1069d3b218882d32d678.tar.gz spark-c78a12c4cc4d4312c4ee1069d3b218882d32d678.tar.bz2 spark-c78a12c4cc4d4312c4ee1069d3b218882d32d678.zip |
[Ml] SPARK-5796 Don't transform data on a last estimator in Pipeline
If it's a last estimator in Pipeline there's no need to transform data, since there's no next stage that would consume this data.
Author: Peter Rudenko <petro.rudenko@gmail.com>
Closes #4590 from petro-rudenko/patch-1 and squashes the following commits:
d13ec33 [Peter Rudenko] [Ml] SPARK-5796 Don't transform data on a last estimator in Pipeline
Diffstat (limited to 'mllib')
-rw-r--r-- | mllib/src/main/scala/org/apache/spark/ml/Pipeline.scala | 4 |
1 files changed, 3 insertions, 1 deletions
diff --git a/mllib/src/main/scala/org/apache/spark/ml/Pipeline.scala b/mllib/src/main/scala/org/apache/spark/ml/Pipeline.scala index bb291e6e1f..5607ed21af 100644 --- a/mllib/src/main/scala/org/apache/spark/ml/Pipeline.scala +++ b/mllib/src/main/scala/org/apache/spark/ml/Pipeline.scala @@ -114,7 +114,9 @@ class Pipeline extends Estimator[PipelineModel] { throw new IllegalArgumentException( s"Do not support stage $stage of type ${stage.getClass}") } - curDataset = transformer.transform(curDataset, paramMap) + if (index < indexOfLastEstimator) { + curDataset = transformer.transform(curDataset, paramMap) + } transformers += transformer } else { transformers += stage.asInstanceOf[Transformer] |