aboutsummaryrefslogtreecommitdiff
path: root/mllib
diff options
context:
space:
mode:
authorLiu Xiang <lxmtlab@gmail.com>2016-02-11 17:28:37 -0800
committerXiangrui Meng <meng@databricks.com>2016-02-11 17:28:37 -0800
commita5257048d74359c3fa7810009be1d60d370e2896 (patch)
tree321a06757666d455bf58564eb87c0c7c71ddc4e7 /mllib
parentb35467388612167f0bc3d17142c21a406f6c620d (diff)
downloadspark-a5257048d74359c3fa7810009be1d60d370e2896.tar.gz
spark-a5257048d74359c3fa7810009be1d60d370e2896.tar.bz2
spark-a5257048d74359c3fa7810009be1d60d370e2896.zip
[SPARK-12765][ML][COUNTVECTORIZER] fix CountVectorizer.transform's lost transformSchema
https://issues.apache.org/jira/browse/SPARK-12765 Author: Liu Xiang <lxmtlab@gmail.com> Closes #10720 from sloth2012/sloth.
Diffstat (limited to 'mllib')
-rw-r--r--mllib/src/main/scala/org/apache/spark/ml/feature/CountVectorizer.scala1
1 files changed, 1 insertions, 0 deletions
diff --git a/mllib/src/main/scala/org/apache/spark/ml/feature/CountVectorizer.scala b/mllib/src/main/scala/org/apache/spark/ml/feature/CountVectorizer.scala
index 10dcda2382..d5cb05f29b 100644
--- a/mllib/src/main/scala/org/apache/spark/ml/feature/CountVectorizer.scala
+++ b/mllib/src/main/scala/org/apache/spark/ml/feature/CountVectorizer.scala
@@ -210,6 +210,7 @@ class CountVectorizerModel(override val uid: String, val vocabulary: Array[Strin
private var broadcastDict: Option[Broadcast[Map[String, Int]]] = None
override def transform(dataset: DataFrame): DataFrame = {
+ transformSchema(dataset.schema, logging = true)
if (broadcastDict.isEmpty) {
val dict = vocabulary.zipWithIndex.toMap
broadcastDict = Some(dataset.sqlContext.sparkContext.broadcast(dict))