aboutsummaryrefslogtreecommitdiff
path: root/mllib/src/test
diff options
context:
space:
mode:
authorHolden Karau <holden@pigscanfly.ca>2015-08-01 01:09:38 -0700
committerJoseph K. Bradley <joseph@databricks.com>2015-08-01 01:09:38 -0700
commit65038973a17904e0e04d453799ec108af240fbab (patch)
treee90123661088eb27645dcf0f9f684b9f8ab344b2 /mllib/src/test
parent60ea7ab4bbfaea29a6cdf4e0e71ddc56afd04de6 (diff)
downloadspark-65038973a17904e0e04d453799ec108af240fbab.tar.gz
spark-65038973a17904e0e04d453799ec108af240fbab.tar.bz2
spark-65038973a17904e0e04d453799ec108af240fbab.zip
[SPARK-7446] [MLLIB] Add inverse transform for string indexer
It is useful to convert the encoded indices back to their string representation for result inspection. We can add a function which creates an inverse transformation. Author: Holden Karau <holden@pigscanfly.ca> Closes #6339 from holdenk/SPARK-7446-inverse-transform-for-string-indexer and squashes the following commits: 7cdf915 [Holden Karau] scala style comment fix b9cffb6 [Holden Karau] Update the labels param to have the metadata note 6a38edb [Holden Karau] Setting the default needs to come after the value gets defined 9e241d8 [Holden Karau] use Array.empty 21c8cfa [Holden Karau] Merge branch 'master' into SPARK-7446-inverse-transform-for-string-indexer 64dd3a3 [Holden Karau] Merge branch 'master' into SPARK-7446-inverse-transform-for-string-indexer 4f06c59 [Holden Karau] Fix comment styles, use empty array as the default, etc. a60c0e3 [Holden Karau] CR feedback (remove old constructor, add a note about use of setLabels) 1987b95 [Holden Karau] Use default copy 71e8d66 [Holden Karau] Make labels a local param for StringIndexerInverse 8450d0b [Holden Karau] Use the labels param in StringIndexerInverse 7464019 [Holden Karau] Add a labels param 868b1a9 [Holden Karau] Update scaladoc since we don't have labelsCol anymore 5aa38bf [Holden Karau] Add an inverse test using only meta data, pass labels when calling inverse method f3e0c64 [Holden Karau] CR feedback ebed932 [Holden Karau] Add Experimental tag and some scaladocs. Also don't require that the inputCol has the metadata on it, instead have the labelsCol specified when creating the inverse. 03ebf95 [Holden Karau] Add explicit type for invert function ecc65e0 [Holden Karau] Read the metadata correctly, use the array, pass the test a42d773 [Holden Karau] Fix test to supply cols as per new invert method 16cc3c3 [Holden Karau] Add an invert method d4bcb20 [Holden Karau] Make the inverse string indexer into a transformer (still needs test updates but compiles) e8bf3ad [Holden Karau] Merge branch 'master' into SPARK-7446-inverse-transform-for-string-indexer c3fdee1 [Holden Karau] Some WIP refactoring based on jkbradley's CR feedback. Definite work-in-progress 557bef8 [Holden Karau] Instead of using a private inverse transform, add an invert function so we can use it in a pipeline 88779c1 [Holden Karau] fix long line 78b28c1 [Holden Karau] Finish reverse part and add a test :) bb16a6a [Holden Karau] Some progress
Diffstat (limited to 'mllib/src/test')
-rw-r--r--mllib/src/test/scala/org/apache/spark/ml/feature/StringIndexerSuite.scala13
1 files changed, 13 insertions, 0 deletions
diff --git a/mllib/src/test/scala/org/apache/spark/ml/feature/StringIndexerSuite.scala b/mllib/src/test/scala/org/apache/spark/ml/feature/StringIndexerSuite.scala
index 99f82bea42..d0295a0fe2 100644
--- a/mllib/src/test/scala/org/apache/spark/ml/feature/StringIndexerSuite.scala
+++ b/mllib/src/test/scala/org/apache/spark/ml/feature/StringIndexerSuite.scala
@@ -47,6 +47,19 @@ class StringIndexerSuite extends SparkFunSuite with MLlibTestSparkContext {
// a -> 0, b -> 2, c -> 1
val expected = Set((0, 0.0), (1, 2.0), (2, 1.0), (3, 0.0), (4, 0.0), (5, 1.0))
assert(output === expected)
+ // convert reverse our transform
+ val reversed = indexer.invert("labelIndex", "label2")
+ .transform(transformed)
+ .select("id", "label2")
+ assert(df.collect().map(r => (r.getInt(0), r.getString(1))).toSet ===
+ reversed.collect().map(r => (r.getInt(0), r.getString(1))).toSet)
+ // Check invert using only metadata
+ val inverse2 = new StringIndexerInverse()
+ .setInputCol("labelIndex")
+ .setOutputCol("label2")
+ val reversed2 = inverse2.transform(transformed).select("id", "label2")
+ assert(df.collect().map(r => (r.getInt(0), r.getString(1))).toSet ===
+ reversed2.collect().map(r => (r.getInt(0), r.getString(1))).toSet)
}
test("StringIndexer with a numeric input column") {