[SPARK-8473] [SPARK-9889] [ML] User guide and example code for DCT

mengxr jkbradley Author: Feynman Liang <fliang@databricks.com> Closes #8184 from feynmanliang/SPARK-9889-DCT-docs.
author: Feynman Liang <fliang@databricks.com> 2015-08-18 17:54:49 -0700
committer: Joseph K. Bradley <joseph@databricks.com> 2015-08-18 17:54:49 -0700
commit: badf7fa650f9801c70515907fcc26b58d7ec3143 (patch)
tree: 66a6c08a5722e75523802e0bc398b345158fd2d2 /docs/ml-features.md
parent: 9108eff74a2815986fd067b273c2a344b6315405 (diff)
download: spark-badf7fa650f9801c70515907fcc26b58d7ec3143.tar.gz
spark-badf7fa650f9801c70515907fcc26b58d7ec3143.tar.bz2
spark-badf7fa650f9801c70515907fcc26b58d7ec3143.zip
1 files changed, 71 insertions, 0 deletions
diff --git a/docs/ml-features.md b/docs/ml-features.md
index 6b2e36b353..28a61933f8 100644
--- a/docs/ml-features.md
+++ b/docs/ml-features.md
@@ -649,6 +649,77 @@ for expanded in polyDF.select("polyFeatures").take(3):
 </div>
 </div>
 
+## Discrete Cosine Transform (DCT)
+
+The [Discrete Cosine
+Transform](https://en.wikipedia.org/wiki/Discrete_cosine_transform)
+transforms a length $N$ real-valued sequence in the time domain into
+another length $N$ real-valued sequence in the frequency domain. A
+[DCT](api/scala/index.html#org.apache.spark.ml.feature.DCT) class
+provides this functionality, implementing the
+[DCT-II](https://en.wikipedia.org/wiki/Discrete_cosine_transform#DCT-II)
+and scaling the result by $1/\sqrt{2}$ such that the representing matrix
+for the transform is unitary. No shift is applied to the transformed
+sequence (e.g. the $0$th element of the transformed sequence is the
+$0$th DCT coefficient and _not_ the $N/2$th).
+
+<div class="codetabs">
+<div data-lang="scala" markdown="1">
+{% highlight scala %}
+import org.apache.spark.ml.feature.DCT
+import org.apache.spark.mllib.linalg.Vectors
+
+val data = Seq(
+  Vectors.dense(0.0, 1.0, -2.0, 3.0),
+  Vectors.dense(-1.0, 2.0, 4.0, -7.0),
+  Vectors.dense(14.0, -2.0, -5.0, 1.0))
+val df = sqlContext.createDataFrame(data.map(Tuple1.apply)).toDF("features")
+val dct = new DCT()
+  .setInputCol("features")
+  .setOutputCol("featuresDCT")
+  .setInverse(false)
+val dctDf = dct.transform(df)
+dctDf.select("featuresDCT").show(3)
+{% endhighlight %}
+</div>
+
+<div data-lang="java" markdown="1">
+{% highlight java %}
+import java.util.Arrays;
+
+import org.apache.spark.api.java.JavaRDD;
+import org.apache.spark.api.java.JavaSparkContext;
+import org.apache.spark.ml.feature.DCT;
+import org.apache.spark.mllib.linalg.Vector;
+import org.apache.spark.mllib.linalg.VectorUDT;
+import org.apache.spark.mllib.linalg.Vectors;
+import org.apache.spark.sql.DataFrame;
+import org.apache.spark.sql.Row;
+import org.apache.spark.sql.RowFactory;
+import org.apache.spark.sql.SQLContext;
+import org.apache.spark.sql.types.Metadata;
+import org.apache.spark.sql.types.StructField;
+import org.apache.spark.sql.types.StructType;
+
+JavaRDD<Row> data = jsc.parallelize(Arrays.asList(
+  RowFactory.create(Vectors.dense(0.0, 1.0, -2.0, 3.0)),
+  RowFactory.create(Vectors.dense(-1.0, 2.0, 4.0, -7.0)),
+  RowFactory.create(Vectors.dense(14.0, -2.0, -5.0, 1.0))
+));
+StructType schema = new StructType(new StructField[] {
+  new StructField("features", new VectorUDT(), false, Metadata.empty()),
+});
+DataFrame df = jsql.createDataFrame(data, schema);
+DCT dct = new DCT()
+  .setInputCol("features")
+  .setOutputCol("featuresDCT")
+  .setInverse(false);
+DataFrame dctDf = dct.transform(df);
+dctDf.select("featuresDCT").show(3);
+{% endhighlight %}
+</div>
+</div>
+
 ## StringIndexer
 
 `StringIndexer` encodes a string column of labels to a column of label indices.
author	Feynman Liang <fliang@databricks.com>	2015-08-18 17:54:49 -0700
committer	Joseph K. Bradley <joseph@databricks.com>	2015-08-18 17:54:49 -0700
commit	badf7fa650f9801c70515907fcc26b58d7ec3143 (patch)
tree	66a6c08a5722e75523802e0bc398b345158fd2d2 /docs/ml-features.md
parent	9108eff74a2815986fd067b273c2a344b6315405 (diff)
download	spark-badf7fa650f9801c70515907fcc26b58d7ec3143.tar.gz spark-badf7fa650f9801c70515907fcc26b58d7ec3143.tar.bz2 spark-badf7fa650f9801c70515907fcc26b58d7ec3143.zip