diff options
author | Xiangrui Meng <meng@databricks.com> | 2015-02-18 10:09:56 -0800 |
---|---|---|
committer | Xiangrui Meng <meng@databricks.com> | 2015-02-18 10:09:56 -0800 |
commit | 85e9d091d5d785d412e91038c2490131e64f5634 (patch) | |
tree | 79d956ee9eb4859e32e0dfc46531a346389cb581 /examples/src/main/scala | |
parent | 5aecdcf1f23a826f6236096001de1dd811dbc443 (diff) | |
download | spark-85e9d091d5d785d412e91038c2490131e64f5634.tar.gz spark-85e9d091d5d785d412e91038c2490131e64f5634.tar.bz2 spark-85e9d091d5d785d412e91038c2490131e64f5634.zip |
[SPARK-5519][MLLIB] add user guide with example code for fp-growth
The API is still not very Java-friendly because `Array[Item]` in `freqItemsets` is recognized as `Object` in Java. We might want to define a case class to wrap the return pair to make it Java friendly.
Author: Xiangrui Meng <meng@databricks.com>
Closes #4661 from mengxr/SPARK-5519 and squashes the following commits:
58ccc25 [Xiangrui Meng] add user guide with example code for fp-growth
Diffstat (limited to 'examples/src/main/scala')
-rw-r--r-- | examples/src/main/scala/org/apache/spark/examples/mllib/FPGrowthExample.scala | 51 |
1 files changed, 51 insertions, 0 deletions
diff --git a/examples/src/main/scala/org/apache/spark/examples/mllib/FPGrowthExample.scala b/examples/src/main/scala/org/apache/spark/examples/mllib/FPGrowthExample.scala new file mode 100644 index 0000000000..ae66107d70 --- /dev/null +++ b/examples/src/main/scala/org/apache/spark/examples/mllib/FPGrowthExample.scala @@ -0,0 +1,51 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.examples.mllib + +import org.apache.spark.mllib.fpm.FPGrowth +import org.apache.spark.{SparkContext, SparkConf} + +/** + * Example for mining frequent itemsets using FP-growth. + */ +object FPGrowthExample { + + def main(args: Array[String]) { + val conf = new SparkConf().setAppName("FPGrowthExample") + val sc = new SparkContext(conf) + + // TODO: Read a user-specified input file. + val transactions = sc.parallelize(Seq( + "r z h k p", + "z y x w v u t s", + "s x o n r", + "x z y m t s q e", + "z", + "x z y r q t p").map(_.split(" ")), numSlices = 2) + + val fpg = new FPGrowth() + .setMinSupport(0.3) + val model = fpg.run(transactions) + + model.freqItemsets.collect().foreach { case (itemset, freq) => + println(itemset.mkString("[", ",", "]") + ", " + freq) + } + + sc.stop() + } +} |