diff options
author | Vincenzo Selvaggio <vselvaggio@hotmail.it> | 2015-05-18 08:46:33 -0700 |
---|---|---|
committer | Xiangrui Meng <meng@databricks.com> | 2015-05-18 08:46:40 -0700 |
commit | a95d4e18f232c2941c203b9e93fef58737d42b40 (patch) | |
tree | 8995b7b41a1d90cba0114ac081a411e3efa7a717 | |
parent | 2c94ffe7e87e00f76883fd0c8b052ba5352da20a (diff) | |
download | spark-a95d4e18f232c2941c203b9e93fef58737d42b40.tar.gz spark-a95d4e18f232c2941c203b9e93fef58737d42b40.tar.bz2 spark-a95d4e18f232c2941c203b9e93fef58737d42b40.zip |
[SPARK-7272] [MLLIB] User guide for PMML model export
https://issues.apache.org/jira/browse/SPARK-7272
Author: Vincenzo Selvaggio <vselvaggio@hotmail.it>
Closes #6219 from selvinsource/mllib_pmml_model_export_SPARK-7272 and squashes the following commits:
c866fb8 [Vincenzo Selvaggio] Update mllib-pmml-model-export.md
1beda98 [Vincenzo Selvaggio] [SPARK-7272] Initial user guide for pmml export
d670662 [Vincenzo Selvaggio] Update mllib-pmml-model-export.md
2731375 [Vincenzo Selvaggio] Update mllib-pmml-model-export.md
680dc33 [Vincenzo Selvaggio] Update mllib-pmml-model-export.md
2e298b5 [Vincenzo Selvaggio] Update mllib-pmml-model-export.md
a932f51 [Vincenzo Selvaggio] Create mllib-pmml-model-export.md
(cherry picked from commit 814b3dabdf01abc7a2f25aa32284caccadeb7798)
Signed-off-by: Xiangrui Meng <meng@databricks.com>
-rw-r--r-- | docs/mllib-guide.md | 1 | ||||
-rw-r--r-- | docs/mllib-pmml-model-export.md | 86 |
2 files changed, 87 insertions, 0 deletions
diff --git a/docs/mllib-guide.md b/docs/mllib-guide.md index f8e879496c..de7d66fb2d 100644 --- a/docs/mllib-guide.md +++ b/docs/mllib-guide.md @@ -39,6 +39,7 @@ filtering, dimensionality reduction, as well as underlying optimization primitiv * [Optimization (developer)](mllib-optimization.html) * stochastic gradient descent * limited-memory BFGS (L-BFGS) +* [PMML model export](mllib-pmml-model-export.html) MLlib is under active development. The APIs marked `Experimental`/`DeveloperApi` may change in future releases, diff --git a/docs/mllib-pmml-model-export.md b/docs/mllib-pmml-model-export.md new file mode 100644 index 0000000000..42ea2ca81f --- /dev/null +++ b/docs/mllib-pmml-model-export.md @@ -0,0 +1,86 @@ +--- +layout: global +title: PMML model export - MLlib +displayTitle: <a href="mllib-guide.html">MLlib</a> - PMML model export +--- + +* Table of contents +{:toc} + +## MLlib supported models + +MLlib supports model export to Predictive Model Markup Language ([PMML](http://en.wikipedia.org/wiki/Predictive_Model_Markup_Language)). + +The table below outlines the MLlib models that can be exported to PMML and their equivalent PMML model. + +<table class="table"> + <thead> + <tr><th>MLlib model</th><th>PMML model</th></tr> + </thead> + <tbody> + <tr> + <td>KMeansModel</td><td>ClusteringModel</td> + </tr> + <tr> + <td>LinearRegressionModel</td><td>RegressionModel (functionName="regression")</td> + </tr> + <tr> + <td>RidgeRegressionModel</td><td>RegressionModel (functionName="regression")</td> + </tr> + <tr> + <td>LassoModel</td><td>RegressionModel (functionName="regression")</td> + </tr> + <tr> + <td>SVMModel</td><td>RegressionModel (functionName="classification" normalizationMethod="none")</td> + </tr> + <tr> + <td>Binary LogisticRegressionModel</td><td>RegressionModel (functionName="classification" normalizationMethod="logit")</td> + </tr> + </tbody> +</table> + +## Examples +<div class="codetabs"> + +<div data-lang="scala" markdown="1"> +To export a supported `model` (see table above) to PMML, simply call `model.toPMML`. + +Here a complete example of building a KMeansModel and print it out in PMML format: +{% highlight scala %} +import org.apache.spark.mllib.clustering.KMeans +import org.apache.spark.mllib.linalg.Vectors + +// Load and parse the data +val data = sc.textFile("data/mllib/kmeans_data.txt") +val parsedData = data.map(s => Vectors.dense(s.split(' ').map(_.toDouble))).cache() + +// Cluster the data into two classes using KMeans +val numClusters = 2 +val numIterations = 20 +val clusters = KMeans.train(parsedData, numClusters, numIterations) + +// Export to PMML +println("PMML Model:\n" + clusters.toPMML) +{% endhighlight %} + +As well as exporting the PMML model to a String (`model.toPMML` as in the example above), you can export the PMML model to other formats: + +{% highlight scala %} +// Export the model to a String in PMML format +clusters.toPMML + +// Export the model to a local file in PMML format +clusters.toPMML("/tmp/kmeans.xml") + +// Export the model to a directory on a distributed file system in PMML format +clusters.toPMML(sc,"/tmp/kmeans") + +// Export the model to the OutputStream in PMML format +clusters.toPMML(System.out) +{% endhighlight %} + +For unsupported models, either you will not find a `.toPMML` method or an `IllegalArgumentException` will be thrown. + +</div> + +</div> |