[SPARK-11964][DOCS][ML] Add in Pipeline Import/Export Documentation

Adding in Pipeline Import and Export Documentation. Author: anabranch <wac.chambers@gmail.com> Author: Bill Chambers <wchambers@ischool.berkeley.edu> Closes #10179 from anabranch/master.
author: anabranch <wac.chambers@gmail.com> 2015-12-11 12:55:56 -0800
committer: Joseph K. Bradley <joseph@databricks.com> 2015-12-11 12:55:56 -0800
commit: aa305dcaf5b4148aba9e669e081d0b9235f50857 (patch)
tree: 5a5312b913a680c90cea914f7d8b7f3238e1b87c /docs/ml-guide.md
parent: 0fb9825556dbbcc98d7eafe9ddea8676301e09bb (diff)
download: spark-aa305dcaf5b4148aba9e669e081d0b9235f50857.tar.gz
spark-aa305dcaf5b4148aba9e669e081d0b9235f50857.tar.bz2
spark-aa305dcaf5b4148aba9e669e081d0b9235f50857.zip
1 files changed, 13 insertions, 0 deletions
diff --git a/docs/ml-guide.md b/docs/ml-guide.md
index 5c96c2b7d5..44a316a07d 100644
--- a/docs/ml-guide.md
+++ b/docs/ml-guide.md
@@ -192,6 +192,10 @@ Parameters belong to specific instances of `Estimator`s and `Transformer`s.
 For example, if we have two `LogisticRegression` instances `lr1` and `lr2`, then we can build a `ParamMap` with both `maxIter` parameters specified: `ParamMap(lr1.maxIter -> 10, lr2.maxIter -> 20)`.
 This is useful if there are two algorithms with the `maxIter` parameter in a `Pipeline`.
 
+## Saving and Loading Pipelines
+
+Often times it is worth it to save a model or a pipeline to disk for later use. In Spark 1.6, a model import/export functionality was added to the Pipeline API. Most basic transformers are supported as well as some of the more basic ML models. Please refer to the algorithm's API documentation to see if saving and loading is supported.
+
 # Code examples
 
 This section gives code examples illustrating the functionality discussed above.
@@ -455,6 +459,15 @@ val pipeline = new Pipeline()
 // Fit the pipeline to training documents.
 val model = pipeline.fit(training)
 
+// now we can optionally save the fitted pipeline to disk
+model.save("/tmp/spark-logistic-regression-model")
+
+// we can also save this unfit pipeline to disk
+pipeline.save("/tmp/unfit-lr-model")
+
+// and load it back in during production
+val sameModel = Pipeline.load("/tmp/spark-logistic-regression-model")
+
 // Prepare test documents, which are unlabeled (id, text) tuples.
 val test = sqlContext.createDataFrame(Seq(
   (4L, "spark i j k"),
author	anabranch <wac.chambers@gmail.com>	2015-12-11 12:55:56 -0800
committer	Joseph K. Bradley <joseph@databricks.com>	2015-12-11 12:55:56 -0800
commit	aa305dcaf5b4148aba9e669e081d0b9235f50857 (patch)
tree	5a5312b913a680c90cea914f7d8b7f3238e1b87c /docs/ml-guide.md
parent	0fb9825556dbbcc98d7eafe9ddea8676301e09bb (diff)
download	spark-aa305dcaf5b4148aba9e669e081d0b9235f50857.tar.gz spark-aa305dcaf5b4148aba9e669e081d0b9235f50857.tar.bz2 spark-aa305dcaf5b4148aba9e669e081d0b9235f50857.zip