aboutsummaryrefslogtreecommitdiff
path: root/python/pyspark/mllib/clustering.py
diff options
context:
space:
mode:
authorJoseph K. Bradley <joseph@databricks.com>2016-07-13 12:33:39 -0700
committerJoseph K. Bradley <joseph@databricks.com>2016-07-13 12:33:39 -0700
commit01f09b161217193b797c8c85969d17054c958615 (patch)
tree40d7d4f5932157f8e0f0c13228dd18063728d4d3 /python/pyspark/mllib/clustering.py
parentd8220c1e5e94abbdb9643672b918f0d748206db9 (diff)
downloadspark-01f09b161217193b797c8c85969d17054c958615.tar.gz
spark-01f09b161217193b797c8c85969d17054c958615.tar.bz2
spark-01f09b161217193b797c8c85969d17054c958615.zip
[SPARK-14812][ML][MLLIB][PYTHON] Experimental, DeveloperApi annotation audit for ML
## What changes were proposed in this pull request? General decisions to follow, except where noted: * spark.mllib, pyspark.mllib: Remove all Experimental annotations. Leave DeveloperApi annotations alone. * spark.ml, pyspark.ml ** Annotate Estimator-Model pairs of classes and companion objects the same way. ** For all algorithms marked Experimental with Since tag <= 1.6, remove Experimental annotation. ** For all algorithms marked Experimental with Since tag = 2.0, leave Experimental annotation. * DeveloperApi annotations are left alone, except where noted. * No changes to which types are sealed. Exceptions where I am leaving items Experimental in spark.ml, pyspark.ml, mainly because the items are new: * Model Summary classes * MLWriter, MLReader, MLWritable, MLReadable * Evaluator and subclasses: There is discussion of changes around evaluating multiple metrics at once for efficiency. * RFormula: Its behavior may need to change slightly to match R in edge cases. * AFTSurvivalRegression * MultilayerPerceptronClassifier DeveloperApi changes: * ml.tree.Node, ml.tree.Split, and subclasses should no longer be DeveloperApi ## How was this patch tested? N/A Note to reviewers: * spark.ml.clustering.LDA underwent significant changes (additional methods), so let me know if you want me to leave it Experimental. * Be careful to check for cases where a class should no longer be Experimental but has an Experimental method, val, or other feature. I did not find such cases, but please verify. Author: Joseph K. Bradley <joseph@databricks.com> Closes #14147 from jkbradley/experimental-audit.
Diffstat (limited to 'python/pyspark/mllib/clustering.py')
-rw-r--r--python/pyspark/mllib/clustering.py16
1 files changed, 0 insertions, 16 deletions
diff --git a/python/pyspark/mllib/clustering.py b/python/pyspark/mllib/clustering.py
index c38c543972..c8c3c42774 100644
--- a/python/pyspark/mllib/clustering.py
+++ b/python/pyspark/mllib/clustering.py
@@ -47,8 +47,6 @@ __all__ = ['BisectingKMeansModel', 'BisectingKMeans', 'KMeansModel', 'KMeans',
@inherit_doc
class BisectingKMeansModel(JavaModelWrapper):
"""
- .. note:: Experimental
-
A clustering model derived from the bisecting k-means method.
>>> data = array([0.0,0.0, 1.0,1.0, 9.0,8.0, 8.0,9.0]).reshape(4, 2)
@@ -120,8 +118,6 @@ class BisectingKMeansModel(JavaModelWrapper):
class BisectingKMeans(object):
"""
- .. note:: Experimental
-
A bisecting k-means algorithm based on the paper "A comparison of
document clustering techniques" by Steinbach, Karypis, and Kumar,
with modification to fit Spark.
@@ -366,8 +362,6 @@ class KMeans(object):
class GaussianMixtureModel(JavaModelWrapper, JavaSaveable, JavaLoader):
"""
- .. note:: Experimental
-
A clustering model derived from the Gaussian Mixture Model method.
>>> from pyspark.mllib.linalg import Vectors, DenseMatrix
@@ -513,8 +507,6 @@ class GaussianMixtureModel(JavaModelWrapper, JavaSaveable, JavaLoader):
class GaussianMixture(object):
"""
- .. note:: Experimental
-
Learning algorithm for Gaussian Mixtures using the expectation-maximization algorithm.
.. versionadded:: 1.3.0
@@ -565,8 +557,6 @@ class GaussianMixture(object):
class PowerIterationClusteringModel(JavaModelWrapper, JavaSaveable, JavaLoader):
"""
- .. note:: Experimental
-
Model produced by [[PowerIterationClustering]].
>>> import math
@@ -645,8 +635,6 @@ class PowerIterationClusteringModel(JavaModelWrapper, JavaSaveable, JavaLoader):
class PowerIterationClustering(object):
"""
- .. note:: Experimental
-
Power Iteration Clustering (PIC), a scalable graph clustering algorithm
developed by [[http://www.icml2010.org/papers/387.pdf Lin and Cohen]].
From the abstract: PIC finds a very low-dimensional embedding of a
@@ -693,8 +681,6 @@ class PowerIterationClustering(object):
class StreamingKMeansModel(KMeansModel):
"""
- .. note:: Experimental
-
Clustering model which can perform an online update of the centroids.
The update formula for each centroid is given by
@@ -794,8 +780,6 @@ class StreamingKMeansModel(KMeansModel):
class StreamingKMeans(object):
"""
- .. note:: Experimental
-
Provides methods to set k, decayFactor, timeUnit to configure the
KMeans algorithm for fitting and predicting on incoming dstreams.
More details on how the centroids are updated are provided under the