[SPARK-2434][MLlib]: Warning messages that point users to original MLlib implementations added to Examples

[SPARK-2434][MLlib]: Warning messages that refer users to the original MLlib implementations of some popular example machine learning algorithms added both in the comments and the code. The following examples have been modified: Scala: * LocalALS * LocalFileLR * LocalKMeans * LocalLP * SparkALS * SparkHdfsLR * SparkKMeans * SparkLR Python: * kmeans.py * als.py * logistic_regression.py Author: Burak <brkyvz@gmail.com> Closes #1515 from brkyvz/SPARK-2434 and squashes the following commits: 7505da9 [Burak] [SPARK-2434][MLlib]: Warning messages added, scalastyle errors fixed, and added missing punctuation b96b522 [Burak] [SPARK-2434][MLlib]: Warning messages added and scalastyle errors fixed 4762f39 [Burak] [SPARK-2434]: Warning messages added 17d3d83 [Burak] SPARK-2434: Added warning messages to the naive implementations of the example algorithms 2cb5301 [Burak] SPARK-2434: Warning messages redirecting to original implementaions added.
author: Burak <brkyvz@gmail.com> 2014-07-21 17:03:40 -0700
committer: Xiangrui Meng <meng@databricks.com> 2014-07-21 17:03:40 -0700
commit: a4d60208ec7995146541451849c51670cdc56451 (patch)
tree: 97bb3b039136994ca210ade8f0436f1923a294b9 /examples/src/main/python
parent: abeacffb7bcdfa3eeb1e969aa546029a7b464eaa (diff)
download: spark-a4d60208ec7995146541451849c51670cdc56451.tar.gz
spark-a4d60208ec7995146541451849c51670cdc56451.tar.bz2
spark-a4d60208ec7995146541451849c51670cdc56451.zip
3 files changed, 21 insertions, 0 deletions
diff --git a/examples/src/main/python/als.py b/examples/src/main/python/als.py
index 1a7c4c51f4..c862650b0a 100755
--- a/examples/src/main/python/als.py
+++ b/examples/src/main/python/als.py
@@ -16,6 +16,9 @@
 #
 
 """
+This is an example implementation of ALS for learning how to use Spark. Please refer to
+ALS in pyspark.mllib.recommendation for more conventional use.
+
 This example requires numpy (http://www.numpy.org/)
 """
 from os.path import realpath
@@ -49,9 +52,15 @@ def update(i, vec, mat, ratings):
 
 
 if __name__ == "__main__":
+
     """
     Usage: als [M] [U] [F] [iterations] [slices]"
     """
+
+    print >> sys.stderr, """WARN: This is a naive implementation of ALS and is given as an
+      example. Please use the ALS method found in pyspark.mllib.recommendation for more
+      conventional use."""
+
     sc = SparkContext(appName="PythonALS")
     M = int(sys.argv[1]) if len(sys.argv) > 1 else 100
     U = int(sys.argv[2]) if len(sys.argv) > 2 else 500
diff --git a/examples/src/main/python/kmeans.py b/examples/src/main/python/kmeans.py
index 988fc45baf..036bdf4c4f 100755
--- a/examples/src/main/python/kmeans.py
+++ b/examples/src/main/python/kmeans.py
@@ -45,9 +45,15 @@ def closestPoint(p, centers):
 
 
 if __name__ == "__main__":
+
     if len(sys.argv) != 4:
         print >> sys.stderr, "Usage: kmeans <file> <k> <convergeDist>"
         exit(-1)
+
+    print >> sys.stderr, """WARN: This is a naive implementation of KMeans Clustering and is given
+       as an example! Please refer to examples/src/main/python/mllib/kmeans.py for an example on
+       how to use MLlib's KMeans implementation."""
+
     sc = SparkContext(appName="PythonKMeans")
     lines = sc.textFile(sys.argv[1])
     data = lines.map(parseVector).cache()
diff --git a/examples/src/main/python/logistic_regression.py b/examples/src/main/python/logistic_regression.py
index 6c33deabfd..8456b272f9 100755
--- a/examples/src/main/python/logistic_regression.py
+++ b/examples/src/main/python/logistic_regression.py
@@ -47,9 +47,15 @@ def readPointBatch(iterator):
     return [matrix]
 
 if __name__ == "__main__":
+
     if len(sys.argv) != 3:
         print >> sys.stderr, "Usage: logistic_regression <file> <iterations>"
         exit(-1)
+
+    print >> sys.stderr,  """WARN: This is a naive implementation of Logistic Regression and is
+      given as an example! Please refer to examples/src/main/python/mllib/logistic_regression.py
+      to see how MLlib's implementation is used."""
+
     sc = SparkContext(appName="PythonLR")
     points = sc.textFile(sys.argv[1]).mapPartitions(readPointBatch).cache()
     iterations = int(sys.argv[2])
author	Burak <brkyvz@gmail.com>	2014-07-21 17:03:40 -0700
committer	Xiangrui Meng <meng@databricks.com>	2014-07-21 17:03:40 -0700
commit	a4d60208ec7995146541451849c51670cdc56451 (patch)
tree	97bb3b039136994ca210ade8f0436f1923a294b9 /examples/src/main/python
parent	abeacffb7bcdfa3eeb1e969aa546029a7b464eaa (diff)
download	spark-a4d60208ec7995146541451849c51670cdc56451.tar.gz spark-a4d60208ec7995146541451849c51670cdc56451.tar.bz2 spark-a4d60208ec7995146541451849c51670cdc56451.zip