aboutsummaryrefslogtreecommitdiff
path: root/docs/mllib-guide.md
diff options
context:
space:
mode:
authorMatei Zaharia <matei@databricks.com>2014-01-10 00:12:43 -0800
committerMatei Zaharia <matei@databricks.com>2014-01-11 22:30:48 -0800
commit4c28a2bad8a6d64ee69213eede440837636fe58b (patch)
treeec33a07ead7ec3bd120c94594a42e2d19b556c79 /docs/mllib-guide.md
parent9a0dfdf868187fb9a2e1656e4cf5f29d952ce5db (diff)
downloadspark-4c28a2bad8a6d64ee69213eede440837636fe58b.tar.gz
spark-4c28a2bad8a6d64ee69213eede440837636fe58b.tar.bz2
spark-4c28a2bad8a6d64ee69213eede440837636fe58b.zip
Update some Python MLlib parameters to use camelCase, and tweak docs
We've used camel case in other Spark methods so it felt reasonable to keep using it here and make the code match Scala/Java as much as possible. Note that parameter names matter in Python because it allows passing optional parameters by name.
Diffstat (limited to 'docs/mllib-guide.md')
-rw-r--r--docs/mllib-guide.md9
1 files changed, 9 insertions, 0 deletions
diff --git a/docs/mllib-guide.md b/docs/mllib-guide.md
index c977bc4f35..1a5c640d10 100644
--- a/docs/mllib-guide.md
+++ b/docs/mllib-guide.md
@@ -21,6 +21,8 @@ depends on native Fortran routines. You may need to install the
if it is not already present on your nodes. MLlib will throw a linking error if it cannot
detect these libraries automatically.
+To use MLlib in Python, you will also need [NumPy](http://www.numpy.org) version 1.7 or newer.
+
# Binary Classification
Binary classification is a supervised learning problem in which we want to
@@ -316,6 +318,13 @@ other signals), you can use the trainImplicit method to get better results.
val model = ALS.trainImplicit(ratings, 1, 20, 0.01)
{% endhighlight %}
+# Using MLLib in Java
+
+All of MLlib's methods use Java-friendly types, so you can import and call them there the same
+way you do in Scala. The only caveat is that the methods take Scala RDD objects, while the
+Spark Java API uses a separate `JavaRDD` class. You can convert a Java RDD to a Scala one by
+calling `.rdd()` on your `JavaRDD` object.
+
# Using MLLib in Python
Following examples can be tested in the PySpark shell.