aboutsummaryrefslogtreecommitdiff
path: root/python/pyspark/statcounter.py
diff options
context:
space:
mode:
authorMatei Zaharia <matei@databricks.com>2013-12-26 01:31:06 -0500
committerMatei Zaharia <matei@databricks.com>2013-12-26 01:31:06 -0500
commitc344ed04c7d65d64e87bb50ad6eba57534945398 (patch)
tree593274571089bd6cc2ff5d2c6d16e7109f6dec3d /python/pyspark/statcounter.py
parent56094bcd8d3ba3442b88af01393d06fd7cd79bde (diff)
parent9cbcf81453a9afca58645969c1bc3ff366392734 (diff)
downloadspark-c344ed04c7d65d64e87bb50ad6eba57534945398.tar.gz
spark-c344ed04c7d65d64e87bb50ad6eba57534945398.tar.bz2
spark-c344ed04c7d65d64e87bb50ad6eba57534945398.zip
Merge pull request #283 from tmyklebu/master
Python bindings for mllib This pull request contains Python bindings for the regression, clustering, classification, and recommendation tools in mllib. For each 'train' frontend exposed, there is a Scala stub in PythonMLLibAPI.scala and a Python stub in mllib.py. The Python stub serialises the input RDD and any vector/matrix arguments into a mutually-understood format and calls the Scala stub. The Scala stub deserialises the RDD and the vector/matrix arguments, calls the appropriate 'train' function, serialises the resulting model, and returns the serialised model. ALSModel is slightly different since a MatrixFactorizationModel has RDDs inside. The Scala stub returns a handle to a Scala MatrixFactorizationModel; prediction is done by calling the Scala predict method. I have tested these bindings on an x86_64 machine running Linux. There is a risk that these bindings may fail on some choose-your-own-endian platform if Python's endian differs from java.nio.ByteBuffer's idea of the native byte order.
Diffstat (limited to 'python/pyspark/statcounter.py')
0 files changed, 0 insertions, 0 deletions