Merge pull request #283 from tmyklebu/master - spark

diff options

author	Matei Zaharia <matei@databricks.com>	2013-12-26 01:31:06 -0500
committer	Matei Zaharia <matei@databricks.com>	2013-12-26 01:31:06 -0500
commit	c344ed04c7d65d64e87bb50ad6eba57534945398 (patch)
tree	593274571089bd6cc2ff5d2c6d16e7109f6dec3d /python/pyspark/statcounter.py
parent	56094bcd8d3ba3442b88af01393d06fd7cd79bde (diff)
parent	9cbcf81453a9afca58645969c1bc3ff366392734 (diff)
download	spark-c344ed04c7d65d64e87bb50ad6eba57534945398.tar.gz spark-c344ed04c7d65d64e87bb50ad6eba57534945398.tar.bz2 spark-c344ed04c7d65d64e87bb50ad6eba57534945398.zip

Merge pull request #283 from tmyklebu/master

Python bindings for mllib This pull request contains Python bindings for the regression, clustering, classification, and recommendation tools in mllib. For each 'train' frontend exposed, there is a Scala stub in PythonMLLibAPI.scala and a Python stub in mllib.py. The Python stub serialises the input RDD and any vector/matrix arguments into a mutually-understood format and calls the Scala stub. The Scala stub deserialises the RDD and the vector/matrix arguments, calls the appropriate 'train' function, serialises the resulting model, and returns the serialised model. ALSModel is slightly different since a MatrixFactorizationModel has RDDs inside. The Scala stub returns a handle to a Scala MatrixFactorizationModel; prediction is done by calling the Scala predict method. I have tested these bindings on an x86_64 machine running Linux. There is a risk that these bindings may fail on some choose-your-own-endian platform if Python's endian differs from java.nio.ByteBuffer's idea of the native byte order.

Diffstat (limited to 'python/pyspark/statcounter.py')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: