aboutsummaryrefslogtreecommitdiff
path: root/python
Commit message (Collapse)AuthorAgeFilesLines
* Initial weights in Scala are ones; do that too. Also fix some errors.Tor Myklebust2013-12-251-6/+6
|
* Split the mllib bindings into a whole bunch of modules and rename some things.Tor Myklebust2013-12-257-183/+409
|
* Remove useless line from test stub.Tor Myklebust2013-12-241-1/+0
|
* Python change for move of PythonMLLibAPI.Tor Myklebust2013-12-241-1/+1
|
* Release JVM reference to the ALSModel when done.Tor Myklebust2013-12-221-2/+2
|
* Python stubs for ALSModel.Tor Myklebust2013-12-212-8/+56
|
* Un-semicolon mllib.py.Tor Myklebust2013-12-201-11/+11
|
* Change some docstrings and add some others.Tor Myklebust2013-12-201-1/+3
|
* Licence notice.Tor Myklebust2013-12-201-0/+17
|
* Whitespace.Tor Myklebust2013-12-201-1/+1
|
* Remove gigantic endian-specific test and exception tests.Tor Myklebust2013-12-201-38/+3
|
* Tests for the Python side of the mllib bindings.Tor Myklebust2013-12-201-52/+172
|
* Python stubs for classification and clustering.Tor Myklebust2013-12-202-16/+96
|
* Python side of python bindings for linear, Lasso, and ridge regressionTor Myklebust2013-12-192-15/+72
|
* Incorporate most of Josh's style suggestions. I don't want to deal with the ↵Tor Myklebust2013-12-192-98/+91
| | | | type and length checking errors until we've got at least one working stub that we're all happy with.
* The rest of the Python side of those bindings.Tor Myklebust2013-12-193-2/+4
|
* First cut at python mllib bindings. Only LinearRegression is supported.Tor Myklebust2013-12-191-0/+114
|
* Merge branch 'master' into akka-bug-fixPrashant Sharma2013-12-113-1/+36
|\ | | | | | | | | | | | | | | | | | | Conflicts: core/pom.xml core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala pom.xml project/SparkBuild.scala streaming/pom.xml yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocationHandler.scala
| * License headersPatrick Wendell2013-12-091-0/+17
| |
| * Fix UnicodeEncodeError in PySpark saveAsTextFile().Josh Rosen2013-11-282-1/+19
| | | | | | Fixes SPARK-970.
* | Merge branch 'master' into wip-scala-2.10Prashant Sharma2013-11-278-142/+383
|\| | | | | | | | | | | | | | | | | Conflicts: core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala core/src/main/scala/org/apache/spark/rdd/MapPartitionsRDD.scala core/src/main/scala/org/apache/spark/rdd/MapPartitionsWithContextRDD.scala core/src/main/scala/org/apache/spark/rdd/RDD.scala python/pyspark/rdd.py
| * Removed unused basestring case from dump_stream.Josh Rosen2013-11-261-2/+0
| |
| * FramedSerializer: _dumps => dumps, _loads => loads.Josh Rosen2013-11-104-18/+18
| |
| * Send PySpark commands as bytes insetad of strings.Josh Rosen2013-11-103-16/+13
| |
| * Add custom serializer support to PySpark.Josh Rosen2013-11-108-148/+362
| | | | | | | | | | | | | | | | | | For now, this only adds MarshalSerializer, but it lays the groundwork for other supporting custom serializers. Many of these mechanisms can also be used to support deserialization of different data formats sent by Java, such as data encoded by MsgPack. This also fixes a bug in SparkContext.union().
| * Remove Pickle-wrapping of Java objects in PySpark.Josh Rosen2013-11-034-14/+39
| | | | | | | | | | | | If we support custom serializers, the Python worker will know what type of input to expect, so we won't need to wrap Tuple2 and Strings into pickled tuples and strings.
| * Replace magic lengths with constants in PySpark.Josh Rosen2013-11-032-6/+13
| | | | | | | | | | | | Write the length of the accumulators section up-front rather than terminating it with a negative length. I find this easier to read.
* | Merge branch 'master' into scala-2.10Raymond Liu2013-11-132-13/+50
|\|
| * Pass self to SparkContext._ensure_initialized.Ewen Cheslack-Postava2013-10-221-1/+10
| | | | | | | | | | | | | | The constructor for SparkContext should pass in self so that we track the current context and produce errors if another one is created. Add a doctest to make sure creating multiple contexts triggers the exception.
| * Add classmethod to SparkContext to set system properties.Ewen Cheslack-Postava2013-10-221-12/+29
| | | | | | | | | | | | | | | | | | Add a new classmethod to SparkContext to set system properties like is possible in Scala/Java. Unlike the Java/Scala implementations, there's no access to System until the JVM bridge is created. Since SparkContext handles that, move the initialization of the JVM connection to a separate classmethod that can safely be called repeatedly as long as the same instance (or no instance) is provided.
| * Add an add() method to pyspark accumulators.Ewen Cheslack-Postava2013-10-191-1/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a regular method for adding a term to accumulators in pyspark. Currently if you have a non-global accumulator, adding to it is awkward. The += operator can't be used for non-global accumulators captured via closure because it's involves an assignment. The only way to do it is using __iadd__ directly. Adding this method lets you write code like this: def main(): sc = SparkContext() accum = sc.accumulator(0) rdd = sc.parallelize([1,2,3]) def f(x): accum.add(x) rdd.foreach(f) print accum.value where using accum += x instead would have caused UnboundLocalError exceptions in workers. Currently it would have to be written as accum.__iadd__(x).
* | Merge branch 'master' of github.com:apache/incubator-spark into scala-2.10Prashant Sharma2013-10-101-7/+53
|\|
| * Fix PySpark docs and an overly long line of code after fdbae41eMatei Zaharia2013-10-091-8/+8
| |
| * SPARK-705: implement sortByKey() in PySparkAndre Schumacher2013-10-071-1/+47
| |
* | Merge branch 'master' into wip-merge-masterPrashant Sharma2013-10-082-4/+10
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Conflicts: bagel/pom.xml core/pom.xml core/src/test/scala/org/apache/spark/ui/UISuite.scala examples/pom.xml mllib/pom.xml pom.xml project/SparkBuild.scala repl/pom.xml streaming/pom.xml tools/pom.xml In scala 2.10, a shorter representation is used for naming artifacts so changed to shorter scala version for artifacts and made it a property in pom.
| * Fixing SPARK-602: PythonPartitionerAndre Schumacher2013-10-042-4/+10
| | | | | | | | | | | | | | Currently PythonPartitioner determines partition ID by hashing a byte-array representation of PySpark's key. This PR lets PythonPartitioner use the actual partition ID, which is required e.g. for sorting via PySpark.
* | Merge branch 'master' into scala-2.10Prashant Sharma2013-10-011-1/+1
|\| | | | | | | | | | | | | | | Conflicts: core/src/main/scala/org/apache/spark/ui/jobs/JobProgressUI.scala docs/_config.yml project/SparkBuild.scala repl/src/main/scala/org/apache/spark/repl/SparkILoop.scala
| * Update build version in masterPatrick Wendell2013-09-241-1/+1
| |
* | Merge branch 'master' of git://github.com/mesos/spark into scala-2.10Prashant Sharma2013-09-155-1/+78
|\| | | | | | | | | | | Conflicts: core/src/main/scala/org/apache/spark/SparkContext.scala project/SparkBuild.scala
| * Whoopsy daisyAaron Davidson2013-09-081-1/+0
| |
| * Export StorageLevel and refactorAaron Davidson2013-09-075-26/+62
| |
| * Remove reflection, hard-code StorageLevelsAaron Davidson2013-09-072-24/+26
| | | | | | | | | | | | | | | | | | | | | | The sc.StorageLevel -> StorageLevel pathway is a bit janky, but otherwise the shell would have to call a private method of SparkContext. Having StorageLevel available in sc also doesn't seem like the end of the world. There may be a better solution, though. As for creating the StorageLevel object itself, this seems to be the best way in Python 2 for creating singleton, enum-like objects: http://stackoverflow.com/questions/36932/how-can-i-represent-an-enum-in-python
| * Memoize StorageLevels read from JVMAaron Davidson2013-09-061-2/+9
| |
| * SPARK-660: Add StorageLevel support in PythonAaron Davidson2013-09-053-1/+34
| | | | | | | | | | It uses reflection... I am not proud of that fact, but it at least ensures compatibility (sans refactoring of the StorageLevel stuff).
* | Merged with masterPrashant Sharma2013-09-0625-98/+948
|\|
| * Add missing license headers found with RATMatei Zaharia2013-09-021-1/+18
| |
| * Exclude some private modules in epydocMatei Zaharia2013-09-021-0/+1
| |
| * Further fixes to get PySpark to work on WindowsMatei Zaharia2013-09-021-5/+12
| |
| * Allow PySpark to launch worker.py directly on WindowsMatei Zaharia2013-09-011-4/+7
| |
| * Move some classes to more appropriate packages:Matei Zaharia2013-09-011-2/+2
| | | | | | | | | | | | * RDD, *RDDFunctions -> org.apache.spark.rdd * Utils, ClosureCleaner, SizeEstimator -> org.apache.spark.util * JavaSerializer, KryoSerializer -> org.apache.spark.serializer