aboutsummaryrefslogtreecommitdiff
path: root/python
Commit message (Collapse)AuthorAgeFilesLines
...
| * | | Make Python function/line appear in the UI.Tor Myklebust2013-12-281-11/+55
| | | |
* | | | Fix Python code after change of getOrElseMatei Zaharia2014-01-012-7/+14
| | | |
* | | | Miscellaneous fixes from code review.Matei Zaharia2014-01-011-8/+4
| | | | | | | | | | | | | | | | | | | | | | | | Also replaced SparkConf.getOrElse with just a "get" that takes a default value, and added getInt, getLong, etc to make code that uses this simpler later on.
* | | | Merge remote-tracking branch 'apache/master' into conf2Matei Zaharia2013-12-312-9/+4
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Conflicts: core/src/main/scala/org/apache/spark/rdd/CheckpointRDD.scala streaming/src/main/scala/org/apache/spark/streaming/Checkpoint.scala streaming/src/main/scala/org/apache/spark/streaming/scheduler/JobGenerator.scala
| * \ \ \ Merge pull request #289 from tdas/filestream-fixPatrick Wendell2013-12-312-9/+4
| |\ \ \ \ | | |/ / / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Bug fixes for file input stream and checkpointing - Fixed bugs in the file input stream that led the stream to fail due to transient HDFS errors (listing files when a background thread it deleting fails caused errors, etc.) - Updated Spark's CheckpointRDD and Streaming's CheckpointWriter to use SparkContext.hadoopConfiguration, to allow checkpoints to be written to any HDFS compatible store requiring special configuration. - Changed the API of SparkContext.setCheckpointDir() - eliminated the unnecessary 'useExisting' parameter. Now SparkContext will always create a unique subdirectory within the user specified checkpoint directory. This is to ensure that previous checkpoint files are not accidentally overwritten. - Fixed bug where setting checkpoint directory as a relative local path caused the checkpointing to fail.
| | * | | Fixed Python API for sc.setCheckpointDir. Also other fixes based on ↵Tathagata Das2013-12-242-9/+4
| | | | | | | | | | | | | | | | | | | | Reynold's comments on PR 289.
* | | | | Updated docs for SparkConf and handled review commentsMatei Zaharia2013-12-302-17/+31
| | | | |
* | | | | Properly show Spark properties on web UI, and change app name propertyMatei Zaharia2013-12-292-3/+3
| | | | |
* | | | | Fix some Python docs and make sure to unset SPARK_TESTING in PythonMatei Zaharia2013-12-296-22/+37
| | | | | | | | | | | | | | | | | | | | tests so we don't get the test spark.conf on the classpath.
* | | | | Merge remote-tracking branch 'origin/master' into conf2Matei Zaharia2013-12-299-2/+599
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Conflicts: core/src/main/scala/org/apache/spark/SparkContext.scala core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala core/src/main/scala/org/apache/spark/scheduler/cluster/ClusterTaskSetManager.scala core/src/main/scala/org/apache/spark/scheduler/local/LocalScheduler.scala core/src/main/scala/org/apache/spark/util/MetadataCleaner.scala core/src/test/scala/org/apache/spark/scheduler/TaskResultGetterSuite.scala core/src/test/scala/org/apache/spark/scheduler/TaskSetManagerSuite.scala new-yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala streaming/src/main/scala/org/apache/spark/streaming/Checkpoint.scala streaming/src/main/scala/org/apache/spark/streaming/api/java/JavaStreamingContext.scala streaming/src/main/scala/org/apache/spark/streaming/scheduler/JobGenerator.scala streaming/src/test/scala/org/apache/spark/streaming/BasicOperationsSuite.scala streaming/src/test/scala/org/apache/spark/streaming/CheckpointSuite.scala streaming/src/test/scala/org/apache/spark/streaming/InputStreamsSuite.scala streaming/src/test/scala/org/apache/spark/streaming/TestSuiteBase.scala streaming/src/test/scala/org/apache/spark/streaming/WindowOperationsSuite.scala
| * | | | Merge pull request #283 from tmyklebu/masterMatei Zaharia2013-12-268-1/+598
| |\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Python bindings for mllib This pull request contains Python bindings for the regression, clustering, classification, and recommendation tools in mllib. For each 'train' frontend exposed, there is a Scala stub in PythonMLLibAPI.scala and a Python stub in mllib.py. The Python stub serialises the input RDD and any vector/matrix arguments into a mutually-understood format and calls the Scala stub. The Scala stub deserialises the RDD and the vector/matrix arguments, calls the appropriate 'train' function, serialises the resulting model, and returns the serialised model. ALSModel is slightly different since a MatrixFactorizationModel has RDDs inside. The Scala stub returns a handle to a Scala MatrixFactorizationModel; prediction is done by calling the Scala predict method. I have tested these bindings on an x86_64 machine running Linux. There is a risk that these bindings may fail on some choose-your-own-endian platform if Python's endian differs from java.nio.ByteBuffer's idea of the native byte order.
| | * | | | Remove commented code in __init__.py.Tor Myklebust2013-12-251-8/+0
| | | | | |
| | * | | | Fix copypasta in __init__.py. Don't import anything directly into ↵Tor Myklebust2013-12-251-26/+8
| | | | | | | | | | | | | | | | | | | | | | | | pyspark.mllib.
| | * | | | Initial weights in Scala are ones; do that too. Also fix some errors.Tor Myklebust2013-12-251-6/+6
| | | | | |
| | * | | | Split the mllib bindings into a whole bunch of modules and rename some things.Tor Myklebust2013-12-257-183/+409
| | | | | |
| | * | | | Remove useless line from test stub.Tor Myklebust2013-12-241-1/+0
| | | | | |
| | * | | | Python change for move of PythonMLLibAPI.Tor Myklebust2013-12-241-1/+1
| | | | | |
| | * | | | Release JVM reference to the ALSModel when done.Tor Myklebust2013-12-221-2/+2
| | | | | |
| | * | | | Python stubs for ALSModel.Tor Myklebust2013-12-212-8/+56
| | | | | |
| | * | | | Un-semicolon mllib.py.Tor Myklebust2013-12-201-11/+11
| | | | | |
| | * | | | Change some docstrings and add some others.Tor Myklebust2013-12-201-1/+3
| | | | | |
| | * | | | Licence notice.Tor Myklebust2013-12-201-0/+17
| | | | | |
| | * | | | Whitespace.Tor Myklebust2013-12-201-1/+1
| | | | | |
| | * | | | Remove gigantic endian-specific test and exception tests.Tor Myklebust2013-12-201-38/+3
| | | | | |
| | * | | | Tests for the Python side of the mllib bindings.Tor Myklebust2013-12-201-52/+172
| | | | | |
| | * | | | Python stubs for classification and clustering.Tor Myklebust2013-12-202-16/+96
| | | | | |
| | * | | | Python side of python bindings for linear, Lasso, and ridge regressionTor Myklebust2013-12-192-15/+72
| | | | | |
| | * | | | Incorporate most of Josh's style suggestions. I don't want to deal with the ↵Tor Myklebust2013-12-192-98/+91
| | | | | | | | | | | | | | | | | | | | | | | | type and length checking errors until we've got at least one working stub that we're all happy with.
| | * | | | The rest of the Python side of those bindings.Tor Myklebust2013-12-193-2/+4
| | | | | |
| | * | | | First cut at python mllib bindings. Only LinearRegression is supported.Tor Myklebust2013-12-191-0/+114
| | | | | |
| * | | | | Typo: avaiable -> availableAndrew Ash2013-12-241-1/+1
| | |/ / / | |/| | |
* | | | | Add Python docs about SparkConfMatei Zaharia2013-12-292-1/+44
| | | | |
* | | | | Fix some other Python tests due to initializing JVM in a different wayMatei Zaharia2013-12-293-10/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The test in context.py created two different instances of the SparkContext class by copying "globals", so that some tests can have a global "sc" object and others can try initializing their own contexts. This led to two JVM gateways being created since SparkConf also looked at pyspark.context.SparkContext to get the JVM.
* | | | | Add SparkConf support in PythonMatei Zaharia2013-12-294-13/+146
| | | | |
* | | | | Fix Python use of getLocalDirMatei Zaharia2013-12-291-1/+1
|/ / / /
* | | | Merge pull request #276 from shivaram/collectPartitionReynold Xin2013-12-192-4/+6
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add collectPartition to JavaRDD interface. This interface is useful for implementing `take` from other language frontends where the data is serialized. Also remove `takePartition` from PythonRDD and use `collectPartition` in rdd.py. Thanks @concretevitamin for the original change and tests.
| * | | | Make collectPartitions take an array of partitionsShivaram Venkataraman2013-12-191-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Change the implementation to use runJob instead of PartitionPruningRDD. Also update the unit tests and the python take implementation to use the new interface.
| * | | | Add collectPartition to JavaRDD interface.Shivaram Venkataraman2013-12-182-4/+1
| |/ / / | | | | | | | | | | | | Also remove takePartition from PythonRDD and use collectPartition in rdd.py.
* / / / Add toString to Java RDD, and __repr__ to Python RDDNick Pentreath2013-12-191-0/+3
|/ / /
* | | Merge branch 'master' into akka-bug-fixPrashant Sharma2013-12-113-1/+36
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Conflicts: core/pom.xml core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala pom.xml project/SparkBuild.scala streaming/pom.xml yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocationHandler.scala
| * | | License headersPatrick Wendell2013-12-091-0/+17
| | | |
| * | | Fix UnicodeEncodeError in PySpark saveAsTextFile().Josh Rosen2013-11-282-1/+19
| | | | | | | | | | | | Fixes SPARK-970.
* | | | Merge branch 'master' into wip-scala-2.10Prashant Sharma2013-11-278-142/+383
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Conflicts: core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala core/src/main/scala/org/apache/spark/rdd/MapPartitionsRDD.scala core/src/main/scala/org/apache/spark/rdd/MapPartitionsWithContextRDD.scala core/src/main/scala/org/apache/spark/rdd/RDD.scala python/pyspark/rdd.py
| * | | Removed unused basestring case from dump_stream.Josh Rosen2013-11-261-2/+0
| | | |
| * | | FramedSerializer: _dumps => dumps, _loads => loads.Josh Rosen2013-11-104-18/+18
| | | |
| * | | Send PySpark commands as bytes insetad of strings.Josh Rosen2013-11-103-16/+13
| | | |
| * | | Add custom serializer support to PySpark.Josh Rosen2013-11-108-148/+362
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For now, this only adds MarshalSerializer, but it lays the groundwork for other supporting custom serializers. Many of these mechanisms can also be used to support deserialization of different data formats sent by Java, such as data encoded by MsgPack. This also fixes a bug in SparkContext.union().
| * | | Remove Pickle-wrapping of Java objects in PySpark.Josh Rosen2013-11-034-14/+39
| | | | | | | | | | | | | | | | | | | | | | | | If we support custom serializers, the Python worker will know what type of input to expect, so we won't need to wrap Tuple2 and Strings into pickled tuples and strings.
| * | | Replace magic lengths with constants in PySpark.Josh Rosen2013-11-032-6/+13
| | | | | | | | | | | | | | | | | | | | | | | | Write the length of the accumulators section up-front rather than terminating it with a negative length. I find this easier to read.
* | | | Merge branch 'master' into scala-2.10Raymond Liu2013-11-132-13/+50
|\| | |