aboutsummaryrefslogtreecommitdiff
path: root/python/pyspark/context.py
Commit message (Collapse)AuthorAgeFilesLines
* Fix Python code after change of getOrElseMatei Zaharia2014-01-011-6/+8
|
* Merge remote-tracking branch 'apache/master' into conf2Matei Zaharia2013-12-311-7/+2
|\ | | | | | | | | | | | | Conflicts: core/src/main/scala/org/apache/spark/rdd/CheckpointRDD.scala streaming/src/main/scala/org/apache/spark/streaming/Checkpoint.scala streaming/src/main/scala/org/apache/spark/streaming/scheduler/JobGenerator.scala
| * Fixed Python API for sc.setCheckpointDir. Also other fixes based on ↵Tathagata Das2013-12-241-7/+2
| | | | | | | | Reynold's comments on PR 289.
* | Updated docs for SparkConf and handled review commentsMatei Zaharia2013-12-301-12/+12
| |
* | Properly show Spark properties on web UI, and change app name propertyMatei Zaharia2013-12-291-2/+2
| |
* | Fix some Python docs and make sure to unset SPARK_TESTING in PythonMatei Zaharia2013-12-291-1/+2
| | | | | | | | tests so we don't get the test spark.conf on the classpath.
* | Add Python docs about SparkConfMatei Zaharia2013-12-291-1/+2
| |
* | Fix some other Python tests due to initializing JVM in a different wayMatei Zaharia2013-12-291-8/+15
| | | | | | | | | | | | | | | | The test in context.py created two different instances of the SparkContext class by copying "globals", so that some tests can have a global "sc" object and others can try initializing their own contexts. This led to two JVM gateways being created since SparkConf also looked at pyspark.context.SparkContext to get the JVM.
* | Add SparkConf support in PythonMatei Zaharia2013-12-291-12/+28
| |
* | Fix Python use of getLocalDirMatei Zaharia2013-12-291-1/+1
|/
* Add collectPartition to JavaRDD interface.Shivaram Venkataraman2013-12-181-3/+0
| | | | Also remove takePartition from PythonRDD and use collectPartition in rdd.py.
* FramedSerializer: _dumps => dumps, _loads => loads.Josh Rosen2013-11-101-1/+1
|
* Add custom serializer support to PySpark.Josh Rosen2013-11-101-16/+45
| | | | | | | | | For now, this only adds MarshalSerializer, but it lays the groundwork for other supporting custom serializers. Many of these mechanisms can also be used to support deserialization of different data formats sent by Java, such as data encoded by MsgPack. This also fixes a bug in SparkContext.union().
* Remove Pickle-wrapping of Java objects in PySpark.Josh Rosen2013-11-031-5/+5
| | | | | | If we support custom serializers, the Python worker will know what type of input to expect, so we won't need to wrap Tuple2 and Strings into pickled tuples and strings.
* Pass self to SparkContext._ensure_initialized.Ewen Cheslack-Postava2013-10-221-1/+10
| | | | | | | The constructor for SparkContext should pass in self so that we track the current context and produce errors if another one is created. Add a doctest to make sure creating multiple contexts triggers the exception.
* Add classmethod to SparkContext to set system properties.Ewen Cheslack-Postava2013-10-221-12/+29
| | | | | | | | | Add a new classmethod to SparkContext to set system properties like is possible in Scala/Java. Unlike the Java/Scala implementations, there's no access to System until the JVM bridge is created. Since SparkContext handles that, move the initialization of the JVM connection to a separate classmethod that can safely be called repeatedly as long as the same instance (or no instance) is provided.
* Whoopsy daisyAaron Davidson2013-09-081-1/+0
|
* Export StorageLevel and refactorAaron Davidson2013-09-071-23/+12
|
* Remove reflection, hard-code StorageLevelsAaron Davidson2013-09-071-22/+24
| | | | | | | | | | | The sc.StorageLevel -> StorageLevel pathway is a bit janky, but otherwise the shell would have to call a private method of SparkContext. Having StorageLevel available in sc also doesn't seem like the end of the world. There may be a better solution, though. As for creating the StorageLevel object itself, this seems to be the best way in Python 2 for creating singleton, enum-like objects: http://stackoverflow.com/questions/36932/how-can-i-represent-an-enum-in-python
* Memoize StorageLevels read from JVMAaron Davidson2013-09-061-2/+9
|
* SPARK-660: Add StorageLevel support in PythonAaron Davidson2013-09-051-0/+14
| | | | | It uses reflection... I am not proud of that fact, but it at least ensures compatibility (sans refactoring of the StorageLevel stuff).
* Move some classes to more appropriate packages:Matei Zaharia2013-09-011-2/+2
| | | | | | * RDD, *RDDFunctions -> org.apache.spark.rdd * Utils, ClosureCleaner, SizeEstimator -> org.apache.spark.util * JavaSerializer, KryoSerializer -> org.apache.spark.serializer
* Initial work to rename package to org.apache.sparkMatei Zaharia2013-09-011-2/+2
|
* Implementing SPARK-878 for PySpark: adding zip and egg files to context and ↵Andre Schumacher2013-08-161-3/+11
| | | | passing it down to workers which add these to their sys.path
* SPARK-815. Python parallelize() should split lists before batchingMatei Zaharia2013-07-291-2/+9
| | | | | | | | | One unfortunate consequence of this fix is that we materialize any collections that are given to us as generators, but this seems necessary to get reasonable behavior on small collections. We could add a batchSize parameter later to bypass auto-computation of batch size if this becomes a problem (e.g. if users really want to parallelize big generators nicely)
* Add Apache license headers and LICENSE and NOTICE filesMatei Zaharia2013-07-161-0/+17
|
* Fix reporting of PySpark doctest failures.Josh Rosen2013-02-031-1/+3
|
* Use spark.local.dir for PySpark temp files (SPARK-580).Josh Rosen2013-02-011-4/+8
|
* Do not launch JavaGateways on workers (SPARK-674).Josh Rosen2013-02-011-10/+17
| | | | | | | | | | | The problem was that the gateway was being initialized whenever the pyspark.context module was loaded. The fix uses lazy initialization that occurs only when SparkContext instances are actually constructed. I also made the gateway and jvm variables private. This change results in ~3-4x performance improvement when running the PySpark unit tests.
* Merge pull request #396 from JoshRosen/spark-653Matei Zaharia2013-01-241-10/+5
|\ | | | | Make PySpark AccumulatorParam an abstract base class
| * Make AccumulatorParam an abstract base class.Josh Rosen2013-01-211-10/+5
| |
* | Allow PySpark's SparkFiles to be used from driverJosh Rosen2013-01-231-6/+21
| | | | | | | | Fix minor documentation formatting issues.
* | Fix sys.path bug in PySpark SparkContext.addPyFileJosh Rosen2013-01-221-2/+0
| |
* | Don't download files to master's working directory.Josh Rosen2013-01-211-4/+36
|/ | | | | | | This should avoid exceptions caused by existing files with different contents. I also removed some unused code.
* Update checkpointing API docs in Python/Java.Josh Rosen2013-01-201-4/+7
|
* Add checkpointFile() and more tests to PySpark.Josh Rosen2013-01-201-1/+5
|
* Add RDD checkpointing to Python API.Josh Rosen2013-01-201-0/+9
|
* Added accumulators to PySparkMatei Zaharia2013-01-201-0/+38
|
* Change PYSPARK_PYTHON_EXEC to PYSPARK_PYTHON.Josh Rosen2013-01-101-1/+1
|
* Change PySpark RDD.take() to not call iterator().Josh Rosen2013-01-031-0/+1
|
* Rename top-level 'pyspark' directory to 'python'Josh Rosen2013-01-011-0/+158