aboutsummaryrefslogtreecommitdiff
path: root/python/pyspark
Commit message (Collapse)AuthorAgeFilesLines
* Merge branch 'master' into MatrixFactorizationModel-fixHossein Falaki2014-01-073-3/+3
|\
| * Merge remote-tracking branch 'apache-github/master' into remove-binariesPatrick Wendell2014-01-032-2/+2
| |\ | | | | | | | | | | | | | | | Conflicts: core/src/test/scala/org/apache/spark/DriverSuite.scala docs/python-programming-guide.md
| | * Merge pull request #317 from ScrapCodes/spark-915-segregate-scriptsPatrick Wendell2014-01-032-2/+2
| | |\ | | | | | | | | | | | | Spark-915 segregate scripts
| | | * sbin/spark-class* -> bin/spark-class*Prashant Sharma2014-01-031-1/+1
| | | |
| | | * pyspark -> bin/pysparkPrashant Sharma2014-01-021-1/+1
| | | |
| | | * Merge branch 'scripts-reorg' of github.com:shane-huang/incubator-spark into ↵Prashant Sharma2014-01-021-1/+1
| | | |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | spark-915-segregate-scripts Conflicts: bin/spark-shell core/pom.xml core/src/main/scala/org/apache/spark/SparkContext.scala core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/CoarseMesosSchedulerBackend.scala core/src/main/scala/org/apache/spark/ui/UIWorkloadGenerator.scala core/src/test/scala/org/apache/spark/DriverSuite.scala python/run-tests sbin/compute-classpath.sh sbin/spark-class sbin/stop-slaves.sh
| | | | * Merge branch 'reorgscripts' into scripts-reorgshane-huang2013-09-271-1/+1
| | | | |\
| | | | | * fix paths and change spark to use APP_MEM as application driver memory ↵shane-huang2013-09-261-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | instead of SPARK_MEM, user should add application jars to SPARK_CLASSPATH Signed-off-by: shane-huang <shengsheng.huang@intel.com>
| | | | | * added spark-class and spark-executor to sbinshane-huang2013-09-231-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: shane-huang <shengsheng.huang@intel.com>
| * | | | | Changes on top of Prashant's patch.Patrick Wendell2014-01-031-1/+1
| |/ / / / | | | | | | | | | | | | | | | Closes #316
* | | | | Added predictAll python function to MatrixFactorizationModelHossein Falaki2014-01-061-4/+6
| | | | |
* | | | | Added Rating deserializerHossein Falaki2014-01-061-3/+18
| | | | |
* | | | | Added python binding for bulk recommendationHossein Falaki2014-01-042-1/+19
|/ / / /
* | | | Merge pull request #311 from tmyklebu/masterMatei Zaharia2014-01-021-11/+55
|\ \ \ \ | |/ / / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | SPARK-991: Report information gleaned from a Python stacktrace in the UI Scala: - Added setCallSite/clearCallSite to SparkContext and JavaSparkContext. These functions mutate a LocalProperty called "externalCallSite." - Add a wrapper, getCallSite, that checks for an externalCallSite and, if none is found, calls the usual Utils.formatSparkCallSite. - Change everything that calls Utils.formatSparkCallSite to call getCallSite instead. Except getCallSite. - Add wrappers to setCallSite/clearCallSite wrappers to JavaSparkContext. Python: - Add a gruesome hack to rdd.py that inspects the traceback and guesses what you want to see in the UI. - Add a RAII wrapper around said gruesome hack that calls setCallSite/clearCallSite as appropriate. - Wire said RAII wrapper up around three calls into the Scala code. I'm not sure that I hit all the spots with the RAII wrapper. I'm also not sure that my gruesome hack does exactly what we want. One could also approach this change by refactoring runJob/submitJob/runApproximateJob to take a call site, then threading that parameter through everything that needs to know it. One might object to the pointless-looking wrappers in JavaSparkContext. Unfortunately, I can't directly access the SparkContext from Python---or, if I can, I don't know how---so I need to wrap everything that matters in JavaSparkContext. Conflicts: core/src/main/scala/org/apache/spark/api/java/JavaSparkContext.scala
| * | | Make Python function/line appear in the UI.Tor Myklebust2013-12-281-11/+55
| | | |
* | | | Fix Python code after change of getOrElseMatei Zaharia2014-01-012-7/+14
| | | |
* | | | Miscellaneous fixes from code review.Matei Zaharia2014-01-011-8/+4
| | | | | | | | | | | | | | | | | | | | | | | | Also replaced SparkConf.getOrElse with just a "get" that takes a default value, and added getInt, getLong, etc to make code that uses this simpler later on.
* | | | Merge remote-tracking branch 'apache/master' into conf2Matei Zaharia2013-12-312-9/+4
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Conflicts: core/src/main/scala/org/apache/spark/rdd/CheckpointRDD.scala streaming/src/main/scala/org/apache/spark/streaming/Checkpoint.scala streaming/src/main/scala/org/apache/spark/streaming/scheduler/JobGenerator.scala
| * \ \ \ Merge pull request #289 from tdas/filestream-fixPatrick Wendell2013-12-312-9/+4
| |\ \ \ \ | | |/ / / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Bug fixes for file input stream and checkpointing - Fixed bugs in the file input stream that led the stream to fail due to transient HDFS errors (listing files when a background thread it deleting fails caused errors, etc.) - Updated Spark's CheckpointRDD and Streaming's CheckpointWriter to use SparkContext.hadoopConfiguration, to allow checkpoints to be written to any HDFS compatible store requiring special configuration. - Changed the API of SparkContext.setCheckpointDir() - eliminated the unnecessary 'useExisting' parameter. Now SparkContext will always create a unique subdirectory within the user specified checkpoint directory. This is to ensure that previous checkpoint files are not accidentally overwritten. - Fixed bug where setting checkpoint directory as a relative local path caused the checkpointing to fail.
| | * | | Fixed Python API for sc.setCheckpointDir. Also other fixes based on ↵Tathagata Das2013-12-242-9/+4
| | | | | | | | | | | | | | | | | | | | Reynold's comments on PR 289.
* | | | | Updated docs for SparkConf and handled review commentsMatei Zaharia2013-12-302-17/+31
| | | | |
* | | | | Properly show Spark properties on web UI, and change app name propertyMatei Zaharia2013-12-292-3/+3
| | | | |
* | | | | Fix some Python docs and make sure to unset SPARK_TESTING in PythonMatei Zaharia2013-12-294-20/+35
| | | | | | | | | | | | | | | | | | | | tests so we don't get the test spark.conf on the classpath.
* | | | | Merge remote-tracking branch 'origin/master' into conf2Matei Zaharia2013-12-299-2/+599
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Conflicts: core/src/main/scala/org/apache/spark/SparkContext.scala core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala core/src/main/scala/org/apache/spark/scheduler/cluster/ClusterTaskSetManager.scala core/src/main/scala/org/apache/spark/scheduler/local/LocalScheduler.scala core/src/main/scala/org/apache/spark/util/MetadataCleaner.scala core/src/test/scala/org/apache/spark/scheduler/TaskResultGetterSuite.scala core/src/test/scala/org/apache/spark/scheduler/TaskSetManagerSuite.scala new-yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala streaming/src/main/scala/org/apache/spark/streaming/Checkpoint.scala streaming/src/main/scala/org/apache/spark/streaming/api/java/JavaStreamingContext.scala streaming/src/main/scala/org/apache/spark/streaming/scheduler/JobGenerator.scala streaming/src/test/scala/org/apache/spark/streaming/BasicOperationsSuite.scala streaming/src/test/scala/org/apache/spark/streaming/CheckpointSuite.scala streaming/src/test/scala/org/apache/spark/streaming/InputStreamsSuite.scala streaming/src/test/scala/org/apache/spark/streaming/TestSuiteBase.scala streaming/src/test/scala/org/apache/spark/streaming/WindowOperationsSuite.scala
| * | | | Merge pull request #283 from tmyklebu/masterMatei Zaharia2013-12-268-1/+598
| |\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Python bindings for mllib This pull request contains Python bindings for the regression, clustering, classification, and recommendation tools in mllib. For each 'train' frontend exposed, there is a Scala stub in PythonMLLibAPI.scala and a Python stub in mllib.py. The Python stub serialises the input RDD and any vector/matrix arguments into a mutually-understood format and calls the Scala stub. The Scala stub deserialises the RDD and the vector/matrix arguments, calls the appropriate 'train' function, serialises the resulting model, and returns the serialised model. ALSModel is slightly different since a MatrixFactorizationModel has RDDs inside. The Scala stub returns a handle to a Scala MatrixFactorizationModel; prediction is done by calling the Scala predict method. I have tested these bindings on an x86_64 machine running Linux. There is a risk that these bindings may fail on some choose-your-own-endian platform if Python's endian differs from java.nio.ByteBuffer's idea of the native byte order.
| | * | | | Remove commented code in __init__.py.Tor Myklebust2013-12-251-8/+0
| | | | | |
| | * | | | Fix copypasta in __init__.py. Don't import anything directly into ↵Tor Myklebust2013-12-251-26/+8
| | | | | | | | | | | | | | | | | | | | | | | | pyspark.mllib.
| | * | | | Initial weights in Scala are ones; do that too. Also fix some errors.Tor Myklebust2013-12-251-6/+6
| | | | | |
| | * | | | Split the mllib bindings into a whole bunch of modules and rename some things.Tor Myklebust2013-12-257-183/+409
| | | | | |
| | * | | | Remove useless line from test stub.Tor Myklebust2013-12-241-1/+0
| | | | | |
| | * | | | Python change for move of PythonMLLibAPI.Tor Myklebust2013-12-241-1/+1
| | | | | |
| | * | | | Release JVM reference to the ALSModel when done.Tor Myklebust2013-12-221-2/+2
| | | | | |
| | * | | | Python stubs for ALSModel.Tor Myklebust2013-12-212-8/+56
| | | | | |
| | * | | | Un-semicolon mllib.py.Tor Myklebust2013-12-201-11/+11
| | | | | |
| | * | | | Change some docstrings and add some others.Tor Myklebust2013-12-201-1/+3
| | | | | |
| | * | | | Licence notice.Tor Myklebust2013-12-201-0/+17
| | | | | |
| | * | | | Whitespace.Tor Myklebust2013-12-201-1/+1
| | | | | |
| | * | | | Remove gigantic endian-specific test and exception tests.Tor Myklebust2013-12-201-38/+3
| | | | | |
| | * | | | Tests for the Python side of the mllib bindings.Tor Myklebust2013-12-201-52/+172
| | | | | |
| | * | | | Python stubs for classification and clustering.Tor Myklebust2013-12-202-16/+96
| | | | | |
| | * | | | Python side of python bindings for linear, Lasso, and ridge regressionTor Myklebust2013-12-192-15/+72
| | | | | |
| | * | | | Incorporate most of Josh's style suggestions. I don't want to deal with the ↵Tor Myklebust2013-12-192-98/+91
| | | | | | | | | | | | | | | | | | | | | | | | type and length checking errors until we've got at least one working stub that we're all happy with.
| | * | | | The rest of the Python side of those bindings.Tor Myklebust2013-12-193-2/+4
| | | | | |
| | * | | | First cut at python mllib bindings. Only LinearRegression is supported.Tor Myklebust2013-12-191-0/+114
| | | | | |
| * | | | | Typo: avaiable -> availableAndrew Ash2013-12-241-1/+1
| | |/ / / | |/| | |
* | | | | Add Python docs about SparkConfMatei Zaharia2013-12-292-1/+44
| | | | |
* | | | | Fix some other Python tests due to initializing JVM in a different wayMatei Zaharia2013-12-292-10/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The test in context.py created two different instances of the SparkContext class by copying "globals", so that some tests can have a global "sc" object and others can try initializing their own contexts. This led to two JVM gateways being created since SparkConf also looked at pyspark.context.SparkContext to get the JVM.
* | | | | Add SparkConf support in PythonMatei Zaharia2013-12-294-13/+146
| | | | |
* | | | | Fix Python use of getLocalDirMatei Zaharia2013-12-291-1/+1
|/ / / /
* | | | Merge pull request #276 from shivaram/collectPartitionReynold Xin2013-12-192-4/+6
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add collectPartition to JavaRDD interface. This interface is useful for implementing `take` from other language frontends where the data is serialized. Also remove `takePartition` from PythonRDD and use `collectPartition` in rdd.py. Thanks @concretevitamin for the original change and tests.