| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
| |
Author: Prashant Sharma <prashant.s@imaginea.com>
Author: Prashant Sharma <scrapcodes@gmail.com>
Closes #262 from ScrapCodes/SPARK-1336/ReduceVerbosity and squashes the following commits:
87dfa54 [Prashant Sharma] Further reduction in noise and made pyspark tests to fail fast.
811170f [Prashant Sharma] Reducing the ouput of run-tests script.
|
|
|
|
|
|
|
|
|
| |
Author: Prashant Sharma <prashant.s@imaginea.com>
Closes #235 from ScrapCodes/SPARK-1322/top-rev-sort and squashes the following commits:
f316266 [Prashant Sharma] Minor change in comment.
58e58c6 [Prashant Sharma] SPARK-1322, top in pyspark should sort result in descending order.
|
|
|
|
|
|
|
|
|
|
| |
Doctest added for map in rdd.py
Author: Jyotiska NK <jyotiska123@gmail.com>
Closes #177 from jyotiska/pyspark_rdd_map_doctest and squashes the following commits:
a38527f [Jyotiska NK] Added doctest for map function in rdd.py
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Here's the addition of min and max to statscounter.py and min and max methods to rdd.py.
Author: Dan McClary <dan.mcclary@gmail.com>
Closes #144 from dwmclary/SPARK-1246-add-min-max-to-stat-counter and squashes the following commits:
fd3fd4b [Dan McClary] fixed error, updated test
82cde0e [Dan McClary] flipped incorrectly assigned inf values in StatCounter
5d96799 [Dan McClary] added max and min to StatCounter repr for pyspark
21dd366 [Dan McClary] added max and min to StatCounter output, updated doc
1a97558 [Dan McClary] added max and min to StatCounter output, updated doc
a5c13b0 [Dan McClary] Added min and max to Scala and Java RDD, added min and max to StatCounter
ed67136 [Dan McClary] broke min/max out into separate transaction, added to rdd.py
1e7056d [Dan McClary] added underscore to getBucket
37a7dea [Dan McClary] cleaned up boundaries for histogram -- uses real min/max when buckets are derived
29981f2 [Dan McClary] fixed indentation on doctest comment
eaf89d9 [Dan McClary] added correct doctest for histogram
4916016 [Dan McClary] added histogram method, added max and min to statscounter
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
https://spark-project.atlassian.net/browse/SPARK-1240
It seems that the current implementation does not handle the empty RDD case when run takeSample
In this patch, before calling sample() inside takeSample API, I add a checker for this case and returns an empty Array when it's a empty RDD; also in sample(), I add a checker for the invalid fraction value
In the test case, I also add several lines for this case
Author: CodingCat <zhunansjtu@gmail.com>
Closes #135 from CodingCat/SPARK-1240 and squashes the following commits:
fef57d4 [CodingCat] fix the same problem in PySpark
36db06b [CodingCat] create new test cases for takeSample from an empty red
810948d [CodingCat] further fix
a40e8fb [CodingCat] replace if with require
ad483fd [CodingCat] handle the case with empty RDD when take sample
|
|
|
|
|
|
|
|
| |
Author: Prashant Sharma <prashant.s@imaginea.com>
Closes #93 from ScrapCodes/SPARK-1162/pyspark-top-takeOrdered and squashes the following commits:
ece1fa4 [Prashant Sharma] Added top in python.
|
|
|
|
|
|
|
|
| |
Author: prabinb <prabin.banka@imaginea.com>
Closes #92 from prabinb/python-api-rdd and squashes the following commits:
51129ca [prabinb] Added missing Python RDD functions Added __repr__ function to StorageLevel class. Added doctest for RDD.getStorageLevel().
|
|
|
|
|
|
|
|
| |
Author: Prashant Sharma <prashant.s@imaginea.com>
Closes #115 from ScrapCodes/SPARK-1168/pyspark-foldByKey and squashes the following commits:
db6f67e [Prashant Sharma] SPARK-1168, Added foldByKey to pyspark.
|
|
|
|
|
|
|
|
|
|
|
| |
(resubmitted)
Author: jyotiska <jyotiska123@gmail.com>
Closes #34 from jyotiska/pyspark_code and squashes the following commits:
c9439be [jyotiska] replaced dict with namedtuple
a6bf4cd [jyotiska] added callsite info for context.py
|
|
|
|
|
|
|
|
|
|
| |
was raised earlier as a part of apache/incubator-spark#486
Author: Prabin Banka <prabin.banka@imaginea.com>
Closes #76 from prabinb/python-api-zip and squashes the following commits:
b1a31a0 [Prabin Banka] Added Python RDD.zip function
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
(Continued from old repo, prior discussion at https://github.com/apache/incubator-spark/pull/615)
This patch cements our deprecation of the SPARK_MEM environment variable by replacing it with three more specialized variables:
SPARK_DAEMON_MEMORY, SPARK_EXECUTOR_MEMORY, and SPARK_DRIVER_MEMORY
The creation of the latter two variables means that we can safely set driver/job memory without accidentally setting the executor memory. Neither is public.
SPARK_EXECUTOR_MEMORY is only used by the Mesos scheduler (and set within SparkContext). The proper way of configuring executor memory is through the "spark.executor.memory" property.
SPARK_DRIVER_MEMORY is the new way of specifying the amount of memory run by jobs launched by spark-class, without possibly affecting executor memory.
Other memory considerations:
- The repl's memory can be set through the "--drivermem" command-line option, which really just sets SPARK_DRIVER_MEMORY.
- run-example doesn't use spark-class, so the only way to modify examples' memory is actually an unusual use of SPARK_JAVA_OPTS (which is normally overriden in all cases by spark-class).
This patch also fixes a lurking bug where spark-shell misused spark-class (the first argument is supposed to be the main class name, not java options), as well as a bug in the Windows spark-class2.cmd. I have not yet tested this patch on either Windows or Mesos, however.
Author: Aaron Davidson <aaron@databricks.com>
Closes #99 from aarondav/sparkmem and squashes the following commits:
9df4c68 [Aaron Davidson] SPARK-929: Fully deprecate usage of SPARK_MEM
|
|
|
|
|
|
|
|
|
|
|
|
| |
Author: Prashant Sharma <prashant.s@imaginea.com>
Author: Prashant Sharma <scrapcodes@gmail.com>
Closes #80 from ScrapCodes/SPARK-1165/RDD.intersection and squashes the following commits:
9b015e9 [Prashant Sharma] Added a note, shuffle is required for intersection.
1fea813 [Prashant Sharma] correct the lines wrapping
d0c71f3 [Prashant Sharma] SPARK-1165 RDD.intersection in java
d6effee [Prashant Sharma] SPARK-1165 Implemented RDD.intersection in python.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The following Python APIs are added,
RDD.id()
SparkContext.setJobGroup()
SparkContext.setLocalProperty()
SparkContext.getLocalProperty()
SparkContext.sparkUser()
was raised earlier as a part of apache/incubator-spark#486
Author: Prabin Banka <prabin.banka@imaginea.com>
Closes #75 from prabinb/python-api-backup and squashes the following commits:
cc3c6cd [Prabin Banka] Added missing Python APIs
|
|
|
|
|
|
|
|
| |
Author: Prashant Sharma <prashant.s@imaginea.com>
Closes #73 from ScrapCodes/SPARK-1109/wrong-API-docs and squashes the following commits:
1a55b58 [Prashant Sharma] SPARK-1109 wrong API docs for pyspark map function
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This surroungs the complete worker code in a try/except block so we catch any error that arrives. An example would be the depickling failing for some reason
@JoshRosen
Author: Bouke van der Bijl <boukevanderbijl@gmail.com>
Closes #644 from bouk/catch-depickling-errors and squashes the following commits:
f0f67cc [Bouke van der Bijl] Lol indentation
0e4d504 [Bouke van der Bijl] Surround the complete python worker with the try block
|
|
|
|
|
|
|
|
|
|
| |
Updated doctests for mapValues and flatMapValues in rdd.py
Author: jyotiska <jyotiska123@gmail.com>
Closes #621 from jyotiska/python_spark and squashes the following commits:
716f7cd [jyotiska] doctest updated for mapValues, flatMapValues in rdd.py
|
|
|
|
|
|
|
|
|
|
| |
Fixed minor typo in worker.py
Author: jyotiska <jyotiska123@gmail.com>
Closes #630 from jyotiska/pyspark_code and squashes the following commits:
ee44201 [jyotiska] typo fixed in worker.py
|
|
|
|
|
|
|
|
|
|
| |
Patch to allow PySpark to use existing JVM and Gateway. Changes to PySpark implementation of SparkConf to take existing SparkConf JVM handle. Change to PySpark SparkContext to allow subclass specific context initialization.
Author: Ahir Reddy <ahirreddy@gmail.com>
Closes #622 from ahirreddy/pyspark-existing-jvm and squashes the following commits:
a86f457 [Ahir Reddy] Patch to allow PySpark to use existing JVM and Gateway. Changes to PySpark implementation of SparkConf to take existing SparkConf JVM handle. Change to PySpark SparkContext to allow subclass specific context initialization.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Added example Python code for sort
I added an example Python code for sort. Right now, PySpark has limited examples for new people willing to use the project. This example code sorts integers stored in a file. I was able to sort 5 million, 10 million and 25 million integers with this code.
Author: jyotiska <jyotiska123@gmail.com>
== Merge branch commits ==
commit 8ad8faf6c8e02ae1cd68565d98524edf165f54df
Author: jyotiska <jyotiska123@gmail.com>
Date: Sun Feb 9 11:00:41 2014 +0530
Added comments in code on collect() method
commit 6f98f1e313f4472a7c2207d36c4f0fbcebc95a8c
Author: jyotiska <jyotiska123@gmail.com>
Date: Sat Feb 8 13:12:37 2014 +0530
Updated python example code sort.py
commit 945e39a5d68daa7e5bab0d96cbd35d7c4b04eafb
Author: jyotiska <jyotiska123@gmail.com>
Date: Sat Feb 8 12:59:09 2014 +0530
Added example python code for sort
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Version number to 1.0.0-SNAPSHOT
Since 0.9.0-incubating is done and out the door, we shouldn't be building 0.9.0-incubating-SNAPSHOT anymore.
@pwendell
Author: Mark Hamstra <markhamstra@gmail.com>
== Merge branch commits ==
commit 1b00a8a7c1a7f251b4bb3774b84b9e64758eaa71
Author: Mark Hamstra <markhamstra@gmail.com>
Date: Wed Feb 5 09:30:32 2014 -0800
Version number to 1.0.0-SNAPSHOT
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Python api additions
Author: Prashant Sharma <prashant.s@imaginea.com>
== Merge branch commits ==
commit 8b51591f1a7a79a62c13ee66ff8d83040f7eccd8
Author: Prashant Sharma <prashant.s@imaginea.com>
Date: Fri Jan 24 11:50:29 2014 +0530
Josh's and Patricks review comments.
commit d37f9677838e43bef6c18ef61fbf08055ba6d1ca
Author: Prashant Sharma <prashant.s@imaginea.com>
Date: Thu Jan 23 17:27:17 2014 +0530
fixed doc tests
commit 27cb54bf5c99b1ea38a73858c291d0a1c43d8b7c
Author: Prashant Sharma <prashant.s@imaginea.com>
Date: Thu Jan 23 16:48:43 2014 +0530
Added keys and values methods for PairFunctions in python
commit 4ce76b396fbaefef2386d7a36d611572bdef9b5d
Author: Prashant Sharma <prashant.s@imaginea.com>
Date: Thu Jan 23 13:51:26 2014 +0530
Added foreachPartition
commit 05f05341a187cba829ac0e6c2bdf30be49948c89
Author: Prashant Sharma <prashant.s@imaginea.com>
Date: Thu Jan 23 13:02:59 2014 +0530
Added coalesce fucntion to python API
commit 6568d2c2fa14845dc56322c0f39ba2e13b3b26dd
Author: Prashant Sharma <prashant.s@imaginea.com>
Date: Thu Jan 23 12:52:44 2014 +0530
added repartition function to python API.
|
|
|
|
|
|
|
|
|
| |
This fixes SPARK-1043, a bug introduced in 0.9.0
where PySpark couldn't serialize strings > 64kB.
This fix was written by @tyro89 and @bouk in #512.
This commit squashes and rebases their pull request
in order to fix some merge conflicts.
|
|\
| |
| |
| |
| |
| | |
Fix PySpark hang when input files are deleted (SPARK-1025)
This pull request addresses [SPARK-1025](https://spark-project.atlassian.net/browse/SPARK-1025), an issue where PySpark could hang if its input files were deleted.
|
| | |
|
|/
|
|
|
|
| |
Also, replace the last reference to it in the docs.
This fixes SPARK-1026.
|
| |
|
| |
|
|\
| |
| |
| |
| |
| | |
Re-enable Python MLlib tests (require Python 2.7 and NumPy 1.7+)
We disabled these earlier because Jenkins didn't have these versions.
|
| | |
|
| | |
|
|/
|
|
|
|
|
|
|
|
|
|
|
| |
Remove Typesafe Config usage and conf files to fix nested property names
With Typesafe Config we had the subtle problem of no longer allowing
nested property names, which are used for a few of our properties:
http://apache-spark-developers-list.1001551.n3.nabble.com/Config-properties-broken-in-master-td208.html
This PR is for branch 0.9 but should be added into master too.
(cherry picked from commit 34e911ce9a9f91f3259189861779032069257852)
Signed-off-by: Patrick Wendell <pwendell@gmail.com>
|
| |
|
|
|
|
|
|
| |
This helps in case the exception happened while serializing a record to
be sent to Java, leaving the stream to Java in an inconsistent state
where PythonRDD won't be able to read the error.
|
|
|
|
|
|
|
| |
We've used camel case in other Spark methods so it felt reasonable to
keep using it here and make the code match Scala/Java as much as
possible. Note that parameter names matter in Python because it allows
passing optional parameters by name.
|
|
|
|
|
|
|
|
|
|
|
|
| |
- Added a Python wrapper for Naive Bayes
- Updated the Scala Naive Bayes to match the style of our other
algorithms better and in particular make it easier to call from Java
(added builder pattern, removed default value in train method)
- Updated Python MLlib functions to not require a SparkContext; we can
get that from the RDD the user gives
- Added a toString method in LabeledPoint
- Made the Python MLlib tests run as part of run-tests as well (before
they could only be run individually through each file)
|
|\ |
|
| |\
| | |
| | |
| | |
| | |
| | | |
Conflicts:
core/src/test/scala/org/apache/spark/DriverSuite.scala
docs/python-programming-guide.md
|
| | |\
| | | |
| | | |
| | | | |
Spark-915 segregate scripts
|
| | | | |
|
| | | | |
|
| | | |\
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
spark-915-segregate-scripts
Conflicts:
bin/spark-shell
core/pom.xml
core/src/main/scala/org/apache/spark/SparkContext.scala
core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/CoarseMesosSchedulerBackend.scala
core/src/main/scala/org/apache/spark/ui/UIWorkloadGenerator.scala
core/src/test/scala/org/apache/spark/DriverSuite.scala
python/run-tests
sbin/compute-classpath.sh
sbin/spark-class
sbin/stop-slaves.sh
|
| | | | |\ |
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
instead of SPARK_MEM, user should add application jars to SPARK_CLASSPATH
Signed-off-by: shane-huang <shengsheng.huang@intel.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Signed-off-by: shane-huang <shengsheng.huang@intel.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Signed-off-by: shane-huang <shengsheng.huang@intel.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Closes #316
|
| |/ / / / |
|
| | | | | |
|
| | | | | |
|
|/ / / / |
|