diff options
author | Josh Rosen <joshrosen@apache.org> | 2014-07-26 17:37:05 -0700 |
---|---|---|
committer | Matei Zaharia <matei@databricks.com> | 2014-07-26 17:37:05 -0700 |
commit | ba46bbed5d32aec0f11f0b71c82bba8dbe19f05a (patch) | |
tree | 5826bc60fdb70aebf9b0a9e3887dbce96d526851 /python/pyspark/tests.py | |
parent | 12901643b7e808aa75cf0b19e2d0c3d40b1a978d (diff) | |
download | spark-ba46bbed5d32aec0f11f0b71c82bba8dbe19f05a.tar.gz spark-ba46bbed5d32aec0f11f0b71c82bba8dbe19f05a.tar.bz2 spark-ba46bbed5d32aec0f11f0b71c82bba8dbe19f05a.zip |
[SPARK-2601] [PySpark] Fix Py4J error when transforming pickleFiles
Similar to SPARK-1034, the problem was that Py4J didn’t cope well with the fake ClassTags used in the Java API. It doesn’t look like there’s any reason why PythonRDD needs to take a ClassTag, since it just ignores the type of the previous RDD, so I removed the type parameter and we no longer pass ClassTags from Python.
Author: Josh Rosen <joshrosen@apache.org>
Closes #1605 from JoshRosen/spark-2601 and squashes the following commits:
b68e118 [Josh Rosen] Fix Py4J error when transforming pickleFiles [SPARK-2601]
Diffstat (limited to 'python/pyspark/tests.py')
-rw-r--r-- | python/pyspark/tests.py | 9 |
1 files changed, 9 insertions, 0 deletions
diff --git a/python/pyspark/tests.py b/python/pyspark/tests.py index a92abbf371..8ba51461d1 100644 --- a/python/pyspark/tests.py +++ b/python/pyspark/tests.py @@ -226,6 +226,15 @@ class TestRDDFunctions(PySparkTestCase): cart = rdd1.cartesian(rdd2) result = cart.map(lambda (x, y): x + y).collect() + def test_transforming_pickle_file(self): + # Regression test for SPARK-2601 + data = self.sc.parallelize(["Hello", "World!"]) + tempFile = tempfile.NamedTemporaryFile(delete=True) + tempFile.close() + data.saveAsPickleFile(tempFile.name) + pickled_file = self.sc.pickleFile(tempFile.name) + pickled_file.map(lambda x: x).collect() + def test_cartesian_on_textfile(self): # Regression test for path = os.path.join(SPARK_HOME, "python/test_support/hello.txt") |