aboutsummaryrefslogtreecommitdiff
path: root/python/pyspark/sql/tests.py
diff options
context:
space:
mode:
authorDavies Liu <davies@databricks.com>2015-02-27 20:07:17 -0800
committerJosh Rosen <joshrosen@databricks.com>2015-02-27 20:07:17 -0800
commite0e64ba4b1b8eb72e856286f756c65fa22ab0a36 (patch)
treeca358052d6b572756ecbbf98133a093db3f4cc83 /python/pyspark/sql/tests.py
parent8c468a6600e0deb5464990df60148212e64fdecd (diff)
downloadspark-e0e64ba4b1b8eb72e856286f756c65fa22ab0a36.tar.gz
spark-e0e64ba4b1b8eb72e856286f756c65fa22ab0a36.tar.bz2
spark-e0e64ba4b1b8eb72e856286f756c65fa22ab0a36.zip
[SPARK-6055] [PySpark] fix incorrect __eq__ of DataType
The _eq_ of DataType is not correct, class cache is not use correctly (created class can not be find by dataType), then it will create lots of classes (saved in _cached_cls), never released. Also, all same DataType have same hash code, there will be many object in a dict with the same hash code, end with hash attach, it's very slow to access this dict (depends on the implementation of CPython). This PR also improve the performance of inferSchema (avoid the unnecessary converter of object). cc pwendell JoshRosen Author: Davies Liu <davies@databricks.com> Closes #4808 from davies/leak and squashes the following commits: 6a322a4 [Davies Liu] tests refactor 3da44fc [Davies Liu] fix __eq__ of Singleton 534ac90 [Davies Liu] add more checks 46999dc [Davies Liu] fix tests d9ae973 [Davies Liu] fix memory leak in sql
Diffstat (limited to 'python/pyspark/sql/tests.py')
-rw-r--r--python/pyspark/sql/tests.py9
1 files changed, 9 insertions, 0 deletions
diff --git a/python/pyspark/sql/tests.py b/python/pyspark/sql/tests.py
index 83899ad4b1..2720439416 100644
--- a/python/pyspark/sql/tests.py
+++ b/python/pyspark/sql/tests.py
@@ -24,6 +24,7 @@ import sys
import pydoc
import shutil
import tempfile
+import pickle
import py4j
@@ -88,6 +89,14 @@ class ExamplePoint:
other.x == self.x and other.y == self.y
+class DataTypeTests(unittest.TestCase):
+ # regression test for SPARK-6055
+ def test_data_type_eq(self):
+ lt = LongType()
+ lt2 = pickle.loads(pickle.dumps(LongType()))
+ self.assertEquals(lt, lt2)
+
+
class SQLTests(ReusedPySparkTestCase):
@classmethod