diff options
author | Davies Liu <davies.liu@gmail.com> | 2014-09-06 16:12:29 -0700 |
---|---|---|
committer | Josh Rosen <joshrosen@apache.org> | 2014-09-06 16:12:29 -0700 |
commit | 110fb8b24d2454ad7c979c3934dbed87650f17b8 (patch) | |
tree | 0d3e49877f108d58557d2755b7acfbefa75edc0e /python/pyspark/rdd.py | |
parent | 21a1e1bb893512b2f68598ab0c0ec8c33e8d9909 (diff) | |
download | spark-110fb8b24d2454ad7c979c3934dbed87650f17b8.tar.gz spark-110fb8b24d2454ad7c979c3934dbed87650f17b8.tar.bz2 spark-110fb8b24d2454ad7c979c3934dbed87650f17b8.zip |
[SPARK-2334] fix AttributeError when call PipelineRDD.id()
The underline JavaRDD for PipelineRDD is created lazily, it's delayed until call _jrdd.
The id of JavaRDD is cached as `_id`, it saves a RPC call in py4j for later calls.
closes #1276
Author: Davies Liu <davies.liu@gmail.com>
Closes #2296 from davies/id and squashes the following commits:
e197958 [Davies Liu] fix style
9721716 [Davies Liu] fix id of PipelineRDD
Diffstat (limited to 'python/pyspark/rdd.py')
-rw-r--r-- | python/pyspark/rdd.py | 6 |
1 files changed, 6 insertions, 0 deletions
diff --git a/python/pyspark/rdd.py b/python/pyspark/rdd.py index aa90297855..266090e3ae 100644 --- a/python/pyspark/rdd.py +++ b/python/pyspark/rdd.py @@ -2075,6 +2075,7 @@ class PipelinedRDD(RDD): self.ctx = prev.ctx self.prev = prev self._jrdd_val = None + self._id = None self._jrdd_deserializer = self.ctx.serializer self._bypass_serializer = False self._partitionFunc = prev._partitionFunc if self.preservesPartitioning else None @@ -2105,6 +2106,11 @@ class PipelinedRDD(RDD): self._jrdd_val = python_rdd.asJavaRDD() return self._jrdd_val + def id(self): + if self._id is None: + self._id = self._jrdd.id() + return self._id + def _is_pipelinable(self): return not (self.is_cached or self.is_checkpointed) |