diff options
author | 0x0FFF <programmerag@gmail.com> | 2015-09-02 13:36:36 -0700 |
---|---|---|
committer | Davies Liu <davies.liu@gmail.com> | 2015-09-02 13:36:36 -0700 |
commit | 6cd98c1878a9c5c6475ed5974643021ab27862a7 (patch) | |
tree | 662254b085711c660660d1df9e95f07c421870d2 /python/pyspark/sql/column.py | |
parent | 2da3a9e98e5d129d4507b5db01bba5ee9558d28e (diff) | |
download | spark-6cd98c1878a9c5c6475ed5974643021ab27862a7.tar.gz spark-6cd98c1878a9c5c6475ed5974643021ab27862a7.tar.bz2 spark-6cd98c1878a9c5c6475ed5974643021ab27862a7.zip |
[SPARK-10417] [SQL] Iterating through Column results in infinite loop
`pyspark.sql.column.Column` object has `__getitem__` method, which makes it iterable for Python. In fact it has `__getitem__` to address the case when the column might be a list or dict, for you to be able to access certain element of it in DF API. The ability to iterate over it is just a side effect that might cause confusion for the people getting familiar with Spark DF (as you might iterate this way on Pandas DF for instance)
Issue reproduction:
```
df = sqlContext.jsonRDD(sc.parallelize(['{"name": "El Magnifico"}']))
for i in df["name"]: print i
```
Author: 0x0FFF <programmerag@gmail.com>
Closes #8574 from 0x0FFF/SPARK-10417.
Diffstat (limited to 'python/pyspark/sql/column.py')
-rw-r--r-- | python/pyspark/sql/column.py | 3 |
1 files changed, 3 insertions, 0 deletions
diff --git a/python/pyspark/sql/column.py b/python/pyspark/sql/column.py index 0948f9b27c..56e75e8cae 100644 --- a/python/pyspark/sql/column.py +++ b/python/pyspark/sql/column.py @@ -226,6 +226,9 @@ class Column(object): raise AttributeError(item) return self.getField(item) + def __iter__(self): + raise TypeError("Column is not iterable") + # string methods rlike = _bin_op("rlike") like = _bin_op("like") |