aboutsummaryrefslogtreecommitdiff
path: root/python/pyspark/sql/column.py
diff options
context:
space:
mode:
authorBurak Yavuz <brkyvz@gmail.com>2015-08-06 10:29:40 -0700
committerXiangrui Meng <meng@databricks.com>2015-08-06 10:29:40 -0700
commit98e69467d4fda2c26a951409b5b7c6f1e9345ce4 (patch)
tree79802e82268885bacdc4b0e4aecaaf4e936e52b5 /python/pyspark/sql/column.py
parent076ec056818a65216eaf51aa5b3bd8f697c34748 (diff)
downloadspark-98e69467d4fda2c26a951409b5b7c6f1e9345ce4.tar.gz
spark-98e69467d4fda2c26a951409b5b7c6f1e9345ce4.tar.bz2
spark-98e69467d4fda2c26a951409b5b7c6f1e9345ce4.zip
[SPARK-9615] [SPARK-9616] [SQL] [MLLIB] Bugs related to FrequentItems when merging and with Tungsten
In short: 1- FrequentItems should not use the InternalRow representation, because the keys in the map get messed up. For example, every key in the Map correspond to the very last element observed in the partition, when the elements are strings. 2- Merging two partitions had a bug: **Existing behavior with size 3** Partition A -> Map(1 -> 3, 2 -> 3, 3 -> 4) Partition B -> Map(4 -> 25) Result -> Map() **Correct Behavior:** Partition A -> Map(1 -> 3, 2 -> 3, 3 -> 4) Partition B -> Map(4 -> 25) Result -> Map(3 -> 1, 4 -> 22) cc mengxr rxin JoshRosen Author: Burak Yavuz <brkyvz@gmail.com> Closes #7945 from brkyvz/freq-fix and squashes the following commits: 07fa001 [Burak Yavuz] address 2 1dc61a8 [Burak Yavuz] address 1 506753e [Burak Yavuz] fixed and added reg test 47bfd50 [Burak Yavuz] pushing
Diffstat (limited to 'python/pyspark/sql/column.py')
0 files changed, 0 insertions, 0 deletions