diff options
author | pierre-borckmans <pierre.borckmans@realimpactanalytics.com> | 2015-12-22 23:00:42 -0800 |
---|---|---|
committer | Reynold Xin <rxin@databricks.com> | 2015-12-22 23:00:42 -0800 |
commit | 43b2a6390087b7ce262a54dc8ab8dd825db62e21 (patch) | |
tree | 958bb0b86a5d040d4064d53786824274193cebd6 /data/mllib | |
parent | 50301c0a28b64c5348b0f2c2d828589c0833c70c (diff) | |
download | spark-43b2a6390087b7ce262a54dc8ab8dd825db62e21.tar.gz spark-43b2a6390087b7ce262a54dc8ab8dd825db62e21.tar.bz2 spark-43b2a6390087b7ce262a54dc8ab8dd825db62e21.zip |
[SPARK-12477][SQL] - Tungsten projection fails for null values in array fields
Accessing null elements in an array field fails when tungsten is enabled.
It works in Spark 1.3.1, and in Spark > 1.5 with Tungsten disabled.
This PR solves this by checking if the accessed element in the array field is null, in the generated code.
Example:
```
// Array of String
case class AS( as: Seq[String] )
val dfAS = sc.parallelize( Seq( AS ( Seq("a",null,"b") ) ) ).toDF
dfAS.registerTempTable("T_AS")
for (i <- 0 to 2) { println(i + " = " + sqlContext.sql(s"select as[$i] from T_AS").collect.mkString(","))}
```
With Tungsten disabled:
```
0 = [a]
1 = [null]
2 = [b]
```
With Tungsten enabled:
```
0 = [a]
15/12/22 09:32:50 ERROR Executor: Exception in task 7.0 in stage 1.0 (TID 15)
java.lang.NullPointerException
at org.apache.spark.sql.catalyst.expressions.UnsafeRowWriters$UTF8StringWriter.getSize(UnsafeRowWriters.java:90)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown Source)
at org.apache.spark.sql.execution.TungstenProject$$anonfun$3$$anonfun$apply$3.apply(basicOperators.scala:90)
at org.apache.spark.sql.execution.TungstenProject$$anonfun$3$$anonfun$apply$3.apply(basicOperators.scala:88)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
```
Author: pierre-borckmans <pierre.borckmans@realimpactanalytics.com>
Closes #10429 from pierre-borckmans/SPARK-12477_Tungsten-Projection-Null-Element-In-Array.
Diffstat (limited to 'data/mllib')
0 files changed, 0 insertions, 0 deletions