aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorCheng Lian <lian@databricks.com>2015-03-02 16:18:00 -0800
committerMichael Armbrust <michael@databricks.com>2015-03-02 16:18:00 -0800
commit1a49496b4a9df40c74739fc0fb8a21c88a477075 (patch)
tree735ac85bc69402ec931557010a6662ec2dbe584d
parent8223ce6a81e4cc9fdf816892365fcdff4006c35e (diff)
downloadspark-1a49496b4a9df40c74739fc0fb8a21c88a477075.tar.gz
spark-1a49496b4a9df40c74739fc0fb8a21c88a477075.tar.bz2
spark-1a49496b4a9df40c74739fc0fb8a21c88a477075.zip
[SPARK-6082] [SQL] Provides better error message for malformed rows when caching tables
Constructs like Hive `TRANSFORM` may generate malformed rows (via badly authored external scripts for example). I'm a bit hesitant to have this feature, since it introduces per-tuple cost when caching tables. However, considering caching tables is usually a one-time cost, this is probably worth having. <!-- Reviewable:start --> [<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/4842) <!-- Reviewable:end --> Author: Cheng Lian <lian@databricks.com> Closes #4842 from liancheng/spark-6082 and squashes the following commits: b05dbff [Cheng Lian] Provides better error message for malformed rows when caching tables
-rw-r--r--sql/core/src/main/scala/org/apache/spark/sql/columnar/InMemoryColumnarTableScan.scala11
1 files changed, 11 insertions, 0 deletions
diff --git a/sql/core/src/main/scala/org/apache/spark/sql/columnar/InMemoryColumnarTableScan.scala b/sql/core/src/main/scala/org/apache/spark/sql/columnar/InMemoryColumnarTableScan.scala
index 11d5943fb4..8944a32bc3 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/columnar/InMemoryColumnarTableScan.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/columnar/InMemoryColumnarTableScan.scala
@@ -119,6 +119,17 @@ private[sql] case class InMemoryRelation(
var rowCount = 0
while (rowIterator.hasNext && rowCount < batchSize) {
val row = rowIterator.next()
+
+ // Added for SPARK-6082. This assertion can be useful for scenarios when something
+ // like Hive TRANSFORM is used. The external data generation script used in TRANSFORM
+ // may result malformed rows, causing ArrayIndexOutOfBoundsException, which is somewhat
+ // hard to decipher.
+ assert(
+ row.size == columnBuilders.size,
+ s"""Row column number mismatch, expected ${output.size} columns, but got ${row.size}.
+ |Row content: $row
+ """.stripMargin)
+
var i = 0
while (i < row.length) {
columnBuilders(i).appendFrom(row, i)