[SPARK-6082] [SQL] Provides better error message for malformed rows when caching tables

Constructs like Hive `TRANSFORM` may generate malformed rows (via badly authored external scripts for example). I'm a bit hesitant to have this feature, since it introduces per-tuple cost when caching tables. However, considering caching tables is usually a one-time cost, this is probably worth having.  [<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/4842)  Author: Cheng Lian <lian@databricks.com> Closes #4842 from liancheng/spark-6082 and squashes the following commits: b05dbff [Cheng Lian] Provides better error message for malformed rows when caching tables
author: Cheng Lian <lian@databricks.com> 2015-03-02 16:18:00 -0800
committer: Michael Armbrust <michael@databricks.com> 2015-03-02 16:18:00 -0800
commit: 1a49496b4a9df40c74739fc0fb8a21c88a477075 (patch)
tree: 735ac85bc69402ec931557010a6662ec2dbe584d /sql
parent: 8223ce6a81e4cc9fdf816892365fcdff4006c35e (diff)
download: spark-1a49496b4a9df40c74739fc0fb8a21c88a477075.tar.gz
spark-1a49496b4a9df40c74739fc0fb8a21c88a477075.tar.bz2
spark-1a49496b4a9df40c74739fc0fb8a21c88a477075.zip
1 files changed, 11 insertions, 0 deletions
diff --git a/sql/core/src/main/scala/org/apache/spark/sql/columnar/InMemoryColumnarTableScan.scala b/sql/core/src/main/scala/org/apache/spark/sql/columnar/InMemoryColumnarTableScan.scala
index 11d5943fb4..8944a32bc3 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/columnar/InMemoryColumnarTableScan.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/columnar/InMemoryColumnarTableScan.scala
@@ -119,6 +119,17 @@ private[sql] case class InMemoryRelation(
           var rowCount = 0
           while (rowIterator.hasNext && rowCount < batchSize) {
             val row = rowIterator.next()
+
+            // Added for SPARK-6082. This assertion can be useful for scenarios when something
+            // like Hive TRANSFORM is used. The external data generation script used in TRANSFORM
+            // may result malformed rows, causing ArrayIndexOutOfBoundsException, which is somewhat
+            // hard to decipher.
+            assert(
+              row.size == columnBuilders.size,
+              s"""Row column number mismatch, expected ${output.size} columns, but got ${row.size}.
+                 |Row content: $row
+               """.stripMargin)
+
             var i = 0
             while (i < row.length) {
               columnBuilders(i).appendFrom(row, i)
author	Cheng Lian <lian@databricks.com>	2015-03-02 16:18:00 -0800
committer	Michael Armbrust <michael@databricks.com>	2015-03-02 16:18:00 -0800
commit	1a49496b4a9df40c74739fc0fb8a21c88a477075 (patch)
tree	735ac85bc69402ec931557010a6662ec2dbe584d /sql
parent	8223ce6a81e4cc9fdf816892365fcdff4006c35e (diff)
download	spark-1a49496b4a9df40c74739fc0fb8a21c88a477075.tar.gz spark-1a49496b4a9df40c74739fc0fb8a21c88a477075.tar.bz2 spark-1a49496b4a9df40c74739fc0fb8a21c88a477075.zip