diff options
author | Cheng Lian <lian@databricks.com> | 2015-03-02 16:18:00 -0800 |
---|---|---|
committer | Michael Armbrust <michael@databricks.com> | 2015-03-02 16:18:00 -0800 |
commit | 1a49496b4a9df40c74739fc0fb8a21c88a477075 (patch) | |
tree | 735ac85bc69402ec931557010a6662ec2dbe584d /sql | |
parent | 8223ce6a81e4cc9fdf816892365fcdff4006c35e (diff) | |
download | spark-1a49496b4a9df40c74739fc0fb8a21c88a477075.tar.gz spark-1a49496b4a9df40c74739fc0fb8a21c88a477075.tar.bz2 spark-1a49496b4a9df40c74739fc0fb8a21c88a477075.zip |
[SPARK-6082] [SQL] Provides better error message for malformed rows when caching tables
Constructs like Hive `TRANSFORM` may generate malformed rows (via badly authored external scripts for example). I'm a bit hesitant to have this feature, since it introduces per-tuple cost when caching tables. However, considering caching tables is usually a one-time cost, this is probably worth having.
<!-- Reviewable:start -->
[<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/4842)
<!-- Reviewable:end -->
Author: Cheng Lian <lian@databricks.com>
Closes #4842 from liancheng/spark-6082 and squashes the following commits:
b05dbff [Cheng Lian] Provides better error message for malformed rows when caching tables
Diffstat (limited to 'sql')
-rw-r--r-- | sql/core/src/main/scala/org/apache/spark/sql/columnar/InMemoryColumnarTableScan.scala | 11 |
1 files changed, 11 insertions, 0 deletions
diff --git a/sql/core/src/main/scala/org/apache/spark/sql/columnar/InMemoryColumnarTableScan.scala b/sql/core/src/main/scala/org/apache/spark/sql/columnar/InMemoryColumnarTableScan.scala index 11d5943fb4..8944a32bc3 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/columnar/InMemoryColumnarTableScan.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/columnar/InMemoryColumnarTableScan.scala @@ -119,6 +119,17 @@ private[sql] case class InMemoryRelation( var rowCount = 0 while (rowIterator.hasNext && rowCount < batchSize) { val row = rowIterator.next() + + // Added for SPARK-6082. This assertion can be useful for scenarios when something + // like Hive TRANSFORM is used. The external data generation script used in TRANSFORM + // may result malformed rows, causing ArrayIndexOutOfBoundsException, which is somewhat + // hard to decipher. + assert( + row.size == columnBuilders.size, + s"""Row column number mismatch, expected ${output.size} columns, but got ${row.size}. + |Row content: $row + """.stripMargin) + var i = 0 while (i < row.length) { columnBuilders(i).appendFrom(row, i) |