aboutsummaryrefslogtreecommitdiff
path: root/sql/core
diff options
context:
space:
mode:
authorHerman van Hovell <hvanhovell@databricks.com>2016-09-03 19:02:20 +0200
committerHerman van Hovell <hvanhovell@databricks.com>2016-09-03 19:02:20 +0200
commitc2a1576c230697f56f282b6388c79835377e0f2f (patch)
tree9d870d3a6366c728a9058d67c3bb1bb123859828 /sql/core
parenta8a35b39b92fc9000eaac102c67c66be30b05e54 (diff)
downloadspark-c2a1576c230697f56f282b6388c79835377e0f2f.tar.gz
spark-c2a1576c230697f56f282b6388c79835377e0f2f.tar.bz2
spark-c2a1576c230697f56f282b6388c79835377e0f2f.zip
[SPARK-17335][SQL] Fix ArrayType and MapType CatalogString.
## What changes were proposed in this pull request? the `catalogString` for `ArrayType` and `MapType` currently calls the `simpleString` method on its children. This is a problem when the child is a struct, the `struct.simpleString` implementation truncates the number of fields it shows (25 at max). This breaks the generation of a proper `catalogString`, and has shown to cause errors while writing to Hive. This PR fixes this by providing proper `catalogString` implementations for `ArrayData` or `MapData`. ## How was this patch tested? Added testing for `catalogString` to `DataTypeSuite`. Author: Herman van Hovell <hvanhovell@databricks.com> Closes #14938 from hvanhovell/SPARK-17335.
Diffstat (limited to 'sql/core')
-rw-r--r--sql/core/benchmarks/WideSchemaBenchmark-results.txt174
1 files changed, 99 insertions, 75 deletions
diff --git a/sql/core/benchmarks/WideSchemaBenchmark-results.txt b/sql/core/benchmarks/WideSchemaBenchmark-results.txt
index ea6a6616c2..0b9f791ac8 100644
--- a/sql/core/benchmarks/WideSchemaBenchmark-results.txt
+++ b/sql/core/benchmarks/WideSchemaBenchmark-results.txt
@@ -1,93 +1,117 @@
-OpenJDK 64-Bit Server VM 1.8.0_66-internal-b17 on Linux 4.2.0-36-generic
-Intel(R) Xeon(R) CPU E5-1650 v3 @ 3.50GHz
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_92-b14 on Mac OS X 10.11.6
+Intel(R) Core(TM) i7-4980HQ CPU @ 2.80GHz
+
parsing large select: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
-1 select expressions 3 / 5 0.0 2967064.0 1.0X
-100 select expressions 11 / 12 0.0 11369518.0 0.3X
-2500 select expressions 243 / 250 0.0 242561004.0 0.0X
+1 select expressions 2 / 4 0.0 2050147.0 1.0X
+100 select expressions 6 / 7 0.0 6123412.0 0.3X
+2500 select expressions 135 / 141 0.0 134623148.0 0.0X
+
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_92-b14 on Mac OS X 10.11.6
+Intel(R) Core(TM) i7-4980HQ CPU @ 2.80GHz
-OpenJDK 64-Bit Server VM 1.8.0_66-internal-b17 on Linux 4.2.0-36-generic
-Intel(R) Xeon(R) CPU E5-1650 v3 @ 3.50GHz
many column field r/w: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
-1 cols x 100000 rows (read in-mem) 28 / 40 3.6 278.8 1.0X
-1 cols x 100000 rows (exec in-mem) 28 / 42 3.5 284.0 1.0X
-1 cols x 100000 rows (read parquet) 23 / 35 4.4 228.8 1.2X
-1 cols x 100000 rows (write parquet) 163 / 182 0.6 1633.0 0.2X
-100 cols x 1000 rows (read in-mem) 27 / 39 3.7 266.9 1.0X
-100 cols x 1000 rows (exec in-mem) 48 / 79 2.1 481.7 0.6X
-100 cols x 1000 rows (read parquet) 25 / 36 3.9 254.3 1.1X
-100 cols x 1000 rows (write parquet) 182 / 196 0.5 1819.5 0.2X
-2500 cols x 40 rows (read in-mem) 280 / 315 0.4 2797.1 0.1X
-2500 cols x 40 rows (exec in-mem) 606 / 638 0.2 6064.3 0.0X
-2500 cols x 40 rows (read parquet) 836 / 843 0.1 8356.4 0.0X
-2500 cols x 40 rows (write parquet) 490 / 522 0.2 4900.6 0.1X
+1 cols x 100000 rows (read in-mem) 16 / 18 6.3 158.6 1.0X
+1 cols x 100000 rows (exec in-mem) 17 / 19 6.0 166.7 1.0X
+1 cols x 100000 rows (read parquet) 24 / 26 4.3 235.1 0.7X
+1 cols x 100000 rows (write parquet) 81 / 85 1.2 811.3 0.2X
+100 cols x 1000 rows (read in-mem) 17 / 19 6.0 166.2 1.0X
+100 cols x 1000 rows (exec in-mem) 25 / 27 4.0 249.2 0.6X
+100 cols x 1000 rows (read parquet) 23 / 25 4.4 226.0 0.7X
+100 cols x 1000 rows (write parquet) 83 / 87 1.2 831.0 0.2X
+2500 cols x 40 rows (read in-mem) 132 / 137 0.8 1322.9 0.1X
+2500 cols x 40 rows (exec in-mem) 326 / 330 0.3 3260.6 0.0X
+2500 cols x 40 rows (read parquet) 831 / 839 0.1 8305.8 0.0X
+2500 cols x 40 rows (write parquet) 237 / 245 0.4 2372.6 0.1X
+
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_92-b14 on Mac OS X 10.11.6
+Intel(R) Core(TM) i7-4980HQ CPU @ 2.80GHz
-OpenJDK 64-Bit Server VM 1.8.0_66-internal-b17 on Linux 4.2.0-36-generic
-Intel(R) Xeon(R) CPU E5-1650 v3 @ 3.50GHz
wide shallowly nested struct field r/w: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
-1 wide x 100000 rows (read in-mem) 22 / 35 4.6 216.0 1.0X
-1 wide x 100000 rows (exec in-mem) 40 / 63 2.5 400.6 0.5X
-1 wide x 100000 rows (read parquet) 93 / 134 1.1 933.9 0.2X
-1 wide x 100000 rows (write parquet) 133 / 174 0.7 1334.3 0.2X
-100 wide x 1000 rows (read in-mem) 22 / 44 4.5 223.3 1.0X
-100 wide x 1000 rows (exec in-mem) 88 / 138 1.1 878.6 0.2X
-100 wide x 1000 rows (read parquet) 117 / 186 0.9 1172.0 0.2X
-100 wide x 1000 rows (write parquet) 144 / 174 0.7 1441.6 0.1X
-2500 wide x 40 rows (read in-mem) 36 / 57 2.8 358.9 0.6X
-2500 wide x 40 rows (exec in-mem) 1466 / 1507 0.1 14656.6 0.0X
-2500 wide x 40 rows (read parquet) 690 / 802 0.1 6898.2 0.0X
-2500 wide x 40 rows (write parquet) 197 / 207 0.5 1970.9 0.1X
+1 wide x 100000 rows (read in-mem) 15 / 17 6.6 151.0 1.0X
+1 wide x 100000 rows (exec in-mem) 20 / 22 5.1 196.6 0.8X
+1 wide x 100000 rows (read parquet) 59 / 63 1.7 592.8 0.3X
+1 wide x 100000 rows (write parquet) 81 / 87 1.2 814.6 0.2X
+100 wide x 1000 rows (read in-mem) 21 / 25 4.8 208.7 0.7X
+100 wide x 1000 rows (exec in-mem) 72 / 81 1.4 718.5 0.2X
+100 wide x 1000 rows (read parquet) 75 / 85 1.3 752.6 0.2X
+100 wide x 1000 rows (write parquet) 88 / 95 1.1 876.7 0.2X
+2500 wide x 40 rows (read in-mem) 28 / 34 3.5 282.2 0.5X
+2500 wide x 40 rows (exec in-mem) 1269 / 1284 0.1 12688.1 0.0X
+2500 wide x 40 rows (read parquet) 549 / 578 0.2 5493.4 0.0X
+2500 wide x 40 rows (write parquet) 96 / 104 1.0 959.1 0.2X
+
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_92-b14 on Mac OS X 10.11.6
+Intel(R) Core(TM) i7-4980HQ CPU @ 2.80GHz
-OpenJDK 64-Bit Server VM 1.8.0_66-internal-b17 on Linux 4.2.0-36-generic
-Intel(R) Xeon(R) CPU E5-1650 v3 @ 3.50GHz
deeply nested struct field r/w: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
-1 deep x 100000 rows (read in-mem) 22 / 35 4.5 223.9 1.0X
-1 deep x 100000 rows (exec in-mem) 28 / 52 3.6 280.6 0.8X
-1 deep x 100000 rows (read parquet) 41 / 65 2.4 410.5 0.5X
-1 deep x 100000 rows (write parquet) 163 / 173 0.6 1634.5 0.1X
-100 deep x 1000 rows (read in-mem) 43 / 63 2.3 425.9 0.5X
-100 deep x 1000 rows (exec in-mem) 232 / 280 0.4 2321.7 0.1X
-100 deep x 1000 rows (read parquet) 1989 / 2281 0.1 19886.6 0.0X
-100 deep x 1000 rows (write parquet) 144 / 184 0.7 1442.6 0.2X
-250 deep x 400 rows (read in-mem) 68 / 95 1.5 680.9 0.3X
-250 deep x 400 rows (exec in-mem) 1310 / 1403 0.1 13096.4 0.0X
-250 deep x 400 rows (read parquet) 41477 / 41847 0.0 414766.8 0.0X
-250 deep x 400 rows (write parquet) 243 / 272 0.4 2433.1 0.1X
+1 deep x 100000 rows (read in-mem) 14 / 16 7.0 143.8 1.0X
+1 deep x 100000 rows (exec in-mem) 17 / 19 5.9 169.7 0.8X
+1 deep x 100000 rows (read parquet) 33 / 35 3.1 327.0 0.4X
+1 deep x 100000 rows (write parquet) 79 / 84 1.3 786.9 0.2X
+100 deep x 1000 rows (read in-mem) 21 / 24 4.7 211.3 0.7X
+100 deep x 1000 rows (exec in-mem) 221 / 235 0.5 2214.5 0.1X
+100 deep x 1000 rows (read parquet) 1928 / 1952 0.1 19277.1 0.0X
+100 deep x 1000 rows (write parquet) 91 / 96 1.1 909.5 0.2X
+250 deep x 400 rows (read in-mem) 57 / 61 1.8 567.1 0.3X
+250 deep x 400 rows (exec in-mem) 1329 / 1385 0.1 13291.8 0.0X
+250 deep x 400 rows (read parquet) 36563 / 36750 0.0 365630.2 0.0X
+250 deep x 400 rows (write parquet) 126 / 130 0.8 1262.0 0.1X
+
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_92-b14 on Mac OS X 10.11.6
+Intel(R) Core(TM) i7-4980HQ CPU @ 2.80GHz
-OpenJDK 64-Bit Server VM 1.8.0_66-internal-b17 on Linux 4.2.0-36-generic
-Intel(R) Xeon(R) CPU E5-1650 v3 @ 3.50GHz
bushy struct field r/w: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
-1 x 1 deep x 100000 rows (read in-mem) 23 / 36 4.4 229.8 1.0X
-1 x 1 deep x 100000 rows (exec in-mem) 27 / 48 3.7 269.6 0.9X
-1 x 1 deep x 100000 rows (read parquet) 25 / 33 4.0 247.5 0.9X
-1 x 1 deep x 100000 rows (write parquet) 82 / 134 1.2 821.1 0.3X
-128 x 8 deep x 1000 rows (read in-mem) 19 / 29 5.3 189.5 1.2X
-128 x 8 deep x 1000 rows (exec in-mem) 144 / 165 0.7 1440.4 0.2X
-128 x 8 deep x 1000 rows (read parquet) 117 / 159 0.9 1174.4 0.2X
-128 x 8 deep x 1000 rows (write parquet) 135 / 162 0.7 1349.0 0.2X
-1024 x 11 deep x 100 rows (read in-mem) 30 / 49 3.3 304.4 0.8X
-1024 x 11 deep x 100 rows (exec in-mem) 1146 / 1183 0.1 11457.6 0.0X
-1024 x 11 deep x 100 rows (read parquet) 712 / 758 0.1 7119.5 0.0X
-1024 x 11 deep x 100 rows (write parquet) 104 / 143 1.0 1037.3 0.2X
+1 x 1 deep x 100000 rows (read in-mem) 13 / 15 7.8 127.7 1.0X
+1 x 1 deep x 100000 rows (exec in-mem) 15 / 17 6.6 151.5 0.8X
+1 x 1 deep x 100000 rows (read parquet) 20 / 23 5.0 198.3 0.6X
+1 x 1 deep x 100000 rows (write parquet) 77 / 82 1.3 770.4 0.2X
+128 x 8 deep x 1000 rows (read in-mem) 12 / 14 8.2 122.5 1.0X
+128 x 8 deep x 1000 rows (exec in-mem) 124 / 140 0.8 1241.2 0.1X
+128 x 8 deep x 1000 rows (read parquet) 69 / 74 1.4 693.9 0.2X
+128 x 8 deep x 1000 rows (write parquet) 78 / 83 1.3 777.7 0.2X
+1024 x 11 deep x 100 rows (read in-mem) 25 / 29 4.1 246.1 0.5X
+1024 x 11 deep x 100 rows (exec in-mem) 1197 / 1223 0.1 11974.6 0.0X
+1024 x 11 deep x 100 rows (read parquet) 426 / 433 0.2 4263.7 0.0X
+1024 x 11 deep x 100 rows (write parquet) 91 / 98 1.1 913.5 0.1X
+
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_92-b14 on Mac OS X 10.11.6
+Intel(R) Core(TM) i7-4980HQ CPU @ 2.80GHz
-OpenJDK 64-Bit Server VM 1.8.0_66-internal-b17 on Linux 4.2.0-36-generic
-Intel(R) Xeon(R) CPU E5-1650 v3 @ 3.50GHz
wide array field r/w: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
-1 wide x 100000 rows (read in-mem) 18 / 31 5.6 179.3 1.0X
-1 wide x 100000 rows (exec in-mem) 31 / 47 3.2 310.2 0.6X
-1 wide x 100000 rows (read parquet) 45 / 73 2.2 445.1 0.4X
-1 wide x 100000 rows (write parquet) 109 / 140 0.9 1085.9 0.2X
-100 wide x 1000 rows (read in-mem) 17 / 25 5.8 172.7 1.0X
-100 wide x 1000 rows (exec in-mem) 18 / 22 5.4 184.6 1.0X
-100 wide x 1000 rows (read parquet) 26 / 42 3.8 261.8 0.7X
-100 wide x 1000 rows (write parquet) 150 / 164 0.7 1499.4 0.1X
-2500 wide x 40 rows (read in-mem) 19 / 31 5.1 194.7 0.9X
-2500 wide x 40 rows (exec in-mem) 19 / 24 5.3 188.5 1.0X
-2500 wide x 40 rows (read parquet) 33 / 47 3.0 334.4 0.5X
-2500 wide x 40 rows (write parquet) 153 / 164 0.7 1528.2 0.1X
+1 wide x 100000 rows (read in-mem) 14 / 16 7.0 143.2 1.0X
+1 wide x 100000 rows (exec in-mem) 17 / 19 5.9 170.9 0.8X
+1 wide x 100000 rows (read parquet) 43 / 46 2.3 434.1 0.3X
+1 wide x 100000 rows (write parquet) 78 / 83 1.3 777.6 0.2X
+100 wide x 1000 rows (read in-mem) 11 / 13 9.0 111.5 1.3X
+100 wide x 1000 rows (exec in-mem) 13 / 15 7.8 128.3 1.1X
+100 wide x 1000 rows (read parquet) 24 / 27 4.1 245.0 0.6X
+100 wide x 1000 rows (write parquet) 74 / 80 1.4 740.5 0.2X
+2500 wide x 40 rows (read in-mem) 11 / 13 9.1 109.5 1.3X
+2500 wide x 40 rows (exec in-mem) 13 / 15 7.7 129.4 1.1X
+2500 wide x 40 rows (read parquet) 24 / 26 4.1 241.3 0.6X
+2500 wide x 40 rows (write parquet) 75 / 81 1.3 751.8 0.2X
+
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_92-b14 on Mac OS X 10.11.6
+Intel(R) Core(TM) i7-4980HQ CPU @ 2.80GHz
+
+wide map field r/w: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
+------------------------------------------------------------------------------------------------
+1 wide x 100000 rows (read in-mem) 16 / 18 6.2 162.6 1.0X
+1 wide x 100000 rows (exec in-mem) 21 / 23 4.8 208.2 0.8X
+1 wide x 100000 rows (read parquet) 54 / 59 1.8 543.6 0.3X
+1 wide x 100000 rows (write parquet) 80 / 86 1.2 804.5 0.2X
+100 wide x 1000 rows (read in-mem) 11 / 13 8.7 114.5 1.4X
+100 wide x 1000 rows (exec in-mem) 14 / 16 7.0 143.5 1.1X
+100 wide x 1000 rows (read parquet) 30 / 32 3.3 300.4 0.5X
+100 wide x 1000 rows (write parquet) 75 / 80 1.3 749.9 0.2X
+2500 wide x 40 rows (read in-mem) 13 / 15 7.8 128.1 1.3X
+2500 wide x 40 rows (exec in-mem) 15 / 18 6.5 153.6 1.1X
+2500 wide x 40 rows (read parquet) 30 / 33 3.3 304.4 0.5X
+2500 wide x 40 rows (write parquet) 77 / 83 1.3 768.5 0.2X