diff options
author | Dongjoon Hyun <dongjoon@apache.org> | 2016-05-24 10:08:14 -0700 |
---|---|---|
committer | Davies Liu <davies.liu@gmail.com> | 2016-05-24 10:08:14 -0700 |
commit | f8763b80ecd9968566018396c8cdc1851e7f8a46 (patch) | |
tree | 60834146260721ba357bca443a550514902b9542 /sql/core | |
parent | c24b6b679c3efa053f7de19be73eb36dc70d9930 (diff) | |
download | spark-f8763b80ecd9968566018396c8cdc1851e7f8a46.tar.gz spark-f8763b80ecd9968566018396c8cdc1851e7f8a46.tar.bz2 spark-f8763b80ecd9968566018396c8cdc1851e7f8a46.zip |
[SPARK-13135] [SQL] Don't print expressions recursively in generated code
## What changes were proposed in this pull request?
This PR is an up-to-date and a little bit improved version of #11019 of rxin for
- (1) preventing recursive printing of expressions in generated code.
Since the major function of this PR is indeed the above, he should be credited for the work he did. In addition to #11019, this PR improves the followings in code generation.
- (2) Improve multiline comment indentation.
- (3) Reduce the number of empty lines (mainly consecutive empty lines).
- (4) Remove all space characters on empty lines.
**Example**
```scala
spark.range(1, 1000).select('id+1+2+3, 'id+4+5+6)
```
**Before**
```
Generated code:
/* 001 */ public Object generate(Object[] references) {
...
/* 005 */ /**
/* 006 */ * Codegend pipeline for
/* 007 */ * Project [(((id#0L + 1) + 2) + 3) AS (((id + 1) + 2) + 3)#3L,(((id#0L + 4) + 5) + 6) AS (((id + 4) + 5) + 6)#4L]
/* 008 */ * +- Range 1, 1, 8, 999, [id#0L]
/* 009 */ */
...
/* 075 */ // PRODUCE: Project [(((id#0L + 1) + 2) + 3) AS (((id + 1) + 2) + 3)#3L,(((id#0L + 4) + 5) + 6) AS (((id + 4) + 5) + 6)#4L]
/* 076 */
/* 077 */ // PRODUCE: Range 1, 1, 8, 999, [id#0L]
/* 078 */
/* 079 */ // initialize Range
...
/* 092 */ // CONSUME: Project [(((id#0L + 1) + 2) + 3) AS (((id + 1) + 2) + 3)#3L,(((id#0L + 4) + 5) + 6) AS (((id + 4) + 5) + 6)#4L]
/* 093 */
/* 094 */ // CONSUME: WholeStageCodegen
/* 095 */
/* 096 */ // (((input[0, bigint, false] + 1) + 2) + 3)
/* 097 */ // ((input[0, bigint, false] + 1) + 2)
/* 098 */ // (input[0, bigint, false] + 1)
...
/* 107 */ // (((input[0, bigint, false] + 4) + 5) + 6)
/* 108 */ // ((input[0, bigint, false] + 4) + 5)
/* 109 */ // (input[0, bigint, false] + 4)
...
/* 126 */ }
```
**After**
```
Generated code:
/* 001 */ public Object generate(Object[] references) {
...
/* 005 */ /**
/* 006 */ * Codegend pipeline for
/* 007 */ * Project [(((id#0L + 1) + 2) + 3) AS (((id + 1) + 2) + 3)#3L,(((id#0L + 4) + 5) + 6) AS (((id + 4) + 5) + 6)#4L]
/* 008 */ * +- Range 1, 1, 8, 999, [id#0L]
/* 009 */ */
...
/* 075 */ // PRODUCE: Project [(((id#0L + 1) + 2) + 3) AS (((id + 1) + 2) + 3)#3L,(((id#0L + 4) + 5) + 6) AS (((id + 4) + 5) + 6)#4L]
/* 076 */ // PRODUCE: Range 1, 1, 8, 999, [id#0L]
/* 077 */ // initialize Range
...
/* 090 */ // CONSUME: Project [(((id#0L + 1) + 2) + 3) AS (((id + 1) + 2) + 3)#3L,(((id#0L + 4) + 5) + 6) AS (((id + 4) + 5) + 6)#4L]
/* 091 */ // CONSUME: WholeStageCodegen
/* 092 */ // (((input[0, bigint, false] + 1) + 2) + 3)
...
/* 101 */ // (((input[0, bigint, false] + 4) + 5) + 6)
...
/* 118 */ }
```
## How was this patch tested?
Pass the Jenkins tests and see the result of the following command manually.
```scala
scala> spark.range(1, 1000).select('id+1+2+3, 'id+4+5+6).queryExecution.debug.codegen()
```
Author: Dongjoon Hyun <dongjoonapache.org>
Author: Reynold Xin <rxindatabricks.com>
Author: Dongjoon Hyun <dongjoon@apache.org>
Closes #13192 from dongjoon-hyun/SPARK-13135.
Diffstat (limited to 'sql/core')
-rw-r--r-- | sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala | 4 | ||||
-rw-r--r-- | sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/GenerateColumnAccessor.scala | 3 |
2 files changed, 4 insertions, 3 deletions
diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala index 2a1ce735b7..908e22de73 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala @@ -333,8 +333,8 @@ case class WholeStageCodegenExec(child: SparkPlan) extends UnaryExecNode with Co """.trim // try to compile, helpful for debug - val cleanedSource = - new CodeAndComment(CodeFormatter.stripExtraNewLines(source), ctx.getPlaceHolderToComments()) + val cleanedSource = CodeFormatter.stripOverlappingComments( + new CodeAndComment(CodeFormatter.stripExtraNewLines(source), ctx.getPlaceHolderToComments())) logDebug(s"\n${CodeFormatter.format(cleanedSource)}") CodeGenerator.compile(cleanedSource) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/GenerateColumnAccessor.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/GenerateColumnAccessor.scala index e0b48119f6..1041bab9d5 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/GenerateColumnAccessor.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/GenerateColumnAccessor.scala @@ -224,7 +224,8 @@ object GenerateColumnAccessor extends CodeGenerator[Seq[DataType], ColumnarItera } }""" - val code = new CodeAndComment(codeBody, ctx.getPlaceHolderToComments()) + val code = CodeFormatter.stripOverlappingComments( + new CodeAndComment(codeBody, ctx.getPlaceHolderToComments())) logDebug(s"Generated ColumnarIterator:\n${CodeFormatter.format(code)}") CodeGenerator.compile(code).generate(Array.empty).asInstanceOf[ColumnarIterator] |