aboutsummaryrefslogtreecommitdiff
path: root/sql
Commit message (Collapse)AuthorAgeFilesLines
...
* [SPARK-10196] [SQL] Correctly saving decimals in internal rows to JSON.Yin Huai2015-08-242-1/+28
| | | | | | | | https://issues.apache.org/jira/browse/SPARK-10196 Author: Yin Huai <yhuai@databricks.com> Closes #8408 from yhuai/DecimalJsonSPARK-10196.
* [SPARK-10178] [SQL] HiveComparisionTest should print out dependent tablesMichael Armbrust2015-08-241-0/+36
| | | | | | | | In `HiveComparisionTest`s it is possible to fail a query of the form `SELECT * FROM dest1`, where `dest1` is the query that is actually computing the incorrect results. To aid debugging this patch improves the harness to also print these query plans and their results. Author: Michael Armbrust <michael@databricks.com> Closes #8388 from marmbrus/generatedTables.
* [SPARK-10121] [SQL] Thrift server always use the latest class loader ↵Yin Huai2015-08-252-0/+60
| | | | | | | | | | | | provided by the conf of executionHive's state https://issues.apache.org/jira/browse/SPARK-10121 Looks like the problem is that if we add a jar through another thread, the thread handling the JDBC session will not get the latest classloader. Author: Yin Huai <yhuai@databricks.com> Closes #8368 from yhuai/SPARK-10121.
* [SQL] [MINOR] [DOC] Clarify docs for inferring DataFrame from RDD of ProductsFeynman Liang2015-08-242-2/+2
| | | | | | | | | * Makes `SQLImplicits.rddToDataFrameHolder` scaladoc consistent with `SQLContext.createDataFrame[A <: Product](rdd: RDD[A])` since the former is essentially a wrapper for the latter * Clarifies `createDataFrame[A <: Product]` scaladoc to apply for any `RDD[Product]`, not just case classes Author: Feynman Liang <fliang@databricks.com> Closes #8406 from feynmanliang/sql-doc-fixes.
* [SPARK-10165] [SQL] Await child resolution in ResolveFunctionsMichael Armbrust2015-08-242-44/+77
| | | | | | | | | | Currently, we eagerly attempt to resolve functions, even before their children are resolved. However, this is not valid in cases where we need to know the types of the input arguments (i.e. when resolving Hive UDFs). As a fix, this PR delays function resolution until the functions children are resolved. This change also necessitates a change to the way we resolve aggregate expressions that are not in aggregate operators (e.g., in `HAVING` or `ORDER BY` clauses). Specifically, we can't assume that these misplaced functions will be resolved, allowing us to differentiate aggregate functions from normal functions. To compensate for this change we now attempt to resolve these unresolved expressions in the context of the aggregate operator, before checking to see if any aggregate expressions are present. Author: Michael Armbrust <michael@databricks.com> Closes #8371 from marmbrus/hiveUDFResolution.
* [SPARK-10190] Fix NPE in CatalystTypeConverters Decimal toScala converterJosh Rosen2015-08-242-2/+7
| | | | | | | | This adds a missing null check to the Decimal `toScala` converter in `CatalystTypeConverters`, fixing an NPE. Author: Josh Rosen <joshrosen@databricks.com> Closes #8401 from JoshRosen/SPARK-10190.
* [SPARK-9758] [TEST] [SQL] Compilation issue for hive test / wrong package?Sean Owen2015-08-2410-9/+6
| | | | | | | | | | | Move `test.org.apache.spark.sql.hive` package tests to apparent intended `org.apache.spark.sql.hive` as they don't intend to test behavior from outside org.apache.spark.* Alternate take, per discussion at https://github.com/apache/spark/pull/8051 I think this is what vanzin and I had in mind but also CC rxin to cross-check, as this does indeed depend on whether these tests were accidentally in this package or not. Testing from a `test.org.apache.spark` package is legitimate but didn't seem to be the intent here. Author: Sean Owen <sowen@cloudera.com> Closes #8307 from srowen/SPARK-9758.
* [SPARK-8580] [SQL] Refactors ParquetHiveCompatibilitySuite and adds more ↵Cheng Lian2015-08-241-39/+93
| | | | | | | | | | | | test cases This PR refactors `ParquetHiveCompatibilitySuite` so that it's easier to add new test cases. Hit two bugs, SPARK-10177 and HIVE-11625, while working on this, added test cases for them and marked as ignored for now. SPARK-10177 will be addressed in a separate PR. Author: Cheng Lian <lian@databricks.com> Closes #8392 from liancheng/spark-8580/parquet-hive-compat-tests.
* [SPARK-7710] [SPARK-7998] [DOCS] Docs for DataFrameStatFunctionsBurak Yavuz2015-08-242-1/+102
| | | | | | | | | | This PR contains examples on how to use some of the Stat Functions available for DataFrames under `df.stat`. rxin Author: Burak Yavuz <brkyvz@gmail.com> Closes #8378 from brkyvz/update-sql-docs.
* [SPARK-9401] [SQL] Fully implement code generation for ConcatWsYijie Shen2015-08-221-3/+39
| | | | | | | | | | | | This PR adds full codegen support for ConcatWs, is a substitute of #7782 JIRA: https://issues.apache.org/jira/browse/SPARK-9401 cc davies Author: Yijie Shen <henry.yijieshen@gmail.com> Closes #8353 from yjshen/concatws.
* [SPARK-10143] [SQL] Use parquet's block size (row group size) setting as the ↵Yin Huai2015-08-211-2/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | | | min split size if necessary. https://issues.apache.org/jira/browse/SPARK-10143 With this PR, we will set min split size to parquet's block size (row group size) set in the conf if the min split size is smaller. So, we can avoid have too many tasks and even useless tasks for reading parquet data. I tested it locally. The table I have has 343MB and it is in my local FS. Because I did not set any min/max split size, the default split size was 32MB and the map stage had 11 tasks. But there were only three tasks that actually read data. With my PR, there were only three tasks in the map stage. Here is the difference. Without this PR: ![image](https://cloud.githubusercontent.com/assets/2072857/9399179/8587dba6-4765-11e5-9189-7ebba52a2b6d.png) With this PR: ![image](https://cloud.githubusercontent.com/assets/2072857/9399185/a4735d74-4765-11e5-8848-1f1e361a6b4b.png) Even if the block size setting does match the actual block size of parquet file, I think it is still generally good to use parquet's block size setting if min split size is smaller than this block size. Tested it on a cluster using ``` val count = sqlContext.table("""store_sales""").groupBy().count().queryExecution.executedPlan(3).execute().count ``` Basically, it reads 0 column of table `store_sales`. My table has 1824 parquet files with size from 80MB to 280MB (1 to 3 row group sizes). Without this patch, in a 16 worker cluster, the job had 5023 tasks and spent 102s. With this patch, the job had 2893 tasks and spent 64s. It is still not as good as using one mapper per file (1824 tasks and 42s), but it is much better than our master. Author: Yin Huai <yhuai@databricks.com> Closes #8346 from yhuai/parquetMinSplit.
* [SPARK-10130] [SQL] type coercion for IF should have children resolved firstDaoyuan Wang2015-08-212-0/+8
| | | | | | | | Type coercion for IF should have children resolved first, or we could meet unresolved exception. Author: Daoyuan Wang <daoyuan.wang@intel.com> Closes #8331 from adrian-wang/spark10130.
* [SPARK-10040] [SQL] Use batch insert for JDBC writingLiang-Chi Hsieh2015-08-211-3/+14
| | | | | | | | | | JIRA: https://issues.apache.org/jira/browse/SPARK-10040 We should use batch insert instead of single row in JDBC. Author: Liang-Chi Hsieh <viirya@appier.com> Closes #8273 from viirya/jdbc-insert-batch.
* [SPARK-9400] [SQL] codegen for StringLocateTarek Auel2015-08-201-1/+27
| | | | | | | | | | | This is based on #7779 , thanks to tarekauel . Fix the conflict and nullability. Closes #7779 and #8274 . Author: Tarek Auel <tarek.auel@googlemail.com> Author: Davies Liu <davies@databricks.com> Closes #8330 from davies/stringLocate.
* [SQL] [MINOR] remove unnecessary classWenchen Fan2015-08-201-64/+0
| | | | | | | | This class is identical to `org.apache.spark.sql.execution.datasources.jdbc. DefaultSource` and is not needed. Author: Wenchen Fan <cloud0fan@outlook.com> Closes #8334 from cloud-fan/minor.
* [SPARK-10136] [SQL] Fixes Parquet support for Avro array of primitive arrayCheng Lian2015-08-2013-844/+1718
| | | | | | | | I caught SPARK-10136 while adding more test cases to `ParquetAvroCompatibilitySuite`. Actual bug fix code lies in `CatalystRowConverter.scala`. Author: Cheng Lian <lian@databricks.com> Closes #8341 from liancheng/spark-10136/parquet-avro-nested-primitive-array.
* [SPARK-10100] [SQL] Eliminate hash table lookup if there is no grouping key ↵Reynold Xin2015-08-202-10/+22
| | | | | | | | | | in aggregation. This improves performance by ~ 20 - 30% in one of my local test and should fix the performance regression from 1.4 to 1.5 on ss_max. Author: Reynold Xin <rxin@databricks.com> Closes #8332 from rxin/SPARK-10100.
* [SPARK-10092] [SQL] Multi-DB support follow up.Yin Huai2015-08-2016-94/+398
| | | | | | | | | | | | | | | | | https://issues.apache.org/jira/browse/SPARK-10092 This pr is a follow-up one for Multi-DB support. It has the following changes: * `HiveContext.refreshTable` now accepts `dbName.tableName`. * `HiveContext.analyze` now accepts `dbName.tableName`. * `CreateTableUsing`, `CreateTableUsingAsSelect`, `CreateTempTableUsing`, `CreateTempTableUsingAsSelect`, `CreateMetastoreDataSource`, and `CreateMetastoreDataSourceAsSelect` all take `TableIdentifier` instead of the string representation of table name. * When you call `saveAsTable` with a specified database, the data will be saved to the correct location. * Explicitly do not allow users to create a temporary with a specified database name (users cannot do it before). * When we save table to metastore, we also check if db name and table name can be accepted by hive (using `MetaStoreUtils.validateName`). Author: Yin Huai <yhuai@databricks.com> Closes #8324 from yhuai/saveAsTableDB.
* [SPARK-9242] [SQL] Audit UDAF interface.Reynold Xin2015-08-1918-349/+386
| | | | | | | | | | | | | | | | | | | A few minor changes: 1. Improved documentation 2. Rename apply(distinct....) to distinct. 3. Changed MutableAggregationBuffer from a trait to an abstract class. 4. Renamed returnDataType to dataType to be more consistent with other expressions. And unrelated to UDAFs: 1. Renamed file names in expressions to use suffix "Expressions" to be more consistent. 2. Moved regexp related expressions out to its own file. 3. Renamed StringComparison => StringPredicate. Author: Reynold Xin <rxin@databricks.com> Closes #8321 from rxin/SPARK-9242.
* [SPARK-10035] [SQL] Parquet filters does not process EqualNullSafe filter.hyukjinkwon2015-08-202-139/+37
| | | | | | | | | | | | | | | | | | | As I talked with Lian, 1. I added EquelNullSafe to ParquetFilters - It uses the same equality comparison filter with EqualTo since the Parquet filter performs actually null-safe equality comparison. 2. Updated the test code (ParquetFilterSuite) - Convert catalyst.Expression to sources.Filter - Removed Cast since only Literal is picked up as a proper Filter in DataSourceStrategy - Added EquelNullSafe comparison 3. Removed deprecated createFilter for catalyst.Expression Author: hyukjinkwon <gurwls223@gmail.com> Author: 권혁진 <gurwls223@gmail.com> Closes #8275 from HyukjinKwon/master.
* [SPARK-6489] [SQL] add column pruning for GenerateWenchen Fan2015-08-193-2/+100
| | | | | | | | This PR takes over https://github.com/apache/spark/pull/5358 Author: Wenchen Fan <cloud0fan@outlook.com> Closes #8268 from cloud-fan/6489.
* [SPARK-10083] [SQL] CaseWhen should support type coercion of DecimalType and ↵Daoyuan Wang2015-08-192-2/+13
| | | | | | | | | | | | FractionalType create t1 (a decimal(7, 2), b long); select case when 1=1 then a else 1.0 end from t1; select case when 1=1 then a else b end from t1; Author: Daoyuan Wang <daoyuan.wang@intel.com> Closes #8270 from adrian-wang/casewhenfractional.
* [SPARK-9899] [SQL] Disables customized output committer when speculation is onCheng Lian2015-08-192-1/+49
| | | | | | | | | | | | Speculation hates direct output committer, as there are multiple corner cases that may cause data corruption and/or data loss. Please see this [PR comment] [1] for more details. [1]: https://github.com/apache/spark/pull/8191#issuecomment-131598385 Author: Cheng Lian <lian@databricks.com> Closes #8317 from liancheng/spark-9899/speculation-hates-direct-output-committer.
* [SPARK-10090] [SQL] fix decimal scale of divisionDavies Liu2015-08-196-31/+157
| | | | | | | | We should rounding the result of multiply/division of decimal to expected precision/scale, also check overflow. Author: Davies Liu <davies@databricks.com> Closes #8287 from davies/decimal_division.
* [SPARK-9627] [SQL] Stops using Scala runtime reflection in DictionaryEncodingCheng Lian2015-08-192-12/+4
| | | | | | | | | | `DictionaryEncoding` uses Scala runtime reflection to avoid boxing costs while building the directory array. However, this code path may hit [SI-6240] [1] and throw exception. [1]: https://issues.scala-lang.org/browse/SI-6240 Author: Cheng Lian <lian@databricks.com> Closes #8306 from liancheng/spark-9627/in-memory-cache-scala-reflection.
* [SPARK-10073] [SQL] Python withColumn should replace the old columnDavies Liu2015-08-191-1/+2
| | | | | | | | | | DataFrame.withColumn in Python should be consistent with the Scala one (replacing the existing column that has the same name). cc marmbrus Author: Davies Liu <davies@databricks.com> Closes #8300 from davies/with_column.
* [SPARK-10107] [SQL] fix NPE in format_numberDavies Liu2015-08-192-3/+3
| | | | | | Author: Davies Liu <davies@databricks.com> Closes #8305 from davies/format_number.
* [SPARK-10093] [SPARK-10096] [SQL] Avoid transformation on executors & fix ↵Reynold Xin2015-08-184-7/+68
| | | | | | | | | | | | | | | | | | UDFs on complex types This is kind of a weird case, but given a sufficiently complex query plan (in this case a TungstenProject with an Exchange underneath), we could have NPEs on the executors due to the time when we were calling transformAllExpressions In general we should ensure that all transformations occur on the driver and not on the executors. Some reasons for avoid executor side transformations include: * (this case) Some operator constructors require state such as access to the Spark/SQL conf so doing a makeCopy on the executor can fail. * (unrelated reason for avoid executor transformations) ExprIds are calculated using an atomic integer, so you can violate their uniqueness constraint by constructing them anywhere other than the driver. This subsumes #8285. Author: Reynold Xin <rxin@databricks.com> Author: Michael Armbrust <michael@databricks.com> Closes #8295 from rxin/SPARK-10096.
* [SPARK-10095] [SQL] use public API of BigIntegerDavies Liu2015-08-182-27/+11
| | | | | | | | | | | | In UnsafeRow, we use the private field of BigInteger for better performance, but it actually didn't contribute much (3% in one benchmark) to end-to-end runtime, and make it not portable (may fail on other JVM implementations). So we should use the public API instead. cc rxin Author: Davies Liu <davies@databricks.com> Closes #8286 from davies/portable_decimal.
* [SPARK-9939] [SQL] Resorts to Java process API in CliSuite, ↵Cheng Lian2015-08-195-91/+149
| | | | | | | | | | | | | | HiveSparkSubmitSuite and HiveThriftServer2 test suites Scala process API has a known bug ([SI-8768] [1]), which may be the reason why several test suites which fork sub-processes are flaky. This PR replaces Scala process API with Java process API in `CliSuite`, `HiveSparkSubmitSuite`, and `HiveThriftServer2` related test suites to see whether it fix these flaky tests. [1]: https://issues.scala-lang.org/browse/SI-8768 Author: Cheng Lian <lian@databricks.com> Closes #8168 from liancheng/spark-9939/use-java-process-api.
* [SPARK-10088] [SQL] Add support for "stored as avro" in HiveQL parser.Marcelo Vanzin2015-08-182-10/+13
| | | | | | Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #8282 from vanzin/SPARK-10088.
* [SPARK-10089] [SQL] Add missing golden files.Marcelo Vanzin2015-08-182-0/+503
| | | | | | Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #8283 from vanzin/SPARK-10089.
* [SPARK-10080] [SQL] Fix binary incompatibility for $ column interpolationMichael Armbrust2015-08-183-11/+22
| | | | | | | | Turns out that inner classes of inner objects are referenced directly, and thus moving it will break binary compatibility. Author: Michael Armbrust <michael@databricks.com> Closes #8281 from marmbrus/binaryCompat.
* [SPARK-8118] [SQL] Redirects Parquet JUL logger via SLF4JCheng Lian2015-08-184-43/+45
| | | | | | | | | | Parquet hard coded a JUL logger which always writes to stdout. This PR redirects it via SLF4j JUL bridge handler, so that we can control Parquet logs via `log4j.properties`. This solution is inspired by https://github.com/Parquet/parquet-mr/issues/390#issuecomment-46064909. Author: Cheng Lian <lian@databricks.com> Closes #8196 from liancheng/spark-8118/redirect-parquet-jul.
* [SPARK-10038] [SQL] fix bug in generated unsafe projection when there is ↵Davies Liu2015-08-172-4/+29
| | | | | | | | | | | | binary in ArrayData The type for array of array in Java is slightly different than array of others. cc cloud-fan Author: Davies Liu <davies@databricks.com> Closes #8250 from davies/array_binary.
* [MINOR] Format the comment of `translate` at `functions.scala`Yu ISHIKAWA2015-08-171-8/+9
| | | | | | Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com> Closes #8265 from yu-iskw/minor-translate-comment.
* [SPARK-9592] [SQL] Fix Last function implemented based on AggregateExpression1.Yin Huai2015-08-172-2/+22
| | | | | | | | | | | | | https://issues.apache.org/jira/browse/SPARK-9592 #8113 has the fundamental fix. But, if we want to minimize the number of changed lines, we can go with this one. Then, in 1.6, we merge #8113. Author: Yin Huai <yhuai@databricks.com> Closes #8172 from yhuai/lastFix and squashes the following commits: b28c42a [Yin Huai] Regression test. af87086 [Yin Huai] Fix last.
* [SPARK-9526] [SQL] Utilize randomized tests to reveal potential bugs in sql ↵Yijie Shen2015-08-1710-6/+410
| | | | | | | | | | | | expressions JIRA: https://issues.apache.org/jira/browse/SPARK-9526 This PR is a follow up of #7830, aiming at utilizing randomized tests to reveal more potential bugs in sql expression. Author: Yijie Shen <henry.yijieshen@gmail.com> Closes #7855 from yjshen/property_check.
* [SPARK-10036] [SQL] Load JDBC driver in DataFrameReader.jdbc and ↵zsxwing2015-08-174-7/+20
| | | | | | | | | | | | | DataFrameWriter.jdbc This PR uses `JDBCRDD.getConnector` to load JDBC driver before creating connection in `DataFrameReader.jdbc` and `DataFrameWriter.jdbc`. Author: zsxwing <zsxwing@gmail.com> Closes #8232 from zsxwing/SPARK-10036 and squashes the following commits: adf75de [zsxwing] Add extraOptions to the connection properties 57f59d4 [zsxwing] Load JDBC driver in DataFrameReader.jdbc and DataFrameWriter.jdbc
* [SPARK-9950] [SQL] Wrong Analysis Error for grouping/aggregating on struct ↵Wenchen Fan2015-08-171-0/+5
| | | | | | | | | | | | | fields This issue has been fixed by https://github.com/apache/spark/pull/8215, this PR added regression test for it. Author: Wenchen Fan <cloud0fan@outlook.com> Closes #8222 from cloud-fan/minor and squashes the following commits: 0bbfb1c [Wenchen Fan] fix style... 7e2d8d9 [Wenchen Fan] add test
* [SPARK-7837] [SQL] Avoids double closing output writers when commitTask() failsCheng Lian2015-08-182-6/+61
| | | | | | | | When inserting data into a `HadoopFsRelation`, if `commitTask()` of the writer container fails, `abortTask()` will be invoked. However, both `commitTask()` and `abortTask()` try to close the output writer(s). The problem is that, closing underlying writers may not be an idempotent operation. E.g., `ParquetRecordWriter.close()` throws NPE when called twice. Author: Cheng Lian <lian@databricks.com> Closes #8236 from liancheng/spark-7837/double-closing.
* [SPARK-10005] [SQL] Fixes schema merging for nested structsCheng Lian2015-08-164-22/+112
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In case of schema merging, we only handled first level fields when converting Parquet groups to `InternalRow`s. Nested struct fields are not properly handled. For example, the schema of a Parquet file to be read can be: ``` message individual { required group f1 { optional binary f11 (utf8); } } ``` while the global schema is: ``` message global { required group f1 { optional binary f11 (utf8); optional int32 f12; } } ``` This PR fixes this issue by padding missing fields when creating actual converters. Author: Cheng Lian <lian@databricks.com> Closes #8228 from liancheng/spark-10005/nested-schema-merging.
* [SPARK-9973] [SQL] Correct in-memory columnar buffer sizeKun Xu2015-08-161-2/+1
| | | | | | | | | | | The `initialSize` argument of `ColumnBuilder.initialize()` should be the number of rows rather than bytes. However `InMemoryColumnarTableScan` passes in a byte size, which makes Spark SQL allocate more memory than necessary when building in-memory columnar buffers. Author: Kun Xu <viper_kun@163.com> Closes #8189 from viper-kun/errorSize.
* [SPARK-9955] [SQL] correct error message for aggregateWenchen Fan2015-08-153-7/+12
| | | | | | | | | | | We should skip unresolved `LogicalPlan`s for `PullOutNondeterministic`, as calling `output` on unresolved `LogicalPlan` will produce confusing error message. Author: Wenchen Fan <cloud0fan@outlook.com> Closes #8203 from cloud-fan/error-msg and squashes the following commits: 1c67ca7 [Wenchen Fan] move test 7593080 [Wenchen Fan] correct error message for aggregate
* [SPARK-9984] [SQL] Create local physical operator interface.Reynold Xin2015-08-144-0/+224
| | | | | | | | | | | | This pull request creates a new operator interface that is more similar to traditional database query iterators (with open/close/next/get). These local operators are not currently used anywhere, but will become the basis for SPARK-9983 (local physical operators for query execution). cc zsxwing Author: Reynold Xin <rxin@databricks.com> Closes #8212 from rxin/SPARK-9984.
* [SPARK-8887] [SQL] Explicit define which data types can be used as dynamic ↵Yijie Shen2015-08-145-4/+41
| | | | | | | | | | | | partition columns This PR enforce dynamic partition column data type requirements by adding analysis rules. JIRA: https://issues.apache.org/jira/browse/SPARK-8887 Author: Yijie Shen <henry.yijieshen@gmail.com> Closes #8201 from yjshen/dynamic_partition_columns.
* [SPARK-9634] [SPARK-9323] [SQL] cleanup unnecessary Aliases in LogicalPlan ↵Wenchen Fan2015-08-149-24/+120
| | | | | | | | | | | | | at the end of analysis Also alias the ExtractValue instead of wrapping it with UnresolvedAlias when resolve attribute in LogicalPlan, as this alias will be trimmed if it's unnecessary. Based on #7957 without the changes to mllib, but instead maintaining earlier behavior when using `withColumn` on expressions that already have metadata. Author: Wenchen Fan <cloud0fan@outlook.com> Author: Michael Armbrust <michael@databricks.com> Closes #8215 from marmbrus/pr/7957.
* [HOTFIX] fix duplicated bracesDavies Liu2015-08-143-3/+3
| | | | | | Author: Davies Liu <davies@databricks.com> Closes #8219 from davies/fix_typo.
* [SPARK-9949] [SQL] Fix TakeOrderedAndProject's output.Yin Huai2015-08-142-4/+28
| | | | | | | | https://issues.apache.org/jira/browse/SPARK-9949 Author: Yin Huai <yhuai@databricks.com> Closes #8179 from yhuai/SPARK-9949.
* [SPARK-8670] [SQL] Nested columns can't be referenced in pysparkWenchen Fan2015-08-141-0/+2
| | | | | | | | This bug is caused by a wrong column-exist-check in `__getitem__` of pyspark dataframe. `DataFrame.apply` accepts not only top level column names, but also nested column name like `a.b`, so we should remove that check from `__getitem__`. Author: Wenchen Fan <cloud0fan@outlook.com> Closes #8202 from cloud-fan/nested.