aboutsummaryrefslogtreecommitdiff
Commit message (Collapse)AuthorAgeFilesLines
* Disable flaky script transformation testReynold Xin2016-04-241-2/+2
|
* [SPARK-14548][SQL] Support not greater than and not less than operator in ↵jliwork2016-04-245-3/+15
| | | | | | | | | | | | | | | | | Spark SQL !< means not less than which is equivalent to >= !> means not greater than which is equivalent to <= I'd to create a PR to support these two operators. I've added new test cases in: DataFrameSuite, ExpressionParserSuite, JDBCSuite, PlanParserSuite, SQLQuerySuite dilipbiswal viirya gatorsmile Author: jliwork <jiali@us.ibm.com> Closes #12316 from jliwork/SPARK-14548.
* [SPARK-14691][SQL] Simplify and Unify Error Generation for Unsupported Alter ↵gatorsmile2016-04-245-138/+40
| | | | | | | | | | | | | | | | | | Table DDL #### What changes were proposed in this pull request? So far, we are capturing each unsupported Alter Table in separate visit functions. They should be unified and issue the same ParseException instead. This PR is to refactor the existing implementation and make error message consistent for Alter Table DDL. #### How was this patch tested? Updated the existing test cases and also added new test cases to ensure all the unsupported statements are covered. Author: gatorsmile <gatorsmile@gmail.com> Author: xiaoli <lixiao1983@gmail.com> Author: Xiao Li <xiaoli@Xiaos-MacBook-Pro.local> Closes #12459 from gatorsmile/cleanAlterTable.
* [DOCS][MINOR] Screenshot + minor fixes to improve reading for accumulatorsJacek Laskowski2016-04-242-6/+12
| | | | | | | | | | | | | | ## What changes were proposed in this pull request? Added screenshot + minor fixes to improve reading ## How was this patch tested? Manual Author: Jacek Laskowski <jacek@japila.pl> Closes #12569 from jaceklaskowski/docs-accumulators.
* [SPARK-13267][WEB UI] document the ?param arguments of the REST API; lift the…Steve Loughran2016-04-241-16/+51
| | | | | | | | | | | | | | Add to the REST API details on the ? args and examples from the test suite. I've used the existing table, adding all the fields to the second table. see [in the pr](https://github.com/steveloughran/spark/blob/history/SPARK-13267-doc-params/docs/monitoring.md). There's a slightly more sophisticated option: make the table 3 columns wide, and for all existing entries, have the initial `td` span 2 columns. The new entries would then have an empty 1st column, param in 2nd and text in 3rd, with any examples after a `br` entry. Author: Steve Loughran <stevel@hortonworks.com> Closes #11152 from steveloughran/history/SPARK-13267-doc-params.
* Support single argument version of sqlContext.getConfmathieu longtin2016-04-231-3/+17
| | | | | | | | | | | | | | | | | | | | | | ## What changes were proposed in this pull request? In Python, sqlContext.getConf didn't allow getting the system default (getConf with one parameter). Now the following are supported: ``` sqlContext.getConf(confName) # System default if not locally set, this is new sqlContext.getConf(confName, myDefault) # myDefault if not locally set, old behavior ``` I also added doctests to this function. The original behavior does not change. ## How was this patch tested? Manually, but doctests were added. Author: mathieu longtin <mathieu.longtin@nuance.com> Closes #12488 from mathieulongtin/pyfixgetconf3.
* [SPARK-14879][SQL] Move CreateMetastoreDataSource and ↵Yin Huai2016-04-239-451/+497
| | | | | | | | | | | | | | | CreateMetastoreDataSourceAsSelect to sql/core ## What changes were proposed in this pull request? CreateMetastoreDataSource and CreateMetastoreDataSourceAsSelect are not Hive-specific. So, this PR moves them from sql/hive to sql/core. Also, I am adding `Command` suffix to these two classes. ## How was this patch tested? Existing tests. Author: Yin Huai <yhuai@databricks.com> Closes #12645 from yhuai/moveCreateDataSource.
* [SPARK-14833][SQL][STREAMING][TEST] Refactor StreamTests to test for source ↵Tathagata Das2016-04-232-197/+233
| | | | | | | | | | | | | | | | | | | | | | | | | fault-tolerance correctly. ## What changes were proposed in this pull request? Current StreamTest allows testing of a streaming Dataset generated explicitly wraps a source. This is different from the actual production code path where the source object is dynamically created through a DataSource object every time a query is started. So all the fault-tolerance testing in FileSourceSuite and FileSourceStressSuite is not really testing the actual code path as they are just reusing the FileStreamSource object. This PR fixes StreamTest and the FileSource***Suite to test this correctly. Instead of maintaining a mapping of source --> expected offset in StreamTest (which requires reuse of source object), it now maintains a mapping of source index --> offset, so that it is independent of the source object. Summary of changes - StreamTest refactored to keep track of offset by source index instead of source - AddData, AddTextData and AddParquetData updated to find the FileStreamSource object from an active query, so that it can work with sources generated when query is started. - Refactored unit tests in FileSource***Suite to test using DataFrame/Dataset generated with public, rather than reusing the same FileStreamSource. This correctly tests fault tolerance. The refactoring changed a lot of indents in FileSourceSuite, so its recommended to hide whitespace changes with this - https://github.com/apache/spark/pull/12592/files?w=1 ## How was this patch tested? Refactored unit tests. Author: Tathagata Das <tathagata.das1565@gmail.com> Closes #12592 from tdas/SPARK-14833.
* [SPARK-14838] [SQL] Set default size for ObjecType to avoid failure when ↵Liang-Chi Hsieh2016-04-232-2/+24
| | | | | | | | | | | | | | | | estimating sizeInBytes in ObjectProducer ## What changes were proposed in this pull request? We have logical plans that produce domain objects which are `ObjectType`. As we can't estimate the size of `ObjectType`, we throw an `UnsupportedOperationException` if trying to do that. We should set a default size for `ObjectType` to avoid this failure. ## How was this patch tested? `DatasetSuite`. Author: Liang-Chi Hsieh <simonh@tw.ibm.com> Closes #12599 from viirya/skip-broadcast-objectproducer.
* [SPARK-12148][SPARKR] fix doc after renaming DataFrame to SparkDataFramefelixcheung2016-04-232-10/+11
| | | | | | | | | | | | | | | ## What changes were proposed in this pull request? Fixed inadvertent roxygen2 doc changes, added class name change to programming guide Follow up of #12621 ## How was this patch tested? manually checked Author: felixcheung <felixcheung_m@hotmail.com> Closes #12647 from felixcheung/rdataframe.
* [SPARK-14856] Correct message in assertion for 'returning batch for wide table'tedyu2016-04-231-1/+1
| | | | | | | | | | | | | | ## What changes were proposed in this pull request? There was a typo in the message for second assertion in "returning batch for wide table" test ## How was this patch tested? Existing tests. Author: tedyu <yuzhihong@gmail.com> Closes #12639 from tedyu/master.
* [MINOR] [SQL] Fix error message string in nullSafeEvel of TernaryExpressionDongjoon Hyun2016-04-231-1/+2
| | | | | | | | | | | | | | | | | | | ## What changes were proposed in this pull request? TernaryExpressions should thows proper error message for itself. ```scala protected def nullSafeEval(input1: Any, input2: Any, input3: Any): Any = - sys.error(s"BinaryExpressions must override either eval or nullSafeEval") + sys.error(s"TernaryExpressions must override either eval or nullSafeEval") ``` ## How was this patch tested? Manual. Author: Dongjoon Hyun <dongjoon@apache.org> Closes #12642 from dongjoon-hyun/minor_fix_error_msg_in_ternaryexpression.
* [SPARK-14877][SQL] Remove HiveMetastoreTypes classReynold Xin2016-04-238-60/+23
| | | | | | | | | | | | ## What changes were proposed in this pull request? It is unnecessary as DataType.catalogString largely replaces the need for this class. ## How was this patch tested? Mostly removing dead code and should be covered by existing tests. Author: Reynold Xin <rxin@databricks.com> Closes #12644 from rxin/SPARK-14877.
* [SPARK-14865][SQL] Better error handling for view creation.Reynold Xin2016-04-233-71/+100
| | | | | | | | | | | | | | | | | | | | | ## What changes were proposed in this pull request? This patch improves error handling in view creation. CreateViewCommand itself will analyze the view SQL query first, and if it cannot successfully analyze it, throw an AnalysisException. In addition, I also added the following two conservative guards for easier identification of Spark bugs: 1. If there is a bug and the generated view SQL cannot be analyzed, throw an exception at runtime. Note that this is not an AnalysisException because it is not caused by the user and more likely indicate a bug in Spark. 2. SQLBuilder when it gets an unresolved plan, it will also show the plan in the error message. I also took the chance to simplify the internal implementation of CreateViewCommand, and *removed* a fallback path that would've masked an exception from before. ## How was this patch tested? 1. Added a unit test for the user facing error handling. 2. Manually introduced some bugs in Spark to test the internal defensive error handling. 3. Also added a test case to test nested views (not super relevant). Author: Reynold Xin <rxin@databricks.com> Closes #12633 from rxin/SPARK-14865.
* [SPARK-14869][SQL] Don't mask exceptions in ResolveRelationsReynold Xin2016-04-2310-18/+26
| | | | | | | | | | | | ## What changes were proposed in this pull request? In order to support running SQL directly on files, we added some code in ResolveRelations to catch the exception thrown by catalog.lookupRelation and ignore it. This unfortunately masks all the exceptions. This patch changes the logic to simply test the table's existence. ## How was this patch tested? I manually hacked some bugs into Spark and made sure the exceptions were being propagated up. Author: Reynold Xin <rxin@databricks.com> Closes #12634 from rxin/SPARK-14869.
* [SPARK-14872][SQL] Restructure command packageReynold Xin2016-04-237-257/+317
| | | | | | | | | | | | | | ## What changes were proposed in this pull request? This patch restructures sql.execution.command package to break the commands into multiple files, in some logical organization: databases, tables, views, functions. I also renamed basicOperators.scala to basicLogicalOperators.scala and basicPhysicalOperators.scala. ## How was this patch tested? N/A - all I did was moving code around. Author: Reynold Xin <rxin@databricks.com> Closes #12636 from rxin/SPARK-14872.
* [SPARK-14871][SQL] Disable StatsReportListener to declutter outputReynold Xin2016-04-231-2/+0
| | | | | | | | | | | | ## What changes were proposed in this pull request? Spark SQL inherited from Shark to use the StatsReportListener. Unfortunately this clutters the spark-sql CLI output and makes it very difficult to read the actual query results. ## How was this patch tested? Built and tested in spark-sql CLI. Author: Reynold Xin <rxin@databricks.com> Closes #12635 from rxin/SPARK-14871.
* [HOTFIX] disable generated aggregate mapDavies Liu2016-04-231-1/+1
|
* Turn script transformation back on.Reynold Xin2016-04-231-2/+2
| | | | | | | | | | | | | | | | ## What changes were proposed in this pull request? (Please fill in changes proposed in this fix) ## How was this patch tested? (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) Author: Reynold Xin <rxin@databricks.com> Closes #12565 from rxin/test-flaky.
* [SPARK-14594][SPARKR] check execution return status codefelixcheung2016-04-231-0/+3
| | | | | | | | | | | | | | | | | | | | ## What changes were proposed in this pull request? When JVM backend fails without going proper error handling (eg. process crashed), the R error message could be ambiguous. ``` Error in if (returnStatus != 0) { : argument is of length zero ``` This change attempts to make it more clear (however, one would still need to investigate why JVM fails) ## How was this patch tested? manually Author: felixcheung <felixcheung_m@hotmail.com> Closes #12622 from felixcheung/rreturnstatus.
* Closes some open PRs that have been requested to close.Reynold Xin2016-04-230-0/+0
| | | | | | | | | | | | | | | Closes #7647 Closes #8195 Closes #8741 Closes #8972 Closes #9490 Closes #10419 Closes #10761 Closes #11003 Closes #11201 Closes #11803 Closes #12111 Closes #12442
* [SPARK-14873][CORE] Java sampleByKey methods take ju.Map but with Scala ↵Sean Owen2016-04-233-21/+23
| | | | | | | | | | | | | | | | Double values; results in type Object ## What changes were proposed in this pull request? Java `sampleByKey` methods should accept `Map` with `java.lang.Double` values ## How was this patch tested? Existing (updated) Jenkins tests Author: Sean Owen <sowen@cloudera.com> Closes #12637 from srowen/SPARK-14873.
* [SPARK-12148][SPARKR] SparkR: rename DataFrame to SparkDataFramefelixcheung2016-04-2314-468/+473
| | | | | | | | | | | | | | | | | | ## What changes were proposed in this pull request? Changed class name defined in R from "DataFrame" to "SparkDataFrame". A popular package, S4Vector already defines "DataFrame" - this change is to avoid conflict. Aside from class name and API/roxygen2 references, SparkR APIs like `createDataFrame`, `as.DataFrame` are not changed (S4Vector does not define a "as.DataFrame"). Since in R, one would rarely reference type/class, this change should have minimal/almost-no impact to a SparkR user in terms of back compat. ## How was this patch tested? SparkR tests, manually loading S4Vector then SparkR package Author: felixcheung <felixcheung_m@hotmail.com> Closes #12621 from felixcheung/rdataframe.
* [MINOR][ML][MLLIB] Remove unused importsZheng RuiFeng2016-04-2217-21/+13
| | | | | | | | | | | | ## What changes were proposed in this pull request? del unused imports in ML/MLLIB ## How was this patch tested? unit tests Author: Zheng RuiFeng <ruifengz@foxmail.com> Closes #12497 from zhengruifeng/del_unused_imports.
* [SPARK-14551][SQL] Reduce number of NameNode calls in OrcRelationRajesh Balamohan2016-04-222-11/+108
| | | | | | | | | | | | | | | | ## What changes were proposed in this pull request? When FileSourceStrategy is used, record reader is created which incurs a NN call internally. Later in OrcRelation.unwrapOrcStructs, it ends ups reading the file information to get the ObjectInspector. This incurs additional NN call. It would be good to avoid this additional NN call (specifically for partitioned datasets). Added OrcRecordReader which is very similar to OrcNewInputFormat.OrcRecordReader with an option of exposing the ObjectInspector. This eliminates the need to look up the file later for generating the object inspector. This would be specifically be useful for partitioned tables/datasets. ## How was this patch tested? Ran tpc-ds queries manually and also verified by running org.apache.spark.sql.hive.orc.OrcSuite,org.apache.spark.sql.hive.orc.OrcQuerySuite,org.apache.spark.sql.hive.orc.OrcPartitionDiscoverySuite,OrcPartitionDiscoverySuite.OrcHadoopFsRelationSuite,org.apache.spark.sql.hive.execution.HiveCompatibilitySuite …SourceStrategy mode Author: Rajesh Balamohan <rbalamohan@apache.org> Closes #12319 from rajeshbalamohan/SPARK-14551.
* [SPARK-14866][SQL] Break SQLQuerySuite out into smaller test suitesReynold Xin2016-04-224-512/+572
| | | | | | | | | | | | ## What changes were proposed in this pull request? This patch breaks SQLQuerySuite out into smaller test suites. It was a little bit too large for debugging. ## How was this patch tested? This is a test only change. Author: Reynold Xin <rxin@databricks.com> Closes #12630 from rxin/SPARK-14866.
* [SPARK-14863][SQL] Cache TreeNode's hashCode by defaultJosh Rosen2016-04-231-0/+5
| | | | | | | | Caching TreeNode's `hashCode` can lead to orders-of-magnitude performance improvement in certain optimizer rules when operating on huge/complex schemas. Author: Josh Rosen <joshrosen@databricks.com> Closes #12626 from JoshRosen/cache-treenode-hashcode.
* [SPARK-14856] [SQL] returning batch correctlyDavies Liu2016-04-223-10/+35
| | | | | | | | | | | | | | ## What changes were proposed in this pull request? Currently, the Parquet reader decide whether to return batch based on required schema or full schema, it's not consistent, this PR fix that. ## How was this patch tested? Added regression tests. Author: Davies Liu <davies@databricks.com> Closes #12619 from davies/fix_return_batch.
* [SPARK-14842][SQL] Implement view creation in sql/coreReynold Xin2016-04-229-182/+140
| | | | | | | | | | | | ## What changes were proposed in this pull request? This patch re-implements view creation command in sql/core, based on the pre-existing view creation command in the Hive module. This consolidates the view creation logical command and physical command into a single one, called CreateViewCommand. ## How was this patch tested? All the code should've been tested by existing tests. Author: Reynold Xin <rxin@databricks.com> Closes #12615 from rxin/SPARK-14842-2.
* [SPARK-14807] Create a compatibility moduleYin Huai2016-04-225-5/+68
| | | | | | | | | | | | | ## What changes were proposed in this pull request? This PR creates a compatibility module in sql (called `hive-1-x-compatibility`), which will host HiveContext in Spark 2.0 (moving HiveContext to here will be done separately). This module is not included in assembly because only users who still want to access HiveContext need it. ## How was this patch tested? I manually tested `sbt/sbt -Phive package` and `mvn -Phive package -DskipTests`. Author: Yin Huai <yhuai@databricks.com> Closes #12580 from yhuai/compatibility.
* [SPARK-14855][SQL] Add "Exec" suffix to physical operatorsReynold Xin2016-04-2277-436/+473
| | | | | | | | | | | | ## What changes were proposed in this pull request? This patch adds "Exec" suffix to all physical operators. Before this patch, Spark's physical operators and logical operators are named the same (e.g. Project could be logical.Project or execution.Project), which caused small issues in code review and bigger issues in code refactoring. ## How was this patch tested? N/A Author: Reynold Xin <rxin@databricks.com> Closes #12617 from rxin/exec-node.
* [SPARK-14832][SQL][STREAMING] Refactor DataSource to ensure schema is ↵Tathagata Das2016-04-224-49/+116
| | | | | | | | | | | | | | | | | | | | | inferred only once when creating a file stream ## What changes were proposed in this pull request? When creating a file stream using sqlContext.write.stream(), existing files are scanned twice for finding the schema - Once, when creating a DataSource + StreamingRelation in the DataFrameReader.stream() - Again, when creating streaming Source from the DataSource, in DataSource.createSource() Instead, the schema should be generated only once, at the time of creating the dataframe, and when the streaming source is created, it should just reuse that schema The solution proposed in this PR is to add a lazy field in DataSource that caches the schema. Then streaming Source created by the DataSource can just reuse the schema. ## How was this patch tested? Refactored unit tests. Author: Tathagata Das <tathagata.das1565@gmail.com> Closes #12591 from tdas/SPARK-14832.
* [SPARK-14582][SQL] increase parallelism for small tablesDavies Liu2016-04-222-1/+9
| | | | | | | | | | | | | | | | ## What changes were proposed in this pull request? This PR try to increase the parallelism for small table (a few of big files) to reduce the query time, by decrease the maxSplitBytes, the goal is to have at least one task per CPU in the cluster, if the total size of all files is bigger than openCostInBytes * 2 * nCPU. For example, a small/medium table could be used as dimension table in huge query, this will be useful to reduce the time waiting for broadcast. ## How was this patch tested? Existing tests. Author: Davies Liu <davies@databricks.com> Closes #12344 from davies/more_partition.
* [SPARK-14701][STREAMING] First stop the event loop, then stop the checkpoint ↵Liwei Lin2016-04-221-2/+2
| | | | | | | | | | | | | | | | | | | | | writer in JobGenerator Currently if we call `streamingContext.stop` (e.g. in a `StreamingListener.onBatchCompleted` callback) when a batch is about to complete, a `rejectedException` may get thrown from `checkPointWriter.executor`, since the `eventLoop` will try to process `DoCheckpoint` events even after the `checkPointWriter.executor` was stopped. Please see [SPARK-14701](https://issues.apache.org/jira/browse/SPARK-14701) for details and stack traces. ## What changes were proposed in this pull request? Reversed the stopping order of `event loop` and `checkpoint writer`. ## How was this patch tested? Existing test suits. (no dedicated test suits were added because the change is simple to reason about) Author: Liwei Lin <lwlin7@gmail.com> Closes #12489 from lw-lin/spark-14701.
* [SPARK-14796][SQL] Add spark.sql.optimizer.inSetConversionThreshold config ↵Dongjoon Hyun2016-04-225-7/+40
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | option. ## What changes were proposed in this pull request? Currently, `OptimizeIn` optimizer replaces `In` expression into `InSet` expression if the size of set is greater than a constant, 10. This issue aims to make a configuration `spark.sql.optimizer.inSetConversionThreshold` for that. After this PR, `OptimizerIn` is configurable. ```scala scala> sql("select a in (1,2,3) from (select explode(array(1,2)) a) T").explain() == Physical Plan == WholeStageCodegen : +- Project [a#7 IN (1,2,3) AS (a IN (1, 2, 3))#8] : +- INPUT +- Generate explode([1,2]), false, false, [a#7] +- Scan OneRowRelation[] scala> sqlContext.setConf("spark.sql.optimizer.inSetConversionThreshold", "2") scala> sql("select a in (1,2,3) from (select explode(array(1,2)) a) T").explain() == Physical Plan == WholeStageCodegen : +- Project [a#16 INSET (1,2,3) AS (a IN (1, 2, 3))#17] : +- INPUT +- Generate explode([1,2]), false, false, [a#16] +- Scan OneRowRelation[] ``` ## How was this patch tested? Pass the Jenkins tests (with a new testcase) Author: Dongjoon Hyun <dongjoon@apache.org> Closes #12562 from dongjoon-hyun/SPARK-14796.
* [SPARK-14669] [SQL] Fix some SQL metrics in codegen and added moreDavies Liu2016-04-2210-32/+110
| | | | | | | | | | | | | | | | | | ## What changes were proposed in this pull request? 1. Fix the "spill size" of TungstenAggregate and Sort 2. Rename "data size" to "peak memory" to match the actual meaning (also consistent with task metrics) 3. Added "data size" for ShuffleExchange and BroadcastExchange 4. Added some timing for Sort, Aggregate and BroadcastExchange (this requires another patch to work) ## How was this patch tested? Existing tests. ![metrics](https://cloud.githubusercontent.com/assets/40902/14573908/21ad2f00-030d-11e6-9e2c-c544f30039ea.png) Author: Davies Liu <davies@databricks.com> Closes #12425 from davies/fix_metrics.
* [SPARK-14791] [SQL] fix risk condition between broadcast and subqueryDavies Liu2016-04-224-15/+34
| | | | | | | | | | | | | | | | | | ## What changes were proposed in this pull request? SparkPlan.prepare() could be called in different threads (BroadcastExchange will call it in a thread pool), it only make sure that doPrepare() will only be called once, the second call to prepare() may return earlier before all the children had finished prepare(). Then some operator may call doProduce() before prepareSubqueries(), `null` will be used as the result of subquery, which is wrong. This cause TPCDS Q23B returns wrong answer sometimes. This PR added synchronization for prepare(), make sure all the children had finished prepare() before return. Also call prepare() in produce() (similar to execute()). Added checking for ScalarSubquery to make sure that the subquery has finished before using the result. ## How was this patch tested? Manually tested with Q23B, no wrong answer anymore. Author: Davies Liu <davies@databricks.com> Closes #12600 from davies/fix_risk.
* [SPARK-14763][SQL] fix subquery resolutionDavies Liu2016-04-227-49/+173
| | | | | | | | | | | | | | | | ## What changes were proposed in this pull request? Currently, a column could be resolved wrongly if there are columns from both outer table and subquery have the same name, we should only resolve the attributes that can't be resolved within subquery. They may have same exprId than other attributes in subquery, so we should create alias for them. Also, the column in IN subquery could have same exprId, we should create alias for them. ## How was this patch tested? Added regression tests. Manually tests TPCDS Q70 and Q95, work well after this patch. Author: Davies Liu <davies@databricks.com> Closes #12539 from davies/fix_subquery.
* [SPARK-14762] [SQL] TPCDS Q90 fails to parseHerman van Hovell2016-04-222-6/+51
| | | | | | | | | | | | | | | | | ### What changes were proposed in this pull request? TPCDS Q90 fails to parse because it uses a reserved keyword as an Identifier; `AT` was used as an alias for one of the subqueries. `AT` is not a reserved keyword and should have been registerd as a in the `nonReserved` rule. In order to prevent this from happening again I have added tests for all keywords that are non-reserved in Hive. See the `nonReserved`, `sql11ReservedKeywordsUsedAsCastFunctionName` & `sql11ReservedKeywordsUsedAsIdentifier` rules in https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/parse/IdentifiersParser.g. ### How was this patch tested? Added tests to for all Hive non reserved keywords to `TableIdentifierParserSuite`. cc davies Author: Herman van Hovell <hvanhovell@questtec.nl> Closes #12537 from hvanhovell/SPARK-14762.
* [SPARK-13178] RRDD faces with concurrency issue in case of rdd.zip(rdd).count().Sun Rui2016-04-221-2/+0
| | | | | | | | | | | | | ## What changes were proposed in this pull request? The concurrency issue reported in SPARK-13178 was fixed by the PR https://github.com/apache/spark/pull/10947 for SPARK-12792. This PR just removes a workaround not needed anymore. ## How was this patch tested? SparkR unit tests. Author: Sun Rui <rui.sun@intel.com> Closes #12606 from sun-rui/SPARK-13178.
* [SPARK-14841][SQL] Move SQLBuilder into sql/coreReynold Xin2016-04-228-19/+19
| | | | | | | | | | | | | ## What changes were proposed in this pull request? This patch moves SQLBuilder into sql/core so we can in the future move view generation also into sql/core. ## How was this patch tested? Also moved unit tests. Author: Reynold Xin <rxin@databricks.com> Author: Wenchen Fan <wenchen@databricks.com> Closes #12602 from rxin/SPARK-14841.
* [SPARK-14843][ML] Fix encoding error in LibSVMRelationLiang-Chi Hsieh2016-04-232-5/+13
| | | | | | | | | | | | | ## What changes were proposed in this pull request? We use `RowEncoder` in libsvm data source to serialize the label and features read from libsvm files. However, the schema passed in this encoder is not correct. As the result, we can't correctly select `features` column from the DataFrame. We should use full data schema instead of `requiredSchema` to serialize the data read in. Then do projection to select required columns later. ## How was this patch tested? `LibSVMRelationSuite`. Author: Liang-Chi Hsieh <simonh@tw.ibm.com> Closes #12611 from viirya/fix-libsvm.
* [SPARK-10001] Consolidate Signaling and SignalLogger.Reynold Xin2016-04-225-77/+58
| | | | | | | | | | | | | | | | ## What changes were proposed in this pull request? This is a follow-up to #12557, with the following changes: 1. Fixes some of the style issues. 2. Merges Signaling and SignalLogger into a new class called SignalUtils. It was pretty confusing to have Signaling and Signal in one file, and it was also confusing to have two classes named Signaling and one called the other. 3. Made logging registration idempotent. ## How was this patch tested? N/A. Author: Reynold Xin <rxin@databricks.com> Closes #12605 from rxin/SPARK-10001.
* [SPARK-13266] [SQL] None read/writer options were not transalated to "null"Liang-Chi Hsieh2016-04-223-4/+14
| | | | | | | | | | | | | | | | | ## What changes were proposed in this pull request? In Python, the `option` and `options` method of `DataFrameReader` and `DataFrameWriter` were sending the string "None" instead of `null` when passed `None`, therefore making it impossible to send an actual `null`. This fixes that problem. This is based on #11305 from mathieulongtin. ## How was this patch tested? Added test to readwriter.py. Author: Liang-Chi Hsieh <simonh@tw.ibm.com> Author: mathieu longtin <mathieu.longtin@nuance.com> Closes #12494 from viirya/py-df-none-option.
* [SPARK-14848][SQL] Compare as Set in DatasetSuite - Java encoderPete Robbins2016-04-221-2/+2
| | | | | | | | | | | | ## What changes were proposed in this pull request? Change test to compare sets rather than sequence ## How was this patch tested? Full test runs on little endian and big endian platforms Author: Pete Robbins <robbinspg@gmail.com> Closes #12610 from robbinspg/DatasetSuiteFix.
* [MINOR][DOC] Fix doc style in ml.ann.Layer and MultilayerPerceptronClassifierZheng RuiFeng2016-04-222-40/+40
| | | | | | | | | | | | | ## What changes were proposed in this pull request? 1, fix the indentation 2, add a missing param desc ## How was this patch tested? unit tests Author: Zheng RuiFeng <ruifengz@foxmail.com> Closes #12499 from zhengruifeng/fix_doc.
* [SPARK-6429] Implement hashCode and equals togetherJoan2016-04-2232-40/+136
| | | | | | | | | | | ## What changes were proposed in this pull request? Implement some `hashCode` and `equals` together in order to enable the scalastyle. This is a first batch, I will continue to implement them but I wanted to know your thoughts. Author: Joan <joan@goyeau.com> Closes #12157 from joan38/SPARK-6429-HashCode-Equals.
* [SPARK-14609][SQL] Native support for LOAD DATA DDL commandLiang-Chi Hsieh2016-04-2211-8/+427
| | | | | | | | | | | | | | ## What changes were proposed in this pull request? Add the native support for LOAD DATA DDL command that loads data into Hive table/partition. ## How was this patch tested? `HiveDDLCommandSuite` and `HiveQuerySuite`. Besides, few Hive tests (`WindowQuerySuite`, `HiveTableScanSuite` and `HiveSerDeSuite`) also use `LOAD DATA` command. Author: Liang-Chi Hsieh <simonh@tw.ibm.com> Closes #12412 from viirya/ddl-load-data.
* [SPARK-14826][SQL] Remove HiveQueryExecutionReynold Xin2016-04-2220-436/+420
| | | | | | | | | | | | ## What changes were proposed in this pull request? This patch removes HiveQueryExecution. As part of this, I consolidated all the describe commands into DescribeTableCommand. ## How was this patch tested? Should be covered by existing tests. Author: Reynold Xin <rxin@databricks.com> Closes #12588 from rxin/SPARK-14826.
* [SPARK-10001] [CORE] Interrupt tasks in repl with Ctrl+CJakob Odersky2016-04-215-28/+147
| | | | | | | | | | | | | | | | ## What changes were proposed in this pull request? Improve signal handling to allow interrupting running tasks from the REPL (with Ctrl+C). If no tasks are running or Ctrl+C is pressed twice, the signal is forwarded to the default handler resulting in the usual termination of the application. This PR is a rewrite of -- and therefore closes #8216 -- as per piaozhexiu's request ## How was this patch tested? Signal handling is not easily testable therefore no unit tests were added. Nevertheless, the new functionality is implemented in a best-effort approach, soft-failing in case signals aren't available on a specific OS. Author: Jakob Odersky <jakob@odersky.com> Closes #12557 from jodersky/SPARK-10001-sigint.