spark - Mirror of Apache Spark

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	[SPARK-10710] Remove ability to disable spilling in core and SQL	Josh Rosen	2015-09-19	18	-234/+81
\| \| \| \| \| \| \| \| \| \|	It does not make much sense to set `spark.shuffle.spill` or `spark.sql.planner.externalSort` to false: I believe that these configurations were initially added as "escape hatches" to guard against bugs in the external operators, but these operators are now mature and well-tested. In addition, these configurations are not handled in a consistent way anymore: SQL's Tungsten codepath ignores these configurations and will continue to use spilling operators. Similarly, Spark Core's `tungsten-sort` shuffle manager does not respect `spark.shuffle.spill=false`. This pull request removes these configurations, adds warnings at the appropriate places, and deletes a large amount of code which was only used in code paths that did not support spilling. Author: Josh Rosen <joshrosen@databricks.com> Closes #8831 from JoshRosen/remove-ability-to-disable-spilling.
*	[SPARK-10155] [SQL] Change SqlParser to object to avoid memory leak	zsxwing	2015-09-19	9	-19/+19
\| \| \| \| \| \| \| \| \| \|	Since `scala.util.parsing.combinator.Parsers` is thread-safe since Scala 2.10 (See [SI-4929](https://issues.scala-lang.org/browse/SI-4929)), we can change SqlParser to object to avoid memory leak. I didn't change other subclasses of `scala.util.parsing.combinator.Parsers` because there is only one instance in one SQLContext, which should not be an issue. Author: zsxwing <zsxwing@gmail.com> Closes #8357 from zsxwing/sql-memory-leak.
*	Fixed links to the API	Alexis Seigneurin	2015-09-19	1	-4/+4
\| \| \| \| \| \| \| \|	Submitting this change on the master branch as requested in https://github.com/apache/spark/pull/8819#issuecomment-141505941 Author: Alexis Seigneurin <alexis.seigneurin@gmail.com> Closes #8838 from aseigneurin/patch-2.
*	[SPARK-10584] [SQL] [DOC] Documentation about the compatible Hive version is ↵	Kousuke Saruta	2015-09-19	1	-3/+5
\| \| \| \| \| \| \| \| \| \| \| \|	wrong. In Spark 1.5.0, Spark SQL is compatible with Hive 0.12.0 through 1.2.1 but the documentation is wrong. /CC yhuai Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp> Closes #8776 from sarutak/SPARK-10584-2.
*	[SPARK-10474] [SQL] Aggregation fails to allocate memory for pointer array	Andrew Or	2015-09-18	3	-5/+66
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When `TungstenAggregation` hits memory pressure, it switches from hash-based to sort-based aggregation in-place. However, in the process we try to allocate the pointer array for writing to the new `UnsafeExternalSorter` before actually freeing the memory from the hash map. This lead to the following exception: ``` java.io.IOException: Could not acquire 65536 bytes of memory at org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.initializeForWriting(UnsafeExternalSorter.java:169) at org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.spill(UnsafeExternalSorter.java:220) at org.apache.spark.sql.execution.UnsafeKVExternalSorter.<init>(UnsafeKVExternalSorter.java:126) at org.apache.spark.sql.execution.UnsafeFixedWidthAggregationMap.destructAndCreateExternalSorter(UnsafeFixedWidthAggregationMap.java:257) at org.apache.spark.sql.execution.aggregate.TungstenAggregationIterator.switchToSortBasedAggregation(TungstenAggregationIterator.scala:435) ``` Author: Andrew Or <andrew@databricks.com> Closes #8827 from andrewor14/allocate-pointer-array.
*	[SPARK-10623] [SQL] Fixes ORC predicate push-down	Cheng Lian	2015-09-18	2	-34/+52
\| \| \| \| \| \| \| \| \| \|	When pushing down a leaf predicate, ORC `SearchArgument` builder requires an extra "parent" predicate (any one among `AND`/`OR`/`NOT`) to wrap the leaf predicate. E.g., to push down `a < 1`, we must build `AND(a < 1)` instead. Fortunately, when actually constructing the `SearchArgument`, the builder will eliminate all those unnecessary wrappers. This PR is based on #8783 authored by zhzhan. I also took the chance to simply `OrcFilters` a little bit to improve readability. Author: Cheng Lian <lian@databricks.com> Closes #8799 from liancheng/spark-10623/fix-orc-ppd.
*	[MINOR] [ML] override toString of AttributeGroup	Eric Liang	2015-09-18	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \|	This makes equality test failures much more readable. mengxr Author: Eric Liang <ekl@databricks.com> Author: Eric Liang <ekhliang@gmail.com> Closes #8826 from ericl/attrgroupstr.
*	[SPARK-10611] Clone Configuration for each task for NewHadoopRDD	Mingyu Kim	2015-09-18	2	-8/+34
\| \| \| \| \| \| \| \|	This patch attempts to fix the Hadoop Configuration thread safety issue for NewHadoopRDD in the same way SPARK-2546 fixed the issue for HadoopRDD. Author: Mingyu Kim <mkim@palantir.com> Closes #8763 from mingyukim/mkim/SPARK-10611.
*	[SPARK-9808] Remove hash shuffle file consolidation.	Reynold Xin	2015-09-18	7	-301/+17
\| \| \| \| \| \|	Author: Reynold Xin <rxin@databricks.com> Closes #8812 from rxin/SPARK-9808-1.
*	[SPARK-10449] [SQL] Don't merge decimal types with incompatable precision or ↵	Holden Karau	2015-09-18	1	-4/+13
\| \| \| \| \| \| \| \| \| \|	scales From JIRA: Schema merging should only handle struct fields. But currently we also reconcile decimal precision and scale information. Author: Holden Karau <holden@pigscanfly.ca> Closes #8634 from holdenk/SPARK-10449-dont-merge-different-precision.
*	[SPARK-10539] [SQL] Project should not be pushed down through Intersect or ↵	Yijie Shen	2015-09-18	3	-30/+39
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Except #8742 Intersect and Except are both set operators and they use the all the columns to compare equality between rows. When pushing their Project parent down, the relations they based on would change, therefore not an equivalent transformation. JIRA: https://issues.apache.org/jira/browse/SPARK-10539 I added some comments based on the fix of https://github.com/apache/spark/pull/8742. Author: Yijie Shen <henry.yijieshen@gmail.com> Author: Yin Huai <yhuai@databricks.com> Closes #8823 from yhuai/fix_set_optimization.
*	[SPARK-10540] Fixes flaky all-data-type test	Cheng Lian	2015-09-18	1	-66/+43
\| \| \| \| \| \| \| \| \| \| \| \|	This PR breaks the original test case into multiple ones (one test case for each data type). In this way, test failure output can be much more readable. Within each test case, we build a table with two columns, one of them is for the data type to test, the other is an "index" column, which is used to sort the DataFrame and workaround [SPARK-10591] [1] [1]: https://issues.apache.org/jira/browse/SPARK-10591 Author: Cheng Lian <lian@databricks.com> Closes #8768 from liancheng/spark-10540/test-all-data-types.
*	[SPARK-10615] [PYSPARK] change assertEquals to assertEqual	Yanbo Liang	2015-09-18	4	-99/+99
\| \| \| \| \| \| \| \|	As ```assertEquals``` is deprecated, so we need to change ```assertEquals``` to ```assertEqual``` for existing python unit tests. Author: Yanbo Liang <ybliang8@gmail.com> Closes #8814 from yanboliang/spark-10615.
*	[SPARK-10451] [SQL] Prevent unnecessary serializations in ↵	Yash Datta	2015-09-18	1	-14/+21
\| \| \| \| \| \| \| \| \| \| \| \|	InMemoryColumnarTableScan Many of the fields in InMemoryColumnar scan and InMemoryRelation can be made transient. This reduces my 1000ms job to abt 700 ms . The task size reduces from 2.8 mb to ~1300kb Author: Yash Datta <Yash.Datta@guavus.com> Closes #8604 from saucam/serde.
*	[SPARK-10684] [SQL] StructType.interpretedOrdering need not to be serialized	navis.ryu	2015-09-18	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Kryo fails with buffer overflow even with max value (2G). {noformat} org.apache.spark.SparkException: Kryo serialization failed: Buffer overflow. Available: 0, required: 1 Serialization trace: containsChild (org.apache.spark.sql.catalyst.expressions.BoundReference) child (org.apache.spark.sql.catalyst.expressions.SortOrder) array (scala.collection.mutable.ArraySeq) ordering (org.apache.spark.sql.catalyst.expressions.InterpretedOrdering) interpretedOrdering (org.apache.spark.sql.types.StructType) schema (org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema). To avoid this, increase spark.kryoserializer.buffer.max value. at org.apache.spark.serializer.KryoSerializerInstance.serialize(KryoSerializer.scala:263) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:240) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) {noformat} Author: navis.ryu <navis@apache.org> Closes #8808 from navis/SPARK-10684.
*	Added <code> tag to documentation.	Reynold Xin	2015-09-17	1	-1/+1
\|
*	docs/running-on-mesos.md: state default values in default column	Felix Bechstein	2015-09-17	1	-4/+4
\| \| \| \| \| \| \| \|	This PR simply uses the default value column for defaults. Author: Felix Bechstein <felix.bechstein@otto.de> Closes #8810 from felixb/fix_mesos_doc.
*	[SPARK-9522] [SQL] SparkSubmit process can not exit if kill application when ↵	linweizhong	2015-09-17	2	-1/+7
\| \| \| \| \| \| \| \| \| \|	HiveThriftServer was starting When we start HiveThriftServer, we will start SparkContext first, then start HiveServer2, if we kill application while HiveServer2 is starting then SparkContext will stop successfully, but SparkSubmit process can not exit. Author: linweizhong <linweizhong@huawei.com> Closes #7853 from Sephiroth-Lin/SPARK-9522.
*	[SPARK-10682] [GRAPHX] Remove Bagel test suites.	Reynold Xin	2015-09-17	2	-140/+0
\| \| \| \| \| \| \| \| \| \|	Bagel has been deprecated and we haven't done any changes to it. There is no need to run those tests. This should speed up tests by 1 min. Author: Reynold Xin <rxin@databricks.com> Closes #8807 from rxin/SPARK-10682.
*	[SPARK-8518] [ML] Log-linear models for survival analysis	Yanbo Liang	2015-09-17	2	-0/+760
\| \| \| \| \| \| \| \| \|	[Accelerated Failure Time (AFT) model](https://en.wikipedia.org/wiki/Accelerated_failure_time_model) is the most commonly used and easy to parallel method of survival analysis for censored survival data. It is the log-linear model based on the Weibull distribution of the survival time. Users can refer to the R function [```survreg```](https://stat.ethz.ch/R-manual/R-devel/library/survival/html/survreg.html) to compare the model and [```predict```](https://stat.ethz.ch/R-manual/R-devel/library/survival/html/predict.survreg.html) to compare the prediction. There are different kinds of model prediction, I have just select the type ```response``` which is default used for R. Author: Yanbo Liang <ybliang8@gmail.com> Closes #8611 from yanboliang/spark-8518.
*	[SPARK-10674] [TESTS] Increase timeouts in SaslIntegrationSuite.	Marcelo Vanzin	2015-09-17	1	-5/+10
\| \| \| \| \| \| \| \| \| \| \| \|	1s seems to trigger too many times on the jenkins build boxes, so increase the timeout and cross fingers. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #8802 from vanzin/SPARK-10674 and squashes the following commits: 3c93117 [Marcelo Vanzin] Use java 7 syntax. d667d1b [Marcelo Vanzin] [SPARK-10674] [tests] Increase timeouts in SaslIntegrationSuite.
*	[SPARK-9698] [ML] Add RInteraction transformer for supporting R-style ↵	Eric Liang	2015-09-17	2	-0/+443
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	feature interactions This is a pre-req for supporting the ":" operator in the RFormula feature transformer. Design doc from umbrella task: https://docs.google.com/document/d/10NZNSEurN2EdWM31uFYsgayIPfCFHiuIu3pCWrUmP_c/edit mengxr Author: Eric Liang <ekl@databricks.com> Closes #7987 from ericl/interaction.
*	[SPARK-10657] Remove SCP-based Jenkins log archiving	Josh Rosen	2015-09-17	1	-35/+0
\| \| \| \| \| \| \| \| \| \|	As of https://issues.apache.org/jira/browse/SPARK-7561, we no longer need to use our custom SCP-based mechanism for archiving Jenkins logs on the master machine; this has been superseded by the use of a Jenkins plugin which archives the logs and provides public links to view them. Per shaneknapp, we should remove this log syncing mechanism if it is no longer necessary; removing the need to SCP from the Jenkins workers to the masters is a desired step as part of some larger Jenkins infra refactoring. Author: Josh Rosen <joshrosen@databricks.com> Closes #8793 from JoshRosen/remove-jenkins-ssh-to-master.
*	[SPARK-10394] [ML] Make GBTParams use shared stepSize	Yanbo Liang	2015-09-17	1	-15/+13
\| \| \| \| \| \| \| \| \|	```GBTParams``` has ```stepSize``` as learning rate currently. ML has shared param class ```HasStepSize```, ```GBTParams``` can extend from it rather than duplicated implementation. Author: Yanbo Liang <ybliang8@gmail.com> Closes #8552 from yanboliang/spark-10394.
*	[SPARK-10639] [SQL] Need to convert UDAF's result from scala to sql type	Yin Huai	2015-09-17	6	-12/+188
\| \| \| \| \| \| \| \|	https://issues.apache.org/jira/browse/SPARK-10639 Author: Yin Huai <yhuai@databricks.com> Closes #8788 from yhuai/udafConversion.
*	[SPARK-10650] Clean before building docs	Michael Armbrust	2015-09-17	1	-2/+5
\| \| \| \| \| \| \| \|	The [published docs for 1.5.0](http://spark.apache.org/docs/1.5.0/api/java/org/apache/spark/streaming/) have a bunch of test classes in them. The only way I can reproduce this is to `test:compile` before running `unidoc`. To prevent this from happening again, I've added a clean before doc generation. Author: Michael Armbrust <michael@databricks.com> Closes #8787 from marmbrus/testsInDocs.
*	[SPARK-10531] [CORE] AppId is set as AppName in status rest api	Jeff Zhang	2015-09-17	5	-12/+13
\| \| \| \| \| \| \| \|	Verify it manually. Author: Jeff Zhang <zjffdu@apache.org> Closes #8688 from zjffdu/SPARK-10531.
*	[SPARK-10172] [CORE] disable sort in HistoryServer webUI	Josiah Samuel	2015-09-17	1	-1/+4
\| \| \| \| \| \| \| \| \| \| \|	This pull request is to address the JIRA SPARK-10172 (History Server web UI gets messed up when sorting on any column). The content of the table gets messed up due to the rowspan attribute of the table data(cell) during sorting. The current table sort library used in SparkUI (sorttable.js) doesn't support/handle cells(td) with rowspans. The fix will disable the table sort in the web UI, when there are jobs listed with multiple attempts. Author: Josiah Samuel <josiah_sams@in.ibm.com> Closes #8506 from josiahsams/SPARK-10172.
*	[SPARK-10642] [PYSPARK] Fix crash when calling rdd.lookup() on tuple keys	Liang-Chi Hsieh	2015-09-17	1	-1/+4
\| \| \| \| \| \| \| \| \| \|	JIRA: https://issues.apache.org/jira/browse/SPARK-10642 When calling `rdd.lookup()` on a RDD with tuple keys, `portable_hash` will return a long. That causes `DAGScheduler.submitJob` to throw `java.lang.ClassCastException: java.lang.Long cannot be cast to java.lang.Integer`. Author: Liang-Chi Hsieh <viirya@appier.com> Closes #8796 from viirya/fix-pyrdd-lookup.
*	[SPARK-10660] Doc describe error in the "Running Spark on YARN" page	yangping.wu	2015-09-17	1	-2/+2
\| \| \| \| \| \| \| \|	In the Configuration section, the spark.yarn.driver.memoryOverhead and spark.yarn.am.memoryOverhead‘s default value should be "driverMemory * 0.10, with minimum of 384" and "AM memory * 0.10, with minimum of 384" respectively. Because from Spark 1.4.0, the MEMORY_OVERHEAD_FACTOR is set to 0.1.0, not 0.07. Author: yangping.wu <wyphao.2007@163.com> Closes #8797 from 397090770/SparkOnYarnDocError.
*	[SPARK-10459] [SQL] Do not need to have ConvertToSafe for PythonUDF	Liang-Chi Hsieh	2015-09-17	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \|	JIRA: https://issues.apache.org/jira/browse/SPARK-10459 As mentioned in the JIRA, `PythonUDF` actually could process `UnsafeRow`. Specially, the rows in `childResults` in `BatchPythonEvaluation` will be projected to a `MutableRow`. So I think we can enable `canProcessUnsafeRows` for `BatchPythonEvaluation` and get rid of redundant `ConvertToSafe`. Author: Liang-Chi Hsieh <viirya@appier.com> Closes #8616 from viirya/pyudf-unsafe.
*	[SPARK-10077] [DOCS] [ML] Add package info for java of ml/feature	Holden Karau	2015-09-17	1	-0/+108
\| \| \| \| \| \| \| \| \|	Should be the same as SPARK-7808 but use Java for the code example. It would be great to add package doc for `spark.ml.feature`. Author: Holden Karau <holden@pigscanfly.ca> Closes #8740 from holdenk/SPARK-10077-JAVA-PACKAGE-DOC-FOR-SPARK.ML.FEATURE.
*	[SPARK-10282] [ML] [PYSPARK] [DOCS] Add @since annotation to ↵	Yu ISHIKAWA	2015-09-17	1	-0/+28
\| \| \| \| \| \| \| \|	pyspark.ml.recommendation Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com> Closes #8692 from yu-iskw/SPARK-10282.
*	[SPARK-10274] [MLLIB] Add @since annotation to pyspark.mllib.fpm	Yu ISHIKAWA	2015-09-17	1	-1/+9
\| \| \| \| \| \|	Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com> Closes #8665 from yu-iskw/SPARK-10274.
*	[SPARK-10279] [MLLIB] [PYSPARK] [DOCS] Add @since annotation to ↵	Yu ISHIKAWA	2015-09-17	1	-2/+26
\| \| \| \| \| \| \| \|	pyspark.mllib.util Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com> Closes #8689 from yu-iskw/SPARK-10279.
*	[SPARK-10278] [MLLIB] [PYSPARK] Add @since annotation to pyspark.mllib.tree	Yu ISHIKAWA	2015-09-17	1	-1/+35
\| \| \| \| \| \|	Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com> Closes #8685 from yu-iskw/SPARK-10278.
*	[SPARK-10281] [ML] [PYSPARK] [DOCS] Add @since annotation to ↵	Yu ISHIKAWA	2015-09-17	1	-0/+13
\| \| \| \| \| \| \| \|	pyspark.ml.clustering Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com> Closes #8691 from yu-iskw/SPARK-10281.
*	[SPARK-10283] [ML] [PYSPARK] [DOCS] Add @since annotation to ↵	Yu ISHIKAWA	2015-09-17	1	-0/+65
\| \| \| \| \| \| \| \|	pyspark.ml.regression Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com> Closes #8693 from yu-iskw/SPARK-10283.
*	[SPARK-10284] [ML] [PYSPARK] [DOCS] Add @since annotation to pyspark.ml.tuning	Yu ISHIKAWA	2015-09-17	1	-0/+28
\| \| \| \| \| \|	Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com> Closes #8694 from yu-iskw/SPARK-10284.
*	[MINOR] [CORE] Fixes minor variable name typo	Cheng Lian	2015-09-17	1	-2/+2
\| \| \| \| \| \|	Author: Cheng Lian <lian@databricks.com> Closes #8784 from liancheng/typo-fix.
*	Tiny style fix for d39f15ea2b8bed5342d2f8e3c1936f915c470783.	Reynold Xin	2015-09-16	1	-1/+1
\|
*	[SPARK-9794] [SQL] Fix datetime parsing in SparkSQL.	Kevin Cox	2015-09-16	2	-17/+42
\| \| \| \| \| \| \| \| \| \|	This fixes https://issues.apache.org/jira/browse/SPARK-9794 by using a real ISO8601 parser. (courtesy of the xml component of the standard java library) cc: angelini Author: Kevin Cox <kevincox@kevincox.ca> Closes #8396 from kevincox/kevincox-sql-time-parsing.
*	[SPARK-10050] [SPARKR] Support collecting data of MapType in DataFrame.	Sun Rui	2015-09-16	6	-23/+123
\| \| \| \| \| \| \| \| \|	1. Support collecting data of MapType from DataFrame. 2. Support data of MapType in createDataFrame. Author: Sun Rui <rui.sun@intel.com> Closes #8711 from sun-rui/SPARK-10050.
*	[SPARK-10589] [WEBUI] Add defense against external site framing	Sean Owen	2015-09-16	5	-11/+24
\| \| \| \| \| \| \| \|	Set `X-Frame-Options: SAMEORIGIN` to protect against frame-related vulnerability Author: Sean Owen <sowen@cloudera.com> Closes #8745 from srowen/SPARK-10589.
*	[SPARK-10276] [MLLIB] [PYSPARK] Add @since annotation to ↵	Yu ISHIKAWA	2015-09-16	1	-1/+35
\| \| \| \| \| \| \| \|	pyspark.mllib.recommendation Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com> Closes #8677 from yu-iskw/SPARK-10276.
*	[SPARK-10511] [BUILD] Reset git repository before packaging source distro	Luciano Resende	2015-09-16	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	The calculation of Spark version is downloading Scala and Zinc in the build directory which is inflating the size of the source distribution. Reseting the repo before packaging the source distribution fix this issue. Author: Luciano Resende <lresende@apache.org> Closes #8774 from lresende/spark-10511.
*	[SPARK-10516] [ MLLIB] Added values property in DenseVector	Vinod K C	2015-09-15	1	-0/+4
\| \| \| \| \| \|	Author: Vinod K C <vinod.kc@huawei.com> Closes #8682 from vinodkc/fix_SPARK-10516.
*	[SPARK-10595] [ML] [MLLIB] [DOCS] Various ML guide cleanups	Joseph K. Bradley	2015-09-15	5	-35/+91
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Various ML guide cleanups. * ml-guide.md: Make it easier to access the algorithm-specific guides. * LDA user guide: EM often begins with useless topics, but running longer generally improves them dramatically. E.g., 10 iterations on a Wikipedia dataset produces useless topics, but 50 iterations produces very meaningful topics. * mllib-feature-extraction.html#elementwiseproduct: “w” parameter should be “scalingVec” * Clean up Binarizer user guide a little. * Document in Pipeline that users should not put an instance into the Pipeline in more than 1 place. * spark.ml Word2Vec user guide: clean up grammar/writing * Chi Sq Feature Selector docs: Improve text in doc. CC: mengxr feynmanliang Author: Joseph K. Bradley <joseph@databricks.com> Closes #8752 from jkbradley/mlguide-fixes-1.5.
*	[SPARK-9078] [SQL] Allow jdbc dialects to override the query used to check ↵	sureshthalamati	2015-09-15	4	-4/+41
\| \| \| \| \| \| \| \| \| \| \| \|	the table. Current implementation uses query with a LIMIT clause to find if table already exists. This syntax works only in some database systems. This patch changes the default query to the one that is likely to work on most databases, and adds a new method to the JdbcDialect abstract class to allow dialects to override the default query. I looked at using the JDBC meta data calls, it turns out there is no common way to find the current schema, catalog..etc. There is a new method Connection.getSchema() , but that is available only starting jdk1.7 , and existing jdbc drivers may not have implemented it. Other option was to use jdbc escape syntax clause for LIMIT, not sure on how well this supported in all the databases also. After looking at all the jdbc metadata options my conclusion was most common way is to use the simple select query with 'where 1 =0' , and allow dialects to customize as needed Author: sureshthalamati <suresh.thalamati@gmail.com> Closes #8676 from sureshthalamati/table_exists_spark-9078.
*	[SPARK-10613] [SPARK-10624] [SQL] Reduce LocalNode tests dependency on ↵	Andrew Or	2015-09-15	17	-636/+468
\| \| \| \| \| \| \| \| \| \| \| \|	SQLContext Instead of relying on `DataFrames` to verify our answers, we can just use simple arrays. This significantly simplifies the test logic for `LocalNode`s and reduces a lot of code duplicated from `SparkPlanTest`. This also fixes an additional issue [SPARK-10624](https://issues.apache.org/jira/browse/SPARK-10624) where the output of `TakeOrderedAndProjectNode` is not actually ordered. Author: Andrew Or <andrew@databricks.com> Closes #8764 from andrewor14/sql-local-tests-cleanup.