spark - Mirror of Apache Spark

	Commit message (Collapse)	Author	Age	Files	Lines
*	[SPARK-15417][SQL][PYTHON] PySpark shell always uses in-memory catalog	Andrew Or	2016-05-19	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? There is no way to use the Hive catalog in `pyspark-shell`. This is because we used to create a `SparkContext` before calling `SparkSession.enableHiveSupport().getOrCreate()`, which just gets the existing `SparkContext` instead of creating a new one. As a result, `spark.sql.catalogImplementation` was never propagated. ## How was this patch tested? Manual. Author: Andrew Or <andrew@databricks.com> Closes #13203 from andrewor14/fix-pyspark-shell.
*	[SPARK-15075][SPARK-15345][SQL] Clean up SparkSession builder and propagate ↵	Reynold Xin	2016-05-19	1	-3/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	config options to existing sessions if specified ## What changes were proposed in this pull request? Currently SparkSession.Builder use SQLContext.getOrCreate. It should probably the the other way around, i.e. all the core logic goes in SparkSession, and SQLContext just calls that. This patch does that. This patch also makes sure config options specified in the builder are propagated to the existing (and of course the new) SparkSession. ## How was this patch tested? Updated tests to reflect the change, and also introduced a new SparkSessionBuilderSuite that should cover all the branches. Author: Reynold Xin <rxin@databricks.com> Closes #13200 from rxin/SPARK-15075.
*	[SPARK-15171][SQL] Remove the references to deprecated method ↵	Sean Zhong	2016-05-18	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	dataset.registerTempTable ## What changes were proposed in this pull request? Update the unit test code, examples, and documents to remove calls to deprecated method `dataset.registerTempTable`. ## How was this patch tested? This PR only changes the unit test code, examples, and comments. It should be safe. This is a follow up of PR https://github.com/apache/spark/pull/12945 which was merged. Author: Sean Zhong <seanzhong@databricks.com> Closes #13098 from clockfly/spark-15171-remove-deprecation.
*	[SPARK-15244] [PYTHON] Type of column name created with createDataFrame is ↵	Dongjoon Hyun	2016-05-17	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	not consistent. ## What changes were proposed in this pull request? createDataFrame returns inconsistent types for column names. ```python >>> from pyspark.sql.types import StructType, StructField, StringType >>> schema = StructType([StructField(u"col", StringType())]) >>> df1 = spark.createDataFrame([("a",)], schema) >>> df1.columns # "col" is str ['col'] >>> df2 = spark.createDataFrame([("a",)], [u"col"]) >>> df2.columns # "col" is unicode [u'col'] ``` The reason is only StructField has the following code. ``` if not isinstance(name, str): name = name.encode('utf-8') ``` This PR adds the same logic into createDataFrame for consistency. ``` if isinstance(schema, list): schema = [x.encode('utf-8') if not isinstance(x, str) else x for x in schema] ``` ## How was this patch tested? Pass the Jenkins test (with new python doctest) Author: Dongjoon Hyun <dongjoon@apache.org> Closes #13097 from dongjoon-hyun/SPARK-15244.
*	[SPARK-15171][SQL] Deprecate registerTempTable and add dataset.createTempView	Sean Zhong	2016-05-12	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? Deprecates registerTempTable and add dataset.createTempView, dataset.createOrReplaceTempView. ## How was this patch tested? Unit tests. Author: Sean Zhong <seanzhong@databricks.com> Closes #12945 from clockfly/spark-15171.
*	[SPARK-15072][SQL][PYSPARK] FollowUp: Remove SparkSession.withHiveSupport in ↵	Sandeep Singh	2016-05-11	1	-10/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	PySpark ## What changes were proposed in this pull request? This is a followup of https://github.com/apache/spark/pull/12851 Remove `SparkSession.withHiveSupport` in PySpark and instead use `SparkSession.builder. enableHiveSupport` ## How was this patch tested? Existing tests. Author: Sandeep Singh <sandeep@techaddict.me> Closes #13063 from techaddict/SPARK-15072-followup.
*	[SPARK-15126][SQL] RuntimeConfig.set should return Unit	Reynold Xin	2016-05-04	1	-3/+0
\| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? Currently we return RuntimeConfig itself to facilitate chaining. However, it makes the output in interactive environments (e.g. notebooks, scala repl) weird because it'd show the response of calling set as a RuntimeConfig itself. ## How was this patch tested? Updated unit tests. Author: Reynold Xin <rxin@databricks.com> Closes #12902 from rxin/SPARK-15126.
*	[SPARK-15084][PYTHON][SQL] Use builder pattern to create SparkSession in ↵	Dongjoon Hyun	2016-05-03	1	-1/+90
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	PySpark. ## What changes were proposed in this pull request? This is a python port of corresponding Scala builder pattern code. `sql.py` is modified as a target example case. ## How was this patch tested? Manual. Author: Dongjoon Hyun <dongjoon@apache.org> Closes #12860 from dongjoon-hyun/SPARK-15084.
*	[SPARK-15012][SQL] Simplify configuration API further	Andrew Or	2016-04-29	1	-29/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? 1. Remove all the `spark.setConf` etc. Just expose `spark.conf` 2. Make `spark.conf` take in things set in the core `SparkConf` as well, otherwise users may get confused This was done for both the Python and Scala APIs. ## How was this patch tested? `SQLConfSuite`, python tests. This one fixes the failed tests in #12787 Closes #12787 Author: Andrew Or <andrew@databricks.com> Author: Yin Huai <yhuai@databricks.com> Closes #12798 from yhuai/conf-api.
*	[SPARK-14988][PYTHON] SparkSession API follow-ups	Andrew Or	2016-04-29	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? Addresses comments in #12765. ## How was this patch tested? Python tests. Author: Andrew Or <andrew@databricks.com> Closes #12784 from andrewor14/python-followup.
*	[SPARK-14988][PYTHON] SparkSession catalog and conf API	Andrew Or	2016-04-29	1	-80/+59
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? The `catalog` and `conf` APIs were exposed in `SparkSession` in #12713 and #12669. This patch adds those to the python API. ## How was this patch tested? Python tests. Author: Andrew Or <andrew@databricks.com> Closes #12765 from andrewor14/python-spark-session-more.
*	[SPARK-14945][PYTHON] SparkSession Python API	Andrew Or	2016-04-28	1	-0/+525
	## What changes were proposed in this pull request? ``` Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /__ / .__/\_,_/_/ /_/\_\ version 2.0.0-SNAPSHOT /_/ Using Python version 2.7.5 (default, Mar 9 2014 22:15:05) SparkSession available as 'spark'. >>> spark <pyspark.sql.session.SparkSession object at 0x101f3bfd0> >>> spark.sql("SHOW TABLES").show() ... +---------+-----------+ \|tableName\|isTemporary\| +---------+-----------+ \| src\| false\| +---------+-----------+ >>> spark.range(1, 10, 2).show() +---+ \| id\| +---+ \| 1\| \| 3\| \| 5\| \| 7\| \| 9\| +---+ ``` Note: This API is NOT complete in its current state. In particular, for now I left out the `conf` and `catalog` APIs, which were added later in Scala. These will be added later before 2.0. ## How was this patch tested? Python tests. Author: Andrew Or <andrew@databricks.com> Closes #12746 from andrewor14/python-spark-session.