spark - Mirror of Apache Spark

	Commit message (Collapse)	Author	Age	Files	Lines
*	[MINOR][PYSPARK][DOC] Fix wrongly formatted examples in PySpark documentation	hyukjinkwon	2016-07-06	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? This PR fixes wrongly formatted examples in PySpark documentation as below: - `SparkSession` - Before ![2016-07-06 11 34 41](https://cloud.githubusercontent.com/assets/6477701/16605847/ae939526-436d-11e6-8ab8-6ad578362425.png) - After ![2016-07-06 11 33 56](https://cloud.githubusercontent.com/assets/6477701/16605845/ace9ee78-436d-11e6-8923-b76d4fc3e7c3.png) - `Builder` - Before ![2016-07-06 11 34 44](https://cloud.githubusercontent.com/assets/6477701/16605844/aba60dbc-436d-11e6-990a-c87bc0281c6b.png) - After ![2016-07-06 1 26 37](https://cloud.githubusercontent.com/assets/6477701/16607562/586704c0-437d-11e6-9483-e0af93d8f74e.png) This PR also fixes several similar instances across the documentation in `sql` PySpark module. ## How was this patch tested? N/A Author: hyukjinkwon <gurwls223@gmail.com> Closes #14063 from HyukjinKwon/minor-pyspark-builder.
*	[SPARK-15973][PYSPARK] Fix GroupedData Documentation	Josh Howes	2016-06-17	1	-11/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This contribution is my original work and that I license the work to the project under the project's open source license. ## What changes were proposed in this pull request? Documentation updates to PySpark's GroupedData ## How was this patch tested? Manual Tests Author: Josh Howes <josh.howes@gmail.com> Author: Josh Howes <josh.howes@maxpoint.com> Closes #13724 from josh-howes/bugfix/SPARK-15973.
*	[SPARK-15464][ML][MLLIB][SQL][TESTS] Replace SQLContext and SparkContext ↵	WeichenXu	2016-05-23	1	-5/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	with SparkSession using builder pattern in python test code ## What changes were proposed in this pull request? Replace SQLContext and SparkContext with SparkSession using builder pattern in python test code. ## How was this patch tested? Existing test. Author: WeichenXu <WeichenXu123@outlook.com> Closes #13242 from WeichenXu123/python_doctest_update_sparksession.
*	[SPARK-12756][SQL] use hash expression in Exchange	Wenchen Fan	2016-01-13	1	-3/+3
\| \| \| \| \| \| \| \| \| \|	This PR makes bucketing and exchange share one common hash algorithm, so that we can guarantee the data distribution is same between shuffle and bucketed data source, which enables us to only shuffle one side when join a bucketed table and a normal one. This PR also fixes the tests that are broken by the new hash behaviour in shuffle. Author: Wenchen Fan <wenchen@databricks.com> Closes #10703 from cloud-fan/use-hash-expr-in-shuffle.
*	[SPARK-12184][PYTHON] Make python api doc for pivot consistant with scala doc	Andrew Ray	2015-12-07	1	-5/+9
\| \| \| \| \| \| \| \|	In SPARK-11946 the API for pivot was changed a bit and got updated doc, the doc changes were not made for the python api though. This PR updates the python doc to be consistent. Author: Andrew Ray <ray.andrew@gmail.com> Closes #10176 from aray/sql-pivot-python-doc.
*	[SPARK-11984][SQL][PYTHON] Fix typos in doc for pivot for scala and python	felixcheung	2015-11-25	1	-3/+3
\| \| \| \| \| \|	Author: felixcheung <felixcheung_m@hotmail.com> Closes #9967 from felixcheung/pypivotdoc.
*	[SPARK-11946][SQL] Audit pivot API for 1.6.	Reynold Xin	2015-11-24	1	-4/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently pivot's signature looks like ```scala scala.annotation.varargs def pivot(pivotColumn: Column, values: Column): GroupedData scala.annotation.varargs def pivot(pivotColumn: String, values: Any): GroupedData ``` I think we can remove the one that takes "Column" types, since callers should always be passing in literals. It'd also be more clear if the values are not varargs, but rather Seq or java.util.List. I also made similar changes for Python. Author: Reynold Xin <rxin@databricks.com> Closes #9929 from rxin/SPARK-11946.
*	[SPARK-11690][PYSPARK] Add pivot to python api	Andrew Ray	2015-11-13	1	-1/+23
\| \| \| \| \| \| \| \|	This PR adds pivot to the python api of GroupedData with the same syntax as Scala/Java. Author: Andrew Ray <ray.andrew@gmail.com> Closes #9653 from aray/sql-pivot-python.
*	[SPARK-11489][SQL] Only include common first order statistics in GroupedData	Reynold Xin	2015-11-03	1	-88/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We added a bunch of higher order statistics such as skewness and kurtosis to GroupedData. I don't think they are common enough to justify being listed, since users can always use the normal statistics aggregate functions. That is to say, after this change, we won't support ```scala df.groupBy("key").kurtosis("colA", "colB") ``` However, we will still support ```scala df.groupBy("key").agg(kurtosis(col("colA")), kurtosis(col("colB"))) ``` Author: Reynold Xin <rxin@databricks.com> Closes #9446 from rxin/SPARK-11489.
*	[SPARK-11467][SQL] add Python API for stddev/variance	Davies Liu	2015-11-03	1	-0/+88
\| \| \| \| \| \| \| \|	Add Python API for stddev/stddev_pop/stddev_samp/variance/var_pop/var_samp/skewness/kurtosis Author: Davies Liu <davies@databricks.com> Closes #9424 from davies/py_var.
*	[SPARK-10373] [PYSPARK] move @since into pyspark from sql	Davies Liu	2015-09-08	1	-1/+1
\| \| \| \| \| \| \| \|	cc mengxr Author: Davies Liu <davies@databricks.com> Closes #8657 from davies/move_since.
*	[SPARK-8770][SQL] Create BinaryOperator abstract class.	Reynold Xin	2015-07-01	1	-12/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Our current BinaryExpression abstract class is not for generic binary expressions, i.e. it requires left/right children to have the same type. However, due to its name, contributors build new binary expressions that don't have that assumption (e.g. Sha) and still extend BinaryExpression. This patch creates a new BinaryOperator abstract class, and update the analyzer o only apply type casting rule there. This patch also adds the notion of "prettyName" to expressions, which defines the user-facing name for the expression. Author: Reynold Xin <rxin@databricks.com> Closes #7174 from rxin/binary-opterator and squashes the following commits: f31900d [Reynold Xin] [SPARK-8770][SQL] Create BinaryOperator abstract class. fceb216 [Reynold Xin] Merge branch 'master' of github.com:apache/spark into binary-opterator d8518cf [Reynold Xin] Updated Python tests.
*	[SPARK-7322, SPARK-7836, SPARK-7822][SQL] DataFrame window function related ↵	Davies Liu	2015-05-23	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	updates 1. ntile should take an integer as parameter. 2. Added Python API (based on #6364) 3. Update documentation of various DataFrame Python functions. Author: Davies Liu <davies@databricks.com> Author: Reynold Xin <rxin@databricks.com> Closes #6374 from rxin/window-final and squashes the following commits: 69004c7 [Reynold Xin] Style fix. 288cea9 [Reynold Xin] Update documentaiton. 7cb8985 [Reynold Xin] Merge pull request #6364 from davies/window 66092b4 [Davies Liu] update docs ed73cb4 [Reynold Xin] [SPARK-7322][SQL] Improve DataFrame window function documentation. ef55132 [Davies Liu] Merge branch 'master' of github.com:apache/spark into window4 8936ade [Davies Liu] fix maxint in python 3 2649358 [Davies Liu] update docs 778e2c0 [Davies Liu] SPARK-7836 and SPARK-7822: Python API of window functions
*	[SPARK-7606] [SQL] [PySpark] add version to Python SQL API docs	Davies Liu	2015-05-20	1	-0/+10
\| \| \| \| \| \| \| \| \| \| \| \| \|	Add version info for public Python SQL API. cc rxin Author: Davies Liu <davies@databricks.com> Closes #6295 from davies/versions and squashes the following commits: cfd91e6 [Davies Liu] add more version for DataFrame API 600834d [Davies Liu] add version to SQL API docs
*	[SPARK-7543] [SQL] [PySpark] split dataframe.py into multiple files	Davies Liu	2015-05-15	1	-0/+183
	dataframe.py is splited into column.py, group.py and dataframe.py: ``` 360 column.py 1223 dataframe.py 183 group.py ``` Author: Davies Liu <davies@databricks.com> Closes #6201 from davies/split_df and squashes the following commits: fc8f5ab [Davies Liu] split dataframe.py into multiple files