| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
| |
They should use the existing SQLContext.
Author: Davies Liu <davies@databricks.com>
Closes #9914 from davies/create_udf.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This PR addresses (SPARK-9014)[https://issues.apache.org/jira/browse/SPARK-9014]
Added functionality: `Column` object in Python now supports exponential operator `**`
Example:
```
from pyspark.sql import *
df = sqlContext.createDataFrame([Row(a=2)])
df.select(3**df.a,df.a**3,df.a**df.a).collect()
```
Outputs:
```
[Row(POWER(3.0, a)=9.0, POWER(a, 3.0)=8.0, POWER(a, a)=4.0)]
```
Author: 0x0FFF <programmerag@gmail.com>
Closes #8658 from 0x0FFF/SPARK-9014.
|
|
|
|
|
|
|
|
| |
cc mengxr
Author: Davies Liu <davies@databricks.com>
Closes #8657 from davies/move_since.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
`pyspark.sql.column.Column` object has `__getitem__` method, which makes it iterable for Python. In fact it has `__getitem__` to address the case when the column might be a list or dict, for you to be able to access certain element of it in DF API. The ability to iterate over it is just a side effect that might cause confusion for the people getting familiar with Spark DF (as you might iterate this way on Pandas DF for instance)
Issue reproduction:
```
df = sqlContext.jsonRDD(sc.parallelize(['{"name": "El Magnifico"}']))
for i in df["name"]: print i
```
Author: 0x0FFF <programmerag@gmail.com>
Closes #8574 from 0x0FFF/SPARK-10417.
|
|
|
|
|
|
|
|
|
|
|
|
| |
to JavaConverters
Replace `JavaConversions` implicits with `JavaConverters`
Most occurrences I've seen so far are necessary conversions; a few have been avoidable. None are in critical code as far as I see, yet.
Author: Sean Owen <sowen@cloudera.com>
Closes #8033 from srowen/SPARK-9613.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Inspiration drawn from this blog post: https://lab.getbase.com/pandarize-spark-dataframes/
Author: Reynold Xin <rxin@databricks.com>
Closes #7977 from rxin/isin and squashes the following commits:
9b1d3d6 [Reynold Xin] Added return.
2197d37 [Reynold Xin] Fixed test case.
7c1b6cf [Reynold Xin] Import warnings.
4f4a35d [Reynold Xin] [SPARK-9659][SQL] Rename inSet to isin to match Pandas function.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
in booelan expression
It's a common mistake that user will put Column in a boolean expression (together with `and` , `or`), which does not work as expected, we should raise a exception in that case, and suggest user to use `&`, `|` instead.
Author: Davies Liu <davies@databricks.com>
Closes #6961 from davies/column_bool and squashes the following commits:
9f19beb [Davies Liu] update message
af74bd6 [Davies Liu] fix tests
07dff84 [Davies Liu] address comments, fix tests
f70c08e [Davies Liu] raise Exception if column is used in booelan expression
|
|
|
|
|
|
|
|
|
|
|
|
| |
Thanks ogirardot, closes #6580
cc rxin JoshRosen
Author: Davies Liu <davies@databricks.com>
Closes #6590 from davies/when and squashes the following commits:
c0f2069 [Davies Liu] fix Column.when() and otherwise()
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
updates
1. ntile should take an integer as parameter.
2. Added Python API (based on #6364)
3. Update documentation of various DataFrame Python functions.
Author: Davies Liu <davies@databricks.com>
Author: Reynold Xin <rxin@databricks.com>
Closes #6374 from rxin/window-final and squashes the following commits:
69004c7 [Reynold Xin] Style fix.
288cea9 [Reynold Xin] Update documentaiton.
7cb8985 [Reynold Xin] Merge pull request #6364 from davies/window
66092b4 [Davies Liu] update docs
ed73cb4 [Reynold Xin] [SPARK-7322][SQL] Improve DataFrame window function documentation.
ef55132 [Davies Liu] Merge branch 'master' of github.com:apache/spark into window4
8936ade [Davies Liu] fix maxint in python 3
2649358 [Davies Liu] update docs
778e2c0 [Davies Liu] SPARK-7836 and SPARK-7822: Python API of window functions
|
|
|
|
|
|
|
|
|
|
| |
Author: kaka1992 <kaka_1992@163.com>
Closes #6313 from kaka1992/astype and squashes the following commits:
73dfd0b [kaka1992] [SPARK-7394] Add Pandas style cast (astype)
ad8feb2 [kaka1992] [SPARK-7394] Add Pandas style cast (astype)
4f328b7 [kaka1992] [SPARK-7394] Add Pandas style cast (astype)
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add version info for public Python SQL API.
cc rxin
Author: Davies Liu <davies@databricks.com>
Closes #6295 from davies/versions and squashes the following commits:
cfd91e6 [Davies Liu] add more version for DataFrame API
600834d [Davies Liu] add version to SQL API docs
|
|
dataframe.py is splited into column.py, group.py and dataframe.py:
```
360 column.py
1223 dataframe.py
183 group.py
```
Author: Davies Liu <davies@databricks.com>
Closes #6201 from davies/split_df and squashes the following commits:
fc8f5ab [Davies Liu] split dataframe.py into multiple files
|