aboutsummaryrefslogtreecommitdiff
path: root/python/pyspark/sql/column.py
Commit message (Collapse)AuthorAgeFilesLines
* [SPARK-11836][SQL] udf/cast should not create new SQLContextDavies Liu2015-11-231-3/+4
| | | | | | | | They should use the existing SQLContext. Author: Davies Liu <davies@databricks.com> Closes #9914 from davies/create_udf.
* [SPARK-9014] [SQL] Allow Python spark API to use built-in exponential operator0x0FFF2015-09-111-0/+13
| | | | | | | | | | | | | | | | | | | This PR addresses (SPARK-9014)[https://issues.apache.org/jira/browse/SPARK-9014] Added functionality: `Column` object in Python now supports exponential operator `**` Example: ``` from pyspark.sql import * df = sqlContext.createDataFrame([Row(a=2)]) df.select(3**df.a,df.a**3,df.a**df.a).collect() ``` Outputs: ``` [Row(POWER(3.0, a)=9.0, POWER(a, 3.0)=8.0, POWER(a, a)=4.0)] ``` Author: 0x0FFF <programmerag@gmail.com> Closes #8658 from 0x0FFF/SPARK-9014.
* [SPARK-10373] [PYSPARK] move @since into pyspark from sqlDavies Liu2015-09-081-1/+1
| | | | | | | | cc mengxr Author: Davies Liu <davies@databricks.com> Closes #8657 from davies/move_since.
* [SPARK-10417] [SQL] Iterating through Column results in infinite loop0x0FFF2015-09-021-0/+3
| | | | | | | | | | | | | | `pyspark.sql.column.Column` object has `__getitem__` method, which makes it iterable for Python. In fact it has `__getitem__` to address the case when the column might be a list or dict, for you to be able to access certain element of it in DF API. The ability to iterate over it is just a side effect that might cause confusion for the people getting familiar with Spark DF (as you might iterate this way on Pandas DF for instance) Issue reproduction: ``` df = sqlContext.jsonRDD(sc.parallelize(['{"name": "El Magnifico"}'])) for i in df["name"]: print i ``` Author: 0x0FFF <programmerag@gmail.com> Closes #8574 from 0x0FFF/SPARK-10417.
* [SPARK-9613] [CORE] Ban use of JavaConversions and migrate all existing uses ↵Sean Owen2015-08-251-0/+12
| | | | | | | | | | | | to JavaConverters Replace `JavaConversions` implicits with `JavaConverters` Most occurrences I've seen so far are necessary conversions; a few have been avoidable. None are in critical code as far as I see, yet. Author: Sean Owen <sowen@cloudera.com> Closes #8033 from srowen/SPARK-9613.
* [SPARK-9659][SQL] Rename inSet to isin to match Pandas function.Reynold Xin2015-08-061-1/+19
| | | | | | | | | | | | | Inspiration drawn from this blog post: https://lab.getbase.com/pandarize-spark-dataframes/ Author: Reynold Xin <rxin@databricks.com> Closes #7977 from rxin/isin and squashes the following commits: 9b1d3d6 [Reynold Xin] Added return. 2197d37 [Reynold Xin] Fixed test case. 7c1b6cf [Reynold Xin] Import warnings. 4f4a35d [Reynold Xin] [SPARK-9659][SQL] Rename inSet to isin to match Pandas function.
* [SPARK-8573] [SPARK-8568] [SQL] [PYSPARK] raise Exception if column is used ↵Davies Liu2015-06-231-0/+5
| | | | | | | | | | | | | | | in booelan expression It's a common mistake that user will put Column in a boolean expression (together with `and` , `or`), which does not work as expected, we should raise a exception in that case, and suggest user to use `&`, `|` instead. Author: Davies Liu <davies@databricks.com> Closes #6961 from davies/column_bool and squashes the following commits: 9f19beb [Davies Liu] update message af74bd6 [Davies Liu] fix tests 07dff84 [Davies Liu] address comments, fix tests f70c08e [Davies Liu] raise Exception if column is used in booelan expression
* [SPARK-8038] [SQL] [PYSPARK] fix Column.when() and otherwise()Davies Liu2015-06-021-3/+28
| | | | | | | | | | | | Thanks ogirardot, closes #6580 cc rxin JoshRosen Author: Davies Liu <davies@databricks.com> Closes #6590 from davies/when and squashes the following commits: c0f2069 [Davies Liu] fix Column.when() and otherwise()
* [SPARK-7322, SPARK-7836, SPARK-7822][SQL] DataFrame window function related ↵Davies Liu2015-05-231-14/+40
| | | | | | | | | | | | | | | | | | | | | | | updates 1. ntile should take an integer as parameter. 2. Added Python API (based on #6364) 3. Update documentation of various DataFrame Python functions. Author: Davies Liu <davies@databricks.com> Author: Reynold Xin <rxin@databricks.com> Closes #6374 from rxin/window-final and squashes the following commits: 69004c7 [Reynold Xin] Style fix. 288cea9 [Reynold Xin] Update documentaiton. 7cb8985 [Reynold Xin] Merge pull request #6364 from davies/window 66092b4 [Davies Liu] update docs ed73cb4 [Reynold Xin] [SPARK-7322][SQL] Improve DataFrame window function documentation. ef55132 [Davies Liu] Merge branch 'master' of github.com:apache/spark into window4 8936ade [Davies Liu] fix maxint in python 3 2649358 [Davies Liu] update docs 778e2c0 [Davies Liu] SPARK-7836 and SPARK-7822: Python API of window functions
* [SPARK-7394][SQL] Add Pandas style cast (astype)kaka19922015-05-211-0/+2
| | | | | | | | | | Author: kaka1992 <kaka_1992@163.com> Closes #6313 from kaka1992/astype and squashes the following commits: 73dfd0b [kaka1992] [SPARK-7394] Add Pandas style cast (astype) ad8feb2 [kaka1992] [SPARK-7394] Add Pandas style cast (astype) 4f328b7 [kaka1992] [SPARK-7394] Add Pandas style cast (astype)
* [SPARK-7606] [SQL] [PySpark] add version to Python SQL API docsDavies Liu2015-05-201-0/+12
| | | | | | | | | | | | | Add version info for public Python SQL API. cc rxin Author: Davies Liu <davies@databricks.com> Closes #6295 from davies/versions and squashes the following commits: cfd91e6 [Davies Liu] add more version for DataFrame API 600834d [Davies Liu] add version to SQL API docs
* [SPARK-7543] [SQL] [PySpark] split dataframe.py into multiple filesDavies Liu2015-05-151-0/+360
dataframe.py is splited into column.py, group.py and dataframe.py: ``` 360 column.py 1223 dataframe.py 183 group.py ``` Author: Davies Liu <davies@databricks.com> Closes #6201 from davies/split_df and squashes the following commits: fc8f5ab [Davies Liu] split dataframe.py into multiple files