aboutsummaryrefslogtreecommitdiff
path: root/python/pyspark/sql/readwriter.py
Commit message (Collapse)AuthorAgeFilesLines
* [SPARK-12749][SQL] add json option to parse floating-point types as DecimalTypeBrandon Bradley2016-01-281-0/+2
| | | | | | | | | | I tried to add this via `USE_BIG_DECIMAL_FOR_FLOATS` option from Jackson with no success. Added test for non-complex types. Should I add a test for complex types? Author: Brandon Bradley <bradleytastic@gmail.com> Closes #10936 from blbradley/spark-12749.
* [SPARK-12600][SQL] Remove deprecated methods in Spark SQLReynold Xin2016-01-041-6/+14
| | | | | | Author: Reynold Xin <rxin@databricks.com> Closes #10559 from rxin/remove-deprecated-sql.
* [SPARK-12537][SQL] Add option to accept quoting of all character backslash ↵Cazen2016-01-031-0/+2
| | | | | | | | | | | | | quoting mechanism We can provides the option to choose JSON parser can be enabled to accept quoting of all character or not. Author: Cazen <Cazen@korea.com> Author: Cazen Lee <cazen.lee@samsung.com> Author: Cazen Lee <Cazen@korea.com> Author: cazen.lee <cazen.lee@samsung.com> Closes #10497 from Cazen/master.
* [SQL] Update SQLContext.read.text docYanbo Liang2015-12-171-1/+1
| | | | | | | | Since we rename the column name from ```text``` to ```value``` for DataFrame load by ```SQLContext.read.text```, we need to update doc. Author: Yanbo Liang <ybliang8@gmail.com> Closes #10349 from yanboliang/text-value.
* [SPARK-11967][SQL] Consistent use of varargs for multiple paths in ↵Reynold Xin2015-11-241-7/+12
| | | | | | | | | | | | DataFrameReader This patch makes it consistent to use varargs in all DataFrameReader methods, including Parquet, JSON, text, and the generic load function. Also added a few more API tests for the Java API. Author: Reynold Xin <rxin@databricks.com> Closes #9945 from rxin/SPARK-11967.
* [SPARK-11804] [PYSPARK] Exception raise when using Jdbc predicates opt…Jeff Zhang2015-11-181-5/+5
| | | | | | | | …ion in PySpark Author: Jeff Zhang <zjffdu@apache.org> Closes #9791 from zjffdu/SPARK-11804.
* [SPARK-11745][SQL] Enable more JSON parsing optionsReynold Xin2015-11-161-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | This patch adds the following options to the JSON data source, for dealing with non-standard JSON files: * `allowComments` (default `false`): ignores Java/C++ style comment in JSON records * `allowUnquotedFieldNames` (default `false`): allows unquoted JSON field names * `allowSingleQuotes` (default `true`): allows single quotes in addition to double quotes * `allowNumericLeadingZeros` (default `false`): allows leading zeros in numbers (e.g. 00012) To avoid passing a lot of options throughout the json package, I introduced a new JSONOptions case class to define all JSON config options. Also updated documentation to explain these options. Scala ![screen shot 2015-11-15 at 6 12 12 pm](https://cloud.githubusercontent.com/assets/323388/11172965/e3ace6ec-8bc4-11e5-805e-2d78f80d0ed6.png) Python ![screen shot 2015-11-15 at 6 11 28 pm](https://cloud.githubusercontent.com/assets/323388/11172964/e23ed6ee-8bc4-11e5-8216-312f5983acd5.png) Author: Reynold Xin <rxin@databricks.com> Closes #9724 from rxin/SPARK-11745.
* [HOTFIX] Fix python tests after #9527Michael Armbrust2015-11-061-1/+1
| | | | | | | | #9527 missed updating the python tests. Author: Michael Armbrust <michael@databricks.com> Closes #9533 from marmbrus/hotfixTextValue.
* [SPARK-11292] [SQL] Python API for text data sourceReynold Xin2015-10-281-2/+25
| | | | | | | | Adds DataFrameReader.text and DataFrameWriter.text. Author: Reynold Xin <rxin@databricks.com> Closes #9259 from rxin/SPARK-11292.
* [SPARK-10185] [SQL] Feat sql comma separated pathsKoert Kuipers2015-10-171-1/+13
| | | | | | | | Make sure comma-separated paths get processed correcly in ResolvedDataSource for a HadoopFsRelationProvider Author: Koert Kuipers <koert@tresata.com> Closes #8416 from koertkuipers/feat-sql-comma-separated-paths.
* [SPARK-10373] [PYSPARK] move @since into pyspark from sqlDavies Liu2015-09-081-2/+1
| | | | | | | | cc mengxr Author: Davies Liu <davies@databricks.com> Closes #8657 from davies/move_since.
* [SPARK-9964] [PYSPARK] [SQL] PySpark DataFrameReader accept RDD of String ↵Yanbo Liang2015-08-261-6/+22
| | | | | | | | | | | for JSON PySpark DataFrameReader should could accept an RDD of Strings (like the Scala version does) for JSON, rather than only taking a path. If this PR is merged, it should be duplicated to cover the other input types (not just JSON). Author: Yanbo Liang <ybliang8@gmail.com> Closes #8444 from yanboliang/spark-9964.
* [SPARK-9828] [PYSPARK] Mutable values should not be default argumentsMechCoder2015-08-141-2/+6
| | | | | | Author: MechCoder <manojkumarsivaraj334@gmail.com> Closes #8110 from MechCoder/spark-9828.
* [SPARK-6591] [SQL] Python data source load options should auto convert ↵Yijie Shen2015-08-051-3/+14
| | | | | | | | | | | | | | | common types into strings JIRA: https://issues.apache.org/jira/browse/SPARK-6591 Author: Yijie Shen <henry.yijieshen@gmail.com> Closes #7926 from yjshen/py_dsload_opt and squashes the following commits: b207832 [Yijie Shen] fix style efdf834 [Yijie Shen] resolve comment 7a8f6a2 [Yijie Shen] lowercase 822e769 [Yijie Shen] convert load opts to string
* [SPARK-9100] [SQL] Adds DataFrame reader/writer shortcut methods for ORCCheng Lian2015-07-211-3/+41
| | | | | | | | | | | This PR adds DataFrame reader/writer shortcut methods for ORC in both Scala and Python. Author: Cheng Lian <lian@databricks.com> Closes #7444 from liancheng/spark-9100 and squashes the following commits: 284d043 [Cheng Lian] Fixes PySpark test cases and addresses PR comments e0b09fb [Cheng Lian] Adds DataFrame reader/writer shortcut methods for ORC
* [SPARK-8698] partitionBy in Python DataFrame reader/writer interface should ↵Reynold Xin2015-06-291-8/+13
| | | | | | | | | | not default to empty tuple. Author: Reynold Xin <rxin@databricks.com> Closes #7079 from rxin/SPARK-8698 and squashes the following commits: 8513e1c [Reynold Xin] [SPARK-8698] partitionBy in Python DataFrame reader/writer interface should not default to empty tuple.
* [SPARK-8355] [SQL] Python DataFrameReader/Writer should mirror ScalaCheolsoo Park2015-06-291-0/+14
| | | | | | | | | | | | | I compared PySpark DataFrameReader/Writer against Scala ones. `Option` function is missing in both reader and writer, but the rest seems to all match. I added `Option` to reader and writer and updated the `pyspark-sql` test. Author: Cheolsoo Park <cheolsoop@netflix.com> Closes #7078 from piaozhexiu/SPARK-8355 and squashes the following commits: c63d419 [Cheolsoo Park] Fix version 524e0aa [Cheolsoo Park] Add option function to df reader and writer
* [SPARK-8532] [SQL] In Python's DataFrameWriter, ↵Yin Huai2015-06-221-11/+19
| | | | | | | | | | | | | | | | | | | | | save/saveAsTable/json/parquet/jdbc always override mode https://issues.apache.org/jira/browse/SPARK-8532 This PR has two changes. First, it fixes the bug that save actions (i.e. `save/saveAsTable/json/parquet/jdbc`) always override mode. Second, it adds input argument `partitionBy` to `save/saveAsTable/parquet`. Author: Yin Huai <yhuai@databricks.com> Closes #6937 from yhuai/SPARK-8532 and squashes the following commits: f972d5d [Yin Huai] davies's comment. d37abd2 [Yin Huai] style. d21290a [Yin Huai] Python doc. 889eb25 [Yin Huai] Minor refactoring and add partitionBy to save, saveAsTable, and parquet. 7fbc24b [Yin Huai] Use None instead of "error" as the default value of mode since JVM-side already uses "error" as the default value. d696dff [Yin Huai] Python style. 88eb6c4 [Yin Huai] If mode is "error", do not call mode method. c40c461 [Yin Huai] Regression test.
* [SPARK-8060] Improve DataFrame Python test coverage and documentation.Reynold Xin2015-06-031-119/+98
| | | | | | | | | | Author: Reynold Xin <rxin@databricks.com> Closes #6601 from rxin/python-read-write-test-and-doc and squashes the following commits: baa8ad5 [Reynold Xin] Code review feedback. f081d47 [Reynold Xin] More documentation updates. c9902fa [Reynold Xin] [SPARK-8060] Improve DataFrame Python reader/writer interface doc and testing.
* [SPARK-8021] [SQL] [PYSPARK] make Python read/write API consistent with ScalaDavies Liu2015-06-021-27/+94
| | | | | | | | | | | | | | add schema()/format()/options() for reader, add mode()/format()/options()/partitionBy() for writer cc rxin yhuai pwendell Author: Davies Liu <davies@databricks.com> Closes #6578 from davies/readwrite and squashes the following commits: 720d293 [Davies Liu] address comments b65dfa2 [Davies Liu] Update readwriter.py 1299ab6 [Davies Liu] make Python API consistent with Scala
* [SPARK-7840] add insertInto() to WriterDavies Liu2015-05-231-7/+15
| | | | | | | | | | Add tests later. Author: Davies Liu <davies@databricks.com> Closes #6375 from davies/insertInto and squashes the following commits: 826423e [Davies Liu] add insertInto() to Writer
* [SPARK-7606] [SQL] [PySpark] add version to Python SQL API docsDavies Liu2015-05-201-0/+15
| | | | | | | | | | | | | Add version info for public Python SQL API. cc rxin Author: Davies Liu <davies@databricks.com> Closes #6295 from davies/versions and squashes the following commits: cfd91e6 [Davies Liu] add more version for DataFrame API 600834d [Davies Liu] add version to SQL API docs
* [SPARK-7738] [SQL] [PySpark] add reader and writer API in PythonDavies Liu2015-05-191-0/+338
cc rxin, please take a quick look, I'm working on tests. Author: Davies Liu <davies@databricks.com> Closes #6238 from davies/readwrite and squashes the following commits: c7200eb [Davies Liu] update tests 9cbf01b [Davies Liu] Merge branch 'master' of github.com:apache/spark into readwrite f0c5a04 [Davies Liu] use sqlContext.read.load 5f68bc8 [Davies Liu] update tests 6437e9a [Davies Liu] Merge branch 'master' of github.com:apache/spark into readwrite bcc6668 [Davies Liu] add reader amd writer API in Python