diff options
author | Tathagata Das <tathagata.das1565@gmail.com> | 2016-06-14 17:58:45 -0700 |
---|---|---|
committer | Tathagata Das <tathagata.das1565@gmail.com> | 2016-06-14 17:58:45 -0700 |
commit | 214adb14b8d1f1c4dce0c97dd6dc09efedbaa643 (patch) | |
tree | 4933de7ffd5ff7f099957fceaf581b4519a0b2fa /python/pyspark/sql/dataframe.py | |
parent | 5d50d4f0f9db3e6cc7c51e35cdb2d12daa4fd108 (diff) | |
download | spark-214adb14b8d1f1c4dce0c97dd6dc09efedbaa643.tar.gz spark-214adb14b8d1f1c4dce0c97dd6dc09efedbaa643.tar.bz2 spark-214adb14b8d1f1c4dce0c97dd6dc09efedbaa643.zip |
[SPARK-15933][SQL][STREAMING] Refactored DF reader-writer to use readStream and writeStream for streaming DFs
## What changes were proposed in this pull request?
Currently, the DataFrameReader/Writer has method that are needed for streaming and non-streaming DFs. This is quite awkward because each method in them through runtime exception for one case or the other. So rather having half the methods throw runtime exceptions, its just better to have a different reader/writer API for streams.
- [x] Python API!!
## How was this patch tested?
Existing unit tests + two sets of unit tests for DataFrameReader/Writer and DataStreamReader/Writer.
Author: Tathagata Das <tathagata.das1565@gmail.com>
Closes #13653 from tdas/SPARK-15933.
Diffstat (limited to 'python/pyspark/sql/dataframe.py')
-rw-r--r-- | python/pyspark/sql/dataframe.py | 18 |
1 files changed, 16 insertions, 2 deletions
diff --git a/python/pyspark/sql/dataframe.py b/python/pyspark/sql/dataframe.py index 4fa799ac55..0126faf574 100644 --- a/python/pyspark/sql/dataframe.py +++ b/python/pyspark/sql/dataframe.py @@ -33,7 +33,7 @@ from pyspark.storagelevel import StorageLevel from pyspark.traceback_utils import SCCallSiteSync from pyspark.sql.types import _parse_datatype_json_string from pyspark.sql.column import Column, _to_seq, _to_list, _to_java_column -from pyspark.sql.readwriter import DataFrameWriter +from pyspark.sql.readwriter import DataFrameWriter, DataStreamWriter from pyspark.sql.types import * __all__ = ["DataFrame", "DataFrameNaFunctions", "DataFrameStatFunctions"] @@ -172,13 +172,27 @@ class DataFrame(object): @since(1.4) def write(self): """ - Interface for saving the content of the :class:`DataFrame` out into external storage. + Interface for saving the content of the non-streaming :class:`DataFrame` out into external + storage. :return: :class:`DataFrameWriter` """ return DataFrameWriter(self) @property + @since(2.0) + def writeStream(self): + """ + Interface for saving the content of the streaming :class:`DataFrame` out into external + storage. + + .. note:: Experimental. + + :return: :class:`DataStreamWriter` + """ + return DataStreamWriter(self) + + @property @since(1.3) def schema(self): """Returns the schema of this :class:`DataFrame` as a :class:`types.StructType`. |