| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
language binding
This PR is based on PR #14098 authored by wangmiao1981.
## What changes were proposed in this pull request?
This PR replaces the original Python Spark SQL example file with the following three files:
- `sql/basic.py`
Demonstrates basic Spark SQL features.
- `sql/datasource.py`
Demonstrates various Spark SQL data sources.
- `sql/hive.py`
Demonstrates Spark SQL Hive interaction.
This PR also removes hard-coded Python example snippets in the SQL programming guide by extracting snippets from the above files using the `include_example` Liquid template tag.
## How was this patch tested?
Manually tested.
Author: wm624@hotmail.com <wm624@hotmail.com>
Author: Cheng Lian <lian@databricks.com>
Closes #14317 from liancheng/py-examples-update.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
## What changes were proposed in this pull request?
This PR adds labelling support for the `include_example` Jekyll plugin, so that we may split a single source file into multiple line blocks with different labels, and include them in multiple code snippets in the generated HTML page.
## How was this patch tested?
Manually tested.
<img width="923" alt="screenshot at jun 29 19-53-21" src="https://cloud.githubusercontent.com/assets/230655/16451099/66a76db2-3e33-11e6-84fb-63104c2f0688.png">
Author: Cheng Lian <lian@databricks.com>
Closes #13972 from liancheng/include-example-with-labels.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
dataset.registerTempTable
## What changes were proposed in this pull request?
Update the unit test code, examples, and documents to remove calls to deprecated method `dataset.registerTempTable`.
## How was this patch tested?
This PR only changes the unit test code, examples, and comments. It should be safe.
This is a follow up of PR https://github.com/apache/spark/pull/12945 which was merged.
Author: Sean Zhong <seanzhong@databricks.com>
Closes #13098 from clockfly/spark-15171-remove-deprecation.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
## What changes were proposed in this pull request?
Deprecates registerTempTable and add dataset.createTempView, dataset.createOrReplaceTempView.
## How was this patch tested?
Unit tests.
Author: Sean Zhong <seanzhong@databricks.com>
Closes #12945 from clockfly/spark-15171.
|
|
|
|
|
|
|
|
|
|
| |
This PR removes `sqlContext` in examples. Actual usage was all replaced in https://github.com/apache/spark/pull/12809 but there are some in comments.
Manual style checking.
Author: hyukjinkwon <gurwls223@gmail.com>
Closes #13006 from HyukjinKwon/minor-docs.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
binary_classification_metrics_example.py
## What changes were proposed in this pull request?
This issue addresses the comments in SPARK-15031 and also fix java-linter errors.
- Use multiline format in SparkSession builder patterns.
- Update `binary_classification_metrics_example.py` to use `SparkSession`.
- Fix Java Linter errors (in SPARK-13745, SPARK-15031, and so far)
## How was this patch tested?
After passing the Jenkins tests and run `dev/lint-java` manually.
Author: Dongjoon Hyun <dongjoon@apache.org>
Closes #12911 from dongjoon-hyun/SPARK-15134.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
## What changes were proposed in this pull request?
This PR aims to update Scala/Python/Java examples by replacing `SQLContext` with newly added `SparkSession`.
- Use **SparkSession Builder Pattern** in 154(Scala 55, Java 52, Python 47) files.
- Add `getConf` in Python SparkContext class: `python/pyspark/context.py`
- Replace **SQLContext Singleton Pattern** with **SparkSession Singleton Pattern**:
- `SqlNetworkWordCount.scala`
- `JavaSqlNetworkWordCount.java`
- `sql_network_wordcount.py`
Now, `SQLContexts` are used only in R examples and the following two Python examples. The python examples are untouched in this PR since it already fails some unknown issue.
- `simple_params_example.py`
- `aft_survival_regression.py`
## How was this patch tested?
Manual.
Author: Dongjoon Hyun <dongjoon@apache.org>
Closes #12809 from dongjoon-hyun/SPARK-15031.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
PySpark.
## What changes were proposed in this pull request?
This is a python port of corresponding Scala builder pattern code. `sql.py` is modified as a target example case.
## How was this patch tested?
Manual.
Author: Dongjoon Hyun <dongjoon@apache.org>
Closes #12860 from dongjoon-hyun/SPARK-15084.
|
|
|
|
|
|
|
|
|
|
| |
To run Spark, default will read file from HDFS if we don't set the schema.
Author: linweizhong <linweizhong@huawei.com>
Closes #5684 from Sephiroth-Lin/pyspark_example_minor and squashes the following commits:
19fe145 [linweizhong] Update example sql.py, so that can read file correctly
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This PR update PySpark to support Python 3 (tested with 3.4).
Known issue: unpickle array from Pyrolite is broken in Python 3, those tests are skipped.
TODO: ec2/spark-ec2.py is not fully tested with python3.
Author: Davies Liu <davies@databricks.com>
Author: twneale <twneale@gmail.com>
Author: Josh Rosen <joshrosen@databricks.com>
Closes #5173 from davies/python3 and squashes the following commits:
d7d6323 [Davies Liu] fix tests
6c52a98 [Davies Liu] fix mllib test
99e334f [Davies Liu] update timeout
b716610 [Davies Liu] Merge branch 'master' of github.com:apache/spark into python3
cafd5ec [Davies Liu] adddress comments from @mengxr
bf225d7 [Davies Liu] Merge branch 'master' of github.com:apache/spark into python3
179fc8d [Davies Liu] tuning flaky tests
8c8b957 [Davies Liu] fix ResourceWarning in Python 3
5c57c95 [Davies Liu] Merge branch 'master' of github.com:apache/spark into python3
4006829 [Davies Liu] fix test
2fc0066 [Davies Liu] add python3 path
71535e9 [Davies Liu] fix xrange and divide
5a55ab4 [Davies Liu] Merge branch 'master' of github.com:apache/spark into python3
125f12c [Davies Liu] Merge branch 'master' of github.com:apache/spark into python3
ed498c8 [Davies Liu] fix compatibility with python 3
820e649 [Davies Liu] Merge branch 'master' of github.com:apache/spark into python3
e8ce8c9 [Davies Liu] Merge branch 'master' of github.com:apache/spark into python3
ad7c374 [Davies Liu] fix mllib test and warning
ef1fc2f [Davies Liu] fix tests
4eee14a [Davies Liu] Merge branch 'master' of github.com:apache/spark into python3
20112ff [Davies Liu] Merge branch 'master' of github.com:apache/spark into python3
59bb492 [Davies Liu] fix tests
1da268c [Davies Liu] Merge branch 'master' of github.com:apache/spark into python3
ca0fdd3 [Davies Liu] fix code style
9563a15 [Davies Liu] add imap back for python 2
0b1ec04 [Davies Liu] make python examples work with Python 3
d2fd566 [Davies Liu] Merge branch 'master' of github.com:apache/spark into python3
a716d34 [Davies Liu] test with python 3.4
f1700e8 [Davies Liu] fix test in python3
671b1db [Davies Liu] fix test in python3
692ff47 [Davies Liu] fix flaky test
7b9699f [Davies Liu] invalidate import cache for Python 3.3+
9c58497 [Davies Liu] fix kill worker
309bfbf [Davies Liu] keep compatibility
5707476 [Davies Liu] cleanup, fix hash of string in 3.3+
8662d5b [Davies Liu] Merge branch 'master' of github.com:apache/spark into python3
f53e1f0 [Davies Liu] fix tests
70b6b73 [Davies Liu] compile ec2/spark_ec2.py in python 3
a39167e [Davies Liu] support customize class in __main__
814c77b [Davies Liu] run unittests with python 3
7f4476e [Davies Liu] mllib tests passed
d737924 [Davies Liu] pass ml tests
375ea17 [Davies Liu] SQL tests pass
6cc42a9 [Davies Liu] rename
431a8de [Davies Liu] streaming tests pass
78901a7 [Davies Liu] fix hash of serializer in Python 3
24b2f2e [Davies Liu] pass all RDD tests
35f48fe [Davies Liu] run future again
1eebac2 [Davies Liu] fix conflict in ec2/spark_ec2.py
6e3c21d [Davies Liu] make cloudpickle work with Python3
2fb2db3 [Josh Rosen] Guard more changes behind sys.version; still doesn't run
1aa5e8f [twneale] Turned out `pickle.DictionaryType is dict` == True, so swapped it out
7354371 [twneale] buffer --> memoryview I'm not super sure if this a valid change, but the 2.7 docs recommend using memoryview over buffer where possible, so hoping it'll work.
b69ccdf [twneale] Uses the pure python pickle._Pickler instead of c-extension _pickle.Pickler. It appears pyspark 2.7 uses the pure python pickler as well, so this shouldn't degrade pickling performance (?).
f40d925 [twneale] xrange --> range
e104215 [twneale] Replaces 2.7 types.InstsanceType with 3.4 `object`....could be horribly wrong depending on how types.InstanceType is used elsewhere in the package--see http://bugs.python.org/issue8206
79de9d0 [twneale] Replaces python2.7 `file` with 3.4 _io.TextIOWrapper
2adb42d [Josh Rosen] Fix up some import differences between Python 2 and 3
854be27 [Josh Rosen] Run `futurize` on Python code:
7c5b4ce [Josh Rosen] Remove Python 3 check in shell.py.
|
|
|
|
|
|
|
|
|
|
| |
Fix the import in sql example.
Author: Davies Liu <davies@databricks.com>
Closes #5230 from davies/fix_sql_example and squashes the following commits:
7ecc5f4 [Davies Liu] fix sql example
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Deprecate inferSchema() and applySchema(), use createDataFrame() instead, which could take an optional `schema` to create an DataFrame from an RDD. The `schema` could be StructType or list of names of columns.
Author: Davies Liu <davies@databricks.com>
Closes #4498 from davies/create and squashes the following commits:
08469c1 [Davies Liu] remove Scala/Java API for now
c80a7a9 [Davies Liu] fix hive test
d1bd8f2 [Davies Liu] cleanup applySchema
9526e97 [Davies Liu] createDataFrame from RDD with columns
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This pull request redesigns the existing Spark SQL dsl, which already provides data frame like functionalities.
TODOs:
With the exception of Python support, other tasks can be done in separate, follow-up PRs.
- [ ] Audit of the API
- [ ] Documentation
- [ ] More test cases to cover the new API
- [x] Python support
- [ ] Type alias SchemaRDD
Author: Reynold Xin <rxin@databricks.com>
Author: Davies Liu <davies@databricks.com>
Closes #4173 from rxin/df1 and squashes the following commits:
0a1a73b [Reynold Xin] Merge branch 'df1' of github.com:rxin/spark into df1
23b4427 [Reynold Xin] Mima.
828f70d [Reynold Xin] Merge pull request #7 from davies/df
257b9e6 [Davies Liu] add repartition
6bf2b73 [Davies Liu] fix collect with UDT and tests
e971078 [Reynold Xin] Missing quotes.
b9306b4 [Reynold Xin] Remove removeColumn/updateColumn for now.
a728bf2 [Reynold Xin] Example rename.
e8aa3d3 [Reynold Xin] groupby -> groupBy.
9662c9e [Davies Liu] improve DataFrame Python API
4ae51ea [Davies Liu] python API for dataframe
1e5e454 [Reynold Xin] Fixed a bug with symbol conversion.
2ca74db [Reynold Xin] Couple minor fixes.
ea98ea1 [Reynold Xin] Documentation & literal expressions.
2b22684 [Reynold Xin] Got rid of IntelliJ problems.
02bbfbc [Reynold Xin] Tightening imports.
ffbce66 [Reynold Xin] Fixed compilation error.
59b6d8b [Reynold Xin] Style violation.
b85edfb [Reynold Xin] ALS.
8c37f0a [Reynold Xin] Made MLlib and examples compile
6d53134 [Reynold Xin] Hive module.
d35efd5 [Reynold Xin] Fixed compilation error.
ce4a5d2 [Reynold Xin] Fixed test cases in SQL except ParquetIOSuite.
66d5ef1 [Reynold Xin] SQLContext minor patch.
c9bcdc0 [Reynold Xin] Checkpoint: SQL module compiles!
|
|
|
|
|
|
|
|
| |
Author: Daoyuan Wang <daoyuan.wang@intel.com>
Closes #2834 from adrian-wang/sqlpypath and squashes the following commits:
da7aa95 [Daoyuan Wang] fix file path using path.join
|
|
SQL example code for Python, as shown on [SQL Programming Guide](https://spark.apache.org/docs/1.0.2/sql-programming-guide.html)
Author: jyotiska <jyotiska123@gmail.com>
Closes #2521 from jyotiska/sql_example and squashes the following commits:
1471dcb [jyotiska] added imports for sql
b25e436 [jyotiska] pep 8 compliance
43fd10a [jyotiska] lines broken to maintain 80 char limit
b4fdf4e [jyotiska] removed blank lines
83d5ab7 [jyotiska] added inferschema and applyschema to the demo
306667e [jyotiska] replaced blank line with end line
c90502a [jyotiska] fixed new line
4939a70 [jyotiska] added new line at end for python style
0b46148 [jyotiska] fixed appname for python sql example
8f67b5b [jyotiska] added python sql example
|