diff options
author | Dongjoon Hyun <dongjoon@apache.org> | 2016-05-04 14:31:36 -0700 |
---|---|---|
committer | Andrew Or <andrew@databricks.com> | 2016-05-04 14:31:36 -0700 |
commit | cdce4e62a5674e2034e5d395578b1a60e3d8c435 (patch) | |
tree | c715f2555dad353683f82820962576f89b2db452 /examples/src/main/python/ml/count_vectorizer_example.py | |
parent | cf2e9da612397233ae7bca0e9ce57309f16226b5 (diff) | |
download | spark-cdce4e62a5674e2034e5d395578b1a60e3d8c435.tar.gz spark-cdce4e62a5674e2034e5d395578b1a60e3d8c435.tar.bz2 spark-cdce4e62a5674e2034e5d395578b1a60e3d8c435.zip |
[SPARK-15031][EXAMPLE] Use SparkSession in Scala/Python/Java example.
## What changes were proposed in this pull request?
This PR aims to update Scala/Python/Java examples by replacing `SQLContext` with newly added `SparkSession`.
- Use **SparkSession Builder Pattern** in 154(Scala 55, Java 52, Python 47) files.
- Add `getConf` in Python SparkContext class: `python/pyspark/context.py`
- Replace **SQLContext Singleton Pattern** with **SparkSession Singleton Pattern**:
- `SqlNetworkWordCount.scala`
- `JavaSqlNetworkWordCount.java`
- `sql_network_wordcount.py`
Now, `SQLContexts` are used only in R examples and the following two Python examples. The python examples are untouched in this PR since it already fails some unknown issue.
- `simple_params_example.py`
- `aft_survival_regression.py`
## How was this patch tested?
Manual.
Author: Dongjoon Hyun <dongjoon@apache.org>
Closes #12809 from dongjoon-hyun/SPARK-15031.
Diffstat (limited to 'examples/src/main/python/ml/count_vectorizer_example.py')
-rw-r--r-- | examples/src/main/python/ml/count_vectorizer_example.py | 10 |
1 files changed, 4 insertions, 6 deletions
diff --git a/examples/src/main/python/ml/count_vectorizer_example.py b/examples/src/main/python/ml/count_vectorizer_example.py index e839f645f7..9dbf9959d1 100644 --- a/examples/src/main/python/ml/count_vectorizer_example.py +++ b/examples/src/main/python/ml/count_vectorizer_example.py @@ -17,19 +17,17 @@ from __future__ import print_function -from pyspark import SparkContext -from pyspark.sql import SQLContext +from pyspark.sql import SparkSession # $example on$ from pyspark.ml.feature import CountVectorizer # $example off$ if __name__ == "__main__": - sc = SparkContext(appName="CountVectorizerExample") - sqlContext = SQLContext(sc) + spark = SparkSession.builder.appName("CountVectorizerExample").getOrCreate() # $example on$ # Input data: Each row is a bag of words with a ID. - df = sqlContext.createDataFrame([ + df = spark.createDataFrame([ (0, "a b c".split(" ")), (1, "a b b c a".split(" ")) ], ["id", "words"]) @@ -41,4 +39,4 @@ if __name__ == "__main__": result.show() # $example off$ - sc.stop() + spark.stop() |