diff options
author | Dongjoon Hyun <dongjoon@apache.org> | 2016-05-05 14:37:50 -0700 |
---|---|---|
committer | Andrew Or <andrew@databricks.com> | 2016-05-05 14:37:50 -0700 |
commit | 2c170dd3d731bd848d62265431795e1c141d75d7 (patch) | |
tree | d81ec5e4a6adfda683d7882680d50d2261b06818 /examples/src/main/python/mllib | |
parent | bb9991dec5dd631b22a05e2e1b83b9082a845e8f (diff) | |
download | spark-2c170dd3d731bd848d62265431795e1c141d75d7.tar.gz spark-2c170dd3d731bd848d62265431795e1c141d75d7.tar.bz2 spark-2c170dd3d731bd848d62265431795e1c141d75d7.zip |
[SPARK-15134][EXAMPLE] Indent SparkSession builder patterns and update binary_classification_metrics_example.py
## What changes were proposed in this pull request?
This issue addresses the comments in SPARK-15031 and also fix java-linter errors.
- Use multiline format in SparkSession builder patterns.
- Update `binary_classification_metrics_example.py` to use `SparkSession`.
- Fix Java Linter errors (in SPARK-13745, SPARK-15031, and so far)
## How was this patch tested?
After passing the Jenkins tests and run `dev/lint-java` manually.
Author: Dongjoon Hyun <dongjoon@apache.org>
Closes #12911 from dongjoon-hyun/SPARK-15134.
Diffstat (limited to 'examples/src/main/python/mllib')
-rw-r--r-- | examples/src/main/python/mllib/binary_classification_metrics_example.py | 15 |
1 files changed, 10 insertions, 5 deletions
diff --git a/examples/src/main/python/mllib/binary_classification_metrics_example.py b/examples/src/main/python/mllib/binary_classification_metrics_example.py index 8f0fc9d45d..daf000e38d 100644 --- a/examples/src/main/python/mllib/binary_classification_metrics_example.py +++ b/examples/src/main/python/mllib/binary_classification_metrics_example.py @@ -18,20 +18,25 @@ Binary Classification Metrics Example. """ from __future__ import print_function -from pyspark import SparkContext +from pyspark.sql import SparkSession # $example on$ from pyspark.mllib.classification import LogisticRegressionWithLBFGS from pyspark.mllib.evaluation import BinaryClassificationMetrics -from pyspark.mllib.util import MLUtils +from pyspark.mllib.regression import LabeledPoint # $example off$ if __name__ == "__main__": - sc = SparkContext(appName="BinaryClassificationMetricsExample") + spark = SparkSession\ + .builder\ + .appName("BinaryClassificationMetricsExample")\ + .getOrCreate() # $example on$ # Several of the methods available in scala are currently missing from pyspark # Load training data in LIBSVM format - data = MLUtils.loadLibSVMFile(sc, "data/mllib/sample_binary_classification_data.txt") + data = spark\ + .read.format("libsvm").load("data/mllib/sample_binary_classification_data.txt")\ + .rdd.map(lambda row: LabeledPoint(row[0], row[1])) # Split data into training (60%) and test (40%) training, test = data.randomSplit([0.6, 0.4], seed=11L) @@ -53,4 +58,4 @@ if __name__ == "__main__": print("Area under ROC = %s" % metrics.areaUnderROC) # $example off$ - sc.stop() + spark.stop() |