aboutsummaryrefslogtreecommitdiff
path: root/examples/src/main/python/ml/decision_tree_regression_example.py
diff options
context:
space:
mode:
authorYanbo Liang <ybliang8@gmail.com>2015-11-13 08:43:05 -0800
committerXiangrui Meng <meng@databricks.com>2015-11-13 08:43:05 -0800
commit99693fef0a30432d94556154b81872356d921c64 (patch)
tree09d76cc0ef6cae153718982a9a1ecc827ee12d5f /examples/src/main/python/ml/decision_tree_regression_example.py
parent61a28486ccbcdd37461419df958aea222c8b9f09 (diff)
downloadspark-99693fef0a30432d94556154b81872356d921c64.tar.gz
spark-99693fef0a30432d94556154b81872356d921c64.tar.bz2
spark-99693fef0a30432d94556154b81872356d921c64.zip
[SPARK-11723][ML][DOC] Use LibSVM data source rather than MLUtils.loadLibSVMFile to load DataFrame
Use LibSVM data source rather than MLUtils.loadLibSVMFile to load DataFrame, include: * Use libSVM data source for all example codes under examples/ml, and remove unused import. * Use libSVM data source for user guides under ml-*** which were omitted by #8697. * Fix bug: We should use ```sqlContext.read().format("libsvm").load(path)``` at Java side, but the API doc and user guides misuse as ```sqlContext.read.format("libsvm").load(path)```. * Code cleanup. mengxr Author: Yanbo Liang <ybliang8@gmail.com> Closes #9690 from yanboliang/spark-11723.
Diffstat (limited to 'examples/src/main/python/ml/decision_tree_regression_example.py')
-rw-r--r--examples/src/main/python/ml/decision_tree_regression_example.py5
1 files changed, 2 insertions, 3 deletions
diff --git a/examples/src/main/python/ml/decision_tree_regression_example.py b/examples/src/main/python/ml/decision_tree_regression_example.py
index 3857aed538..439e398947 100644
--- a/examples/src/main/python/ml/decision_tree_regression_example.py
+++ b/examples/src/main/python/ml/decision_tree_regression_example.py
@@ -28,7 +28,6 @@ from pyspark.ml import Pipeline
from pyspark.ml.regression import DecisionTreeRegressor
from pyspark.ml.feature import VectorIndexer
from pyspark.ml.evaluation import RegressionEvaluator
-from pyspark.mllib.util import MLUtils
# $example off$
if __name__ == "__main__":
@@ -36,8 +35,8 @@ if __name__ == "__main__":
sqlContext = SQLContext(sc)
# $example on$
- # Load and parse the data file, converting it to a DataFrame.
- data = MLUtils.loadLibSVMFile(sc, "data/mllib/sample_libsvm_data.txt").toDF()
+ # Load the data stored in LIBSVM format as a DataFrame.
+ data = sqlContext.read.format("libsvm").load("data/mllib/sample_libsvm_data.txt")
# Automatically identify categorical features, and index them.
# We specify maxCategories so features with > 4 distinct values are treated as continuous.