[SPARK-11920][ML][DOC] ML LinearRegression should use correct dataset in examples and user guide doc - spark

diff options

author	Yanbo Liang <ybliang8@gmail.com>	2015-11-23 11:51:29 -0800
committer	Joseph K. Bradley <joseph@databricks.com>	2015-11-23 11:51:29 -0800
commit	98d7ec7df4bb115dbd84cb9acd744b6c8abfebd5 (patch)
tree	1c9f2845324f06519e5f42a0559873d2041fae0e /ec2/spark_ec2.py
parent	5231cd5acaae69d735ba3209531705cc222f3cfb (diff)
download	spark-98d7ec7df4bb115dbd84cb9acd744b6c8abfebd5.tar.gz spark-98d7ec7df4bb115dbd84cb9acd744b6c8abfebd5.tar.bz2 spark-98d7ec7df4bb115dbd84cb9acd744b6c8abfebd5.zip

[SPARK-11920][ML][DOC] ML LinearRegression should use correct dataset in examples and user guide doc

ML ```LinearRegression``` use ```data/mllib/sample_libsvm_data.txt``` as dataset in examples and user guide doc, but it's actually classification dataset rather than regression dataset. We should use ```data/mllib/sample_linear_regression_data.txt``` instead. The deeper causes is that ```LinearRegression``` with "normal" solver can not solve this dataset correctly, may be due to the ill condition and unreasonable label. This issue has been reported at [SPARK-11918](https://issues.apache.org/jira/browse/SPARK-11918). It will confuse users if they run the example code but get exception, so we should make this change which can clearly illustrate the usage of ```LinearRegression``` algorithm. Author: Yanbo Liang <ybliang8@gmail.com> Closes #9905 from yanboliang/spark-11920.

Diffstat (limited to 'ec2/spark_ec2.py')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: