diff options
author | José Antonio <joseanmunoz@gmail.com> | 2016-06-25 09:11:25 +0100 |
---|---|---|
committer | Sean Owen <sowen@cloudera.com> | 2016-06-25 09:11:25 +0100 |
commit | a3c7b4187bad00dad87df7e3b5929a44d29568ed (patch) | |
tree | 9bceb88d152d184f0a8eaee86d736a3f44e65766 | |
parent | a7d29499dca5b86e776abc225ece84391f09353a (diff) | |
download | spark-a3c7b4187bad00dad87df7e3b5929a44d29568ed.tar.gz spark-a3c7b4187bad00dad87df7e3b5929a44d29568ed.tar.bz2 spark-a3c7b4187bad00dad87df7e3b5929a44d29568ed.zip |
[MLLIB] org.apache.spark.mllib.util.SVMDataGenerator generates ArrayIndexOutOfBoundsException. I have found the bug and tested the solution.
## What changes were proposed in this pull request?
Just adjust the size of an array in line 58 so it does not cause an ArrayOutOfBoundsException in line 66.
## How was this patch tested?
Manual tests. I have recompiled the entire project with the fix, it has been built successfully and I have run the code, also with good results.
line 66: val yD = blas.ddot(trueWeights.length, x, 1, trueWeights, 1) + rnd.nextGaussian() * 0.1
crashes because trueWeights has length "nfeatures + 1" while "x" has length "features", and they should have the same length.
To fix this just make trueWeights be the same length as x.
I have recompiled the project with the change and it is working now:
[spark-1.6.1]$ spark-submit --master local[*] --class org.apache.spark.mllib.util.SVMDataGenerator mllib/target/spark-mllib_2.11-1.6.1.jar local /home/user/test
And it generates the data successfully now in the specified folder.
Author: José Antonio <joseanmunoz@gmail.com>
Closes #13895 from j4munoz/patch-2.
-rw-r--r-- | mllib/src/main/scala/org/apache/spark/mllib/util/SVMDataGenerator.scala | 2 |
1 files changed, 1 insertions, 1 deletions
diff --git a/mllib/src/main/scala/org/apache/spark/mllib/util/SVMDataGenerator.scala b/mllib/src/main/scala/org/apache/spark/mllib/util/SVMDataGenerator.scala index cde5979396..c946860654 100644 --- a/mllib/src/main/scala/org/apache/spark/mllib/util/SVMDataGenerator.scala +++ b/mllib/src/main/scala/org/apache/spark/mllib/util/SVMDataGenerator.scala @@ -55,7 +55,7 @@ object SVMDataGenerator { val sc = new SparkContext(sparkMaster, "SVMGenerator") val globalRnd = new Random(94720) - val trueWeights = Array.fill[Double](nfeatures + 1)(globalRnd.nextGaussian()) + val trueWeights = Array.fill[Double](nfeatures)(globalRnd.nextGaussian()) val data: RDD[LabeledPoint] = sc.parallelize(0 until nexamples, parts).map { idx => val rnd = new Random(42 + idx) |