aboutsummaryrefslogtreecommitdiff
path: root/python
diff options
context:
space:
mode:
authorSean Owen <sowen@cloudera.com>2016-09-04 12:40:51 +0100
committerSean Owen <sowen@cloudera.com>2016-09-04 12:40:51 +0100
commitcdeb97a8cd26e3282cc2a4f126242ed2199f3898 (patch)
tree22bb93ee40ae08cb0f1928c7c2fdd535739ecd23 /python
parente75c162e9e510d74b07f28ccf6c7948ac317a7c6 (diff)
downloadspark-cdeb97a8cd26e3282cc2a4f126242ed2199f3898.tar.gz
spark-cdeb97a8cd26e3282cc2a4f126242ed2199f3898.tar.bz2
spark-cdeb97a8cd26e3282cc2a4f126242ed2199f3898.zip
[SPARK-17311][MLLIB] Standardize Python-Java MLlib API to accept optional long seeds in all cases
## What changes were proposed in this pull request? Related to https://github.com/apache/spark/pull/14524 -- just the 'fix' rather than a behavior change. - PythonMLlibAPI methods that take a seed now always take a `java.lang.Long` consistently, allowing the Python API to specify "no seed" - .mllib's Word2VecModel seemed to be an odd man out in .mllib in that it picked its own random seed. Instead it defaults to None, meaning, letting the Scala implementation pick a seed - BisectingKMeansModel arguably should not hard-code a seed for consistency with .mllib, I think. However I left it. ## How was this patch tested? Existing tests Author: Sean Owen <sowen@cloudera.com> Closes #14826 from srowen/SPARK-16832.2.
Diffstat (limited to 'python')
-rw-r--r--python/pyspark/mllib/feature.py4
1 files changed, 2 insertions, 2 deletions
diff --git a/python/pyspark/mllib/feature.py b/python/pyspark/mllib/feature.py
index 324ba9758e..b32d0c70ec 100644
--- a/python/pyspark/mllib/feature.py
+++ b/python/pyspark/mllib/feature.py
@@ -600,7 +600,7 @@ class Word2Vec(object):
self.learningRate = 0.025
self.numPartitions = 1
self.numIterations = 1
- self.seed = random.randint(0, sys.maxsize)
+ self.seed = None
self.minCount = 5
self.windowSize = 5
@@ -675,7 +675,7 @@ class Word2Vec(object):
raise TypeError("data should be an RDD of list of string")
jmodel = callMLlibFunc("trainWord2VecModel", data, int(self.vectorSize),
float(self.learningRate), int(self.numPartitions),
- int(self.numIterations), int(self.seed),
+ int(self.numIterations), self.seed,
int(self.minCount), int(self.windowSize))
return Word2VecModel(jmodel)