aboutsummaryrefslogtreecommitdiff
path: root/core
diff options
context:
space:
mode:
authorXusen Yin <yinxusen@gmail.com>2015-04-29 14:55:32 -0700
committerXiangrui Meng <meng@databricks.com>2015-04-29 14:55:32 -0700
commitc9d530e2e5123dbd4fd13fc487c890d6076b24bf (patch)
treeed767d35090ace95b7ea77df5d58519bff653ec7 /core
parent15995c883aa248235fdebf0cbeeaa3ef12c97e9c (diff)
downloadspark-c9d530e2e5123dbd4fd13fc487c890d6076b24bf.tar.gz
spark-c9d530e2e5123dbd4fd13fc487c890d6076b24bf.tar.bz2
spark-c9d530e2e5123dbd4fd13fc487c890d6076b24bf.zip
[SPARK-6529] [ML] Add Word2Vec transformer
See JIRA issue [here](https://issues.apache.org/jira/browse/SPARK-6529). There are some notes: 1. I add `learningRate` in sharedParams since it is a common parameter for ML algorithms. 2. We will not support transform of finding synonyms from a `Vector`, which will support in further JIRA issues. 3. Word2Vec is different with other ML models that its training set and transformed set are different. Its training set is an `RDD[Iterable[String]]` which represents documents, but the transformed set we want is an `RDD[String]` that represents unique words. So you have to switch your `inputCol` in these two stages. Author: Xusen Yin <yinxusen@gmail.com> Closes #5596 from yinxusen/SPARK-6529 and squashes the following commits: ee2b37a [Xusen Yin] merge with former HEAD 4945462 [Xusen Yin] merge with #5626 3bc2cbd [Xusen Yin] change foldLeft to for loop and use blas 5dd4ee7 [Xusen Yin] fix scala style 743e0d5 [Xusen Yin] fix comments and code style 04c48e9 [Xusen Yin] ensure the functionality a190f2c [Xusen Yin] fix code style and refine the transform function of word2vec 02848fa [Xusen Yin] refine comments 34a55c0 [Xusen Yin] fix errors 109d124 [Xusen Yin] add test suite and pass it 04dde06 [Xusen Yin] add shared params c594095 [Xusen Yin] add word2vec transformer 23d77fa [Xusen Yin] merge with #5626 e8cfaf7 [Xusen Yin] fix conflict with master 66e7bd3 [Xusen Yin] change foldLeft to for loop and use blas 566ec20 [Xusen Yin] fix scala style b54399f [Xusen Yin] fix comments and code style 1211e86 [Xusen Yin] ensure the functionality 6b97ec8 [Xusen Yin] fix code style and refine the transform function of word2vec 7cde18f [Xusen Yin] rm sharedParams 618abd0 [Xusen Yin] refine comments e29680a [Xusen Yin] fix errors fe3afe9 [Xusen Yin] add test suite and pass it 02767fb [Xusen Yin] add shared params 6a514f1 [Xusen Yin] add word2vec transformer
Diffstat (limited to 'core')
0 files changed, 0 insertions, 0 deletions