diff options
author | Xusen Yin <yinxusen@gmail.com> | 2015-04-29 14:55:32 -0700 |
---|---|---|
committer | Xiangrui Meng <meng@databricks.com> | 2015-04-29 14:55:32 -0700 |
commit | c9d530e2e5123dbd4fd13fc487c890d6076b24bf (patch) | |
tree | ed767d35090ace95b7ea77df5d58519bff653ec7 /bin/sparkR.cmd | |
parent | 15995c883aa248235fdebf0cbeeaa3ef12c97e9c (diff) | |
download | spark-c9d530e2e5123dbd4fd13fc487c890d6076b24bf.tar.gz spark-c9d530e2e5123dbd4fd13fc487c890d6076b24bf.tar.bz2 spark-c9d530e2e5123dbd4fd13fc487c890d6076b24bf.zip |
[SPARK-6529] [ML] Add Word2Vec transformer
See JIRA issue [here](https://issues.apache.org/jira/browse/SPARK-6529).
There are some notes:
1. I add `learningRate` in sharedParams since it is a common parameter for ML algorithms.
2. We will not support transform of finding synonyms from a `Vector`, which will support in further JIRA issues.
3. Word2Vec is different with other ML models that its training set and transformed set are different. Its training set is an `RDD[Iterable[String]]` which represents documents, but the transformed set we want is an `RDD[String]` that represents unique words. So you have to switch your `inputCol` in these two stages.
Author: Xusen Yin <yinxusen@gmail.com>
Closes #5596 from yinxusen/SPARK-6529 and squashes the following commits:
ee2b37a [Xusen Yin] merge with former HEAD
4945462 [Xusen Yin] merge with #5626
3bc2cbd [Xusen Yin] change foldLeft to for loop and use blas
5dd4ee7 [Xusen Yin] fix scala style
743e0d5 [Xusen Yin] fix comments and code style
04c48e9 [Xusen Yin] ensure the functionality
a190f2c [Xusen Yin] fix code style and refine the transform function of word2vec
02848fa [Xusen Yin] refine comments
34a55c0 [Xusen Yin] fix errors
109d124 [Xusen Yin] add test suite and pass it
04dde06 [Xusen Yin] add shared params
c594095 [Xusen Yin] add word2vec transformer
23d77fa [Xusen Yin] merge with #5626
e8cfaf7 [Xusen Yin] fix conflict with master
66e7bd3 [Xusen Yin] change foldLeft to for loop and use blas
566ec20 [Xusen Yin] fix scala style
b54399f [Xusen Yin] fix comments and code style
1211e86 [Xusen Yin] ensure the functionality
6b97ec8 [Xusen Yin] fix code style and refine the transform function of word2vec
7cde18f [Xusen Yin] rm sharedParams
618abd0 [Xusen Yin] refine comments
e29680a [Xusen Yin] fix errors
fe3afe9 [Xusen Yin] add test suite and pass it
02767fb [Xusen Yin] add shared params
6a514f1 [Xusen Yin] add word2vec transformer
Diffstat (limited to 'bin/sparkR.cmd')
0 files changed, 0 insertions, 0 deletions