diff options
author | Yuhao Yang <hhbyyh@gmail.com> | 2015-11-09 16:55:23 -0800 |
---|---|---|
committer | Joseph K. Bradley <joseph@databricks.com> | 2015-11-09 16:55:23 -0800 |
commit | 61f9c8711c79f35d67b0456155866da316b131d9 (patch) | |
tree | f8a120c315999ba1a459d8b2965b6a7646865df1 /pom.xml | |
parent | 7dc9d8dba6c4bc655896b137062d896dec4ef64a (diff) | |
download | spark-61f9c8711c79f35d67b0456155866da316b131d9.tar.gz spark-61f9c8711c79f35d67b0456155866da316b131d9.tar.bz2 spark-61f9c8711c79f35d67b0456155866da316b131d9.zip |
[SPARK-11069][ML] Add RegexTokenizer option to convert to lowercase
jira: https://issues.apache.org/jira/browse/SPARK-11069
quotes from jira:
Tokenizer converts strings to lowercase automatically, but RegexTokenizer does not. It would be nice to add an option to RegexTokenizer to convert to lowercase. Proposal:
call the Boolean Param "toLowercase"
set default to false (so behavior does not change)
Actually sklearn converts to lowercase before tokenizing too
Author: Yuhao Yang <hhbyyh@gmail.com>
Closes #9092 from hhbyyh/tokenLower.
Diffstat (limited to 'pom.xml')
0 files changed, 0 insertions, 0 deletions