aboutsummaryrefslogtreecommitdiff
path: root/docs/ml-features.md
Commit message (Collapse)AuthorAgeFilesLines
* [DOC][MINOR] Fixed minor errors in feature.ml user guide docBryan Cutler2016-05-071-3/+5
| | | | | | | | | | | | ## What changes were proposed in this pull request? Fixed some minor errors found when reviewing feature.ml user guide ## How was this patch tested? built docs locally Author: Bryan Cutler <cutlerb@gmail.com> Closes #12940 from BryanCutler/feature.ml-doc_fixes-DOCS-MINOR.
* [SPARK-14512] [DOC] Add python example for QuantileDiscretizerZheng RuiFeng2016-05-061-0/+9
| | | | | | | | | | | | ## What changes were proposed in this pull request? Add the missing python example for QuantileDiscretizer ## How was this patch tested? manual tests Author: Zheng RuiFeng <ruifengz@foxmail.com> Closes #12281 from zhengruifeng/discret_pe.
* [SPARK-14514][DOC] Add python example for VectorSlicerZheng RuiFeng2016-04-261-0/+8
| | | | | | | | | | | | ## What changes were proposed in this pull request? Add the missing python example for VectorSlicer ## How was this patch tested? manual tests Author: Zheng RuiFeng <ruifengz@foxmail.com> Closes #12282 from zhengruifeng/vecslicer_pe.
* [SPARK-14635][ML] Documentation and Examples for TF-IDF only refer to HashingTFYuhao Yang2016-04-201-3/+12
| | | | | | | | | | | | | | ## What changes were proposed in this pull request? Currently, the docs for TF-IDF only refer to using HashingTF with IDF. However, CountVectorizer can also be used. We should probably amend the user guide and examples to show this. ## How was this patch tested? unit tests and doc generation Author: Yuhao Yang <hhbyyh@gmail.com> Closes #12454 from hhbyyh/tfdoc.
* [SPARK-14515][DOC] Add python example for ChiSqSelectorZheng RuiFeng2016-04-181-0/+8
| | | | | | | | | | | | ## What changes were proposed in this pull request? Add the missing python example for ChiSqSelector ## How was this patch tested? manual tests Author: Zheng RuiFeng <ruifengz@foxmail.com> Closes #12283 from zhengruifeng/chi2_pe.
* [SPARK-14509][DOC] Add python CountVectorizerExampleZheng RuiFeng2016-04-131-0/+9
| | | | | | | | | | | | ## What changes were proposed in this pull request? Add python CountVectorizerExample ## How was this patch tested? manual tests Author: Zheng RuiFeng <ruifengz@foxmail.com> Closes #11917 from zhengruifeng/cv_pe.
* [SPARK-14339][DOC] Add python examples for DCT,MinMaxScaler,MaxAbsScalerZheng RuiFeng2016-04-091-0/+24
| | | | | | | | | | | | ## What changes were proposed in this pull request? add three python examples ## How was this patch tested? manual tests Author: Zheng RuiFeng <ruifengz@foxmail.com> Closes #12063 from zhengruifeng/dct_pe.
* [SPARK-13512][ML] add example and doc for MaxAbsScalerYuhao Yang2016-03-111-0/+32
| | | | | | | | | | | | | | ## What changes were proposed in this pull request? jira: https://issues.apache.org/jira/browse/SPARK-13512 Add example and doc for ml.feature.MaxAbsScaler. ## How was this patch tested? unit tests Author: Yuhao Yang <hhbyyh@gmail.com> Closes #11392 from hhbyyh/maxabsdoc.
* [MINOR][DOCS] Fix all typos in markdown files of `doc` and similar patterns ↵Dongjoon Hyun2016-02-221-3/+3
| | | | | | | | | | | | | | | | | in other comments ## What changes were proposed in this pull request? This PR tries to fix all typos in all markdown files under `docs` module, and fixes similar typos in other comments, too. ## How was the this patch tested? manual tests. Author: Dongjoon Hyun <dongjoon@apache.org> Closes #11300 from dongjoon-hyun/minor_fix_typos.
* [SPARK-11965][ML][DOC] Update user guide for RFormula feature interactionsYanbo Liang2016-01-251-1/+19
| | | | | | | | Update user guide for RFormula feature interactions. Meanwhile we also update other new features such as supporting string label in Spark 1.6. Author: Yanbo Liang <ybliang8@gmail.com> Closes #10222 from yanboliang/spark-11965.
* [MINOR][DOC] Fix broken word2vec linkBenFradet2015-12-141-1/+1
| | | | | | | | Follow-up of [SPARK-12199](https://issues.apache.org/jira/browse/SPARK-12199) and #10193 where a broken link has been left as is. Author: BenFradet <benjamin.fradet@gmail.com> Closes #10282 from BenFradet/SPARK-12199.
* [SPARK-12199][DOC] Follow-up: Refine example code in ml-features.mdXusen Yin2015-12-121-11/+11
| | | | | | | | | | | | https://issues.apache.org/jira/browse/SPARK-12199 Follow-up PR of SPARK-11551. Fix some errors in ml-features.md mengxr Author: Xusen Yin <yinxusen@gmail.com> Closes #10193 from yinxusen/SPARK-12199.
* [SPARK-12217][ML] Document invalid handling for StringIndexerBenFradet2015-12-111-0/+36
| | | | | | | | | | Added a paragraph regarding StringIndexer#setHandleInvalid to the ml-features documentation. I wonder if I should also add a snippet to the code example, input welcome. Author: BenFradet <benjamin.fradet@gmail.com> Closes #10257 from BenFradet/SPARK-12217.
* [SPARK-12212][ML][DOC] Clarifies the difference between spark.ml, ↵Timothy Hunter2015-12-101-2/+2
| | | | | | | | | | | | spark.mllib and mllib in the documentation. Replaces a number of occurences of `MLlib` in the documentation that were meant to refer to the `spark.mllib` package instead. It should clarify for new users the difference between `spark.mllib` (the package) and MLlib (the umbrella project for ML in spark). It also removes some files that I forgot to delete with #10207 Author: Timothy Hunter <timhunter@databricks.com> Closes #10234 from thunterdb/12212.
* [SPARK-11551][DOC] Replace example code in ml-features.md using include_exampleXusen Yin2015-12-091-1061/+51
| | | | | | | | | PR on behalf of somideshmukh, thanks! Author: Xusen Yin <yinxusen@gmail.com> Author: somideshmukh <somilde@us.ibm.com> Closes #10219 from yinxusen/SPARK-11551.
* [SPARK-8517][ML][DOC] Reorganizes the spark.ml user guideTimothy Hunter2015-12-081-2/+2
| | | | | | | | | | This PR moves pieces of the spark.ml user guide to reflect suggestions in SPARK-8517. It does not introduce new content, as requested. <img width="192" alt="screen shot 2015-12-08 at 11 36 00 am" src="https://cloud.githubusercontent.com/assets/7594753/11666166/e82b84f2-9d9f-11e5-8904-e215424d8444.png"> Author: Timothy Hunter <timhunter@databricks.com> Closes #10207 from thunterdb/spark-8517.
* [SPARK-12159][ML] Add user guide section for IndexToString transformerBenFradet2015-12-081-16/+88
| | | | | | | | Documentation regarding the `IndexToString` label transformer with code snippets in Scala/Java/Python. Author: BenFradet <benjamin.fradet@gmail.com> Closes #10166 from BenFradet/SPARK-12159.
* [SPARK-11551][DOC][EXAMPLE] Revert PR #10002Cheng Lian2015-12-081-51/+1058
| | | | | | | | | | This reverts PR #10002, commit 78209b0ccaf3f22b5e2345dfb2b98edfdb746819. The original PR wasn't tested on Jenkins before being merged. Author: Cheng Lian <lian@databricks.com> Closes #10200 from liancheng/revert-pr-10002.
* [SPARK-11958][SPARK-11957][ML][DOC] SQLTransformer user guide and example codeYanbo Liang2015-12-071-0/+59
| | | | | | | | Add ```SQLTransformer``` user guide, example code and make Scala API doc more clear. Author: Yanbo Liang <ybliang8@gmail.com> Closes #10006 from yanboliang/spark-11958.
* [SPARK-11551][DOC][EXAMPLE] Replace example code in ml-features.md using ↵somideshmukh2015-12-071-1058/+51
| | | | | | | | | | | | | include_example Made new patch contaning only markdown examples moved to exmaple/folder. Ony three java code were not shfted since they were contaning compliation error ,these classes are 1)StandardScale 2)NormalizerExample 3)VectorIndexer Author: Xusen Yin <yinxusen@gmail.com> Author: somideshmukh <somilde@us.ibm.com> Closes #10002 from somideshmukh/SomilBranch1.33.
* [SPARK-11963][DOC] Add docs for QuantileDiscretizerXusen Yin2015-12-071-0/+65
| | | | | | | | https://issues.apache.org/jira/browse/SPARK-11963 Author: Xusen Yin <yinxusen@gmail.com> Closes #9962 from yinxusen/SPARK-11963.
* [DOCUMENTATION][MLLIB] typo in mllib docJeff Zhang2015-12-031-1/+1
| | | | | | | | \cc mengxr Author: Jeff Zhang <zjffdu@apache.org> Closes #10093 from zjffdu/mllib_typo.
* [SPARK-11961][DOC] Add docs of ChiSqSelectorXusen Yin2015-12-011-0/+50
| | | | | | | | https://issues.apache.org/jira/browse/SPARK-11961 Author: Xusen Yin <yinxusen@gmail.com> Closes #9965 from yinxusen/SPARK-11961.
* [SPARK-11723][ML][DOC] Use LibSVM data source rather than ↵Yanbo Liang2015-11-131-4/+4
| | | | | | | | | | | | | | | | MLUtils.loadLibSVMFile to load DataFrame Use LibSVM data source rather than MLUtils.loadLibSVMFile to load DataFrame, include: * Use libSVM data source for all example codes under examples/ml, and remove unused import. * Use libSVM data source for user guides under ml-*** which were omitted by #8697. * Fix bug: We should use ```sqlContext.read().format("libsvm").load(path)``` at Java side, but the API doc and user guides misuse as ```sqlContext.read.format("libsvm").load(path)```. * Code cleanup. mengxr Author: Yanbo Liang <ybliang8@gmail.com> Closes #9690 from yanboliang/spark-11723.
* [SPARK-11289][DOC] Substitute code examples in ML features extractors with ↵Xusen Yin2015-10-261-209/+8
| | | | | | | | | | | | include_example mengxr https://issues.apache.org/jira/browse/SPARK-11289 I make some changes in ML feature extractors. I.e. TF-IDF, Word2Vec, and CountVectorizer. I add new example code in spark/examples, hope it is the right place to add those examples. Author: Xusen Yin <yinxusen@gmail.com> Closes #9266 from yinxusen/SPARK-11289.
* [SPARK-10670] [ML] [Doc] add api reference for ml docYuhao Yang2015-09-281-64/+195
| | | | | | | | | | jira: https://issues.apache.org/jira/browse/SPARK-10670 In the Markdown docs for the spark.ml Programming Guide, we have code examples with codetabs for each language. We should link to each language's API docs within the corresponding codetab, but we are inconsistent about this. For an example of what we want to do, see the "Word2Vec" section in https://github.com/apache/spark/blob/64743870f23bffb8d96dcc8a0181c1452782a151/docs/ml-features.md This JIRA is just for spark.ml, not spark.mllib Author: Yuhao Yang <hhbyyh@gmail.com> Closes #8901 from hhbyyh/docAPI.
* [SPARK-10595] [ML] [MLLIB] [DOCS] Various ML guide cleanupsJoseph K. Bradley2015-09-151-4/+30
| | | | | | | | | | | | | | | | | | Various ML guide cleanups. * ml-guide.md: Make it easier to access the algorithm-specific guides. * LDA user guide: EM often begins with useless topics, but running longer generally improves them dramatically. E.g., 10 iterations on a Wikipedia dataset produces useless topics, but 50 iterations produces very meaningful topics. * mllib-feature-extraction.html#elementwiseproduct: “w” parameter should be “scalingVec” * Clean up Binarizer user guide a little. * Document in Pipeline that users should not put an instance into the Pipeline in more than 1 place. * spark.ml Word2Vec user guide: clean up grammar/writing * Chi Sq Feature Selector docs: Improve text in doc. CC: mengxr feynmanliang Author: Joseph K. Bradley <joseph@databricks.com> Closes #8752 from jkbradley/mlguide-fixes-1.5.
* [SPARK-10518] [DOCS] Update code examples in spark.ml user guide to use ↵y-shimizu2015-09-111-42/+22
| | | | | | | | | | LIBSVM data source instead of MLUtils I fixed to use LIBSVM data source in the example code in spark.ml instead of MLUtils Author: y-shimizu <y.shimizu0429@gmail.com> Closes #8697 from y-shimizu/SPARK-10518.
* [SPARK-10249] [ML] [DOC] Add Python Code Example to StopWordsRemover User GuideYuhao Yang2015-09-081-0/+19
| | | | | | | | | | jira: https://issues.apache.org/jira/browse/SPARK-10249 update user guide since python support added. Author: Yuhao Yang <hhbyyh@gmail.com> Closes #8620 from hhbyyh/swPyDocExample.
* [SPARK-9890] [DOC] [ML] User guide for CountVectorizerYuhao Yang2015-08-281-0/+109
| | | | | | | | | | jira: https://issues.apache.org/jira/browse/SPARK-9890 document with Scala and java examples Author: Yuhao Yang <hhbyyh@gmail.com> Closes #8487 from hhbyyh/cvDoc.
* [SPARK-9680] [MLLIB] [DOC] StopWordsRemovers user guide and Java ↵Feynman Liang2015-08-271-3/+99
| | | | | | | | | | | | | compatibility test * Adds user guide for ml.feature.StopWordsRemovers, ran code examples on my machine * Cleans up scaladocs for public methods * Adds test for Java compatibility * Follow up Python user guide code example is tracked by SPARK-10249 Author: Feynman Liang <fliang@databricks.com> Closes #8436 from feynmanliang/SPARK-10230.
* [SPARK-8531] [ML] Update ML user guide for MinMaxScalerYuhao Yang2015-08-251-0/+71
| | | | | | | | | | | jira: https://issues.apache.org/jira/browse/SPARK-8531 Update ML user guide for MinMaxScaler Author: Yuhao Yang <hhbyyh@gmail.com> Author: unknown <yuhaoyan@yuhaoyan-MOBL1.ccr.corp.intel.com> Closes #7211 from hhbyyh/minmaxdoc.
* [SPARK-9893] User guide with Java test suite for VectorSlicerXusen Yin2015-08-211-0/+133
| | | | | | | | | | Add user guide for `VectorSlicer`, with Java test suite and Python version VectorSlicer. Note that Python version does not support selecting by names now. Author: Xusen Yin <yinxusen@gmail.com> Closes #8267 from yinxusen/SPARK-9893.
* [SPARK-9895] User Guide for RFormula Feature TransformerEric Liang2015-08-191-0/+108
| | | | | | | | mengxr Author: Eric Liang <ekl@databricks.com> Closes #8293 from ericl/docs-2.
* [SPARK-10060] [ML] [DOC] spark.ml DecisionTree user guideJoseph K. Bradley2015-08-191-2/+0
| | | | | | | | | | | | New user guide section ml-decision-tree.md, including code examples. I have run all examples, including the Java ones. CC: manishamde yanboliang mengxr Author: Joseph K. Bradley <joseph@databricks.com> Closes #8244 from jkbradley/ml-dt-docs.
* [SPARK-9977] [DOCS] Update documentation for StringIndexerlewuathe2015-08-191-1/+5
| | | | | | | | | By using `StringIndexer`, we can obtain indexed label on new column. So a following estimator should use this new column through pipeline if it wants to use string indexed label. I think it is better to make it explicit on documentation. Author: lewuathe <lewuathe@me.com> Closes #8205 from Lewuathe/SPARK-9977.
* [SPARK-10070] [DOCS] Remove Guava dependencies in user guidesSean Owen2015-08-191-26/+26
| | | | | | | | | | | | `Lists.newArrayList` -> `Arrays.asList` CC jkbradley feynmanliang Anybody into replacing usages of `Lists.newArrayList` in the examples / source code too? this method isn't useful in Java 7 and beyond. Author: Sean Owen <sowen@cloudera.com> Closes #8272 from srowen/SPARK-10070.
* [SPARK-8473] [SPARK-9889] [ML] User guide and example code for DCTFeynman Liang2015-08-181-0/+71
| | | | | | | | mengxr jkbradley Author: Feynman Liang <fliang@databricks.com> Closes #8184 from feynmanliang/SPARK-9889-DCT-docs.
* [SPARK-9768] [PYSPARK] [ML] Add Python API and user guide for ↵Yanbo Liang2015-08-171-4/+19
| | | | | | | | | | ml.feature.ElementwiseProduct Add Python API, user guide and example for ml.feature.ElementwiseProduct. Author: Yanbo Liang <ybliang8@gmail.com> Closes #8061 from yanboliang/SPARK-9768.
* [SPARK-7583] [MLLIB] User guide update for RegexTokenizerYuhao Yang2015-08-121-11/+30
| | | | | | | | | | jira: https://issues.apache.org/jira/browse/SPARK-7583 User guide update for RegexTokenizer Author: Yuhao Yang <hhbyyh@gmail.com> Closes #7828 from hhbyyh/regexTokenizerDoc.
* [SPARK-9191] [ML] [Doc] Add ml.PCA user guide and code examplesYanbo Liang2015-08-031-0/+86
| | | | | | | | | | | Add ml.PCA user guide document and code examples for Scala/Java/Python. Author: Yanbo Liang <ybliang8@gmail.com> Closes #7522 from yanboliang/ml-pca-md and squashes the following commits: 60dec05 [Yanbo Liang] address comments f992abe [Yanbo Liang] Add ml.PCA doc and examples
* [SPARK-8457] [ML] NGram DocumentationFeynman Liang2015-07-081-0/+88
| | | | | | | | | | | | Add documentation for NGram feature transformer. Author: Feynman Liang <fliang@databricks.com> Closes #7244 from feynmanliang/SPARK-8457 and squashes the following commits: 5aface9 [Feynman Liang] Pretty print Scala output and add API doc to each codetab 60d5ac0 [Feynman Liang] Inline API doc and fix indentation 736ccbc [Feynman Liang] NGram feature transformer documentation
* [SPARK-7582] [MLLIB] user guide for StringIndexerXiangrui Meng2015-06-011-0/+116
| | | | | | | | | | | | | This PR adds a Java unit test and user guide for `StringIndexer`. I put it before `OneHotEncoder` because they are closely related. jkbradley Author: Xiangrui Meng <meng@databricks.com> Closes #6561 from mengxr/SPARK-7582 and squashes the following commits: 4bba4f1 [Xiangrui Meng] fix example ba1cd1b [Xiangrui Meng] fix style 7fa18d1 [Xiangrui Meng] add user guide for StringIndexer 136cb93 [Xiangrui Meng] add a Java unit test for StringIndexer
* [SPARK-7584] [MLLIB] User guide for VectorAssemblerXiangrui Meng2015-06-011-0/+114
| | | | | | | | | | | | | | | | This PR adds a section in the user guide for `VectorAssembler` with code examples in Python/Java/Scala. It also adds a unit test in Java. jkbradley Author: Xiangrui Meng <meng@databricks.com> Closes #6556 from mengxr/SPARK-7584 and squashes the following commits: 11313f6 [Xiangrui Meng] simplify Java example 0cd47f3 [Xiangrui Meng] update user guide fd36292 [Xiangrui Meng] update Java unit test ce61ca0 [Xiangrui Meng] add Java unit test for VectorAssembler e399942 [Xiangrui Meng] scala/python example code
* [SPARK-7576] [MLLIB] Add spark.ml user guide doc/example for ElementwiseProductOctavian Geagla2015-05-291-0/+88
| | | | | | | | | Author: Octavian Geagla <ogeagla@gmail.com> Closes #6501 from ogeagla/ml-guide-elemwiseprod and squashes the following commits: 4ad93d5 [Octavian Geagla] [SPARK-7576] [MLLIB] Incorporate code review feedback. f7be7ad [Octavian Geagla] [SPARK-7576] [MLLIB] Add spark.ml user guide doc/example for ElementwiseProduct.
* [SPARK-7577] [ML] [DOC] add bucketizer docXusen Yin2015-05-281-0/+86
| | | | | | | | | | | | | CC jkbradley Author: Xusen Yin <yinxusen@gmail.com> Closes #6451 from yinxusen/SPARK-7577 and squashes the following commits: e2dc32e [Xusen Yin] rename colums e350e49 [Xusen Yin] add all demos 006ddf1 [Xusen Yin] add java test 3238481 [Xusen Yin] add bucketizer
* [SPARK-7578] [ML] [DOC] User guide for spark.ml Normalizer, IDF, StandardScalerJoseph K. Bradley2015-05-211-26/+198
| | | | | | | | | | | | | | | | Added user guide sections with code examples. Also added small Java unit tests to test Java example in guide. CC: mengxr Author: Joseph K. Bradley <joseph@databricks.com> Closes #6127 from jkbradley/feature-guide-2 and squashes the following commits: cd47f4b [Joseph K. Bradley] Updated based on code review f16bcec [Joseph K. Bradley] Fixed merge issues and update Python examples print calls for Python 3 0a862f9 [Joseph K. Bradley] Added Normalizer, StandardScaler to ml-features doc, plus small Java unit tests a21c2d6 [Joseph K. Bradley] Updated ml-features.md with IDF
* [SPARK-7585] [ML] [DOC] VectorIndexer user guide sectionJoseph K. Bradley2015-05-211-0/+83
| | | | | | | | | | | | | Added VectorIndexer section to ML user guide. Also added javaCategoryMaps() method and Java unit test for it. CC: mengxr Author: Joseph K. Bradley <joseph@databricks.com> Closes #6255 from jkbradley/vector-indexer-guide and squashes the following commits: dbb8c4c [Joseph K. Bradley] simplified VectorIndexerModel.javaCategoryMaps f692084 [Joseph K. Bradley] Added VectorIndexer section to ML user guide. Also added javaCategoryMaps() method and Java unit test for it.
* [SPARK-7579] [ML] [DOC] User guide update for OneHotEncoderSandy Ryza2015-05-201-0/+95
| | | | | | | | Author: Sandy Ryza <sandy@cloudera.com> Closes #6126 from sryza/sandy-spark-7579 and squashes the following commits: 5af803d [Sandy Ryza] SPARK-7579 [MLLIB] User guide update for OneHotEncoder
* [SPARK-7586] [ML] [DOC] Add docs of Word2Vec in ml packageXusen Yin2015-05-191-0/+89
| | | | | | | | | | | | | | | | | CC jkbradley. JIRA [issue](https://issues.apache.org/jira/browse/SPARK-7586). Author: Xusen Yin <yinxusen@gmail.com> Closes #6181 from yinxusen/SPARK-7586 and squashes the following commits: 77014c5 [Xusen Yin] comment fix 57a4c07 [Xusen Yin] small fix for docs 1178c8f [Xusen Yin] remove the correctness check in java suite 1c3f389 [Xusen Yin] delete sbt commit 1af152b [Xusen Yin] check python example code 1b5369e [Xusen Yin] add docs of word2vec