aboutsummaryrefslogtreecommitdiff
path: root/examples/src/main/python
Commit message (Collapse)AuthorAgeFilesLines
* [SPARK-14635][ML] Documentation and Examples for TF-IDF only refer to HashingTFYuhao Yang2016-04-201-0/+2
| | | | | | | | | | | | | | ## What changes were proposed in this pull request? Currently, the docs for TF-IDF only refer to using HashingTF with IDF. However, CountVectorizer can also be used. We should probably amend the user guide and examples to show this. ## How was this patch tested? unit tests and doc generation Author: Yuhao Yang <hhbyyh@gmail.com> Closes #12454 from hhbyyh/tfdoc.
* [SPARK-14515][DOC] Add python example for ChiSqSelectorZheng RuiFeng2016-04-181-0/+44
| | | | | | | | | | | | ## What changes were proposed in this pull request? Add the missing python example for ChiSqSelector ## How was this patch tested? manual tests Author: Zheng RuiFeng <ruifengz@foxmail.com> Closes #12283 from zhengruifeng/chi2_pe.
* [SPARK-13089][ML] [Doc] spark.ml Naive Bayes user guide and examplesYuhao Yang2016-04-131-0/+53
| | | | | | | | | | jira: https://issues.apache.org/jira/browse/SPARK-13089 Add section in ml-classification.md for NaiveBayes DataFrame-based API, plus example code (using include_example to clip code from examples/ folder files). Author: Yuhao Yang <hhbyyh@gmail.com> Closes #11015 from hhbyyh/naiveBayesDoc.
* [SPARK-14509][DOC] Add python CountVectorizerExampleZheng RuiFeng2016-04-131-0/+44
| | | | | | | | | | | | ## What changes were proposed in this pull request? Add python CountVectorizerExample ## How was this patch tested? manual tests Author: Zheng RuiFeng <ruifengz@foxmail.com> Closes #11917 from zhengruifeng/cv_pe.
* [SPARK-14339][DOC] Add python examples for DCT,MinMaxScaler,MaxAbsScalerZheng RuiFeng2016-04-093-0/+131
| | | | | | | | | | | | ## What changes were proposed in this pull request? add three python examples ## How was this patch tested? manual tests Author: Zheng RuiFeng <ruifengz@foxmail.com> Closes #12063 from zhengruifeng/dct_pe.
* [SPARK-13874][DOC] Remove docs of streaming-akka, streaming-zeromq, ↵Shixiong Zhu2016-03-261-59/+0
| | | | | | | | | | | | | | | | | | streaming-mqtt and streaming-twitter ## What changes were proposed in this pull request? This PR removes all docs about the old streaming-akka, streaming-zeromq, streaming-mqtt and streaming-twitter projects since I have already copied them to https://github.com/spark-packages Also remove mqtt_wordcount.py that I forgot to remove previously. ## How was this patch tested? Jenkins PR Build. Author: Shixiong Zhu <shixiong@databricks.com> Closes #11824 from zsxwing/remove-doc.
* [SPARK-13017][DOCS] Replace example code in mllib-feature-extraction.md ↵Xin Ren2016-03-245-0/+255
| | | | | | | | | | | | | | | | | | | | | using include_example Replace example code in mllib-feature-extraction.md using include_example https://issues.apache.org/jira/browse/SPARK-13017 The example code in the user guide is embedded in the markdown and hence it is not easy to test. It would be nice to automatically test them. This JIRA is to discuss options to automate example code testing and see what we can do in Spark 1.6. Goal is to move actual example code to spark/examples and test compilation in Jenkins builds. Then in the markdown, we can reference part of the code to show in the user guide. This requires adding a Jekyll tag that is similar to https://github.com/jekyll/jekyll/blob/master/lib/jekyll/tags/include.rb, e.g., called include_example. `{% include_example scala/org/apache/spark/examples/mllib/TFIDFExample.scala %}` Jekyll will find `examples/src/main/scala/org/apache/spark/examples/mllib/TFIDFExample.scala` and pick code blocks marked "example" and replace code block in `{% highlight %}` in the markdown. See more sub-tasks in parent ticket: https://issues.apache.org/jira/browse/SPARK-11337 Author: Xin Ren <iamshrek@126.com> Closes #11142 from keypointt/SPARK-13017.
* [SPARK-13019][DOCS] fix for scala-2.10 build: Replace example code in ↵Xin Ren2016-03-246-0/+277
| | | | | | | | | | | | | | | | | | | | | | mllib-statistics.md using include_example ## What changes were proposed in this pull request? This PR for ticket SPARK-13019 is based on previous PR(https://github.com/apache/spark/pull/11108). Since PR(https://github.com/apache/spark/pull/11108) is breaking scala-2.10 build, more work is needed to fix build errors. What I did new in this PR is adding keyword argument for 'fractions': ` val approxSample = data.sampleByKey(withReplacement = false, fractions = fractions)` ` val exactSample = data.sampleByKeyExact(withReplacement = false, fractions = fractions)` I reopened ticket on JIRA but sorry I don't know how to reopen a GitHub pull request, so I just submitting a new pull request. ## How was this patch tested? Manual build testing on local machine, build based on scala-2.10. Author: Xin Ren <iamshrek@126.com> Closes #11901 from keypointt/SPARK-13019.
* Revert "[SPARK-13019][DOCS] Replace example code in mllib-statistics.md ↵Xiangrui Meng2016-03-216-277/+0
| | | | | | using include_example" This reverts commit 1af8de200c4d3357bcb09e7bbc6deece00e885f2.
* [SPARK-13019][DOCS] Replace example code in mllib-statistics.md using ↵Xin Ren2016-03-216-0/+277
| | | | | | | | | | | | | | | | | | | | include_example https://issues.apache.org/jira/browse/SPARK-13019 The example code in the user guide is embedded in the markdown and hence it is not easy to test. It would be nice to automatically test them. This JIRA is to discuss options to automate example code testing and see what we can do in Spark 1.6. Goal is to move actual example code to spark/examples and test compilation in Jenkins builds. Then in the markdown, we can reference part of the code to show in the user guide. This requires adding a Jekyll tag that is similar to https://github.com/jekyll/jekyll/blob/master/lib/jekyll/tags/include.rb, e.g., called include_example. `{% include_example scala/org/apache/spark/examples/mllib/SummaryStatisticsExample.scala %}` Jekyll will find `examples/src/main/scala/org/apache/spark/examples/mllib/SummaryStatisticsExample.scala` and pick code blocks marked "example" and replace code block in `{% highlight %}` in the markdown. See more sub-tasks in parent ticket: https://issues.apache.org/jira/browse/SPARK-11337 Author: Xin Ren <iamshrek@126.com> Closes #11108 from keypointt/SPARK-13019.
* [SPARK-13814] [PYSPARK] Delete unnecessary imports in python examples filesZheng RuiFeng2016-03-1115-27/+0
| | | | | | | | | | | | | | | | JIRA: https://issues.apache.org/jira/browse/SPARK-13814 ## What changes were proposed in this pull request? delete unnecessary imports in python examples files ## How was this patch tested? manual tests Author: Zheng RuiFeng <ruifengz@foxmail.com> Closes #11651 from zhengruifeng/del_import_pe.
* [SPARK-13672][ML] Add python examples of BisectingKMeans in ML and MLLIBZheng RuiFeng2016-03-112-0/+107
| | | | | | | | | | | | | | | | JIRA: https://issues.apache.org/jira/browse/SPARK-13672 ## What changes were proposed in this pull request? add two python examples of BisectingKMeans for ml and mllib ## How was this patch tested? manual tests Author: Zheng RuiFeng <ruifengz@foxmail.com> Closes #11515 from zhengruifeng/mllib_bkm_pe.
* [SPARK-13706][ML] Add Python Example for Train Validation SplitJeremyNixon2016-03-101-0/+68
| | | | | | | | | | | | | | | | ## What changes were proposed in this pull request? This pull request adds a python example for train validation split. ## How was this patch tested? This was style tested through lint-python, generally tested with ./dev/run-tests, and run in notebook and shell environments. It was viewed in docs locally with jekyll serve. This contribution is my original work and I license it to Spark under its open source license. Author: JeremyNixon <jnixon2@gmail.com> Closes #11547 from JeremyNixon/tvs_example.
* [MINOR] Fix typos in comments and testcase name of codeDongjoon Hyun2016-03-033-3/+3
| | | | | | | | | | | | | | ## What changes were proposed in this pull request? This PR fixes typos in comments and testcase name of code. ## How was this patch tested? manual. Author: Dongjoon Hyun <dongjoon@apache.org> Closes #11481 from dongjoon-hyun/minor_fix_typos_in_code.
* [SPARK-13013][DOCS] Replace example code in mllib-clustering.md using ↵Xin Ren2016-03-035-0/+270
| | | | | | | | | | | | | | | | | | | | | include_example Replace example code in mllib-clustering.md using include_example https://issues.apache.org/jira/browse/SPARK-13013 The example code in the user guide is embedded in the markdown and hence it is not easy to test. It would be nice to automatically test them. This JIRA is to discuss options to automate example code testing and see what we can do in Spark 1.6. Goal is to move actual example code to spark/examples and test compilation in Jenkins builds. Then in the markdown, we can reference part of the code to show in the user guide. This requires adding a Jekyll tag that is similar to https://github.com/jekyll/jekyll/blob/master/lib/jekyll/tags/include.rb, e.g., called include_example. `{% include_example scala/org/apache/spark/examples/mllib/KMeansExample.scala %}` Jekyll will find `examples/src/main/scala/org/apache/spark/examples/mllib/KMeansExample.scala` and pick code blocks marked "example" and replace code block in `{% highlight %}` in the markdown. See more sub-tasks in parent ticket: https://issues.apache.org/jira/browse/SPARK-11337 Author: Xin Ren <iamshrek@126.com> Closes #11116 from keypointt/SPARK-13013.
* [SPARK-11381][DOCS] Replace example code in mllib-linear-methods.md using ↵Dongjoon Hyun2016-02-264-0/+217
| | | | | | | | | | | | | | | | | | | | | include_example ## What changes were proposed in this pull request? This PR replaces example codes in `mllib-linear-methods.md` using `include_example` by doing the followings: * Extracts the example codes(Scala,Java,Python) as files in `example` module. * Merges some dialog-style examples into a single file. * Hide redundant codes in HTML for the consistency with other docs. ## How was the this patch tested? manual test. This PR can be tested by document generations, `SKIP_API=1 jekyll build`. Author: Dongjoon Hyun <dongjoon@apache.org> Closes #11320 from dongjoon-hyun/SPARK-11381.
* [SPARK-10759][ML] update cross validator with include_exampleJeremyNixon2016-02-231-1/+4
| | | | | | | | This pull request uses {%include_example%} to add an example for the python cross validator to ml-guide. Author: JeremyNixon <jnixon2@gmail.com> Closes #11240 from JeremyNixon/pipeline_include_example.
* [SPARK-13257][IMPROVEMENT] Refine naive Bayes example by checking model ↵movelikeriver2016-02-221-2/+15
| | | | | | | | | | after loading it Refine naive Bayes example by checking model after loading it Author: movelikeriver <mars.lenjoy@gmail.com> Closes #11125 from movelikeriver/naive_bayes.
* [SPARK-13012][DOCUMENTATION] Replace example code in ml-guide.md using ↵Devaraj K2016-02-222-0/+151
| | | | | | | | | | include_example Replaced example code in ml-guide.md using include_example Author: Devaraj K <devaraj@apache.org> Closes #11053 from devaraj-kavali/SPARK-13012.
* [SPARK-12247][ML][DOC] Documentation for spark.ml's ALS and collaborative ↵BenFradet2016-02-161-0/+57
| | | | | | | | | | filtering in general This documents the implementation of ALS in `spark.ml` with example code in scala, java and python. Author: BenFradet <benjamin.fradet@gmail.com> Closes #10411 from BenFradet/SPARK-12247.
* [SPARK-12960] [PYTHON] Some examples are missing support for python2Mark Grover2016-01-211-0/+1
| | | | | | | | Without importing the print_function, the lines later on like ```print("Usage: direct_kafka_wordcount.py <broker_list> <topic>", file=sys.stderr)``` fail when using python2.*. Import fixes that problem and doesn't break anything on python3 either. Author: Mark Grover <mark@apache.org> Closes #10872 from markgrover/python2_compat.
* [SPARK-12603][MLLIB] PySpark MLlib GaussianMixtureModel should support ↵Yanbo Liang2016-01-111-0/+4
| | | | | | | | | | single instance predict/predictSoft PySpark MLlib ```GaussianMixtureModel``` should support single instance ```predict/predictSoft``` just like Scala do. Author: Yanbo Liang <ybliang8@gmail.com> Closes #10552 from yanboliang/spark-12603.
* removed lambda from sortByKey()Udo Klein2016-01-111-1/+1
| | | | | | | | According to the documentation the sortByKey method does not take a lambda as an argument, thus the example is flawed. Removed the argument completely as this will default to ascending sort. Author: Udo Klein <git@blinkenlight.net> Closes #10640 from udoklein/patch-1.
* fixed numVertices in transitive closure exampleUdo Klein2016-01-081-2/+2
| | | | | | Author: Udo Klein <git@blinkenlight.net> Closes #10642 from udoklein/patch-2.
* [SPARK-12429][STREAMING][DOC] Add Accumulator and Broadcast example for ↵Shixiong Zhu2015-12-221-1/+29
| | | | | | | | | | Streaming This PR adds Scala, Java and Python examples to show how to use Accumulator and Broadcast in Spark Streaming to support checkpointing. Author: Shixiong Zhu <shixiong@databricks.com> Closes #10385 from zsxwing/accumulator-broadcast-example.
* [SPARK-9057][STREAMING] Twitter example joining to static RDD of word ↵Jeff L2015-12-181-0/+77
| | | | | | | | | | sentiment values Example of joining a static RDD of word sentiments to a streaming RDD of Tweets in order to demo the usage of the transform() method. Author: Jeff L <sha0lin@alumni.carnegiemellon.edu> Closes #8431 from Agent007/SPARK-9057.
* [SPARK-12199][DOC] Follow-up: Refine example code in ml-features.mdXusen Yin2015-12-121-3/+3
| | | | | | | | | | | | https://issues.apache.org/jira/browse/SPARK-12199 Follow-up PR of SPARK-11551. Fix some errors in ml-features.md mengxr Author: Xusen Yin <yinxusen@gmail.com> Closes #10193 from yinxusen/SPARK-12199.
* [SPARK-11978][ML] Move dataset_example.py to examples/ml and rename to ↵Yanbo Liang2015-12-111-22/+34
| | | | | | | | | | | | | | dataframe_example.py Since ```Dataset``` has a new meaning in Spark 1.6, we should rename it to avoid confusion. #9873 finished the work of Scala example, here we focus on the Python one. Move dataset_example.py to ```examples/ml``` and rename to ```dataframe_example.py```. BTW, fix minor missing issues of #9873. cc mengxr Author: Yanbo Liang <ybliang8@gmail.com> Closes #9957 from yanboliang/SPARK-11978.
* [SPARK-11713] [PYSPARK] [STREAMING] Initial RDD updateStateByKey for PySparkBryan Cutler2015-12-101-1/+4
| | | | | | | | Adding ability to define an initial state RDD for use with updateStateByKey PySpark. Added unit test and changed stateful_network_wordcount example to use initial RDD. Author: Bryan Cutler <bjcutler@us.ibm.com> Closes #10082 from BryanCutler/initial-rdd-updateStateByKey-SPARK-11713.
* [SPARK-11551][DOC] Replace example code in ml-features.md using include_exampleXusen Yin2015-12-0915-0/+635
| | | | | | | | | PR on behalf of somideshmukh, thanks! Author: Xusen Yin <yinxusen@gmail.com> Author: somideshmukh <somilde@us.ibm.com> Closes #10219 from yinxusen/SPARK-11551.
* [SPARK-12159][ML] Add user guide section for IndexToString transformerBenFradet2015-12-081-0/+45
| | | | | | | | Documentation regarding the `IndexToString` label transformer with code snippets in Scala/Java/Python. Author: BenFradet <benjamin.fradet@gmail.com> Closes #10166 from BenFradet/SPARK-12159.
* [SPARK-11551][DOC][EXAMPLE] Revert PR #10002Cheng Lian2015-12-0815-629/+0
| | | | | | | | | | This reverts PR #10002, commit 78209b0ccaf3f22b5e2345dfb2b98edfdb746819. The original PR wasn't tested on Jenkins before being merged. Author: Cheng Lian <lian@databricks.com> Closes #10200 from liancheng/revert-pr-10002.
* [SPARK-11958][SPARK-11957][ML][DOC] SQLTransformer user guide and example codeYanbo Liang2015-12-071-0/+40
| | | | | | | | Add ```SQLTransformer``` user guide, example code and make Scala API doc more clear. Author: Yanbo Liang <ybliang8@gmail.com> Closes #10006 from yanboliang/spark-11958.
* [SPARK-11551][DOC][EXAMPLE] Replace example code in ml-features.md using ↵somideshmukh2015-12-0715-0/+629
| | | | | | | | | | | | | include_example Made new patch contaning only markdown examples moved to exmaple/folder. Ony three java code were not shfted since they were contaning compliation error ,these classes are 1)StandardScale 2)NormalizerExample 3)VectorIndexer Author: Xusen Yin <yinxusen@gmail.com> Author: somideshmukh <somilde@us.ibm.com> Closes #10002 from somideshmukh/SomilBranch1.33.
* [SPARK-11975][ML] Remove duplicate mllib example (DT/RF/GBT in Java/Python)Yanbo Liang2015-11-303-311/+0
| | | | | | | | | | Remove duplicate mllib example (DT/RF/GBT in Java/Python). Since we have tutorial code for DT/RF/GBT classification/regression in Scala/Java/Python and example applications for DT/RF/GBT in Scala, so we mark these as duplicated and remove them. mengxr Author: Yanbo Liang <ybliang8@gmail.com> Closes #9954 from yanboliang/SPARK-11975.
* [SPARK-11952][ML] Remove duplicate ml examplesYanbo Liang2015-11-243-235/+0
| | | | | | | | Remove duplicate ml examples (only for ml). mengxr Author: Yanbo Liang <ybliang8@gmail.com> Closes #9933 from yanboliang/SPARK-11685.
* [SPARK-11920][ML][DOC] ML LinearRegression should use correct dataset in ↵Yanbo Liang2015-11-231-1/+2
| | | | | | | | | | | | examples and user guide doc ML ```LinearRegression``` use ```data/mllib/sample_libsvm_data.txt``` as dataset in examples and user guide doc, but it's actually classification dataset rather than regression dataset. We should use ```data/mllib/sample_linear_regression_data.txt``` instead. The deeper causes is that ```LinearRegression``` with "normal" solver can not solve this dataset correctly, may be due to the ill condition and unreasonable label. This issue has been reported at [SPARK-11918](https://issues.apache.org/jira/browse/SPARK-11918). It will confuse users if they run the example code but get exception, so we should make this change which can clearly illustrate the usage of ```LinearRegression``` algorithm. Author: Yanbo Liang <ybliang8@gmail.com> Closes #9905 from yanboliang/spark-11920.
* [SPARK-11549][DOCS] Replace example code in mllib-evaluation-metrics.md ↵Vikas Nelamangala2015-11-205-0/+299
| | | | | | | | using include_example Author: Vikas Nelamangala <vikasnelamangala@Vikass-MacBook-Pro.local> Closes #9689 from vikasnp/master.
* rmse was wrongly calculatedViveka Kulharia2015-11-181-1/+1
| | | | | | | | It was multiplying with U instaed of dividing by U Author: Viveka Kulharia <vivkul@iitk.ac.in> Closes #9771 from vivkul/patch-1.
* [SPARK-11728] Replace example code in ml-ensembles.md using include_exampleXusen Yin2015-11-174-0/+302
| | | | | | | | | | JIRA issue https://issues.apache.org/jira/browse/SPARK-11728. The ml-ensembles.md file contains `OneVsRestExample`. Instead of writing new code files of two `OneVsRestExample`s, I use two existing files in the examples directory, they are `OneVsRestExample.scala` and `JavaOneVsRestExample.scala`. Author: Xusen Yin <yinxusen@gmail.com> Closes #9716 from yinxusen/SPARK-11728.
* [SPARK-11729] Replace example code in ml-linear-methods.md using include_exampleXusen Yin2015-11-172-0/+88
| | | | | | | | JIRA link: https://issues.apache.org/jira/browse/SPARK-11729 Author: Xusen Yin <yinxusen@gmail.com> Closes #9713 from yinxusen/SPARK-11729.
* [SPARK-11723][ML][DOC] Use LibSVM data source rather than ↵Yanbo Liang2015-11-136-17/+12
| | | | | | | | | | | | | | | | MLUtils.loadLibSVMFile to load DataFrame Use LibSVM data source rather than MLUtils.loadLibSVMFile to load DataFrame, include: * Use libSVM data source for all example codes under examples/ml, and remove unused import. * Use libSVM data source for user guides under ml-*** which were omitted by #8697. * Fix bug: We should use ```sqlContext.read().format("libsvm").load(path)``` at Java side, but the API doc and user guides misuse as ```sqlContext.read.format("libsvm").load(path)```. * Code cleanup. mengxr Author: Yanbo Liang <ybliang8@gmail.com> Closes #9690 from yanboliang/spark-11723.
* [SPARK-11445][DOCS] Replaced example code in mllib-ensembles.md using ↵Rishabh Bhardwaj2015-11-134-0/+231
| | | | | | | | | | | include_example I have made the required changes and tested. Kindly review the changes. Author: Rishabh Bhardwaj <rbnext29@gmail.com> Closes #9407 from rishabhbhardwaj/SPARK-11445.
* [SPARK-11629][ML][PYSPARK][DOC] Python example code for Multilayer ↵Yanbo Liang2015-11-121-0/+56
| | | | | | | | | | Perceptron Classification Add Python example code for Multilayer Perceptron Classification, and make example code in user guide document testable. mengxr Author: Yanbo Liang <ybliang8@gmail.com> Closes #9594 from yanboliang/spark-11629.
* [SPARK-11382] Replace example code in mllib-decision-tree.md using ↵Xusen Yin2015-11-103-0/+112
| | | | | | | | | | | | include_example https://issues.apache.org/jira/browse/SPARK-11382 B.T.W. I fix an error in naive_bayes_example.py. Author: Xusen Yin <yinxusen@gmail.com> Closes #9596 from yinxusen/SPARK-11382.
* [SPARK-11548][DOCS] Replaced example code in ↵Rishabh Bhardwaj2015-11-091-0/+54
| | | | | | | | | | mllib-collaborative-filtering.md using include_example Kindly review the changes. Author: Rishabh Bhardwaj <rbnext29@gmail.com> Closes #9519 from rishabhbhardwaj/SPARK-11337.
* [SPARK-11552][DOCS][Replaced example code in ml-decision-tree.md using ↵sachin aggarwal2015-11-092-0/+151
| | | | | | | | | | include_example] I have tested it on my local, it is working fine, please review Author: sachin aggarwal <different.sachin@gmail.com> Closes #9539 from agsachin/SPARK-11552-real.
* [SPARK-10689][ML][DOC] User guide and example code for AFTSurvivalRegressionYanbo Liang2015-11-091-0/+51
| | | | | | | | Add user guide and example code for ```AFTSurvivalRegression```. Author: Yanbo Liang <ybliang8@gmail.com> Closes #9491 from yanboliang/spark-10689.
* [SPARK-11380][DOCS] Replace example code in mllib-frequent-pattern-mining.md ↵Pravin Gadakh2015-11-041-0/+33
| | | | | | | | | using include_example Author: Pravin Gadakh <pravingadakh177@gmail.com> Author: Pravin Gadakh <prgadakh@in.ibm.com> Closes #9340 from pravingadakh/SPARK-11380.
* [SPARK-11383][DOCS] Replaced example code in ↵Rishabh Bhardwaj2015-11-022-0/+112
| | | | | | | | | | | mllib-naive-bayes.md/mllib-isotonic-regression.md using include_example I have made the required changes in mllib-naive-bayes.md/mllib-isotonic-regression.md and also verified them. Kindle Review it. Author: Rishabh Bhardwaj <rbnext29@gmail.com> Closes #9353 from rishabhbhardwaj/SPARK-11383.