aboutsummaryrefslogtreecommitdiff
path: root/examples
Commit message (Collapse)AuthorAgeFilesLines
...
* [SPARK-17338][SQL] add global temp viewWenchen Fan2016-10-103-2/+78
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ## What changes were proposed in this pull request? Global temporary view is a cross-session temporary view, which means it's shared among all sessions. Its lifetime is the lifetime of the Spark application, i.e. it will be automatically dropped when the application terminates. It's tied to a system preserved database `global_temp`(configurable via SparkConf), and we must use the qualified name to refer a global temp view, e.g. SELECT * FROM global_temp.view1. changes for `SessionCatalog`: 1. add a new field `gloabalTempViews: GlobalTempViewManager`, to access the shared global temp views, and the global temp db name. 2. `createDatabase` will fail if users wanna create `global_temp`, which is system preserved. 3. `setCurrentDatabase` will fail if users wanna set `global_temp`, which is system preserved. 4. add `createGlobalTempView`, which is used in `CreateViewCommand` to create global temp views. 5. add `dropGlobalTempView`, which is used in `CatalogImpl` to drop global temp view. 6. add `alterTempViewDefinition`, which is used in `AlterViewAsCommand` to update the view definition for local/global temp views. 7. `renameTable`/`dropTable`/`isTemporaryTable`/`lookupRelation`/`getTempViewOrPermanentTableMetadata`/`refreshTable` will handle global temp views. changes for SQL commands: 1. `CreateViewCommand`/`AlterViewAsCommand` is updated to support global temp views 2. `ShowTablesCommand` outputs a new column `database`, which is used to distinguish global and local temp views. 3. other commands can also handle global temp views if they call `SessionCatalog` APIs which accepts global temp views, e.g. `DropTableCommand`, `AlterTableRenameCommand`, `ShowColumnsCommand`, etc. changes for other public API 1. add a new method `dropGlobalTempView` in `Catalog` 2. `Catalog.findTable` can find global temp view 3. add a new method `createGlobalTempView` in `Dataset` ## How was this patch tested? new tests in `SQLViewSuite` Author: Wenchen Fan <wenchen@databricks.com> Closes #14897 from cloud-fan/global-temp-view.
* [SPARK-17239][ML][DOC] Update user guide for multiclass logistic regressionsethah2016-10-056-0/+197
| | | | | | | | | | | | ## What changes were proposed in this pull request? Updates user guide to reflect that LogisticRegression now supports multiclass. Also adds new examples to show multiclass training. ## How was this patch tested? Ran locally using spark-submit, run-example, and copy/paste from user guide into shells. Generated docs and verified correct output. Author: sethah <seth.hendrickson16@gmail.com> Closes #15349 from sethah/SPARK-17239.
* [SPARK-14525][SQL] Make DataFrameWrite.save work for jdbcJustin Pihony2016-09-264-0/+66
| | | | | | | | | | | | | | | | | | | | | ## What changes were proposed in this pull request? This change modifies the implementation of DataFrameWriter.save such that it works with jdbc, and the call to jdbc merely delegates to save. ## How was this patch tested? This was tested via unit tests in the JDBCWriteSuite, of which I added one new test to cover this scenario. ## Additional details rxin This seems to have been most recently touched by you and was also commented on in the JIRA. This contribution is my original work and I license the work to the project under the project's open source license. Author: Justin Pihony <justin.pihony@gmail.com> Author: Justin Pihony <justin.pihony@typesafe.com> Closes #12601 from JustinPihony/jdbc_reconciliation.
* [SPARK-16992][PYSPARK] use map comprehension in docGaetan Semet2016-09-123-4/+4
| | | | | | | | Code is equivalent, but map comprehency is most of the time faster than a map. Author: Gaetan Semet <gaetan@xeberon.net> Closes #14863 from Stibbons/map_comprehension.
* [SPARK-17347][SQL][EXAMPLES] Encoder in Dataset example has incorrect typeCodingCat2016-09-031-1/+1
| | | | | | | | | | | | | | ## What changes were proposed in this pull request? We propose to fix the Encoder type in the Dataset example ## How was this patch tested? The PR will be tested with the current unit test cases Author: CodingCat <zhunansjtu@gmail.com> Closes #14901 from CodingCat/SPARK-17347.
* [SPARK-17001][ML] Enable standardScaler to standardize sparse vectors when ↵Sean Owen2016-08-272-4/+0
| | | | | | | | | | | | | | | | withMean=True ## What changes were proposed in this pull request? Allow centering / mean scaling of sparse vectors in StandardScaler, if requested. This is for compatibility with `VectorAssembler` in common usages. ## How was this patch tested? Jenkins tests, including new caes to reflect the new behavior. Author: Sean Owen <sowen@cloudera.com> Closes #14663 from srowen/SPARK-17001.
* [MINOR][BUILD] Fix Java CheckStyle ErrorWeiqing Yang2016-08-241-5/+6
| | | | | | | | | | | | | | | | | | | | | | | | | ## What changes were proposed in this pull request? As Spark 2.0.1 will be released soon (mentioned in the spark dev mailing list), besides the critical bugs, it's better to fix the code style errors before the release. Before: ``` ./dev/lint-java Checkstyle checks failed at following occurrences: [ERROR] src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java:[525] (sizes) LineLength: Line is longer than 100 characters (found 119). [ERROR] src/main/java/org/apache/spark/examples/sql/streaming/JavaStructuredNetworkWordCount.java:[64] (sizes) LineLength: Line is longer than 100 characters (found 103). ``` After: ``` ./dev/lint-java Using `mvn` from path: /usr/local/bin/mvn Checkstyle checks passed. ``` ## How was this patch tested? Manual. Author: Weiqing Yang <yangweiqing001@gmail.com> Closes #14768 from Sherry302/fixjavastyle.
* [SPARKR][EXAMPLE] change example APP namewm624@hotmail.com2016-08-201-1/+1
| | | | | | | | | | | | | | | | | | | | | ## What changes were proposed in this pull request? (Please fill in changes proposed in this fix) For R SQL example, appname is "MyApp". While examples in scala, Java and python, the appName is "x Spark SQL basic example". I made the R example consistent with other examples. ## How was this patch tested? (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) Manual test (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) Author: wm624@hotmail.com <wm624@hotmail.com> Closes #14703 from wangmiao1981/example.
* [SPARK-16886][EXAMPLES][DOC] Fix some examples to be consistent and ↵hyukjinkwon2016-08-114-21/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | indentation in documentation ## What changes were proposed in this pull request? Originally this PR was based on #14491 but I realised that fixing examples are more sensible rather than comments. This PR fixes three things below: - Fix two wrong examples in `structured-streaming-programming-guide.md`. Loading via `read.load(..)` without `as` will be `Dataset<Row>` not `Dataset<String>` in Java. - Fix indentation across `structured-streaming-programming-guide.md`. Python has 4 spaces and Scala and Java have double spaces. These are inconsistent across the examples. - Fix `StructuredNetworkWordCountWindowed` and `StructuredNetworkWordCount` in Java and Scala to initially load `DataFrame` and `Dataset<Row>` to be consistent with the comments and some examples in `structured-streaming-programming-guide.md` and to match Scala and Java to Python one (Python one loads it as `DataFrame` initially). ## How was this patch tested? N/A Closes https://github.com/apache/spark/pull/14491 Author: hyukjinkwon <gurwls223@gmail.com> Author: Ganesh Chand <ganeshchand@Ganeshs-MacBook-Pro-2.local> Closes #14564 from HyukjinKwon/SPARK-16886.
* [SPARK-16945] Fix Java Lint errorsWeiqing Yang2016-08-081-1/+2
| | | | | | | | | | | | | | | | | ## What changes were proposed in this pull request? This PR is to fix the minor Java linter errors as following: [ERROR] src/main/java/org/apache/spark/sql/catalyst/expressions/VariableLengthRowBasedKeyValueBatch.java:[42,10] (modifier) RedundantModifier: Redundant 'final' modifier. [ERROR] src/main/java/org/apache/spark/sql/catalyst/expressions/VariableLengthRowBasedKeyValueBatch.java:[97,10] (modifier) RedundantModifier: Redundant 'final' modifier. ## How was this patch tested? Manual test. dev/lint-java Using `mvn` from path: /usr/local/bin/mvn Checkstyle checks passed. Author: Weiqing Yang <yangweiqing001@gmail.com> Closes #14532 from Sherry302/master.
* [SPARK-16421][EXAMPLES][ML] Improve ML Example OutputsBryan Cutler2016-08-0582-188/+427
| | | | | | | | | | | | ## What changes were proposed in this pull request? Improve example outputs to better reflect the functionality that is being presented. This mostly consisted of modifying what was printed at the end of the example, such as calling show() with truncate=False, but sometimes required minor tweaks in the example data to get relevant output. Explicitly set parameters when they are used as part of the example. Fixed Java examples that failed to run because of using old-style MLlib Vectors or problem with schema. Synced examples between different APIs. ## How was this patch tested? Ran each example for Scala, Python, and Java and made sure output was legible on a terminal of width 100. Author: Bryan Cutler <cutlerb@gmail.com> Closes #14308 from BryanCutler/ml-examples-improve-output-SPARK-16260.
* [SPARK-16816] Modify java example which is also reflect in documentation exmaplesandy2016-08-021-0/+16
| | | | | | | | | | | | | | ## What changes were proposed in this pull request? Modify java example which is also reflect in document. ## How was this patch tested? run test cases. Author: sandy <phalodi@gmail.com> Closes #14436 from phalodi/SPARK-16816.
* [SPARK-16558][EXAMPLES][MLLIB] examples/mllib/LDAExample should use MLVector ↵Xusen Yin2016-08-021-2/+3
| | | | | | | | | | | | | | | | instead of MLlib Vector ## What changes were proposed in this pull request? mllib.LDAExample uses ML pipeline and MLlib LDA algorithm. The former transforms original data into MLVector format, while the latter uses MLlibVector format. ## How was this patch tested? Test manually. Author: Xusen Yin <yinxusen@gmail.com> Closes #14212 from yinxusen/SPARK-16558.
* [SPARK-16734][EXAMPLES][SQL] Revise examples of all language bindingsCheng Lian2016-08-028-75/+123
| | | | | | | | | | | | | | ## What changes were proposed in this pull request? This PR makes various minor updates to examples of all language bindings to make sure they are consistent with each other. Some typos and missing parts (JDBC example in Scala/Java/Python) are also fixed. ## How was this patch tested? Manually tested. Author: Cheng Lian <lian@databricks.com> Closes #14368 from liancheng/revise-examples.
* [SPARK-16800][EXAMPLES][ML] Fix Java examples that fail to run due to exceptionBryan Cutler2016-07-3012-38/+49
| | | | | | | | | | | | ## What changes were proposed in this pull request? Some Java examples are using mllib.linalg.Vectors instead of ml.linalg.Vectors and causes an exception when run. Also there are some Java examples that incorrectly specify data types in the schema, also causing an exception. ## How was this patch tested? Ran corrected examples locally Author: Bryan Cutler <cutlerb@gmail.com> Closes #14405 from BryanCutler/java-examples-ml.Vectors-fix-SPARK-16800.
* [SPARK-16694][CORE] Use for/foreach rather than map for Unit expressions ↵Sean Owen2016-07-3020-101/+80
| | | | | | | | | | | | | | | | whose side effects are required ## What changes were proposed in this pull request? Use foreach/for instead of map where operation requires execution of body, not actually defining a transformation ## How was this patch tested? Jenkins Author: Sean Owen <sowen@cloudera.com> Closes #14332 from srowen/SPARK-16694.
* [SPARK-16380][EXAMPLES] Update SQL examples and programming guide for Python ↵Cheng Lian2016-07-236-86/+447
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | language binding This PR is based on PR #14098 authored by wangmiao1981. ## What changes were proposed in this pull request? This PR replaces the original Python Spark SQL example file with the following three files: - `sql/basic.py` Demonstrates basic Spark SQL features. - `sql/datasource.py` Demonstrates various Spark SQL data sources. - `sql/hive.py` Demonstrates Spark SQL Hive interaction. This PR also removes hard-coded Python example snippets in the SQL programming guide by extracting snippets from the above files using the `include_example` Liquid template tag. ## How was this patch tested? Manually tested. Author: wm624@hotmail.com <wm624@hotmail.com> Author: Cheng Lian <lian@databricks.com> Closes #14317 from liancheng/py-examples-update.
* [SPARK-16535][BUILD] In pom.xml, remove groupId which is redundant ↵Xin Ren2016-07-191-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | definition and inherited from the parent https://issues.apache.org/jira/browse/SPARK-16535 ## What changes were proposed in this pull request? When I scan through the pom.xml of sub projects, I found this warning as below and attached screenshot ``` Definition of groupId is redundant, because it's inherited from the parent ``` ![screen shot 2016-07-13 at 3 13 11 pm](https://cloud.githubusercontent.com/assets/3925641/16823121/744f893e-4916-11e6-8a52-042f83b9db4e.png) I've tried to remove some of the lines with groupId definition, and the build on my local machine is still ok. ``` <groupId>org.apache.spark</groupId> ``` As I just find now `<maven.version>3.3.9</maven.version>` is being used in Spark 2.x, and Maven-3 supports versionless parent elements: Maven 3 will remove the need to specify the parent version in sub modules. THIS is great (in Maven 3.1). ref: http://stackoverflow.com/questions/3157240/maven-3-worth-it/3166762#3166762 ## How was this patch tested? I've tested by re-building the project, and build succeeded. Author: Xin Ren <iamshrek@126.com> Closes #14189 from keypointt/SPARK-16535.
* [MINOR][BUILD] Fix Java Linter `LineLength` errorsDongjoon Hyun2016-07-191-2/+4
| | | | | | | | | | | | | | ## What changes were proposed in this pull request? This PR fixes four java linter `LineLength` errors. Those are all `LineLength` errors, but we had better remove all java linter errors before release. ## How was this patch tested? After pass the Jenkins, `./dev/lint-java`. Author: Dongjoon Hyun <dongjoon@apache.org> Closes #14255 from dongjoon-hyun/minor_java_linter.
* [SPARK-16303][DOCS][EXAMPLES] Minor Scala/Java example updateCheng Lian2016-07-184-7/+7
| | | | | | | | | | | | | | ## What changes were proposed in this pull request? This PR moves one and the last hard-coded Scala example snippet from the SQL programming guide into `SparkSqlExample.scala`. It also renames all Scala/Java example files so that all "Sql" in the file names are updated to "SQL". ## How was this patch tested? Manually verified the generated HTML page. Author: Cheng Lian <lian@databricks.com> Closes #14245 from liancheng/minor-scala-example-update.
* [MINOR] Remove unused arg in als.pyZheng RuiFeng2016-07-181-3/+3
| | | | | | | | | | | | ## What changes were proposed in this pull request? The second arg in method `update()` is never used. So I delete it. ## How was this patch tested? local run with `./bin/spark-submit examples/src/main/python/als.py` Author: Zheng RuiFeng <ruifengz@foxmail.com> Closes #14247 from zhengruifeng/als_refine.
* [SPARKR][DOCS] minor code sample update in R programming guideFelix Cheung2016-07-181-1/+1
| | | | | | | | | | | | | | | | ## What changes were proposed in this pull request? Fix code style from ad hoc review of RC4 doc ## How was this patch tested? manual shivaram Author: Felix Cheung <felixcheung_m@hotmail.com> Closes #14250 from felixcheung/rdocs2rc4.
* [SPARK-16403][EXAMPLES] Cleanup to remove unused imports, consistent style, ↵Bryan Cutler2016-07-1456-646/+142
| | | | | | | | | | | | | | | | | | | | | | | | | minor fixes ## What changes were proposed in this pull request? Cleanup of examples, mostly from PySpark-ML to fix minor issues: unused imports, style consistency, pipeline_example is a duplicate, use future print funciton, and a spelling error. * The "Pipeline Example" is duplicated by "Simple Text Classification Pipeline" in Scala, Python, and Java. * "Estimator Transformer Param Example" is duplicated by "Simple Params Example" in Scala, Python and Java * Synced random_forest_classifier_example.py with Scala by adding IndexToString label converted * Synced train_validation_split.py (in Scala ModelSelectionViaTrainValidationExample) by adjusting data split, adding grid for intercept. * RegexTokenizer was doing nothing in tokenizer_example.py and JavaTokenizerExample.java, synced with Scala version ## How was this patch tested? local tests and run modified examples Author: Bryan Cutler <cutlerb@gmail.com> Closes #14081 from BryanCutler/examples-cleanup-SPARK-16403.
* [SPARKR][MINOR] R examples and test updatesFelix Cheung2016-07-132-1/+4
| | | | | | | | | | | | | | | | ## What changes were proposed in this pull request? Minor example updates ## How was this patch tested? manual shivaram Author: Felix Cheung <felixcheung_m@hotmail.com> Closes #14171 from felixcheung/rexample.
* [SPARK-16303][DOCS][EXAMPLES] Updated SQL programming guide and examplesaokolnychyi2016-07-138-269/+1193
| | | | | | | | | | | | | | - Hard-coded Spark SQL sample snippets were moved into source files under examples sub-project. - Removed the inconsistency between Scala and Java Spark SQL examples - Scala and Java Spark SQL examples were updated The work is still in progress. All involved examples were tested manually. An additional round of testing will be done after the code review. ![image](https://cloud.githubusercontent.com/assets/6235869/16710314/51851606-462a-11e6-9fbe-0818daef65e4.png) Author: aokolnychyi <okolnychyyanton@gmail.com> Closes #14119 from aokolnychyi/spark_16303.
* [SPARK-16114][SQL] structured streaming event time window exampleJames Thomas2016-07-116-4/+326
| | | | | | | | | | | | | | ## What changes were proposed in this pull request? A structured streaming example with event time windowing. ## How was this patch tested? Run locally Author: James Thomas <jamesjoethomas@gmail.com> Closes #13957 from jjthomas/current.
* [SPARKR][DOC] SparkR ML user guides update for 2.0Yanbo Liang2016-07-111-11/+11
| | | | | | | | | | | | | ## What changes were proposed in this pull request? * Update SparkR ML section to make them consistent with SparkR API docs. * Since #13972 adds labelling support for the ```include_example``` Jekyll plugin, so that we can split the single ```ml.R``` example file into multiple line blocks with different labels, and include them in different algorithms/models in the generated HTML page. ## How was this patch tested? Only docs update, manually check the generated docs. Author: Yanbo Liang <ybliang8@gmail.com> Closes #14011 from yanboliang/r-user-guide-update.
* [SPARK-16477] Bump master version to 2.1.0-SNAPSHOTReynold Xin2016-07-111-1/+1
| | | | | | | | | | | | ## What changes were proposed in this pull request? After SPARK-16476 (committed earlier today as #14128), we can finally bump the version number. ## How was this patch tested? N/A Author: Reynold Xin <rxin@databricks.com> Closes #14130 from rxin/SPARK-16477.
* [SPARK-16381][SQL][SPARKR] Update SQL examples and programming guide for R ↵Xin Ren2016-07-113-2/+199
| | | | | | | | | | | | | | | | | | | | | | | language binding https://issues.apache.org/jira/browse/SPARK-16381 ## What changes were proposed in this pull request? Update SQL examples and programming guide for R language binding. Here I just follow example https://github.com/apache/spark/compare/master...liancheng:example-snippet-extraction, created a separate R file to store all the example code. ## How was this patch tested? Manual test on my local machine. Screenshot as below: ![screen shot 2016-07-06 at 4 52 25 pm](https://cloud.githubusercontent.com/assets/3925641/16638180/13925a58-439a-11e6-8d57-8451a63dcae9.png) Author: Xin Ren <iamshrek@126.com> Closes #14082 from keypointt/SPARK-16381.
* [SPARK-16260][ML][EXAMPLE] PySpark ML Example Improvements and Cleanupwm624@hotmail.com2016-07-038-7/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | ## What changes were proposed in this pull request? 1). Remove unused import in Scala example; 2). Move spark session import outside example off; 3). Change parameter setting the same as Scala; 4). Change comment to be consistent; 5). Make sure that Scala and python using the same data set; I did one pass and fixed the above issues. There are missing examples in python, which might be added later. TODO: For some examples, there are comments on how to run examples; But there are many missing. We can add them later. ## How was this patch tested? (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) Manually test them Author: wm624@hotmail.com <wm624@hotmail.com> Closes #14021 from wangmiao1981/ann.
* [SPARK-16345][DOCUMENTATION][EXAMPLES][GRAPHX] Extract graphx programming ↵WeichenXu2016-07-026-0/+420
| | | | | | | | | | | | | | | | | | | | | | | | | | | | guide example snippets from source files instead of hard code them ## What changes were proposed in this pull request? I extract 6 example programs from GraphX programming guide and replace them with `include_example` label. The 6 example programs are: - AggregateMessagesExample.scala - SSSPExample.scala - TriangleCountingExample.scala - ConnectedComponentsExample.scala - ComprehensiveExample.scala - PageRankExample.scala All the example code can run using `bin/run-example graphx.EXAMPLE_NAME` ## How was this patch tested? Manual. Author: WeichenXu <WeichenXu123@outlook.com> Closes #14015 from WeichenXu123/graphx_example_plugin.
* [SPARK-16294][SQL] Labelling support for the include_example Jekyll pluginCheng Lian2016-06-293-2/+18
| | | | | | | | | | | | | | | | ## What changes were proposed in this pull request? This PR adds labelling support for the `include_example` Jekyll plugin, so that we may split a single source file into multiple line blocks with different labels, and include them in multiple code snippets in the generated HTML page. ## How was this patch tested? Manually tested. <img width="923" alt="screenshot at jun 29 19-53-21" src="https://cloud.githubusercontent.com/assets/230655/16451099/66a76db2-3e33-11e6-84fb-63104c2f0688.png"> Author: Cheng Lian <lian@databricks.com> Closes #13972 from liancheng/include-example-with-labels.
* [SPARK-16261][EXAMPLES][ML] Fixed incorrect appNames in ML ExamplesBryan Cutler2016-06-294-4/+4
| | | | | | | | | | | | | ## What changes were proposed in this pull request? Some appNames in ML examples are incorrect, mostly in PySpark but one in Scala. This corrects the names. ## How was this patch tested? Style, local tests Author: Bryan Cutler <cutlerb@gmail.com> Closes #13949 from BryanCutler/pyspark-example-appNames-fix-SPARK-16261.
* [SPARK-16114][SQL] structured streaming network word count examplesJames Thomas2016-06-283-0/+234
| | | | | | | | | | | | | | | ## What changes were proposed in this pull request? Network word count example for structured streaming ## How was this patch tested? Run locally Author: James Thomas <jamesjoethomas@gmail.com> Author: James Thomas <jamesthomas@Jamess-MacBook-Pro.local> Closes #13816 from jjthomas/master.
* [SPARK-16231][PYSPARK][ML][EXAMPLES] dataframe_example.py fails to convert ↵Bryan Cutler2016-06-271-1/+3
| | | | | | | | | | | | | | ML style vectors ## What changes were proposed in this pull request? Need to convert ML Vectors to the old MLlib style before doing Statistics.colStats operations on the DataFrame ## How was this patch tested? Ran example, local tests Author: Bryan Cutler <cutlerb@gmail.com> Closes #13928 from BryanCutler/pyspark-ml-example-vector-conv-SPARK-16231.
* [SPARK-16214][EXAMPLES] fix the denominator of SparkPi杨浩2016-06-271-1/+1
| | | | | | | | | | | | | | ## What changes were proposed in this pull request? reduce the denominator of SparkPi by 1 ## How was this patch tested? integration tests Author: 杨浩 <yanghaogn@163.com> Closes #13910 from yanghaogn/patch-1.
* [SPARK-15997][DOC][ML] Update user guide for HashingTF, QuantileVectorizer ↵GayathriMurali2016-06-243-6/+21
| | | | | | | | | | | | and CountVectorizer ## What changes were proposed in this pull request? Made changes to HashingTF,QuantileVectorizer and CountVectorizer Author: GayathriMurali <gayathri.m@intel.com> Closes #13745 from GayathriMurali/SPARK-15997.
* [SPARK-15159][SPARKR] SparkSession roxygen2 doc, programming guide, example ↵Felix Cheung2016-06-203-27/+22
| | | | | | | | | | | | | | | | | updates ## What changes were proposed in this pull request? roxygen2 doc, programming guide, example updates ## How was this patch tested? manual checks shivaram Author: Felix Cheung <felixcheung_m@hotmail.com> Closes #13751 from felixcheung/rsparksessiondoc.
* [SPARK-15129][R][DOC] R API changes in MLGayathriMurali2016-06-171-2/+2
| | | | | | | | | | ## What changes were proposed in this pull request? Make user guide changes to SparkR documentation for all changes that happened in 2.0 to Machine Learning APIs Author: GayathriMurali <gayathri.m@intel.com> Closes #13285 from GayathriMurali/SPARK-15129.
* [SPARK-15608][ML][EXAMPLES][DOC] add examples and documents of ml.isotonic ↵WeichenXu2016-06-166-14/+203
| | | | | | | | | | | | | | | | | | | | | | | | | regression ## What changes were proposed in this pull request? add ml doc for ml isotonic regression add scala example for ml isotonic regression add java example for ml isotonic regression add python example for ml isotonic regression modify scala example for mllib isotonic regression modify java example for mllib isotonic regression modify python example for mllib isotonic regression add data/mllib/sample_isotonic_regression_libsvm_data.txt delete data/mllib/sample_isotonic_regression_data.txt ## How was this patch tested? N/A Author: WeichenXu <WeichenXu123@outlook.com> Closes #13381 from WeichenXu123/add_isotonic_regression_doc.
* [SPARK-15996][R] Fix R examples by removing deprecated functionsDongjoon Hyun2016-06-162-8/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ## What changes were proposed in this pull request? Currently, R examples(`dataframe.R` and `data-manipulation.R`) fail like the following. We had better update them before releasing 2.0 RC. This PR updates them to use up-to-date APIs. ```bash $ bin/spark-submit examples/src/main/r/dataframe.R ... Warning message: 'createDataFrame(sqlContext...)' is deprecated. Use 'createDataFrame(data, schema = NULL, samplingRatio = 1.0)' instead. See help("Deprecated") ... Warning message: 'read.json(sqlContext...)' is deprecated. Use 'read.json(path)' instead. See help("Deprecated") ... Error: could not find function "registerTempTable" Execution halted ``` ## How was this patch tested? Manual. ``` curl -LO http://s3-us-west-2.amazonaws.com/sparkr-data/flights.csv bin/spark-submit examples/src/main/r/dataframe.R bin/spark-submit examples/src/main/r/data-manipulation.R flights.csv ``` Author: Dongjoon Hyun <dongjoon@apache.org> Closes #13714 from dongjoon-hyun/SPARK-15996.
* [SPARK-15898][SQL] DataFrameReader.text should return DataFrameWenchen Fan2016-06-1210-10/+10
| | | | | | | | | | | | | | | | | | | ## What changes were proposed in this pull request? We want to maintain API compatibility for DataFrameReader.text, and will introduce a new API called DataFrameReader.textFile which returns Dataset[String]. affected PRs: https://github.com/apache/spark/pull/11731 https://github.com/apache/spark/pull/13104 https://github.com/apache/spark/pull/13184 ## How was this patch tested? N/A Author: Wenchen Fan <wenchen@databricks.com> Closes #13604 from cloud-fan/revert.
* [SPARK-15086][CORE][STREAMING] Deprecate old Java accumulator APISean Owen2016-06-121-5/+5
| | | | | | | | | | | | | | | | | ## What changes were proposed in this pull request? - Deprecate old Java accumulator API; should use Scala now - Update Java tests and examples - Don't bother testing old accumulator API in Java 8 (too) - (fix a misspelling too) ## How was this patch tested? Jenkins tests Author: Sean Owen <sowen@cloudera.com> Closes #13606 from srowen/SPARK-15086.
* [SPARK-14615][ML][FOLLOWUP] Fix Python examples to use the new ML Vector and ↵hyukjinkwon2016-06-1010-19/+18
| | | | | | | | | | | | | | | | | | | | | | | | | Matrix APIs in the ML pipeline based algorithms ## What changes were proposed in this pull request? This PR fixes Python examples to use the new ML Vector and Matrix APIs in the ML pipeline based algorithms. I firstly executed this shell command, `grep -r "from pyspark.mllib" .` and then executed them all. Some of tests in `ml` produced the error messages as below: ``` pyspark.sql.utils.IllegalArgumentException: u'requirement failed: Input type must be VectorUDT but got org.apache.spark.mllib.linalg.VectorUDTf71b0bce.' ``` So, I fixed them to use new ones just identically with some Python tests fixed in https://github.com/apache/spark/pull/12627 ## How was this patch tested? Manually tested for all the examples listed by `grep -r "from pyspark.mllib" .`. Author: hyukjinkwon <gurwls223@gmail.com> Closes #13393 from HyukjinKwon/SPARK-14615.
* [SPARK-15773][CORE][EXAMPLE] Avoid creating local variable `sc` in examples ↵Dongjoon Hyun2016-06-1012-52/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | if possible ## What changes were proposed in this pull request? Instead of using local variable `sc` like the following example, this PR uses `spark.sparkContext`. This makes examples more concise, and also fixes some misleading, i.e., creating SparkContext from SparkSession. ``` - println("Creating SparkContext") - val sc = spark.sparkContext - println("Writing local file to DFS") val dfsFilename = dfsDirPath + "/dfs_read_write_test" - val fileRDD = sc.parallelize(fileContents) + val fileRDD = spark.sparkContext.parallelize(fileContents) ``` This will change 12 files (+30 lines, -52 lines). ## How was this patch tested? Manual. Author: Dongjoon Hyun <dongjoon@apache.org> Closes #13520 from dongjoon-hyun/SPARK-15773.
* [SPARK-15721][ML] Make DefaultParamsReadable, DefaultParamsWritable publicJoseph K. Bradley2016-06-061-0/+122
| | | | | | | | | | | | | | ## What changes were proposed in this pull request? Made DefaultParamsReadable, DefaultParamsWritable public. Also added relevant doc and annotations. Added UnaryTransformerExample to demonstrate use of UnaryTransformer and DefaultParamsReadable,Writable. ## How was this patch tested? Wrote example making use of the now-public APIs. Compiled and ran locally Author: Joseph K. Bradley <joseph@databricks.com> Closes #13461 from jkbradley/defaultparamswritable.
* [SPARK-15771][ML][EXAMPLES] Use 'accuracy' rather than 'precision' in many ↵Yanbo Liang2016-06-0618-36/+36
| | | | | | | | | | | | | | | | | | ML examples ## What changes were proposed in this pull request? Since [SPARK-15617](https://issues.apache.org/jira/browse/SPARK-15617) deprecated ```precision``` in ```MulticlassClassificationEvaluator```, many ML examples broken. ```python pyspark.sql.utils.IllegalArgumentException: u'MulticlassClassificationEvaluator_4c3bb1d73d8cc0cedae6 parameter metricName given invalid value precision.' ``` We should use ```accuracy``` to replace ```precision``` in these examples. ## How was this patch tested? Offline tests. Author: Yanbo Liang <ybliang8@gmail.com> Closes #13519 from yanboliang/spark-15771.
* [SPARK-15605][ML][EXAMPLES] Fix broken ML JavaDeveloperApiExample.Yanbo Liang2016-06-021-240/+0
| | | | | | | | | | | | | | | | | | | | | ## What changes were proposed in this pull request? See [SPARK-15605](https://issues.apache.org/jira/browse/SPARK-15605) for the detail of this bug. This PR fix 2 major bugs in this example: * The java example class use Param ```maxIter```, it will fail when calling ```Param.shouldOwn```. We need add a public method which return the ```maxIter``` Object. Because ```Params.params``` use java reflection to list all public method whose return type is ```Param```, and invoke them to get all defined param objects in the instance. * The ```uid``` member defined in Java class will be initialized after Scala traits such as ```HasFeaturesCol```. So when ```HasFeaturesCol``` being constructed, they get ```uid``` with null, which will cause ```Param.shouldOwn``` check fail. so, here is my changes: * Add public method: ```public IntParam getMaxIterParam() {return maxIter;}``` * Use Java anonymous class overriding ```uid()``` to defined the ```uid```, and it solve the second problem described above. * To make the ```getMaxIterParam ``` can be invoked using java reflection, we must make the two class (MyJavaLogisticRegression and MyJavaLogisticRegressionModel) public. so I make them become inner public static class. ## How was this patch tested? Offline tests. Author: Yanbo Liang <ybliang8@gmail.com> Closes #13353 from yanboliang/spark-15605.
* [SPARK-15208][WIP][CORE][STREAMING][DOCS] Update Spark examples with ↵Liwei Lin2016-06-021-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | AccumulatorV2 ## What changes were proposed in this pull request? The patch updates the codes & docs in the example module as well as the related doc module: - [ ] [docs] `streaming-programming-guide.md` - [x] scala code part - [ ] java code part - [ ] python code part - [x] [examples] `RecoverableNetworkWordCount.scala` - [ ] [examples] `JavaRecoverableNetworkWordCount.java` - [ ] [examples] `recoverable_network_wordcount.py` ## How was this patch tested? Ran the examples and verified results manually. Author: Liwei Lin <lwlin7@gmail.com> Closes #12981 from lw-lin/accumulatorV2-examples.
* [SPARK-15618][SQL][MLLIB] Use SparkSession.builder.sparkContext if applicable.Dongjoon Hyun2016-05-313-9/+4
| | | | | | | | | | | | | | | | | | ## What changes were proposed in this pull request? This PR changes function `SparkSession.builder.sparkContext(..)` from **private[sql]** into **private[spark]**, and uses it if applicable like the followings. ``` - val spark = SparkSession.builder().config(sc.getConf).getOrCreate() + val spark = SparkSession.builder().sparkContext(sc).getOrCreate() ``` ## How was this patch tested? Pass the existing Jenkins tests. Author: Dongjoon Hyun <dongjoon@apache.org> Closes #13365 from dongjoon-hyun/SPARK-15618.