| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
minor fixes
## What changes were proposed in this pull request?
Cleanup of examples, mostly from PySpark-ML to fix minor issues: unused imports, style consistency, pipeline_example is a duplicate, use future print funciton, and a spelling error.
* The "Pipeline Example" is duplicated by "Simple Text Classification Pipeline" in Scala, Python, and Java.
* "Estimator Transformer Param Example" is duplicated by "Simple Params Example" in Scala, Python and Java
* Synced random_forest_classifier_example.py with Scala by adding IndexToString label converted
* Synced train_validation_split.py (in Scala ModelSelectionViaTrainValidationExample) by adjusting data split, adding grid for intercept.
* RegexTokenizer was doing nothing in tokenizer_example.py and JavaTokenizerExample.java, synced with Scala version
## How was this patch tested?
local tests and run modified examples
Author: Bryan Cutler <cutlerb@gmail.com>
Closes #14081 from BryanCutler/examples-cleanup-SPARK-16403.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Matrix APIs in the ML pipeline based algorithms
## What changes were proposed in this pull request?
This PR fixes Python examples to use the new ML Vector and Matrix APIs in the ML pipeline based algorithms.
I firstly executed this shell command, `grep -r "from pyspark.mllib" .` and then executed them all.
Some of tests in `ml` produced the error messages as below:
```
pyspark.sql.utils.IllegalArgumentException: u'requirement failed: Input type must be VectorUDT but got org.apache.spark.mllib.linalg.VectorUDTf71b0bce.'
```
So, I fixed them to use new ones just identically with some Python tests fixed in https://github.com/apache/spark/pull/12627
## How was this patch tested?
Manually tested for all the examples listed by `grep -r "from pyspark.mllib" .`.
Author: hyukjinkwon <gurwls223@gmail.com>
Closes #13393 from HyukjinKwon/SPARK-14615.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
binary_classification_metrics_example.py
## What changes were proposed in this pull request?
This issue addresses the comments in SPARK-15031 and also fix java-linter errors.
- Use multiline format in SparkSession builder patterns.
- Update `binary_classification_metrics_example.py` to use `SparkSession`.
- Fix Java Linter errors (in SPARK-13745, SPARK-15031, and so far)
## How was this patch tested?
After passing the Jenkins tests and run `dev/lint-java` manually.
Author: Dongjoon Hyun <dongjoon@apache.org>
Closes #12911 from dongjoon-hyun/SPARK-15134.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
## What changes were proposed in this pull request?
This PR aims to update Scala/Python/Java examples by replacing `SQLContext` with newly added `SparkSession`.
- Use **SparkSession Builder Pattern** in 154(Scala 55, Java 52, Python 47) files.
- Add `getConf` in Python SparkContext class: `python/pyspark/context.py`
- Replace **SQLContext Singleton Pattern** with **SparkSession Singleton Pattern**:
- `SqlNetworkWordCount.scala`
- `JavaSqlNetworkWordCount.java`
- `sql_network_wordcount.py`
Now, `SQLContexts` are used only in R examples and the following two Python examples. The python examples are untouched in this PR since it already fails some unknown issue.
- `simple_params_example.py`
- `aft_survival_regression.py`
## How was this patch tested?
Manual.
Author: Dongjoon Hyun <dongjoon@apache.org>
Closes #12809 from dongjoon-hyun/SPARK-15031.
|
|
|
|
|
|
|
|
|
| |
PR on behalf of somideshmukh, thanks!
Author: Xusen Yin <yinxusen@gmail.com>
Author: somideshmukh <somilde@us.ibm.com>
Closes #10219 from yinxusen/SPARK-11551.
|
|
|
|
|
|
|
|
|
|
| |
This reverts PR #10002, commit 78209b0ccaf3f22b5e2345dfb2b98edfdb746819.
The original PR wasn't tested on Jenkins before being merged.
Author: Cheng Lian <lian@databricks.com>
Closes #10200 from liancheng/revert-pr-10002.
|
|
include_example
Made new patch contaning only markdown examples moved to exmaple/folder.
Ony three java code were not shfted since they were contaning compliation error ,these classes are
1)StandardScale 2)NormalizerExample 3)VectorIndexer
Author: Xusen Yin <yinxusen@gmail.com>
Author: somideshmukh <somilde@us.ibm.com>
Closes #10002 from somideshmukh/SomilBranch1.33.
|