| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
|
|
|
|
|
|
| |
compatibility test
* Adds user guide for ml.feature.StopWordsRemovers, ran code examples on my machine
* Cleans up scaladocs for public methods
* Adds test for Java compatibility
* Follow up Python user guide code example is tracked by SPARK-10249
Author: Feynman Liang <fliang@databricks.com>
Closes #8436 from feynmanliang/SPARK-10230.
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Replaces instances of `Lists.newArrayList` with `Arrays.asList`
* Replaces `commons.lang.StringUtils` over `com.google.collections.Strings`
* Replaces `List` interface over `ArrayList` implementations
This PR along with #8445 #8446 #8447 completely removes all `com.google.collections.Lists` dependencies within mllib's Java tests.
Author: Feynman Liang <fliang@databricks.com>
Closes #8451 from feynmanliang/SPARK-10257.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fix for [JavaConverters.asJavaListConverter](http://www.scala-lang.org/api/2.10.5/index.html#scala.collection.JavaConverters$) being removed in 2.11.7 and hence the build fails with the 2.11 profile enabled. Tested with the default 2.10 and 2.11 profiles. BUILD SUCCESS in both cases.
Build for 2.10:
./build/mvn -Pyarn -Phadoop-2.6 -Dhadoop.version=2.7.1 -DskipTests clean install
and 2.11:
./dev/change-scala-version.sh 2.11
./build/mvn -Pyarn -Phadoop-2.6 -Dhadoop.version=2.7.1 -Dscala-2.11 -DskipTests clean install
Author: Jacek Laskowski <jacek@japila.pl>
Closes #8479 from jaceklaskowski/SPARK-9613-hotfix.
|
|
|
|
|
|
|
|
| |
JavaTests
Author: Feynman Liang <fliang@databricks.com>
Closes #8447 from feynmanliang/SPARK-10256.
|
|
|
|
|
|
| |
Author: Feynman Liang <fliang@databricks.com>
Closes #8446 from feynmanliang/SPARK-10255.
|
|
|
|
|
|
|
|
|
| |
* Replaces `com.google.common` dependencies with `java.util.Arrays`
* Small clean up in `JavaNormalizerSuite`
Author: Feynman Liang <fliang@databricks.com>
Closes #8445 from feynmanliang/SPARK-10254.
|
|
|
|
|
|
|
|
|
|
|
|
| |
to JavaConverters
Replace `JavaConversions` implicits with `JavaConverters`
Most occurrences I've seen so far are necessary conversions; a few have been avoidable. None are in critical code as far as I see, yet.
Author: Sean Owen <sowen@cloudera.com>
Closes #8033 from srowen/SPARK-9613.
|
|
|
|
|
|
|
|
|
|
| |
Add user guide for `VectorSlicer`, with Java test suite and Python version VectorSlicer.
Note that Python version does not support selecting by names now.
Author: Xusen Yin <yinxusen@gmail.com>
Closes #8267 from yinxusen/SPARK-9893.
|
|
|
|
|
|
|
|
|
|
| |
For each (document, term) pair, return top topic. Note that instances of (doc, term) pairs within a document (a.k.a. "tokens") are exchangeable, so we should provide an estimate per document-term, rather than per token.
CC: rotationsymmetry mengxr
Author: Joseph K. Bradley <joseph@databricks.com>
Closes #8329 from jkbradley/lda-topic-assignments.
|
|
|
|
|
|
|
|
|
|
| |
Java test suite
Otherwise, setters do not return self type. jkbradley avulanov
Author: Xiangrui Meng <meng@databricks.com>
Closes #8342 from mengxr/SPARK-10138.
|
|
|
|
|
|
|
|
| |
Updates FPM user guide to include Association Rules.
Author: Feynman Liang <fliang@databricks.com>
Closes #8207 from feynmanliang/SPARK-9900-arules.
|
|
|
|
|
|
|
|
| |
Some minor clean-ups after SPARK-9661. See my inline comments. MechCoder jkbradley
Author: Xiangrui Meng <meng@databricks.com>
Closes #8190 from mengxr/SPARK-9661-fix.
|
|
|
|
|
|
|
|
|
|
|
|
| |
I skimmed through the docs for various instance of Object and replaced them with Java compaible versions of the same.
1. Some methods in LDAModel.
2. runMiniBatchSGD
3. kolmogorovSmirnovTest
Author: MechCoder <manojkumarsivaraj334@gmail.com>
Closes #8126 from MechCoder/java_incop.
|
|
|
|
|
|
|
|
|
|
| |
Reinstated LogisticRegression.threshold Param for binary compatibility. Param thresholds overrides threshold, if set.
CC: mengxr dbtsai feynmanliang
Author: Joseph K. Bradley <joseph@databricks.com>
Closes #8079 from jkbradley/logreg-reinstate-threshold.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I have added support for stats in LogisticRegression. The API is similar to that of LinearRegression with LogisticRegressionTrainingSummary and LogisticRegressionSummary
I have some queries and asked them inline.
Author: MechCoder <manojkumarsivaraj334@gmail.com>
Closes #7538 from MechCoder/log_reg_stats and squashes the following commits:
2e9f7c7 [MechCoder] Change defs into lazy vals
d775371 [MechCoder] Clean up class inheritance
9586125 [MechCoder] Add abstraction to handle Multiclass Metrics
40ad8ef [MechCoder] minor
640376a [MechCoder] remove unnecessary dataframe stuff and add docs
80d9954 [MechCoder] Added tests
fbed861 [MechCoder] DataFrame support for metrics
70a0fc4 [MechCoder] [SPARK-9112] [ML] Implement Stats for LogisticRegression
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This PR replaces the old "threshold" with a generalized "thresholds" Param. We keep getThreshold,setThreshold for backwards compatibility for binary classification.
Note that the primary author of this PR is holdenk
Author: Holden Karau <holden@pigscanfly.ca>
Author: Joseph K. Bradley <joseph@databricks.com>
Closes #7909 from jkbradley/holdenk-SPARK-8069-add-cutoff-aka-threshold-to-random-forest and squashes the following commits:
3952977 [Joseph K. Bradley] fixed pyspark doc test
85febc8 [Joseph K. Bradley] made python unit tests a little more robust
7eb1d86 [Joseph K. Bradley] small cleanups
6cc2ed8 [Joseph K. Bradley] Fixed remaining merge issues.
0255e44 [Joseph K. Bradley] Many cleanups for thresholds, some more tests
7565a60 [Holden Karau] fix pep8 style checks, add a getThreshold method similar to our LogisticRegression.scala one for API compat
be87f26 [Holden Karau] Convert threshold to thresholds in the python code, add specialized support for Array[Double] to shared parems codegen, etc.
6747dad [Holden Karau] Override raw2prediction for ProbabilisticClassifier, fix some tests
25df168 [Holden Karau] Fix handling of thresholds in LogisticRegression
c02d6c0 [Holden Karau] No default for thresholds
5e43628 [Holden Karau] CR feedback and fixed the renamed test
f3fbbd1 [Holden Karau] revert the changes to random forest :(
51f581c [Holden Karau] Add explicit types to public methods, fix long line
f7032eb [Holden Karau] Fix a java test bug, remove some unecessary changes
adf15b4 [Holden Karau] rename the classifier suite test to ProbabilisticClassifierSuite now that we only have it in Probabilistic
398078a [Holden Karau] move the thresholding around a bunch based on the design doc
4893bdc [Holden Karau] Use numtrees of 3 since previous result was tied (one tree for each) and the switch from different max methods picked a different element (since they were equal I think this is ok)
638854c [Holden Karau] Add a scala RandomForestClassifierSuite test based on corresponding python test
e09919c [Holden Karau] Fix return type, I need more coffee....
8d92cac [Holden Karau] Use ClassifierParams as the head
3456ed3 [Holden Karau] Add explicit return types even though just test
a0f3b0c [Holden Karau] scala style fixes
6f14314 [Holden Karau] Since hasthreshold/hasthresholds is in root classifier now
ffc8dab [Holden Karau] Update the sharedParams
0420290 [Holden Karau] Allow us to override the get methods selectively
978e77a [Holden Karau] Move HasThreshold into classifier params and start defining the overloaded getThreshold/getThresholds functions
1433e52 [Holden Karau] Revert "try and hide threshold but chainges the API so no dice there"
1f09a2e [Holden Karau] try and hide threshold but chainges the API so no dice there
efb9084 [Holden Karau] move setThresholds only to where its used
6b34809 [Holden Karau] Add a test with thresholding for the RFCS
74f54c3 [Holden Karau] Fix creation of vote array
1986fa8 [Holden Karau] Setting the thresholds only makes sense if the underlying class hasn't overridden predict, so lets push it down.
2f44b18 [Holden Karau] Add a global default of null for thresholds param
f338cfc [Holden Karau] Wait that wasn't a good idea, Revert "Some progress towards unifying threshold and thresholds"
634b06f [Holden Karau] Some progress towards unifying threshold and thresholds
85c9e01 [Holden Karau] Test passes again... little fnur
099c0f3 [Holden Karau] Move thresholds around some more (set on model not trainer)
0f46836 [Holden Karau] Start adding a classifiersuite
f70eb5e [Holden Karau] Fix test compile issues
a7d59c8 [Holden Karau] Move thresholding into Classifier trait
5d999d2 [Holden Karau] Some more progress, start adding a test (maybe try and see if we can find a better thing to use for the base of the test)
1fed644 [Holden Karau] Use thresholds to scale scores in random forest classifcation
31d6bf2 [Holden Karau] Start threading the threshold info through
0ef228c [Holden Karau] Add hasthresholds
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
warnings, 1.5.0 edition
Enable most javac lint warnings; fix a lot of build warnings. In a few cases, touch up surrounding code in the process.
I'll explain several of the changes inline in comments.
Author: Sean Owen <sowen@cloudera.com>
Closes #7862 from srowen/SPARK-9534 and squashes the following commits:
ea51618 [Sean Owen] Enable most javac lint warnings; fix a lot of build warnings. In a few cases, touch up surrounding code in the process.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Regressor
Added featureImportance to RandomForestClassifier and Regressor.
This follows the scikit-learn implementation here: [https://github.com/scikit-learn/scikit-learn/blob/a95203b249c1cf392f86d001ad999e29b2392739/sklearn/tree/_tree.pyx#L3341]
CC: yanboliang Would you mind taking a look? Thanks!
Author: Joseph K. Bradley <joseph@databricks.com>
Author: Feynman Liang <fliang@databricks.com>
Closes #7838 from jkbradley/dt-feature-importance and squashes the following commits:
72a167a [Joseph K. Bradley] fixed unit test
86cea5f [Joseph K. Bradley] Modified RF featuresImportances to return Vector instead of Map
5aa74f0 [Joseph K. Bradley] finally fixed unit test for real
33df5db [Joseph K. Bradley] fix unit test
42a2d3b [Joseph K. Bradley] fix unit test
fe94e72 [Joseph K. Bradley] modified feature importance unit tests
cc693ee [Feynman Liang] Add classifier tests
79a6f87 [Feynman Liang] Compare dense vectors in test
21d01fc [Feynman Liang] Added failing SKLearn test
ac0b254 [Joseph K. Bradley] Added featureImportance to RandomForestClassifier/Regressor. Need to add unit tests
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
1. Use `PrefixSpanModel` to wrap the frequent sequences.
2. Define `FreqSequence` to wrap each frequent sequence, which contains a Java-friendly method `javaSequence`
3. Overload `run` for Java users.
4. Added a unit test in Java to check Java compatibility.
zhangjiajin feynmanliang
Author: Xiangrui Meng <meng@databricks.com>
Closes #7869 from mengxr/SPARK-9527 and squashes the following commits:
4345594 [Xiangrui Meng] add PrefixSpanModel and make PrefixSpan Java friendly
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
support ml.NaiveBayes for Python
Author: Yanbo Liang <ybliang8@gmail.com>
Closes #7568 from yanboliang/spark-9214 and squashes the following commits:
5ee3fd6 [Yanbo Liang] fix typos
3ecd046 [Yanbo Liang] fix typos
f9c94d1 [Yanbo Liang] change lambda_ to smoothing and fix other issues
180452a [Yanbo Liang] fix typos
7dda1f4 [Yanbo Liang] support ml.NaiveBayes for Python
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Implements `logPerplexity` in `OnlineLDAOptimizer`. Also refactors inference code into companion object to enable future reuse (e.g. `predict` method).
Author: Feynman Liang <fliang@databricks.com>
Closes #7705 from feynmanliang/SPARK-6793-perplexity and squashes the following commits:
6da2c99 [Feynman Liang] Remove get* from LDAModel public API
8381da6 [Feynman Liang] Code review comments
17f7000 [Feynman Liang] Documentation typo fixes
2f452a4 [Feynman Liang] Remove auxillary DistributedLDAModel constructor
a275914 [Feynman Liang] Prevent empty counts calls to variationalInference
06d02d9 [Feynman Liang] Remove deprecated LocalLDAModel constructor
afecb46 [Feynman Liang] Fix regression bug in sstats accumulator
5a327a0 [Feynman Liang] Code review quick fixes
998c03e [Feynman Liang] Fix style
1cbb67d [Feynman Liang] Fix access modifier bug
4362daa [Feynman Liang] Organize imports
4f171f7 [Feynman Liang] Fix indendation
2f049ce [Feynman Liang] Fix failing save/load tests
7415e96 [Feynman Liang] Pick changes from big PR
11e7c33 [Feynman Liang] Merge remote-tracking branch 'apache/master' into SPARK-6793-perplexity
f8adc48 [Feynman Liang] Add logPerplexity, refactor variationalBound into a method
cd521d6 [Feynman Liang] Refactor methods into companion class
7f62a55 [Feynman Liang] --amend
c62cb1e [Feynman Liang] Outer product for stats, revert Range slicing
aead650 [Feynman Liang] Range slice, in-place update, reduce transposes
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
multiple params
Removed varargs annotation from Params.setDefault taking multiple params.
Though varargs is technically correct, it often requires that developers do clean assembly, rather than (not clean) assembly, which is a nuisance during development.
CC: mengxr
Author: Joseph K. Bradley <joseph@databricks.com>
Closes #7604 from jkbradley/params-setdefault-varargs and squashes the following commits:
6016dc6 [Joseph K. Bradley] removed varargs annotation from Params.setDefault taking multiple params
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I Implemented the KMeans API for spark.ml Pipelines. But it doesn't include clustering abstractions for spark.ml (SPARK-7610). It would fit for another issues. And I'll try it later, since we are trying to add the hierarchical clustering algorithms in another issue. Thanks.
[SPARK-7879] KMeans API for spark.ml Pipelines - ASF JIRA https://issues.apache.org/jira/browse/SPARK-7879
Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com>
Closes #6756 from yu-iskw/SPARK-7879 and squashes the following commits:
be752de [Yu ISHIKAWA] Add assertions
a14939b [Yu ISHIKAWA] Fix the dashed line's length in pyspark.ml.rst
4c61693 [Yu ISHIKAWA] Remove the test about whether "features" and "prediction" columns exist or not in Python
fb2417c [Yu ISHIKAWA] Use getInt, instead of get
f397be4 [Yu ISHIKAWA] Switch the comparisons.
ca78b7d [Yu ISHIKAWA] Add the Scala docs about the constraints of each parameter.
effc650 [Yu ISHIKAWA] Using expertSetParam and expertGetParam
c8dc6e6 [Yu ISHIKAWA] Remove an unnecessary test
19a9d63 [Yu ISHIKAWA] Include spark.ml.clustering to python tests
1abb19c [Yu ISHIKAWA] Add the statements about spark.ml.clustering into pyspark.ml.rst
f8338bc [Yu ISHIKAWA] Add the placeholders in Python
4a03003 [Yu ISHIKAWA] Test for contains in Python
6566c8b [Yu ISHIKAWA] Use `get`, instead of `apply`
288e8d5 [Yu ISHIKAWA] Using `contains` to check the column names
5a7d574 [Yu ISHIKAWA] Renamce `validateInitializationMode` to `validateInitMode` and remove throwing exception
97cfae3 [Yu ISHIKAWA] Fix the type of return value of `KMeans.copy`
e933723 [Yu ISHIKAWA] Remove the default value of seed from the Model class
978ee2c [Yu ISHIKAWA] Modify the docs of KMeans, according to mllib's KMeans
2ec80bc [Yu ISHIKAWA] Fit on 1 line
e186be1 [Yu ISHIKAWA] Make a few variables, setters and getters be expert ones
b2c205c [Yu ISHIKAWA] Rename the method `getInitializationSteps` to `getInitSteps` and `setInitializationSteps` to `setInitSteps` in Scala and Python
f43f5b4 [Yu ISHIKAWA] Rename the method `getInitializationMode` to `getInitMode` and `setInitializationMode` to `setInitMode` in Scala and Python
3cb5ba4 [Yu ISHIKAWA] Modify the description about epsilon and the validation
4fa409b [Yu ISHIKAWA] Add a comment about the default value of epsilon
2f392e1 [Yu ISHIKAWA] Make some variables `final` and Use `IntParam` and `DoubleParam`
19326f8 [Yu ISHIKAWA] Use `udf`, instead of callUDF
4d2ad1e [Yu ISHIKAWA] Modify the indentations
0ae422f [Yu ISHIKAWA] Add a test for `setParams`
4ff7913 [Yu ISHIKAWA] Add "ml.clustering" to `javacOptions` in SparkBuild.scala
11ffdf1 [Yu ISHIKAWA] Use `===` and the variable
220a176 [Yu ISHIKAWA] Set a random seed in the unit testing
92c3efc [Yu ISHIKAWA] Make the points for a test be fewer
c758692 [Yu ISHIKAWA] Modify the parameters of KMeans in Python
6aca147 [Yu ISHIKAWA] Add some unit testings to validate the setter methods
687cacc [Yu ISHIKAWA] Alias mllib.KMeans as MLlibKMeans in KMeansSuite.scala
a4dfbef [Yu ISHIKAWA] Modify the last brace and indentations
5bedc51 [Yu ISHIKAWA] Remve an extra new line
444c289 [Yu ISHIKAWA] Add the validation for `runs`
e41989c [Yu ISHIKAWA] Modify how to validate `initStep`
7ea133a [Yu ISHIKAWA] Change how to validate `initMode`
7991e15 [Yu ISHIKAWA] Add a validation for `k`
c2df35d [Yu ISHIKAWA] Make `predict` private
93aa2ff [Yu ISHIKAWA] Use `withColumn` in `transform`
d3a79f7 [Yu ISHIKAWA] Remove the inhefited docs
e9532e1 [Yu ISHIKAWA] make `parentModel` of KMeansModel private
8559772 [Yu ISHIKAWA] Remove the `paramMap` parameter of KMeans
6684850 [Yu ISHIKAWA] Rename `initializationSteps` to `initSteps`
99b1b96 [Yu ISHIKAWA] Rename `initializationMode` to `initMode`
79ea82b [Yu ISHIKAWA] Modify the parameters of KMeans docs
6569bcd [Yu ISHIKAWA] Change how to set the default values with `setDefault`
20a795a [Yu ISHIKAWA] Change how to set the default values with `setDefault`
11c2a12 [Yu ISHIKAWA] Limit the imports
badb481 [Yu ISHIKAWA] Alias spark.mllib.{KMeans, KMeansModel}
f80319a [Yu ISHIKAWA] Rebase mater branch and add copy methods
85d92b1 [Yu ISHIKAWA] Add `KMeans.setPredictionCol`
aa9469d [Yu ISHIKAWA] Fix a python test suite error caused by python 3.x
c2d6bcb [Yu ISHIKAWA] ADD Java test suites of the KMeans API for spark.ml Pipeline
598ed2e [Yu ISHIKAWA] Implement the KMeans API for spark.ml Pipelines in Python
63ad785 [Yu ISHIKAWA] Implement the KMeans API for spark.ml Pipelines in Scala
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Naive Bayes API for spark.ml Pipelines
Author: Yanbo Liang <ybliang8@gmail.com>
Closes #7284 from yanboliang/spark-8600 and squashes the following commits:
bc890f7 [Yanbo Liang] remove labels valid check
c3de687 [Yanbo Liang] remove labels from ml.NaiveBayesModel
a2b3088 [Yanbo Liang] address comments
3220b82 [Yanbo Liang] trigger jenkins
3018a41 [Yanbo Liang] address comments
208e166 [Yanbo Liang] Naive Bayes API for spark.ml Pipelines
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This PR copies the RandomForest implementation from spark.mllib to spark.ml. Note that this includes the DecisionTree implementation, but not the GradientBoostedTrees one (which will come later).
I essentially copied a minimal amount of code to spark.ml, removed the use of bins (and only used splits), and modified code only as much as necessary to get it to compile. The spark.ml implementation still uses some spark.mllib classes (privately), which can be moved in future PRs.
This refactoring will be helpful in extending the node representation to include more information, such as class probabilities.
Specifically:
* Copied code from spark.mllib to spark.ml:
* mllib.tree.DecisionTree, mllib.tree.RandomForest copied to ml.tree.impl.RandomForest (main implementation)
* NodeIdCache (needed to use splits instead of bins)
* TreePoint (use splits instead of bins)
* Added ml.tree.LearningNode used in RandomForest training (needed vars)
* Removed bins from implementation, and only used splits
* Small fix in JavaDecisionTreeRegressorSuite
CC: mengxr manishamde codedeft chouqin
Author: Joseph K. Bradley <joseph@databricks.com>
Closes #7294 from jkbradley/dt-move-impl and squashes the following commits:
48749be [Joseph K. Bradley] cleanups based on code review, mostly style
bea9703 [Joseph K. Bradley] scala style fixes. added some scala doc
4e6d2a4 [Joseph K. Bradley] removed unnecessary use of copyValues, setParent for trees
9a4d721 [Joseph K. Bradley] cleanups. removed InfoGainStats from ml, using old one for now.
836e7d4 [Joseph K. Bradley] Fixed test suite failures
bd5e063 [Joseph K. Bradley] fixed bucketizing issue
0df3759 [Joseph K. Bradley] Need to remove use of Bucketizer
d5224a9 [Joseph K. Bradley] modified tree and forest to use moved impl
cc01823 [Joseph K. Bradley] still editing RF to get it to work
19143fb [Joseph K. Bradley] More progress, but not done yet. Rebased with master after 1.4 release.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Distributed generation of single-consequent association rules from a RDD of frequent itemsets. Tests referenced against `R`'s implementation of A Priori in [arules](http://cran.r-project.org/web/packages/arules/index.html).
Author: Feynman Liang <fliang@databricks.com>
Closes #7005 from feynmanliang/fp-association-rules-distributed and squashes the following commits:
466ced0 [Feynman Liang] Refactor AR generation impl
73c1cff [Feynman Liang] Make rule attributes public, remove numTransactions from FreqItemset
80f63ff [Feynman Liang] Change default confidence and optimize imports
04cf5b5 [Feynman Liang] Code review with @mengxr, add R to tests
0cc1a6a [Feynman Liang] Java compatibility test
f3c14b5 [Feynman Liang] Fix MiMa test
764375e [Feynman Liang] Fix tests
1187307 [Feynman Liang] Almost working tests
b20779b [Feynman Liang] Working implementation
5395c4e [Feynman Liang] Fix imports
2d34405 [Feynman Liang] Partial implementation of distributed ar
83ace4b [Feynman Liang] Local rule generation without pruning complete
69c2c87 [Feynman Liang] Working local implementation, now to parallelize../..
4e1ec9a [Feynman Liang] Pull FreqItemsets out, refactor type param, tests
69ccedc [Feynman Liang] First implementation of association rule generation
|
|
|
|
|
|
|
|
|
|
|
| |
Add Java unit test for PCA transformer
Author: Yanbo Liang <ybliang8@gmail.com>
Closes #7184 from yanboliang/spark-8788 and squashes the following commits:
9d1a2af [Yanbo Liang] address comments
b34451f [Yanbo Liang] Add Java unit test for PCA transformer
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
See the jira https://issues.apache.org/jira/browse/SPARK-5562
Author: Alok Singh <singhal@Aloks-MacBook-Pro.local>
Author: Alok Singh <singhal@aloks-mbp.usca.ibm.com>
Author: Alok Singh <“singhal@us.ibm.com”>
Closes #7064 from aloknsingh/aloknsingh_SPARK-5562 and squashes the following commits:
259a0a7 [Alok Singh] change as per the comments by @jkbradley
be48491 [Alok Singh] [SPARK-5562][MLlib] re-order import in alphabhetical order
c01311b [Alok Singh] [SPARK-5562][MLlib] fix the newline typo
b271c8a [Alok Singh] [SPARK-5562][Mllib] As per github discussion with jkbradley. We would like to simply things.
7c06251 [Alok Singh] [SPARK-5562][MLlib] modified the JavaLDASuite for test passing
c710cb6 [Alok Singh] fix the scala code style to have space after :
2572a08 [Alok Singh] [SPARK-5562][MLlib] change the import xyz._ to the import xyz.{c1, c2} ..
ab55fbf [Alok Singh] [SPARK-5562][MLlib] Change as per Sean Owen's comments https://github.com/apache/spark/pull/7064/files#diff-9236d23975e6f5a5608ffc81dfd79146
9f4f9ea [Alok Singh] [SPARK-5562][MLlib] LDA should handle empty document.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Rename DiscreteCosineTransformer and related classes to DCT.
Author: Feynman Liang <fliang@databricks.com>
Closes #7138 from feynmanliang/dct-features and squashes the following commits:
e547b3e [Feynman Liang] Fix renaming bug
9d5c9e4 [Feynman Liang] Lowercase JavaDCTSuite variable
f9a8958 [Feynman Liang] Remove old files
f8fe794 [Feynman Liang] Merge branch 'master' into dct-features
894d0b2 [Feynman Liang] Rename DiscreteCosineTransformer to DCT
433dbc7 [Feynman Liang] Test refactoring
91e9636 [Feynman Liang] Style guide and test helper refactor
b5ac19c [Feynman Liang] Use Vector types, add Java test
530983a [Feynman Liang] Tests for other numeric datatypes
195d7aa [Feynman Liang] Implement support for arbitrary numeric types
95d4939 [Feynman Liang] Working DCT for 1D Doubles
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Implementation and tests for Discrete Cosine Transformer.
Author: Feynman Liang <fliang@databricks.com>
Closes #6894 from feynmanliang/dct-features and squashes the following commits:
433dbc7 [Feynman Liang] Test refactoring
91e9636 [Feynman Liang] Style guide and test helper refactor
b5ac19c [Feynman Liang] Use Vector types, add Java test
530983a [Feynman Liang] Tests for other numeric datatypes
195d7aa [Feynman Liang] Implement support for arbitrary numeric types
95d4939 [Feynman Liang] Working DCT for 1D Doubles
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Otherwise, extra params get ignored in `PipelineModel.transform`. jkbradley
Author: Xiangrui Meng <meng@databricks.com>
Closes #6622 from mengxr/SPARK-8087 and squashes the following commits:
0e4c8c4 [Xiangrui Meng] fix merge issues
26fc1f0 [Xiangrui Meng] address comments
e607a04 [Xiangrui Meng] merge master
b85b57e [Xiangrui Meng] fix examples/compile
d6f7891 [Xiangrui Meng] rename defaultCopyWithParams to defaultCopy
84ec278 [Xiangrui Meng] remove setter checks due to generics
2cf2ed0 [Xiangrui Meng] snapshot
291814f [Xiangrui Meng] OneVsRest.copy
1dfe3bd [Xiangrui Meng] PipelineModel.copy should copy stages
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Java-friendly APIs added:
* GaussianMixture.run()
* GaussianMixtureModel.predict()
* DistributedLDAModel.javaTopicDistributions()
* StreamingKMeans: trainOn, predictOn, predictOnValues
* Statistics.corr
* params
* added doc to w() since Java docs do not inherit doc
* removed non-Java-friendly w() from StringArrayParam and DoubleArrayParam
* made DoubleArrayParam Java-friendly w() actually Java-friendly
I generated the doc and verified all changes.
CC: mengxr
Author: Joseph K. Bradley <joseph@databricks.com>
Closes #6562 from jkbradley/java-api-1.4 and squashes the following commits:
c16821b [Joseph K. Bradley] Small fixes based on code review.
d955581 [Joseph K. Bradley] unit test fixes
29b6b0d [Joseph K. Bradley] small fixes
fe6dcfe [Joseph K. Bradley] Added several Java-friendly APIs + unit tests: NaiveBayes, GaussianMixture, LDA, StreamingKMeans, Statistics.corr, params
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
accessors
This patch significantly refactors CatalystTypeConverters to both clean up the code and enable these conversions to work with future Project Tungsten features.
At a high level, I've reorganized the code so that all functions dealing with the same type are grouped together into type-specific subclasses of `CatalystTypeConveter`. In addition, I've added new methods that allow the Catalyst Row -> Scala Row conversions to access the Catalyst row's fields through type-specific `getTYPE()` methods rather than the generic `get()` / `Row.apply` methods. This refactoring is a blocker to being able to unit test new operators that I'm developing as part of Project Tungsten, since those operators may output `UnsafeRow` instances which don't support the generic `get()`.
The stricter type usage of types here has uncovered some bugs in other parts of Spark SQL:
- #6217: DescribeCommand is assigned wrong output attributes in SparkStrategies
- #6218: DataFrame.describe() should cast all aggregates to String
- #6400: Use output schema, not relation schema, for data source input conversion
Spark SQL current has undefined behavior for what happens when you try to create a DataFrame from user-specified rows whose values don't match the declared schema. According to the `createDataFrame()` Scaladoc:
> It is important to make sure that the structure of every [[Row]] of the provided RDD matches the provided schema. Otherwise, there will be runtime exception.
Given this, it sounds like it's technically not a break of our API contract to fail-fast when the data types don't match. However, there appear to be many cases where we don't fail even though the types don't match. For example, `JavaHashingTFSuite.hasingTF` passes a column of integers values for a "label" column which is supposed to contain floats. This column isn't actually read or modified as part of query processing, so its actual concrete type doesn't seem to matter. In other cases, there could be situations where we have generic numeric aggregates that tolerate being called with different numeric types than the schema specified, but this can be okay due to numeric conversions.
In the long run, we will probably want to come up with precise semantics for implicit type conversions / widening when converting Java / Scala rows to Catalyst rows. Until then, though, I think that failing fast with a ClassCastException is a reasonable behavior; this is the approach taken in this patch. Note that certain optimizations in the inbound conversion functions for primitive types mean that we'll probably preserve the old undefined behavior in a majority of cases.
Author: Josh Rosen <joshrosen@databricks.com>
Closes #6222 from JoshRosen/catalyst-converters-refactoring and squashes the following commits:
740341b [Josh Rosen] Optimize method dispatch for primitive type conversions
befc613 [Josh Rosen] Add tests to document Option-handling behavior.
5989593 [Josh Rosen] Use new SparkFunSuite base in CatalystTypeConvertersSuite
6edf7f8 [Josh Rosen] Re-add convertToScala(), since a Hive test still needs it
3f7b2d8 [Josh Rosen] Initialize converters lazily so that the attributes are resolved first
6ad0ebb [Josh Rosen] Fix JavaHashingTFSuite ClassCastException
677ff27 [Josh Rosen] Fix null handling bug; add tests.
8033d4c [Josh Rosen] Fix serialization error in UserDefinedGenerator.
85bba9d [Josh Rosen] Fix wrong input data in InMemoryColumnarQuerySuite
9c0e4e1 [Josh Rosen] Remove last use of convertToScala().
ae3278d [Josh Rosen] Throw ClassCastException errors during inbound conversions.
7ca7fcb [Josh Rosen] Comments and cleanup
1e87a45 [Josh Rosen] WIP refactoring of CatalystTypeConverters
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This PR adds a Java unit test and user guide for `StringIndexer`. I put it before `OneHotEncoder` because they are closely related. jkbradley
Author: Xiangrui Meng <meng@databricks.com>
Closes #6561 from mengxr/SPARK-7582 and squashes the following commits:
4bba4f1 [Xiangrui Meng] fix example
ba1cd1b [Xiangrui Meng] fix style
7fa18d1 [Xiangrui Meng] add user guide for StringIndexer
136cb93 [Xiangrui Meng] add a Java unit test for StringIndexer
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This PR adds a section in the user guide for `VectorAssembler` with code examples in Python/Java/Scala. It also adds a unit test in Java.
jkbradley
Author: Xiangrui Meng <meng@databricks.com>
Closes #6556 from mengxr/SPARK-7584 and squashes the following commits:
11313f6 [Xiangrui Meng] simplify Java example
0cd47f3 [Xiangrui Meng] update user guide
fd36292 [Xiangrui Meng] update Java unit test
ce61ca0 [Xiangrui Meng] add Java unit test for VectorAssembler
e399942 [Xiangrui Meng] scala/python example code
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Right now `unit-tests.log` are not of much value because we can't tell where the test boundaries are easily. This patch adds log statements before and after each test to outline the test boundaries, e.g.:
```
===== TEST OUTPUT FOR o.a.s.serializer.KryoSerializerSuite: 'kryo with parallelize for primitive arrays' =====
15/05/27 12:36:39.596 pool-1-thread-1-ScalaTest-running-KryoSerializerSuite INFO SparkContext: Starting job: count at KryoSerializerSuite.scala:230
15/05/27 12:36:39.596 dag-scheduler-event-loop INFO DAGScheduler: Got job 3 (count at KryoSerializerSuite.scala:230) with 4 output partitions (allowLocal=false)
15/05/27 12:36:39.596 dag-scheduler-event-loop INFO DAGScheduler: Final stage: ResultStage 3(count at KryoSerializerSuite.scala:230)
15/05/27 12:36:39.596 dag-scheduler-event-loop INFO DAGScheduler: Parents of final stage: List()
15/05/27 12:36:39.597 dag-scheduler-event-loop INFO DAGScheduler: Missing parents: List()
15/05/27 12:36:39.597 dag-scheduler-event-loop INFO DAGScheduler: Submitting ResultStage 3 (ParallelCollectionRDD[5] at parallelize at KryoSerializerSuite.scala:230), which has no missing parents
...
15/05/27 12:36:39.624 pool-1-thread-1-ScalaTest-running-KryoSerializerSuite INFO DAGScheduler: Job 3 finished: count at KryoSerializerSuite.scala:230, took 0.028563 s
15/05/27 12:36:39.625 pool-1-thread-1-ScalaTest-running-KryoSerializerSuite INFO KryoSerializerSuite:
***** FINISHED o.a.s.serializer.KryoSerializerSuite: 'kryo with parallelize for primitive arrays' *****
...
```
Author: Andrew Or <andrew@databricks.com>
Closes #6441 from andrewor14/demarcate-tests and squashes the following commits:
879b060 [Andrew Or] Fix compile after rebase
d622af7 [Andrew Or] Merge branch 'master' of github.com:apache/spark into demarcate-tests
017c8ba [Andrew Or] Merge branch 'master' of github.com:apache/spark into demarcate-tests
7790b6c [Andrew Or] Fix tests after logical merge conflict
c7460c0 [Andrew Or] Merge branch 'master' of github.com:apache/spark into demarcate-tests
c43ffc4 [Andrew Or] Fix tests?
8882581 [Andrew Or] Fix tests
ee22cda [Andrew Or] Fix log message
fa9450e [Andrew Or] Merge branch 'master' of github.com:apache/spark into demarcate-tests
12d1e1b [Andrew Or] Various whitespace changes (minor)
69cbb24 [Andrew Or] Make all test suites extend SparkFunSuite instead of FunSuite
bbce12e [Andrew Or] Fix manual things that cannot be covered through automation
da0b12f [Andrew Or] Add core tests as dependencies in all modules
f7d29ce [Andrew Or] Introduce base abstract class for all test suites
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
CC jkbradley
Author: Xusen Yin <yinxusen@gmail.com>
Closes #6451 from yinxusen/SPARK-7577 and squashes the following commits:
e2dc32e [Xusen Yin] rename colums
e350e49 [Xusen Yin] add all demos
006ddf1 [Xusen Yin] add java test
3238481 [Xusen Yin] add bucketizer
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Added user guide sections with code examples.
Also added small Java unit tests to test Java example in guide.
CC: mengxr
Author: Joseph K. Bradley <joseph@databricks.com>
Closes #6127 from jkbradley/feature-guide-2 and squashes the following commits:
cd47f4b [Joseph K. Bradley] Updated based on code review
f16bcec [Joseph K. Bradley] Fixed merge issues and update Python examples print calls for Python 3
0a862f9 [Joseph K. Bradley] Added Normalizer, StandardScaler to ml-features doc, plus small Java unit tests
a21c2d6 [Joseph K. Bradley] Updated ml-features.md with IDF
|
|
|
|
|
|
|
|
|
|
| |
We removed `varargs` due to Java compilation issues. That was a false alarm because I didn't run `build/sbt clean`. So this PR reverts the changes. jkbradley
Author: Xiangrui Meng <meng@databricks.com>
Closes #6320 from mengxr/SPARK-7498 and squashes the following commits:
74a7259 [Xiangrui Meng] add varargs back to setDefault
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Added VectorIndexer section to ML user guide. Also added javaCategoryMaps() method and Java unit test for it.
CC: mengxr
Author: Joseph K. Bradley <joseph@databricks.com>
Closes #6255 from jkbradley/vector-indexer-guide and squashes the following commits:
dbb8c4c [Joseph K. Bradley] simplified VectorIndexerModel.javaCategoryMaps
f692084 [Joseph K. Bradley] Added VectorIndexer section to ML user guide. Also added javaCategoryMaps() method and Java unit test for it.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
to be consistent with other string names in MLlib. This PR also updates the implementation to use vals instead of hardcoded strings. jkbradley leahmcguire
Author: Xiangrui Meng <meng@databricks.com>
Closes #6277 from mengxr/SPARK-7752 and squashes the following commits:
f38b662 [Xiangrui Meng] add another case _ back in test
ae5c66a [Xiangrui Meng] model type -> modelType
711d1c6 [Xiangrui Meng] Merge remote-tracking branch 'apache/master' into SPARK-7752
40ae53e [Xiangrui Meng] fix Java test suite
264a814 [Xiangrui Meng] add case _ back
3c456a8 [Xiangrui Meng] update NB user guide
17bba53 [Xiangrui Meng] update naive Bayes to use lowercase model type strings
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
CC jkbradley.
JIRA [issue](https://issues.apache.org/jira/browse/SPARK-7586).
Author: Xusen Yin <yinxusen@gmail.com>
Closes #6181 from yinxusen/SPARK-7586 and squashes the following commits:
77014c5 [Xusen Yin] comment fix
57a4c07 [Xusen Yin] small fix for docs
1178c8f [Xusen Yin] remove the correctness check in java suite
1c3f389 [Xusen Yin] delete sbt commit
1af152b [Xusen Yin] check python example code
1b5369e [Xusen Yin] add docs of word2vec
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
JIRA [here](https://issues.apache.org/jira/browse/SPARK-7581).
CC jkbradley
Author: Xusen Yin <yinxusen@gmail.com>
Closes #6113 from yinxusen/SPARK-7581 and squashes the following commits:
1a7d80d [Xusen Yin] merge with master
892a8e9 [Xusen Yin] fix python 3 compatibility
ec935bf [Xusen Yin] small fix
3e9fa1d [Xusen Yin] delete note
69fcf85 [Xusen Yin] simplify and add python example
81d21dc [Xusen Yin] add programming guide for Polynomial Expansion
40babfb [Xusen Yin] add java test suite for PolynomialExpansion
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A param instance is strongly attached to an parent in the current implementation. So if we make a copy of an estimator or a transformer in pipelines and other meta-algorithms, it becomes error-prone to copy the params to the copied instances. In this PR, a param is identified by its parent's UID and the param name. So it becomes loosely attached to its parent and all its derivatives. The UID is preserved during copying or fitting. All components now have a default constructor and a constructor that takes a UID as input. I keep the constructors for Param in this PR to reduce the amount of diff and moved `parent` as a mutable field.
This PR still needs some clean-ups, and there are several spark.ml PRs pending. I'll try to get them merged first and then update this PR.
jkbradley
Author: Xiangrui Meng <meng@databricks.com>
Closes #6019 from mengxr/SPARK-7407 and squashes the following commits:
c4c8120 [Xiangrui Meng] Merge remote-tracking branch 'apache/master' into SPARK-7407
520f0a2 [Xiangrui Meng] address comments
2569168 [Xiangrui Meng] fix tests
873caca [Xiangrui Meng] fix tests in OneVsRest; fix a racing condition in shouldOwn
409ea08 [Xiangrui Meng] minor updates
83a163c [Xiangrui Meng] update JavaDeveloperApiExample
5db5325 [Xiangrui Meng] update OneVsRest
7bde7ae [Xiangrui Meng] merge master
697fdf9 [Xiangrui Meng] update Bucketizer
7b4f6c2 [Xiangrui Meng] Merge remote-tracking branch 'apache/master' into SPARK-7407
629d402 [Xiangrui Meng] fix LRSuite
154516f [Xiangrui Meng] merge master
aa4a611 [Xiangrui Meng] fix examples/compile
a4794dd [Xiangrui Meng] change Param to use to reduce the size of diff
fdbc415 [Xiangrui Meng] all tests passed
c255f17 [Xiangrui Meng] fix tests in ParamsSuite
818e1db [Xiangrui Meng] merge master
e1160cf [Xiangrui Meng] fix tests
fbc39f0 [Xiangrui Meng] pass test:compile
108937e [Xiangrui Meng] pass compile
8726d39 [Xiangrui Meng] use parent uid in Param
eaeed35 [Xiangrui Meng] update Identifiable
|
|
|
|
|
|
|
|
|
|
| |
`RankingMetrics` contains a ClassTag, which is hard to create in Java. This PR adds a factory method `of` for Java users. coderxiang
Author: Xiangrui Meng <meng@databricks.com>
Closes #6098 from mengxr/SPARK-7528 and squashes the following commits:
e5d57ae [Xiangrui Meng] make RankingMetrics Java-friendly
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Minor cleanups discussed with [~mengxr]:
* move OneVsRest from reduction to classification sub-package
* make model constructor private
Some doc cleanups too
CC: harsha2010 Could you please verify this looks OK? Thanks!
Author: Joseph K. Bradley <joseph@databricks.com>
Closes #6097 from jkbradley/onevsrest-cleanup and squashes the following commits:
4ecd48d [Joseph K. Bradley] org imports
430b065 [Joseph K. Bradley] moved OneVsRest from reduction subpackage to classification. small java doc style fixes
9f8b9b9 [Joseph K. Bradley] Small cleanups to OneVsRest. Made model constructor private to ml package.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Added feature transformer subsection to spark.ml guide, with HashingTF and Tokenizer. Added JavaHashingTFSuite to test Java examples in new guide.
I've run Scala, Python examples in the Spark/PySpark shells. I ran the Java examples via the test suite (with small modifications for printing).
CC: mengxr
Author: Joseph K. Bradley <joseph@databricks.com>
Closes #6093 from jkbradley/hashingtf-guide and squashes the following commits:
d5d213f [Joseph K. Bradley] small fix
dd6e91a [Joseph K. Bradley] fixes from code review of user guide
33c3ff9 [Joseph K. Bradley] small fix
bc6058c [Joseph K. Bradley] fix link
361a174 [Joseph K. Bradley] Added subsection for feature transformers to spark.ml guide, with HashingTF and Tokenizer. Added JavaHashingTFSuite to test Java examples in new guide
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
initial cut of one against all. test code is a scaffolding , not fully implemented.
This WIP is to gather early feedback.
Author: Ram Sriharsha <rsriharsha@hw11853.local>
Closes #5830 from harsha2010/reduction and squashes the following commits:
5f4b495 [Ram Sriharsha] Fix Test
386e98b [Ram Sriharsha] Style fix
49b4a17 [Ram Sriharsha] Simplify the test
02279cc [Ram Sriharsha] Output Label Metadata in Prediction Col
bc78032 [Ram Sriharsha] Code Review Updates
8ce4845 [Ram Sriharsha] Merge with Master
2a807be [Ram Sriharsha] Merge branch 'master' into reduction
e21bfcc [Ram Sriharsha] Style Fix
5614f23 [Ram Sriharsha] Style Fix
c75583a [Ram Sriharsha] Cleanup
7a5f136 [Ram Sriharsha] Fix TODOs
804826b [Ram Sriharsha] Merge with Master
1448a5f [Ram Sriharsha] Style Fix
6e47807 [Ram Sriharsha] Style Fix
d63e46b [Ram Sriharsha] Incorporate Code Review Feedback
ced68b5 [Ram Sriharsha] Refactor OneVsAll to implement Predictor
78fa82a [Ram Sriharsha] extra line
0dfa1fb [Ram Sriharsha] Fix inexhaustive match cases that may arise from UnresolvedAttribute
a59a4f4 [Ram Sriharsha] @Experimental
4167234 [Ram Sriharsha] Merge branch 'master' into reduction
868a4fd [Ram Sriharsha] @Experimental
041d905 [Ram Sriharsha] Code Review Fixes
df188d8 [Ram Sriharsha] Style fix
612ec48 [Ram Sriharsha] Style Fix
6ef43d3 [Ram Sriharsha] Prefer Unresolved Attribute to Option: Java APIs are cleaner
6bf6bff [Ram Sriharsha] Update OneHotEncoder to new API
e29cb89 [Ram Sriharsha] Merge branch 'master' into reduction
1c7fa44 [Ram Sriharsha] Fix Tests
ca83672 [Ram Sriharsha] Incorporate Code Review Feedback + Rename to OneVsRestClassifier
221beeed [Ram Sriharsha] Upgrade to use Copy method for cloning Base Classifiers
26f1ddb [Ram Sriharsha] Merge with SPARK-5956 API changes
9738744 [Ram Sriharsha] Merge branch 'master' into reduction
1a3e375 [Ram Sriharsha] More efficient Implementation: Use withColumn to generate label column dynamically
32e0189 [Ram Sriharsha] Restrict reduction to Margin Based Classifiers
ff272da [Ram Sriharsha] Style fix
28771f5 [Ram Sriharsha] Add Tests for Multiclass to Binary Reduction
b60f874 [Ram Sriharsha] Fix Style issues in Test
3191cdf [Ram Sriharsha] Remove this test, accidental commit
23f056c [Ram Sriharsha] Fix Headers for test
1b5e929 [Ram Sriharsha] Fix Style issues and add Header
8752863 [Ram Sriharsha] [SPARK-7015][MLLib][WIP] Multiclass to Binary Reduction: One Against All
|
|
|
|
|
|
|
|
|
|
|
| |
In SPARK-7429 and PR https://github.com/apache/spark/pull/5960, I added the varargs annotation to Params.setDefault which takes a variable number of ParamPairs. It worked locally and on Jenkins for me.
However, mengxr reported issues compiling on his machine. So I'm reverting the change introduced in https://github.com/apache/spark/pull/5960 by removing varargs.
Author: Joseph K. Bradley <joseph@databricks.com>
Closes #6021 from jkbradley/revert-varargs and squashes the following commits:
098ed39 [Joseph K. Bradley] removed varargs annotation from Params.setDefaults taking multiple ParamPairs
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Params.setDefault taking a set of ParamPairs should be annotated with varargs. I thought it would not work before, but it apparently does.
CrossValidator.transform should call transformSchema since the underlying Model might be a PipelineModel
CC: mengxr
Author: Joseph K. Bradley <joseph@databricks.com>
Closes #5960 from jkbradley/params-cleanups and squashes the following commits:
118b158 [Joseph K. Bradley] Params.setDefault taking a set of ParamPairs should be annotated with varargs. I thought it would not work before, but it apparently does. CrossValidator.transform should call transformSchema since the underlying Model might be a PipelineModel
|