spark - Mirror of Apache Spark

	Commit message (Collapse)	Author	Age	Files	Lines
*	[SPARK-11259][ML] Params.validateParams() should be called automatically	Yanbo Liang	2016-01-04	1	-1/+22
\| \| \| \| \| \| \| \|	See JIRA: https://issues.apache.org/jira/browse/SPARK-11259 Author: Yanbo Liang <ybliang8@gmail.com> Closes #9224 from yanboliang/spark-11259.
*	[SPARK-12424][ML] The implementation of ParamMap#filter is wrong.	Kousuke Saruta	2015-12-29	1	-0/+28
\| \| \| \| \| \| \| \| \|	ParamMap#filter uses `mutable.Map#filterKeys`. The return type of `filterKey` is collection.Map, not mutable.Map but the result is casted to mutable.Map using `asInstanceOf` so we get `ClassCastException`. Also, the return type of Map#filterKeys is not Serializable. It's the issue of Scala (https://issues.scala-lang.org/browse/SI-6654). Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp> Closes #10381 from sarutak/SPARK-12424.
*	[SPARK-12311][CORE] Restore previous value of "os.arch" property in test ↵	Kazuaki Ishizaki	2015-12-24	4	-14/+26
\| \| \| \| \| \| \| \| \| \| \| \|	suites after forcing to set specific value to "os.arch" property Restore the original value of os.arch property after each test Since some of tests forced to set the specific value to os.arch property, we need to set the original value. Author: Kazuaki Ishizaki <ishizaki@jp.ibm.com> Closes #10289 from kiszk/SPARK-12311.
*	[SPARK-12309][ML] Use sqlContext from MLlibTestSparkContext for spark.ml ↵	Yanbo Liang	2015-12-16	5	-11/+5
\| \| \| \| \| \| \| \| \| \| \| \|	test suites Use ```sqlContext``` from ```MLlibTestSparkContext``` rather than creating new one for spark.ml test suites. I have checked thoroughly and found there are four test cases need to update. cc mengxr jkbradley Author: Yanbo Liang <ybliang8@gmail.com> Closes #10279 from yanboliang/spark-12309.
*	[SPARK-10991][ML] logistic regression training summary handle empty ↵	Holden Karau	2015-12-11	1	-0/+11
\| \| \| \| \| \| \| \| \| \| \|	prediction col LogisticRegression training summary should still function if the predictionCol is set to an empty string or otherwise unset (related too https://issues.apache.org/jira/browse/SPARK-9718 ) Author: Holden Karau <holden@pigscanfly.ca> Author: Holden Karau <holden@us.ibm.com> Closes #9037 from holdenk/SPARK-10991-LogisticRegressionTrainingSummary-handle-empty-prediction-col.
*	[SPARK-11530][MLLIB] Return eigenvalues with PCA model	Sean Owen	2015-12-10	3	-5/+15
\| \| \| \| \| \| \| \| \| \|	Add `computePrincipalComponentsAndVariance` to also compute PCA's explained variance. CC mengxr Author: Sean Owen <sowen@cloudera.com> Closes #9736 from srowen/SPARK-11530.
*	[SPARK-10299][ML] word2vec should allow users to specify the window size	Holden Karau	2015-12-09	1	-3/+40
\| \| \| \| \| \| \| \| \|	Currently word2vec has the window hard coded at 5, some users may want different sizes (for example if using on n-gram input or similar). User request comes from http://stackoverflow.com/questions/32231975/spark-word2vec-window-size . Author: Holden Karau <holden@us.ibm.com> Author: Holden Karau <holden@pigscanfly.ca> Closes #8513 from holdenk/SPARK-10299-word2vec-should-allow-users-to-specify-the-window-size.
*	[SPARK-11605][MLLIB] ML 1.6 QA: API: Java compatibility, docs	Yuhao Yang	2015-12-08	1	-12/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	jira: https://issues.apache.org/jira/browse/SPARK-11605 Check Java compatibility for MLlib for this release. fix: 1. `StreamingTest.registerStream` needs java friendly interface. 2. `GradientBoostedTreesModel.computeInitialPredictionAndError` and `GradientBoostedTreesModel.updatePredictionError` has java compatibility issue. Mark them as `developerAPI`. TBD: [updated] no fix for now per discussion. `org.apache.spark.mllib.classification.LogisticRegressionModel` `public scala.Option<java.lang.Object> getThreshold();` has wrong return type for Java invocation. `SVMModel` has the similar issue. Yet adding a `scala.Option<java.util.Double> getThreshold()` would result in an overloading error due to the same function signature. And adding a new function with different name seems to be not necessary. cc jkbradley feynmanliang Author: Yuhao Yang <hhbyyh@gmail.com> Closes #10102 from hhbyyh/javaAPI.
*	[SPARK-11439][ML] Optimization of creating sparse feature without dense one	Nakul Jindal	2015-12-08	2	-96/+124
\| \| \| \| \| \| \| \|	Sparse feature generated in LinearDataGenerator does not create dense vectors as an intermediate any more. Author: Nakul Jindal <njindal@us.ibm.com> Closes #9756 from nakul02/SPARK-11439_sparse_without_creating_dense_feature.
*	[SPARK-11994][MLLIB] Word2VecModel load and save cause SparkException when ↵	Antonio Murgia	2015-12-05	1	-0/+19
\| \| \| \| \| \| \| \|	model is bigger than spark.kryoserializer.buffer.max Author: Antonio Murgia <antonio.murgia2@studio.unibo.it> Closes #9989 from tmnd1991/SPARK-11932.
*	[SPARK-12112][BUILD] Upgrade to SBT 0.13.9	Josh Rosen	2015-12-05	3	-12/+12
\| \| \| \| \| \| \| \| \| \|	We should upgrade to SBT 0.13.9, since this is a requirement in order to use SBT's new Maven-style resolution features (which will be done in a separate patch, because it's blocked by some binary compatibility issues in the POM reader plugin). I also upgraded Scalastyle to version 0.8.0, which was necessary in order to fix a Scala 2.10.5 compatibility issue (see https://github.com/scalastyle/scalastyle/issues/156). The newer Scalastyle is slightly stricter about whitespace surrounding tokens, so I fixed the new style violations. Author: Josh Rosen <joshrosen@databricks.com> Closes #10112 from JoshRosen/upgrade-to-sbt-0.13.9.
*	[SPARK-11847][ML] Model export/import for spark.ml: LDA	Yuhao Yang	2015-11-24	1	-2/+42
\| \| \| \| \| \| \| \| \| \| \|	Add read/write support to LDA, similar to ALS. save/load for ml.LocalLDAModel is done. For DistributedLDAModel, I'm not sure if we can invoke save on the mllib.DistributedLDAModel directly. I'll send update after some test. Author: Yuhao Yang <hhbyyh@gmail.com> Closes #9894 from hhbyyh/ldaMLsave.
*	[SPARK-11902][ML] Unhandled case in VectorAssembler#transform	BenFradet	2015-11-22	1	-0/+11
\| \| \| \| \| \| \| \| \| \| \| \|	There is an unhandled case in the transform method of VectorAssembler if one of the input columns doesn't have one of the supported type DoubleType, NumericType, BooleanType or VectorUDT. So, if you try to transform a column of StringType you get a cryptic "scala.MatchError: StringType". This PR aims to fix this, throwing a SparkException when dealing with an unknown column type. Author: BenFradet <benjamin.fradet@gmail.com> Closes #9885 from BenFradet/SPARK-11902.
*	[SPARK-11912][ML] ml.feature.PCA minor refactor	Yanbo Liang	2015-11-22	1	-18/+13
\| \| \| \| \| \| \| \|	Like [SPARK-11852](https://issues.apache.org/jira/browse/SPARK-11852), ```k``` is params and we should save it under ```metadata/``` rather than both under ```data/``` and ```metadata/```. Refactor the constructor of ```ml.feature.PCAModel``` to take only ```pc``` but construct ```mllib.feature.PCAModel``` inside ```transform```. Author: Yanbo Liang <ybliang8@gmail.com> Closes #9897 from yanboliang/spark-11912.
*	[SPARK-6791][ML] Add read/write for CrossValidator and Evaluators	Joseph K. Bradley	2015-11-22	5	-13/+231
\| \| \| \| \| \| \| \| \| \| \| \|	I believe this works for general estimators within CrossValidator, including compound estimators. (See the complex unit test.) Added read/write for all 3 Evaluators as well. CC: mengxr yanboliang Author: Joseph K. Bradley <joseph@databricks.com> Closes #9848 from jkbradley/cv-io.
*	[SPARK-11852][ML] StandardScaler minor refactor	Yanbo Liang	2015-11-20	1	-7/+4
\| \| \| \| \| \| \| \|	```withStd``` and ```withMean``` should be params of ```StandardScaler``` and ```StandardScalerModel```. Author: Yanbo Liang <ybliang8@gmail.com> Closes #9839 from yanboliang/standardScaler-refactor.
*	[SPARK-11867] Add save/load for kmeans and naive bayes	Xusen Yin	2015-11-19	2	-15/+73
\| \| \| \| \| \| \| \|	https://issues.apache.org/jira/browse/SPARK-11867 Author: Xusen Yin <yinxusen@gmail.com> Closes #9849 from yinxusen/SPARK-11867.
*	[SPARK-11869][ML] Clean up TempDirectory properly in ML tests	Joseph K. Bradley	2015-11-19	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \|	Need to remove parent directory (```className```) rather than just tempDir (```className/random_name```) I tested this with IDFSuite, which has 2 read/write tests, and it fixes the problem. CC: mengxr Can you confirm this is fine? I believe it is since the same ```random_name``` is used for all tests in a suite; we basically have an extra unneeded level of nesting. Author: Joseph K. Bradley <joseph@databricks.com> Closes #9851 from jkbradley/tempdir-cleanup.
*	[SPARK-11829][ML] Add read/write to estimators under ml.feature (II)	Yanbo Liang	2015-11-19	4	-8/+92
\| \| \| \| \| \| \| \| \| \| \| \|	Add read/write support to the following estimators under spark.ml: * ChiSqSelector * PCA * VectorIndexer * Word2Vec Author: Yanbo Liang <ybliang8@gmail.com> Closes #9838 from yanboliang/spark-11829.
*	[SPARK-11846] Add save/load for AFTSurvivalRegression and IsotonicRegression	Xusen Yin	2015-11-19	2	-6/+65
\| \| \| \| \| \| \| \| \| \|	https://issues.apache.org/jira/browse/SPARK-11846 mengxr Author: Xusen Yin <yinxusen@gmail.com> Closes #9836 from yinxusen/SPARK-11846.
*	[SPARK-11842][ML] Small cleanups to existing Readers and Writers	Joseph K. Bradley	2015-11-18	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Updates: * Add repartition(1) to save() methods' saving of data for LogisticRegressionModel, LinearRegressionModel. * Strengthen privacy to class and companion object for Writers and Readers * Change LogisticRegressionSuite read/write test to fit intercept * Add Since versions for read/write methods in Pipeline, LogisticRegression * Switch from hand-written class names in Readers to using getClass CC: mengxr CC: yanboliang Would you mind taking a look at this PR? mengxr might not be able to soon. Thank you! Author: Joseph K. Bradley <joseph@databricks.com> Closes #9829 from jkbradley/ml-io-cleanups.
*	[SPARK-11839][ML] refactor save/write traits	Xiangrui Meng	2015-11-18	2	-15/+16
\| \| \| \| \| \| \| \| \| \| \| \|	* add "ML" prefix to reader/writer/readable/writable to avoid name collision with java.util.* * define `DefaultParamsReadable/Writable` and use them to save some code * use `super.load` instead so people can jump directly to the doc of `Readable.load`, which documents the Java compatibility issues jkbradley Author: Xiangrui Meng <meng@databricks.com> Closes #9827 from mengxr/SPARK-11839.
*	[SPARK-6787][ML] add read/write to estimators under ml.feature (1)	Xiangrui Meng	2015-11-18	5	-22/+129
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add read/write support to the following estimators under spark.ml: * CountVectorizer * IDF * MinMaxScaler * StandardScaler (a little awkward because we store some params in spark.mllib model) * StringIndexer Added some necessary method for read/write. Maybe we should add `private[ml] trait DefaultParamsReadable` and `DefaultParamsWritable` to save some boilerplate code, though we still need to override `load` for Java compatibility. jkbradley Author: Xiangrui Meng <meng@databricks.com> Closes #9798 from mengxr/SPARK-6787.
*	[SPARK-6789][ML] Add Readable, Writable support for spark.ml ALS, ALSModel	Joseph K. Bradley	2015-11-18	1	-9/+69
\| \| \| \| \| \| \| \| \| \|	Also modifies DefaultParamsWriter.saveMetadata to take optional extra metadata. CC: mengxr yanboliang Author: Joseph K. Bradley <joseph@databricks.com> Closes #9786 from jkbradley/als-io.
*	[SPARK-6790][ML] Add spark.ml LinearRegression import/export	Wenjian Huang	2015-11-18	1	-2/+32
\| \| \| \| \| \| \| \| \| \| \| \| \|	This replaces [https://github.com/apache/spark/pull/9656] with updates. fayeshine should be the main author when this PR is committed. CC: mengxr fayeshine Author: Wenjian Huang <nextrush@163.com> Author: Joseph K. Bradley <joseph@databricks.com> Closes #9814 from jkbradley/fayeshine-patch-6790.
*	[SPARK-7013][ML][TEST] Add unit test for spark.ml StandardScaler	RoyGaoVLIS	2015-11-17	1	-0/+108
\| \| \| \| \| \| \| \| \|	I have added unit test for ML's StandardScaler By comparing with R's output, please review for me. Thx. Author: RoyGaoVLIS <roygao@zju.edu.cn> Closes #6665 from RoyGao/7013.
*	[SPARK-11764][ML] make Param.jsonEncode/jsonDecode support Vector	Xiangrui Meng	2015-11-17	1	-4/+18
\| \| \| \| \| \| \| \|	This PR makes the default read/write work with simple transformers/estimators that have params of type `Param[Vector]`. jkbradley Author: Xiangrui Meng <meng@databricks.com> Closes #9776 from mengxr/SPARK-11764.
*	[SPARK-11763][ML] Add save,load to LogisticRegression Estimator	Joseph K. Bradley	2015-11-17	5	-17/+123
\| \| \| \| \| \| \| \| \| \| \| \|	Add save/load to LogisticRegression Estimator, and refactor tests a little to make it easier to add similar support to other Estimator, Model pairs. Moved LogisticRegressionReader/Writer to within LogisticRegressionModel CC: mengxr Author: Joseph K. Bradley <joseph@databricks.com> Closes #9749 from jkbradley/lr-io-2.
*	[SPARK-11769][ML] Add save, load to all basic Transformers	Joseph K. Bradley	2015-11-17	16	-22/+174
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This excludes Estimators and ones which include Vector and other non-basic types for Params or data. This adds: * Bucketizer * DCT * HashingTF * Interaction * NGram * Normalizer * OneHotEncoder * PolynomialExpansion * QuantileDiscretizer * RFormula * SQLTransformer * StopWordsRemover * StringIndexer * Tokenizer * VectorAssembler * VectorSlicer CC: mengxr Author: Joseph K. Bradley <joseph@databricks.com> Closes #9755 from jkbradley/transformer-io.
*	[SPARK-11766][MLLIB] add toJson/fromJson to Vector/Vectors	Xiangrui Meng	2015-11-17	1	-0/+17
\| \| \| \| \| \| \| \|	This is to support JSON serialization of Param[Vector] in the pipeline API. It could be used for other purposes too. The schema is the same as `VectorUDT`. jkbradley Author: Xiangrui Meng <meng@databricks.com> Closes #9751 from mengxr/SPARK-11766.
*	[SPARK-11612][ML] Pipeline and PipelineModel persistence	Joseph K. Bradley	2015-11-16	2	-13/+132
\| \| \| \| \| \| \| \| \| \| \| \|	Pipeline and PipelineModel extend Readable and Writable. Persistence succeeds only when all stages are Writable. Note: This PR reinstates tests for other read/write functionality. It should probably not get merged until [https://issues.apache.org/jira/browse/SPARK-11672] gets fixed. CC: mengxr Author: Joseph K. Bradley <joseph@databricks.com> Closes #9674 from jkbradley/pipeline-io.
*	[MINOR][ML] remove MLlibTestsSparkContext from ImpuritySuite	Xiangrui Meng	2015-11-13	1	-2/+1
\| \| \| \| \| \| \| \|	ImpuritySuite doesn't need SparkContext. Author: Xiangrui Meng <meng@databricks.com> Closes #9698 from mengxr/remove-mllib-test-context-in-impurity-suite.
*	[SPARK-11672][ML] Set active SQLContext in MLlibTestSparkContext.beforeAll	Xiangrui Meng	2015-11-13	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Still saw some error messages caused by `SQLContext.getOrCreate`: https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-SBT/3997/AMPLAB_JENKINS_BUILD_PROFILE=hadoop2.3,label=spark-test/testReport/junit/org.apache.spark.ml.util/JavaDefaultReadWriteSuite/testDefaultReadWrite/ This PR sets the active SQLContext in beforeAll, which is not automatically set in `new SQLContext`. This makes `SQLContext.getOrCreate` return the right SQLContext. cc: yhuai Author: Xiangrui Meng <meng@databricks.com> Closes #9694 from mengxr/SPARK-11672.3.
*	[SPARK-11672][ML] flaky spark.ml read/write tests	Xiangrui Meng	2015-11-12	4	-3/+5
\| \| \| \| \| \| \| \| \| \|	We set `sqlContext = null` in `afterAll`. However, this doesn't change `SQLContext.activeContext` and then `SQLContext.getOrCreate` might use the `SparkContext` from previous test suite and hence causes the error. This PR calls `clearActive` in `beforeAll` and `afterAll` to avoid using an old context from other test suites. cc: yhuai Author: Xiangrui Meng <meng@databricks.com> Closes #9677 from mengxr/SPARK-11672.2.
*	[SPARK-11712][ML] Make spark.ml LDAModel be abstract	Joseph K. Bradley	2015-11-12	1	-2/+2
\| \| \| \| \| \| \| \| \| \|	Per discussion in the initial Pipelines LDA PR [https://github.com/apache/spark/pull/9513], we should make LDAModel abstract and create a LocalLDAModel. This code simplification should be done before the 1.6 release to ensure API compatibility in future releases. CC feynmanliang mengxr Author: Joseph K. Bradley <joseph@databricks.com> Closes #9678 from jkbradley/lda-pipelines-2.
*	[SPARK-11672][ML] disable spark.ml read/write tests	Xiangrui Meng	2015-11-11	3	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \|	Saw several failures on Jenkins, e.g., https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2040/testReport/org.apache.spark.ml.util/JavaDefaultReadWriteSuite/testDefaultReadWrite/. This is the first failure in master build: https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-SBT/3982/ I cannot reproduce it on local. So temporarily disable the tests and I will look into the issue under the same JIRA. I'm going to merge the PR after Jenkins passes compile. Author: Xiangrui Meng <meng@databricks.com> Closes #9641 from mengxr/SPARK-11672.
*	[SPARK-6726][ML] Import/export for spark.ml LogisticRegressionModel	Joseph K. Bradley	2015-11-10	2	-3/+18
\| \| \| \| \| \| \| \| \| \|	This PR adds model save/load for spark.ml's LogisticRegressionModel. It also does minor refactoring of the default save/load classes to reuse code. CC: mengxr Author: Joseph K. Bradley <joseph@databricks.com> Closes #9606 from jkbradley/logreg-io2.
*	[SPARK-5565][ML] LDA wrapper for Pipelines API	Joseph K. Bradley	2015-11-10	1	-0/+221
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This adds LDA to spark.ml, the Pipelines API. It follows the design doc in the JIRA: [https://issues.apache.org/jira/browse/SPARK-5565], with one major change: * I eliminated doc IDs. These are not necessary with DataFrames since the user can add an ID column as needed. Note: This will conflict with [https://github.com/apache/spark/pull/9484], but I'll try to merge [https://github.com/apache/spark/pull/9484] first and then rebase this PR. CC: hhbyyh feynmanliang If you have a chance to make a pass, that'd be really helpful--thanks! Now that I'm done traveling & this PR is almost ready, I'll see about reviewing other PRs critical for 1.6. CC: mengxr Author: Joseph K. Bradley <joseph@databricks.com> Closes #9513 from jkbradley/lda-pipelines.
*	[SPARK-7316][MLLIB] RDD sliding window with step	unknown	2015-11-10	1	-4/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Implementation of step capability for sliding window function in MLlib's RDD. Though one can use current sliding window with step 1 and then filter every Nth window, it will take more time and space (N*data.count times more than needed). For example, below are the results for various windows and steps on 10M data points: Window \| Step \| Time \| Windows produced ------------ \| ------------- \| ---------- \| ---------- 128 \| 1 \| 6.38 \| 9999873 128 \| 10 \| 0.9 \| 999988 128 \| 100 \| 0.41 \| 99999 1024 \| 1 \| 44.67 \| 9998977 1024 \| 10 \| 4.74 \| 999898 1024 \| 100 \| 0.78 \| 99990 ``` import org.apache.spark.mllib.rdd.RDDFunctions._ val rdd = sc.parallelize(1 to 10000000, 10) rdd.count val window = 1024 val step = 1 val t = System.nanoTime(); val windows = rdd.sliding(window, step); println(windows.count); println((System.nanoTime() - t) / 1e9) ``` Author: unknown <ulanov@ULANOV3.americas.hpqcorp.net> Author: Alexander Ulanov <nashb@yandex.ru> Author: Xiangrui Meng <meng@databricks.com> Closes #5855 from avulanov/SPARK-7316-sliding.
*	[SPARK-11069][ML] Add RegexTokenizer option to convert to lowercase	Yuhao Yang	2015-11-09	1	-5/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	jira: https://issues.apache.org/jira/browse/SPARK-11069 quotes from jira: Tokenizer converts strings to lowercase automatically, but RegexTokenizer does not. It would be nice to add an option to RegexTokenizer to convert to lowercase. Proposal: call the Boolean Param "toLowercase" set default to false (so behavior does not change) Actually sklearn converts to lowercase before tokenizing too Author: Yuhao Yang <hhbyyh@gmail.com> Closes #9092 from hhbyyh/tokenLower.
*	[SPARK-6517][MLLIB] Implement the Algorithm of Hierarchical Clustering	Yu ISHIKAWA	2015-11-09	1	-0/+182
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I implemented a hierarchical clustering algorithm again. This PR doesn't include examples, documentation and spark.ml APIs. I am going to send another PRs later. https://issues.apache.org/jira/browse/SPARK-6517 - This implementation based on a bi-sectiong K-means clustering. - It derives from the freeman-lab 's implementation - The basic idea is not changed from the previous version. (#2906) - However, It is 1000x faster than the previous version through parallel processing. Thank you for your great cooperation, RJ Nowling(rnowling), Jeremy Freeman(freeman-lab), Xiangrui Meng(mengxr) and Sean Owen(srowen). Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com> Author: Xiangrui Meng <meng@databricks.com> Author: Yu ISHIKAWA <yu-iskw@users.noreply.github.com> Closes #5267 from yu-iskw/new-hierarchical-clustering.
*	[SPARK-11217][ML] save/load for non-meta estimators and transformers	Xiangrui Meng	2015-11-06	3	-1/+165
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This PR implements the default save/load for non-meta estimators and transformers using the JSON serialization of param values. The saved metadata includes: * class name * uid * timestamp * paramMap The save/load interface is similar to DataFrames. We use the current active context by default, which should be sufficient for most use cases. ~~~scala instance.save("path") instance.write.context(sqlContext).overwrite().save("path") Instance.load("path") ~~~ The param handling is different from the design doc. We didn't save default and user-set params separately, and when we load it back, all parameters are user-set. This does cause issues. But it also cause other issues if we modify the default params. TODOs: * [x] Java test * [ ] a follow-up PR to implement default save/load for all non-meta estimators and transformers cc jkbradley Author: Xiangrui Meng <meng@databricks.com> Closes #9454 from mengxr/SPARK-11217.
*	[SPARK-10116][CORE] XORShiftRandom.hashSeed is random in high bits	Imran Rashid	2015-11-06	3	-8/+26
\| \| \| \| \| \| \| \| \| \| \| \|	https://issues.apache.org/jira/browse/SPARK-10116 This is really trivial, just happened to notice it -- if `XORShiftRandom.hashSeed` is really supposed to have random bits throughout (as the comment implies), it needs to do something for the conversion to `long`. mengxr mkolod Author: Imran Rashid <irashid@cloudera.com> Closes #8314 from squito/SPARK-10116.
*	[SPARK-11514][ML] Pass random seed to spark.ml DecisionTree*	Yu ISHIKAWA	2015-11-05	2	-0/+2
\| \| \| \| \| \| \| \|	cc jkbradley Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com> Closes #9486 from yu-iskw/SPARK-11514.
*	[SPARK-11473][ML] R-like summary statistics with intercept for OLS via ↵	Yanbo Liang	2015-11-05	1	-8/+8
\| \| \| \| \| \| \| \| \| \|	normal equation solver Follow up [SPARK-9836](https://issues.apache.org/jira/browse/SPARK-9836), we should also support summary statistics for ```intercept```. Author: Yanbo Liang <ybliang8@gmail.com> Closes #9485 from yanboliang/spark-11473.
*	[SPARK-11349][ML] Support transform string label for RFormula	Yanbo Liang	2015-11-03	1	-0/+19
\| \| \| \| \| \| \| \| \|	Currently ```RFormula``` can only handle label with ```NumericType``` or ```BinaryType``` (cast it to ```DoubleType``` as the label of Linear Regression training), we should also support label of ```StringType``` which is needed for Logistic Regression (glm with family = "binomial"). For label of ```StringType```, we should use ```StringIndexer``` to transform it to 0-based index. Author: Yanbo Liang <ybliang8@gmail.com> Closes #9302 from yanboliang/spark-11349.
*	[MINOR][ML] Fix naming conventions of AFTSurvivalRegression coefficients	Yanbo Liang	2015-11-03	1	-6/+6
\| \| \| \| \| \| \| \| \|	Rename ```regressionCoefficients``` back to ```coefficients```, and name ```weights``` to ```parameters```. See discussion [here](https://github.com/apache/spark/pull/9311/files#diff-e277fd0bc21f825d3196b4551c01fe5fR230). mengxr vectorijk dbtsai Author: Yanbo Liang <ybliang8@gmail.com> Closes #9431 from yanboliang/aft-coefficients.
*	[SPARK-9836][ML] Provide R-like summary statistics for OLS via normal ↵	Yanbo Liang	2015-11-03	1	-0/+129
\| \| \| \| \| \| \| \| \| \|	equation solver https://issues.apache.org/jira/browse/SPARK-9836 Author: Yanbo Liang <ybliang8@gmail.com> Closes #9413 from yanboliang/spark-9836.
*	[SPARK-10592] [ML] [PySpark] Deprecate weights and use coefficients instead ↵	vectorijk	2015-11-02	5	-174/+186
\| \| \| \| \| \| \| \| \| \|	in ML models Deprecated in `LogisticRegression` and `LinearRegression` Author: vectorijk <jiangkai@gmail.com> Closes #9311 from vectorijk/spark-10592.
*	[SPARK-11207] [ML] Add test cases for solver selection of LinearRegres…	Lewuathe	2015-10-30	1	-75/+97
\| \| \| \| \| \| \| \| \| \| \| \|	…sion as followup. This is the follow up work of SPARK-10668. * Fix miner style issues. * Add test case for checking whether solver is selected properly. Author: Lewuathe <lewuathe@me.com> Author: lewuathe <lewuathe@me.com> Closes #9180 from Lewuathe/SPARK-11207.