| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
|
|
|
| |
LIBSVM data source instead of MLUtils
I fixed to use LIBSVM data source in the example code in spark.ml instead of MLUtils
Author: y-shimizu <y.shimizu0429@gmail.com>
Closes #8697 from y-shimizu/SPARK-10518.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
implementing the sufficientResourcesRegistered method
spark.scheduler.minRegisteredResourcesRatio configuration parameter works for YARN mode but not for Mesos Coarse grained mode.
If the parameter specified default value of 0 will be set for spark.scheduler.minRegisteredResourcesRatio in base class and this method will always return true.
There are no existing test for YARN mode too. Hence not added test for the same.
Author: Akash Mishra <akash.mishra20@gmail.com>
Closes #8672 from SleepyThread/master.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
From JIRA:
Add documentation for tungsten-sort.
From the mailing list "I saw a new "spark.shuffle.manager=tungsten-sort" implemented in
https://issues.apache.org/jira/browse/SPARK-7081, but it can't be found its
corresponding description in
http://people.apache.org/~pwendell/spark-releases/spark-1.5.0-rc3-docs/configuration.html(Currenlty
there are only 'sort' and 'hash' two options)."
Author: Holden Karau <holden@pigscanfly.ca>
Closes #8638 from holdenk/SPARK-10469-document-tungsten-sort.
|
|
|
|
|
|
|
|
|
|
| |
0.0 (original: 1.0)
Small typo in the example for `LabelledPoint` in the MLLib docs.
Author: Sean Paradiso <seanparadiso@gmail.com>
Closes #8680 from sparadiso/docs_mllib_smalltypo.
|
|
|
|
|
|
|
|
|
|
| |
jira: https://issues.apache.org/jira/browse/SPARK-10249
update user guide since python support added.
Author: Yuhao Yang <hhbyyh@gmail.com>
Closes #8620 from hhbyyh/swPyDocExample.
|
|
|
|
|
|
|
|
|
|
| |
about rate limiting and backpressure
Author: Tathagata Das <tathagata.das1565@gmail.com>
Closes #8656 from tdas/SPARK-10492 and squashes the following commits:
986cdd6 [Tathagata Das] Added information on backpressure
|
|
|
|
|
|
| |
Author: Jacek Laskowski <jacek@japila.pl>
Closes #8629 from jaceklaskowski/docs-fixes.
|
|
|
|
|
|
|
|
| |
… main README.
Author: Stephen Hopper <shopper@shopper-osx.local>
Closes #8646 from enragedginger/master.
|
|
|
|
|
|
|
|
| |
We introduced the Netty network module for shuffle in Spark 1.2, and has turned it on by default for 3 releases. The old ConnectionManager is difficult to maintain. If we merge the patch now, by the time it is released, it would be 1 yr for which ConnectionManager is off by default. It's time to remove it.
Author: Reynold Xin <rxin@databricks.com>
Closes #8161 from rxin/SPARK-9767.
|
|
|
|
|
|
|
|
|
|
|
| |
guides and python docs
- Fixed information around Python API tags in streaming programming guides
- Added missing stuff in python docs
Author: Tathagata Das <tathagata.das1565@gmail.com>
Closes #8595 from tdas/SPARK-10440.
|
|
|
|
|
|
|
|
|
| |
Support running pyspark with cluster mode on Mesos!
This doesn't upload any scripts, so if running in a remote Mesos requires the user to specify the script from a available URI.
Author: Timothy Chen <tnachen@gmail.com>
Closes #8349 from tnachen/mesos_python.
|
|
|
|
|
|
| |
Author: Tom Graves <tgraves@yahoo-inc.com>
Closes #8585 from tgravescs/SPARK-10432.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
SPARK-4223.
Currently we support setting view and modify acls but you have to specify a list of users. It would be nice to support * meaning all users have access.
Manual tests to verify that: "*" works for any user in:
a. Spark ui: view and kill stage. Done.
b. Spark history server. Done.
c. Yarn application killing. Done.
Author: zhuol <zhuol@yahoo-inc.com>
Closes #8398 from zhuoliu/4223.
|
|
|
|
|
|
|
|
|
|
|
|
| |
scripts
Migrate Apache download closer.cgi refs to new closer.lua
This is the bit of the change that affects the project docs; I'm implementing the changes to the Apache site separately.
Author: Sean Owen <sowen@cloudera.com>
Closes #8557 from srowen/SPARK-10398.
|
|
|
|
|
|
|
|
|
|
|
|
| |
* The example code was added in 1.2, before `createDataFrame`. This PR switches to `createDataFrame`. Java code still uses JavaBean.
* assume `sqlContext` is available
* fix some minor issues from previous code review
jkbradley srowen feynmanliang
Author: Xiangrui Meng <meng@databricks.com>
Closes #8518 from mengxr/SPARK-10331.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* replace `ML Dataset` by `DataFrame` to unify the abstraction
* ML algorithms -> pipeline components to describe the main concept
* remove Scala API doc links from the main guide
* `Section Title` -> `Section tile` to be consistent with other section titles in MLlib guide
* modified lines break at 100 chars or periods
jkbradley feynmanliang
Author: Xiangrui Meng <meng@databricks.com>
Closes #8517 from mengxr/SPARK-10348.
|
|
|
|
|
|
| |
Author: GuoQiang Li <witgo@qq.com>
Closes #8520 from witgo/SPARK-10350.
|
|
|
|
|
|
| |
Author: martinzapletal <zapletal-martin@email.cz>
Closes #8377 from zapletal-martin/SPARK-9910.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This PR updates the MLlib user guide and adds migration guide for 1.4->1.5.
* merge migration guide for `spark.mllib` and `spark.ml` packages
* remove dependency section from `spark.ml` guide
* move the paragraph about `spark.mllib` and `spark.ml` to the top and recommend `spark.ml`
* move Sam's talk to footnote to make the section focus on dependencies
Minor changes to code examples and other wording will be in a separate PR.
jkbradley srowen feynmanliang
Author: Xiangrui Meng <meng@databricks.com>
Closes #8498 from mengxr/SPARK-9671.
|
|
|
|
|
|
|
|
|
|
| |
jira: https://issues.apache.org/jira/browse/SPARK-9890
document with Scala and java examples
Author: Yuhao Yang <hhbyyh@gmail.com>
Closes #8487 from hhbyyh/cvDoc.
|
|
|
|
|
|
|
|
| |
Fix DynamodDB/DynamoDB typo in Kinesis Integration doc
Author: Keiji Yoshida <yoshida.keiji.84@gmail.com>
Closes #8501 from yosssi/patch-1.
|
|
|
|
|
|
|
|
|
|
|
| |
* Adds user guide for `LinearRegressionSummary`
* Fixes unresolved issues in #8197
CC jkbradley mengxr
Author: Feynman Liang <fliang@databricks.com>
Closes #8491 from feynmanliang/SPARK-9905.
|
|
|
|
|
|
|
|
| |
I added a small note about the different types of evaluator and the metrics used.
Author: MechCoder <manojkumarsivaraj334@gmail.com>
Closes #8304 from MechCoder/multiclass_evaluator.
|
|
|
|
|
|
|
|
|
|
| |
https://issues.apache.org/jira/browse/SPARK-10287
After porting json to HadoopFsRelation, it seems hard to keep the behavior of picking up new files automatically for JSON. This PR removes this behavior, so JSON is consistent with others (ORC and Parquet).
Author: Yin Huai <yhuai@databricks.com>
Closes #8469 from yhuai/jsonRefresh.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
compatibility test
* Adds user guide for ml.feature.StopWordsRemovers, ran code examples on my machine
* Cleans up scaladocs for public methods
* Adds test for Java compatibility
* Follow up Python user guide code example is tracked by SPARK-10249
Author: Feynman Liang <fliang@databricks.com>
Closes #8436 from feynmanliang/SPARK-10230.
|
|
|
|
|
|
|
|
|
|
| |
User guide for LogisticRegression summaries
Author: MechCoder <manojkumarsivaraj334@gmail.com>
Author: Manoj Kumar <mks542@nyu.edu>
Author: Feynman Liang <fliang@databricks.com>
Closes #8197 from MechCoder/log_summary_user_guide.
|
|
|
|
|
|
|
|
|
|
| |
jira: https://issues.apache.org/jira/browse/SPARK-9901
The jira covers only the document update. I can further provide example code for QR (like the ones for SVD and PCA) in a separate PR.
Author: Yuhao Yang <hhbyyh@gmail.com>
Closes #8462 from hhbyyh/qrDoc.
|
|
|
|
|
|
|
|
|
|
| |
https://issues.apache.org/jira/browse/SPARK-10315
this parameter is not used any longer and there is some mistake in the current document , should be 'akka.remote.watch-failure-detector.threshold'
Author: CodingCat <zhunansjtu@gmail.com>
Closes #8483 from CodingCat/SPARK_10315.
|
|
|
|
|
|
| |
Author: Michael Armbrust <michael@databricks.com>
Closes #8441 from marmbrus/documentation.
|
|
|
|
|
|
|
|
|
| |
Fix Typo in exactly once semantics
[Semantics of output operations] link
Author: Moussa Taifi <moutai10@gmail.com>
Closes #8468 from moutai/patch-3.
|
|
|
|
|
|
| |
Author: Cheng Lian <lian@databricks.com>
Closes #8467 from liancheng/spark-9424/parquet-docs-for-1.5.
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Adds two new sections to LDA's user guide; one for each optimizer/model
* Documents new features added to LDA (e.g. topXXXperXXX, asymmetric priors, hyperpam optimization)
* Cleans up a TODO and sets a default parameter in LDA code
jkbradley hhbyyh
Author: Feynman Liang <fliang@databricks.com>
Closes #8254 from feynmanliang/SPARK-9888.
|
|
|
|
|
|
|
|
|
|
|
| |
jira: https://issues.apache.org/jira/browse/SPARK-8531
Update ML user guide for MinMaxScaler
Author: Yuhao Yang <hhbyyh@gmail.com>
Author: unknown <yuhaoyan@yuhaoyan-MOBL1.ccr.corp.intel.com>
Closes #7211 from hhbyyh/minmaxdoc.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
User guide for spark.ml GBTs and Random Forests.
The examples are copied from the decision tree guide and modified to run.
I caught some issues I had somehow missed in the tree guide as well.
I have run all examples, including Java ones. (Of course, I thought I had previously as well...)
CC: mengxr manishamde yanboliang
Author: Joseph K. Bradley <joseph@databricks.com>
Closes #8369 from jkbradley/ml-ensemble-docs.
|
|
|
|
|
|
|
|
| |
Update `See the Scala example` to `See the Java example`.
Author: Keiji Yoshida <yoshida.keiji.84@gmail.com>
Closes #8376 from yosssi/patch-1.
|
|
|
|
|
|
|
|
| |
Update `lineLengths.persist();` to `lineLengths.persist(StorageLevel.MEMORY_ONLY());` because `JavaRDD#persist` needs a parameter of `StorageLevel`.
Author: Keiji Yoshida <yoshida.keiji.84@gmail.com>
Closes #8372 from yosssi/patch-1.
|
|
|
|
|
|
|
|
|
|
| |
Add user guide for `VectorSlicer`, with Java test suite and Python version VectorSlicer.
Note that Python version does not support selecting by names now.
Author: Xusen Yin <yinxusen@gmail.com>
Closes #8267 from yinxusen/SPARK-9893.
|
|
|
|
|
|
|
|
|
|
| |
Added user guide for multilayer perceptron classifier:
- Simplified description of the multilayer perceptron classifier
- Example code for Scala and Java
Author: Alexander Ulanov <nashb@yandex.ru>
Closes #8262 from avulanov/SPARK-9846-mlpc-docs.
|
|
|
|
|
|
|
|
| |
mengxr
Author: Eric Liang <ekl@databricks.com>
Closes #8293 from ericl/docs-2.
|
|
|
|
|
|
|
|
|
|
| |
This allows skipping the code that tries to talk to Hive and HBase to
fetch delegation tokens, in case that somehow conflicts with the application
being run.
Author: Marcelo Vanzin <vanzin@cloudera.com>
Closes #8134 from vanzin/SPARK-9833.
|
|
|
|
|
|
|
|
|
| |
1, Add Python example for mllib FP-growth user guide.
2, Correct mistakes of Scala and Java examples.
Author: Yanbo Liang <ybliang8@gmail.com>
Closes #8279 from yanboliang/spark-10084.
|
|
|
|
|
|
|
|
|
|
|
|
| |
New user guide section ml-decision-tree.md, including code examples.
I have run all examples, including the Java ones.
CC: manishamde yanboliang mengxr
Author: Joseph K. Bradley <joseph@databricks.com>
Closes #8244 from jkbradley/ml-dt-docs.
|
|
|
|
|
|
|
|
|
| |
By using `StringIndexer`, we can obtain indexed label on new column. So a following estimator should use this new column through pipeline if it wants to use string indexed label.
I think it is better to make it explicit on documentation.
Author: lewuathe <lewuathe@me.com>
Closes #8205 from Lewuathe/SPARK-9977.
|
|
|
|
|
|
|
|
|
|
|
|
| |
`Lists.newArrayList` -> `Arrays.asList`
CC jkbradley feynmanliang
Anybody into replacing usages of `Lists.newArrayList` in the examples / source code too? this method isn't useful in Java 7 and beyond.
Author: Sean Owen <sowen@cloudera.com>
Closes #8272 from srowen/SPARK-10070.
|
|
|
|
|
|
|
|
| |
Link was broken because it included tick marks.
Author: Bill Chambers <wchambers@ischool.berkeley.edu>
Closes #8302 from anabranch/patch-1.
|
|
|
|
|
|
|
|
| |
SPARK-9436 simplifies the Pregel code. graphx-programming-guide needs to be modified accordingly since it lists the old Pregel code
Author: Alexander Ulanov <nashb@yandex.ru>
Closes #7831 from avulanov/SPARK-9508-pregel-doc2.
|
|
|
|
|
|
|
|
| |
cc JoshRosen
Author: Davies Liu <davies@databricks.com>
Closes #8245 from davies/python_doc.
|
|
|
|
|
|
|
|
| |
mengxr jkbradley
Author: Feynman Liang <fliang@databricks.com>
Closes #8184 from feynmanliang/SPARK-9889-DCT-docs.
|
|
|
|
|
|
|
|
|
| |
Add a new test case in yarn/ClientSuite which checks how the various SparkConf
and ClientArguments propagate into the ApplicationSubmissionContext.
Author: Dennis Huo <dhuo@google.com>
Closes #8072 from dennishuo/dhuo-yarn-application-tags.
|
|
|
|
|
|
|
|
| |
See https://issues.apache.org/jira/browse/SPARK-10085
Author: Piotr Migdal <pmigdal@gmail.com>
Closes #8284 from stared/spark-10085.
|