| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In running-on-yarn.md, a link to YARN overview is here.
But the URL is to YARN alpha's.
It should be stable's.
Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
Closes #3196 from sarutak/SPARK-4330 and squashes the following commits:
30baa21 [Kousuke Saruta] Fixed running-on-yarn.md to point proper URL for YARN
(cherry picked from commit 3c07b8f08240bafcdff5d174989fb433f4bc80b6)
Signed-off-by: Matei Zaharia <matei@databricks.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch attempts to fix SPARK-2546 in `branch-1.0` and `branch-1.1`. The underlying problem is that thread-safety issues in Hadoop Configuration objects may cause Spark tasks to get stuck in infinite loops. The approach taken here is to clone a new copy of the JobConf for each task rather than sharing a single copy between tasks. Note that there are still Configuration thread-safety issues that may affect the driver, but these seem much less likely to occur in practice and will be more complex to fix (see discussion on the SPARK-2546 ticket).
This cloning is guarded by a new configuration option (`spark.hadoop.cloneConf`) and is disabled by default in order to avoid unexpected performance regressions for workloads that are unaffected by the Configuration thread-safety issues.
Author: Josh Rosen <joshrosen@apache.org>
Closes #2684 from JoshRosen/jobconf-fix-backport and squashes the following commits:
f14f259 [Josh Rosen] Add configuration option to control cloning of Hadoop JobConf.
b562451 [Josh Rosen] Remove unused jobConfCacheKey field.
dd25697 [Josh Rosen] [SPARK-2546] [1.0 / 1.1 backport] Clone JobConf for each task.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Introduced in https://github.com/pwendell/spark/commit/f7e79bc42c1635686c3af01eef147dae92de2529, I'm not sure why we need two spark.executor.memory here.
Author: WangTaoTheTonic <barneystinson@aliyun.com>
Author: WangTao <barneystinson@aliyun.com>
Closes #2745 from WangTaoTheTonic/redundantconfig and squashes the following commits:
e7564dc [WangTao] too long line
fdbdb1f [WangTaoTheTonic] trivial workaround
d06b6e5 [WangTaoTheTonic] remove redundant spark.executor.memory in doc
(cherry picked from commit e7f4ea8a52f0d3d56684b4f9caadce978eac4816)
Signed-off-by: Andrew Or <andrewor14@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There are three [Custom Receiver Guide] links in streaming doc, the first is wrong.
Author: w00228970 <wangfei1@huawei.com>
Author: wangfei <wangfei1@huawei.com>
Closes #2749 from scwf/streaming-doc and squashes the following commits:
0cd76b7 [wangfei] update link tojump to the Akka-specific section
45b0646 [w00228970] wrong link in streaming doc
(cherry picked from commit 92e017fb894be1e8e2b2b5274fec4c31a7a4412e)
Signed-off-by: Josh Rosen <joshrosen@apache.org>
|
|
|
|
|
|
|
|
|
|
|
| |
Author: Brenden Matthews <brenden@diddyinc.com>
Closes #2401 from brndnmtthws/master and squashes the following commits:
4abaa5d [Brenden Matthews] [SPARK-3535][Mesos] Fix resource handling.
(cherry picked from commit a8c52d5343e19731909e73db5de151a324d31cd5)
Signed-off-by: Andrew Or <andrewor14@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Update of PR #997.
With this PR, setting SPARK_CONF_DIR overrides SPARK_HOME/conf (not only spark-defaults.conf and spark-env).
Author: EugenCepoi <cepoi.eugen@gmail.com>
Closes #2481 from EugenCepoi/SPARK-2058 and squashes the following commits:
0bb32c2 [EugenCepoi] use orElse orNull and fixing trailing percent in compute-classpath.cmd
77f35d7 [EugenCepoi] SPARK-2058: Overriding SPARK_HOME/conf with SPARK_CONF_DIR
(cherry picked from commit f0811f928e5b608e1a2cba3b6828ba0ed03b701d)
Signed-off-by: Andrew Or <andrewor14@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
programming guide.
We have changed the output format of `printSchema`. This PR will update our SQL programming guide to show the updated format. Also, it fixes a typo (the value type of `StructType` in Java API).
Author: Yin Huai <huai@cse.ohio-state.edu>
Closes #2630 from yhuai/sqlDoc and squashes the following commits:
267d63e [Yin Huai] Update the output of printSchema and fix a typo.
(cherry picked from commit 82a6a083a485140858bcd93d73adec59bb5cca64)
Signed-off-by: Michael Armbrust <michael@databricks.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
https://issues.apache.org/jira/browse/SPARK-3715
Author: WangTaoTheTonic <barneystinson@aliyun.com>
Closes #2567 from WangTaoTheTonic/minortypo and squashes the following commits:
9cc3f7a [WangTaoTheTonic] minor typo
(cherry picked from commit 1f13a40ccd5a869aec62788a1e345dc24fa648c8)
Signed-off-by: Michael Armbrust <michael@databricks.com>
|
|
|
|
|
|
|
|
|
|
|
| |
Author: CrazyJvm <crazyjvm@gmail.com>
Closes #2540 from CrazyJvm/standalone-core and squashes the following commits:
66d9fc6 [CrazyJvm] use "--total-executor-cores" rather than "--cores" after spark-shell
(cherry picked from commit 66107f46f374f83729cd79ab260eb59fa123c041)
Signed-off-by: Andrew Or <andrewor14@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
| |
Author: Grega Kespret <grega.kespret@gmail.com>
Closes #2479 from gregakespret/patch-1 and squashes the following commits:
dd6b90a [Grega Kespret] Update docs to use jsonRDD instead of wrong jsonRdd.
(cherry picked from commit 56dae30ca70489a62686cb245728b09b2179bb5a)
Signed-off-by: Michael Armbrust <michael@databricks.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Extraction guide
Author: RJ Nowling <rnowling@gmail.com>
Closes #2459 from rnowling/tfidf-fix and squashes the following commits:
b370a91 [RJ Nowling] Fix variable name misspelling in MLLib Feature Extraction guide
(cherry picked from commit fec921552ffccc36937214406b3e4a050eb0d8e0)
Signed-off-by: Xiangrui Meng <meng@databricks.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is now supported!
Author: andrewor14 <andrewor14@gmail.com>
Author: Andrew Or <andrewor14@gmail.com>
Closes #2461 from andrewor14/document-standalone-cluster and squashes the following commits:
85c8b9e [andrewor14] Wording change per Patrick
35e30ee [Andrew Or] Fix outdated docs for standalone cluster
(cherry picked from commit 8af2370619a8a6bb1af7df43b8329ab319348ad8)
Signed-off-by: Andrew Or <andrewor14@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
https://issues.apache.org/jira/browse/SPARK-3565
"spark.ports.maxRetries" should be "spark.port.maxRetries". Make the configuration keys in document and code consistent.
Author: WangTaoTheTonic <barneystinson@aliyun.com>
Closes #2427 from WangTaoTheTonic/fixPortRetries and squashes the following commits:
c178813 [WangTaoTheTonic] Use blank lines trigger Jenkins
646f3fe [WangTaoTheTonic] also in SparkBuild.scala
3700dba [WangTaoTheTonic] Fix configuration item not consistent with document
(cherry picked from commit 3f169bfe3c322bf4344e13276dbbe34279b59ad0)
Signed-off-by: Patrick Wendell <pwendell@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Makes the table of contents read better
Author: Andrew Ash <andrew@andrewash.com>
Closes #2402 from ash211/docs/better-indentation and squashes the following commits:
ea0e130 [Andrew Ash] Move HA subsections to a deeper indentation level
(cherry picked from commit b3830b28f8a70224d87c89d8491c514c4c191d23)
Signed-off-by: Andrew Or <andrewor14@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
| |
Author: Michael Armbrust <michael@databricks.com>
Closes #2434 from marmbrus/patch-1 and squashes the following commits:
67215be [Michael Armbrust] [SQL][DOCS] Improve table caching section
(cherry picked from commit cbf983bb4a550ff26756ed7308fb03db42cffcff)
Signed-off-by: Michael Armbrust <michael@databricks.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Taken from liancheng's updates. Merged conflicts with #2316.
Author: Michael Armbrust <michael@databricks.com>
Closes #2384 from marmbrus/sqlDocUpdate and squashes the following commits:
2db6319 [Michael Armbrust] @liancheng's updates
(cherry picked from commit 84073eb1172dc959936149265378f6e24d303685)
Signed-off-by: Michael Armbrust <michael@databricks.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Fixed random typo
* Added in missing description for DecimalType
Author: Nicholas Chammas <nicholas.chammas@gmail.com>
Closes #2367 from nchammas/patch-1 and squashes the following commits:
aa528be [Nicholas Chammas] doc fix for SQL DecimalType
3247ac1 [Nicholas Chammas] [SQL] [Docs] typo fixes
(cherry picked from commit a523ceaf159733dabcef84c7adc1463546679f65)
Signed-off-by: Michael Armbrust <michael@databricks.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
Author: Henry Cook <hcook@eecs.berkeley.edu>
Closes #2316 from hcook/sql-docs and squashes the following commits:
373f94b [Henry Cook] Minor edits to sql programming guide.
(cherry picked from commit 26bc7655de18ab0191ded3f75cb77bc756dc1c03)
Signed-off-by: Michael Armbrust <michael@databricks.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
See compiled doc at
http://people.apache.org/~rxin/tmp/openstack-swift/_site/storage-openstack-swift.html
This is based on #1010. Closes #1010.
Author: Reynold Xin <rxin@apache.org>
Author: Gil Vernik <gilv@il.ibm.com>
Closes #2298 from rxin/openstack-swift and squashes the following commits:
ff4e394 [Reynold Xin] Two minor comments from Patrick.
279f6de [Reynold Xin] core-sites -> core-site
dfb8fea [Reynold Xin] Updated based on Gil's suggestion.
846f5cb [Reynold Xin] Added a link from overview page.
0447c9f [Reynold Xin] Removed sample code.
e9c3761 [Reynold Xin] Merge pull request #1010 from gilv/master
9233fef [Gil Vernik] Fixed typos
6994827 [Gil Vernik] Merge pull request #1 from rxin/openstack
ac0679e [Reynold Xin] Fixed an unclosed tr.
47ce99d [Reynold Xin] Merge branch 'master' into openstack
cca7192 [Gil Vernik] Removed white spases from pom.xml
99f095d [Reynold Xin] Pending openstack changes.
eb22295 [Reynold Xin] Merge pull request #1010 from gilv/master
39a9737 [Gil Vernik] Spark integration with Openstack Swift
c977658 [Gil Vernik] Merge branch 'master' of https://github.com/gilv/spark
2aba763 [Gil Vernik] Fix to docs/openstack-integration.md
9b625b5 [Gil Vernik] Merge branch 'master' of https://github.com/gilv/spark
eff538d [Gil Vernik] SPARK-938 - Openstack Swift object storage support
ce483d7 [Gil Vernik] SPARK-938 - Openstack Swift object storage support
b6c37ef [Gil Vernik] Openstack Swift support
(cherry picked from commit eddfeddac19870fc265ef406d87e1c3db9b54249)
Signed-off-by: Patrick Wendell <pwendell@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Author: Michael Armbrust <michael@databricks.com>
Author: Yin Huai <huai@cse.ohio-state.edu>
Closes #2258 from marmbrus/sqlDocUpdate and squashes the following commits:
f3d450b [Michael Armbrust] fix brackets
bea3bfa [Michael Armbrust] Davies suggestions
3a29fe2 [Michael Armbrust] tighten visibility
a71aa36 [Michael Armbrust] Draft of doc updates
52932c0 [Michael Armbrust] Merge remote-tracking branch 'origin/master' into sqlDocUpdate
1e8c849 [Yin Huai] Update the example used for applySchema.
9457c39 [Yin Huai] Update doc.
31ba240 [Yin Huai] Merge remote-tracking branch 'upstream/master' into dataTypeDoc
29bc668 [Yin Huai] Draft doc for data type and schema APIs.
(cherry picked from commit 39db1bfdab434c867044ad4c70fe93a96fb287ad)
Signed-off-by: Michael Armbrust <michael@databricks.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- Improvements to the kinesis integration guide from @cfregly
- More information about unified input dstreams in main guide
Author: Tathagata Das <tathagata.das1565@gmail.com>
Author: Chris Fregly <chris@fregly.com>
Closes #2307 from tdas/streaming-doc-fix1 and squashes the following commits:
ec40b5d [Tathagata Das] Updated figure with kinesis
fdb9c5e [Tathagata Das] Fixed style issues with kinesis guide
036d219 [Chris Fregly] updated kinesis docs and added an arch diagram
24f622a [Tathagata Das] More modifications.
(cherry picked from commit baff7e936101635d9bd4245e45335878bafb75e0)
Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Updated the main streaming programming guide, and also added source-specific guides for Kafka, Flume, Kinesis.
Author: Tathagata Das <tathagata.das1565@gmail.com>
Author: Jacek Laskowski <jacek@japila.pl>
Closes #2254 from tdas/streaming-doc-fix and squashes the following commits:
e45c6d7 [Jacek Laskowski] More fixes from an old PR
5125316 [Tathagata Das] Fixed links
dc02f26 [Tathagata Das] Refactored streaming kinesis guide and made many other changes.
acbc3e3 [Tathagata Das] Fixed links between streaming guides.
cb7007f [Tathagata Das] Added Streaming + Flume integration guide.
9bd9407 [Tathagata Das] Updated streaming programming guide with additional information from SPARK-2419.
(cherry picked from commit a5224079286d1777864cf9fa77330aadae10cd7b)
Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- Include kinesis in the unidocs
- Hide non-public classes from docs
Author: Tathagata Das <tathagata.das1565@gmail.com>
Closes #2239 from tdas/kinesis-doc-fix and squashes the following commits:
156e20c [Tathagata Das] More fixes, based on PR comments.
e9a6c01 [Tathagata Das] Fixed docs related to kinesis
(cherry picked from commit e9bb12bea9fbef94332fbec88e3cd9197a27b7ad)
Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts #1899 and #2163, two patches that modified `spark-ec2` so that clusters are identified using tags instead of security groups. The original motivation for this patch was to allow multiple clusters to run in the same security group.
Unfortunately, tagging is not atomic with launching instances on EC2, so with this approach we have the possibility of `spark-ec2` launching instances and crashing before they can be tagged, effectively orphaning those instances. The orphaned instances won't belong to any cluster, so the `spark-ec2` script will be unable to clean them up.
Since this feature may still be worth supporting, there are several alternative approaches that we might consider, including detecting orphaned instances and logging warnings, or maybe using another mechanism to group instances into clusters. For the 1.1.0 release, though, I propose that we just revert this patch.
Author: Josh Rosen <joshrosen@apache.org>
Closes #2225 from JoshRosen/revert-ec2-cluster-naming and squashes the following commits:
0c18e86 [Josh Rosen] Revert "SPARK-2333 - spark_ec2 script should allow option for existing security group"
c2ca2d4 [Josh Rosen] Revert "Spark-3213 Fixes issue with spark-ec2 not detecting slaves created with "Launch More like this""
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As [reported on the dev list](http://apache-spark-developers-list.1001551.n3.nabble.com/VOTE-Release-Apache-Spark-1-1-0-RC2-tp8107p8131.html):
* Code fencing with triple-backticks doesn’t seem to work like it does on GitHub. Newlines are lost. Instead, use 4-space indent to format small code blocks.
* Nested bullets need 2 leading spaces, not 1.
* Spellcheck!
Author: Nicholas Chammas <nicholas.chammas@gmail.com>
Author: nchammas <nicholas.chammas@gmail.com>
Closes #2201 from nchammas/sql-doc-fixes and squashes the following commits:
873f889 [Nicholas Chammas] [Docs] fix skip-api flag
5195e0c [Nicholas Chammas] [Docs] SQL doc formatting and typo fixes
3b26c8d [nchammas] [Spark QA] Link to console output on test time out
(cherry picked from commit 53aa8316e88980c6f46d3b9fc90d935a4738a370)
Signed-off-by: Michael Armbrust <michael@databricks.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The executors and the driver may not share the same Spark home. There is currently one way to set the executor side Spark home in Mesos, through setting `spark.home`. However, this is neither documented nor intuitive. This PR adds a more specific config `spark.mesos.executor.home` and exposes this to the user.
liancheng tnachen
Author: Andrew Or <andrewor14@gmail.com>
Closes #2166 from andrewor14/mesos-spark-home and squashes the following commits:
b87965e [Andrew Or] Merge branch 'master' of github.com:apache/spark into mesos-spark-home
f6abb2e [Andrew Or] Document spark.mesos.executor.home
ca7846d [Andrew Or] Add more specific configuration for executor Spark home in Mesos
(cherry picked from commit 41dc5987d9abeca6fc0f5935c780d48f517cdf95)
Signed-off-by: Andrew Or <andrewor14@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The only updates are in DecisionTree.
CC: mengxr
Author: Joseph K. Bradley <joseph.kurata.bradley@gmail.com>
Closes #2146 from jkbradley/mllib-migration and squashes the following commits:
5a1f487 [Joseph K. Bradley] small edit to doc
411d6d9 [Joseph K. Bradley] Added migration guide for v1.0 to v1.1. The only updates are in DecisionTree.
(cherry picked from commit 171a41cb034f4ea80f6a3c91a6872970de16a14a)
Signed-off-by: Xiangrui Meng <meng@databricks.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
1. renamed mllib-basics to mllib-data-types
1. renamed mllib-stats to mllib-statistics
1. moved random data generation to the bottom of mllib-stats
1. updated toc accordingly
atalwalkar
Author: Xiangrui Meng <meng@databricks.com>
Closes #2151 from mengxr/mllib-doc-1.1 and squashes the following commits:
0bd79f3 [Xiangrui Meng] add mllib-data-types
b64a5d7 [Xiangrui Meng] update the content list of basis statistics in mllib-guide
f625cc2 [Xiangrui Meng] move mllib-basics to mllib-data-types
4d69250 [Xiangrui Meng] move random data generation to the bottom of statistics
e64f3ce [Xiangrui Meng] move mllib-stats.md to mllib-statistics.md
(cherry picked from commit 43dfc84f883822ea27b6e312d4353bf301c2e7ef)
Signed-off-by: Xiangrui Meng <meng@databricks.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Documentation updated for the Statistics Toolkit of MLlib. mengxr atalwalkar
https://issues.apache.org/jira/browse/SPARK-2839
P.S. Accidentally closed #2123. New commits didn't show up after I reopened the PR. I've opened this instead and closed the old one.
Author: Burak <brkyvz@gmail.com>
Closes #2130 from brkyvz/StatsLib-Docs and squashes the following commits:
a54a855 [Burak] [SPARK-2839][MLlib] Addressed comments
bfc6896 [Burak] [SPARK-2839][MLlib] Added a more specific link to colStats() for pyspark
213fe3f [Burak] [SPARK-2839][MLlib] Modifications made according to review
fec4d9d [Burak] [SPARK-2830][MLlib] Stats Toolkit documentation updated
(cherry picked from commit 1208f72ac78960fe5060187761479b2a9a417c1b)
Signed-off-by: Xiangrui Meng <meng@databricks.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
to mention `-Pnetlib-lgpl` option. atalwalkar
Author: Xiangrui Meng <meng@databricks.com>
Closes #2128 from mengxr/mllib-native and squashes the following commits:
4cbba57 [Xiangrui Meng] update mllib dependencies
(cherry picked from commit adbd5c1636669fc474ab02b54cd1ced353f68712)
Signed-off-by: Xiangrui Meng <meng@databricks.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It should be `spark-env.sh` rather than `spark.env.sh`.
Author: Cheng Lian <lian.cs.zju@gmail.com>
Closes #2119 from liancheng/fix-mesos-doc and squashes the following commits:
f360548 [Cheng Lian] Fixed a typo in docs/running-on-mesos.md
(cherry picked from commit 805fec845b7aa8b4763e3e0e34bec6c3872469f4)
Signed-off-by: Josh Rosen <joshrosen@apache.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Update the documentation to reflect the fact we can handle roughly square matrices.
Author: Reza Zadeh <rizlar@gmail.com>
Closes #2070 from rezazadeh/svddocs and squashes the following commits:
826b8fe [Reza Zadeh] left singular vectors
3f34fc6 [Reza Zadeh] PCA is still TS
7ffa2aa [Reza Zadeh] better title
aeaf39d [Reza Zadeh] More docs
788ed13 [Reza Zadeh] add computational cost explanation
6429c59 [Reza Zadeh] Add link to rowmatrix docs
1eeab8b [Reza Zadeh] Update SVD documentation to reflect roughly square
(cherry picked from commit b1b20301b3a1b35564d61e58eb5964d5ad5e4d7d)
Signed-off-by: Xiangrui Meng <meng@databricks.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Documentation for newly added feature transformations:
1. TF-IDF
2. StandardScaler
3. Normalizer
Author: DB Tsai <dbtsai@alpinenow.com>
Closes #2068 from dbtsai/transformer-documentation and squashes the following commits:
109f324 [DB Tsai] address feedback
(cherry picked from commit 572952ae615895efaaabcd509d582262000c0852)
Signed-off-by: Xiangrui Meng <meng@databricks.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
and Thrift JDBC server is absent in proper document -
The most important things I mentioned in #1885 is as follows.
* People who build Spark is not always programmer.
* If a person who build Spark is not a programmer, he/she won't read programmer's guide before building.
So, how to build for using CLI and JDBC server is not only in programmer's guide.
Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
Closes #2080 from sarutak/SPARK-2963 and squashes the following commits:
ee07c76 [Kousuke Saruta] Modified regression of the description about building for using Thrift JDBC server and CLI
ed53329 [Kousuke Saruta] Modified description and notaton of proper noun
07c59fc [Kousuke Saruta] Added a description about how to build to use HiveServer and CLI for SparkSQL to building-with-maven.md
6e6645a [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into SPARK-2963
c88fa93 [Kousuke Saruta] Added a description about building to use HiveServer and CLI for SparkSQL
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Updated DecisionTree documentation, with examples for Java, Python.
Added same Java example to code as well.
CC: @mengxr @manishamde @atalwalkar
Author: Joseph K. Bradley <joseph.kurata.bradley@gmail.com>
Closes #2063 from jkbradley/dt-docs and squashes the following commits:
2dd2c19 [Joseph K. Bradley] Last updates based on github review.
9dd1b6b [Joseph K. Bradley] Updated decision tree doc.
d802369 [Joseph K. Bradley] Updates based on comments: cache data, corrected doc text.
b9bee04 [Joseph K. Bradley] Updated DT examples
57eee9f [Joseph K. Bradley] Created JavaDecisionTree example from example in docs, and corrected doc example as needed.
d939a92 [Joseph K. Bradley] Updated DecisionTree documentation. Added Java, Python examples.
(cherry picked from commit 050f8d01e47b9b67b02ce50d83fb7b4e528b7204)
Signed-off-by: Xiangrui Meng <meng@databricks.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
atalwalkar srowen
Author: Xiangrui Meng <meng@databricks.com>
Closes #2064 from mengxr/als-doc and squashes the following commits:
b2e20ab [Xiangrui Meng] introduced -> discussed
98abdd7 [Xiangrui Meng] add reference
339bd08 [Xiangrui Meng] add a section about regularization parameter in ALS
(cherry picked from commit e0f946265b9ea5bc48849cf7794c2c03d5e29fba)
Signed-off-by: Xiangrui Meng <meng@databricks.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Moved TF-IDF before Word2Vec because the former is more basic. I also added a link for Word2Vec. atalwalkar
Author: Xiangrui Meng <meng@databricks.com>
Closes #2061 from mengxr/tfidf-doc and squashes the following commits:
ca04c70 [Xiangrui Meng] address comments
a5ea4b4 [Xiangrui Meng] add tf-idf user guide
(cherry picked from commit e1571874f26c1df2dfd5ac2959612372716cd2d8)
Signed-off-by: Xiangrui Meng <meng@databricks.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently we have a separate profile called hive-thriftserver. I originally suggested this in case users did not want to bundle the thriftserver, but it's ultimately lead to a lot of confusion. Since the thriftserver is only a few classes, I don't see a really good reason to isolate it from the rest of Hive. So let's go ahead and just include it in the same profile to simplify things.
This has been suggested in the past by liancheng.
Author: Patrick Wendell <pwendell@gmail.com>
Closes #2006 from pwendell/hiveserver and squashes the following commits:
742ea40 [Patrick Wendell] Merge remote-tracking branch 'apache/master' into hiveserver
034ad47 [Patrick Wendell] SPARK-3092: Always include the thriftserver when -Phive is enabled.
(cherry picked from commit f2f26c2a1dc6d60078c3be9c3d11a21866d9a24f)
Signed-off-by: Patrick Wendell <pwendell@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
| |
Author: Ken Takagiwa <ugw.gi.world@gmail.com>
Closes #2042 from giwa/patch-1 and squashes the following commits:
216fe0e [Ken Takagiwa] Fixed wrong links
(cherry picked from commit 8a74e4b2a8c7dab154b406539487cf29d578d208)
Signed-off-by: Reynold Xin <rxin@apache.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
because NB treats feature values as term frequencies. jkbradley
Author: Xiangrui Meng <meng@databricks.com>
Closes #2038 from mengxr/nb-neg and squashes the following commits:
52c37c3 [Xiangrui Meng] address comments
65f892d [Xiangrui Meng] detect negative values in nb
(cherry picked from commit 068b6fe6a10eb1c6b2102d88832203267f030e85)
Signed-off-by: Xiangrui Meng <meng@databricks.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Added a documentation section on StreamingLR to the ``MLlib - Linear Methods``, including a worked example.
mengxr tdas
Author: freeman <the.freeman.lab@gmail.com>
Closes #2047 from freeman-lab/streaming-lr-docs and squashes the following commits:
568d250 [freeman] Tweaks to wording / formatting
05a1139 [freeman] Added documentation and example for StreamingLR
(cherry picked from commit c7252b0097cfacd36f17357d195b12a59e503b35)
Signed-off-by: Xiangrui Meng <meng@databricks.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Though we don't use default argument for methods in RandomRDDs, it is still not easy for Java users to use because the output type is either `RDD[Double]` or `RDD[Vector]`. Java users should expect `JavaDoubleRDD` and `JavaRDD[Vector]`, respectively. We should create dedicated methods for Java users, and allow default arguments in Scala methods in RandomRDDs, to make life easier for both Java and Scala users. This PR also contains documentation for random data generation. brkyvz
Author: Xiangrui Meng <meng@databricks.com>
Closes #2041 from mengxr/stat-doc and squashes the following commits:
fc5eedf [Xiangrui Meng] add missing comma
ffde810 [Xiangrui Meng] address comments
aef6d07 [Xiangrui Meng] add doc for random data generation
b99d94b [Xiangrui Meng] add java-friendly methods to RandomRDDs
(cherry picked from commit 825d4fe47b9c4d48de88622dd48dcf83beb8b80a)
Signed-off-by: Xiangrui Meng <meng@databricks.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- Uses the name tag to identify machines in a cluster.
- Allows overriding the security group name so it doesn't need to coincide with the cluster name.
- Outputs the request id's of up to 10 pending spot instance requests.
Author: Vida Ha <vida@databricks.com>
Closes #1899 from vidaha/vida/ec2-reuse-security-group and squashes the following commits:
c80d5c3 [Vida Ha] wrap retries in a try catch block
b2989d5 [Vida Ha] SPARK-2333: spark_ec2 script should allow option for existing security group
(cherry picked from commit 94053a7b766788bb62e2dbbf352ccbcc75f71fc0)
Signed-off-by: Josh Rosen <joshrosen@apache.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Candidate splits were inconsistent with the example.
Author: Matt Forbes <matt@tellapart.com>
Closes #1837 from emef/tree-doc and squashes the following commits:
3be14a1 [Matt Forbes] Fix typo in decision tree docs
(cherry picked from commit cd0720ca77894d481fb73a8b5bb517013843cb1e)
Signed-off-by: Xiangrui Meng <meng@databricks.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This definitely needs review as I am not familiar with this part of Spark.
I tested this locally and it did seem to work.
Author: Patrick Wendell <pwendell@gmail.com>
Closes #1937 from pwendell/scheduler and squashes the following commits:
b858e33 [Patrick Wendell] SPARK-3025: Allow JDBC clients to set a fair scheduler pool
(cherry picked from commit 6bca8898a1aa4ca7161492229bac1748b3da2ad7)
Signed-off-by: Michael Armbrust <michael@databricks.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
mengxr
Documentation for Word2Vec
Author: Liquan Pei <liquanpei@gmail.com>
Closes #2003 from Ishiihara/Word2Vec-doc and squashes the following commits:
4ff11d4 [Liquan Pei] minor fix
8d7458f [Liquan Pei] code reformat
6df0dcb [Liquan Pei] add Word2Vec documentation
(cherry picked from commit eef779b8d631de971d440051cae21040f4de558f)
Signed-off-by: Xiangrui Meng <meng@databricks.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
fixed markup, separated out sections more-clearly, more thorough explanations
Author: Chris Fregly <chris@fregly.com>
Closes #1757 from cfregly/master and squashes the following commits:
9b1c71a [Chris Fregly] better explained why spark checkpoints are disabled in the example (due to no stateful operations being used)
0f37061 [Chris Fregly] SPARK-1981: (Kinesis streaming support) updated streaming-kinesis.md
862df67 [Chris Fregly] Merge remote-tracking branch 'upstream/master'
8e1ae2e [Chris Fregly] Merge remote-tracking branch 'upstream/master'
4774581 [Chris Fregly] updated docs, renamed retry to retryRandom to be more clear, removed retries around store() method
0393795 [Chris Fregly] moved Kinesis examples out of examples/ and back into extras/kinesis-asl
691a6be [Chris Fregly] fixed tests and formatting, fixed a bug with JavaKinesisWordCount during union of streams
0e1c67b [Chris Fregly] Merge remote-tracking branch 'upstream/master'
74e5c7c [Chris Fregly] updated per TD's feedback. simplified examples, updated docs
e33cbeb [Chris Fregly] Merge remote-tracking branch 'upstream/master'
bf614e9 [Chris Fregly] per matei's feedback: moved the kinesis examples into the examples/ dir
d17ca6d [Chris Fregly] per TD's feedback: updated docs, simplified the KinesisUtils api
912640c [Chris Fregly] changed the foundKinesis class to be a publically-avail class
db3eefd [Chris Fregly] Merge remote-tracking branch 'upstream/master'
21de67f [Chris Fregly] Merge remote-tracking branch 'upstream/master'
6c39561 [Chris Fregly] parameterized the versions of the aws java sdk and kinesis client
338997e [Chris Fregly] improve build docs for kinesis
828f8ae [Chris Fregly] more cleanup
e7c8978 [Chris Fregly] Merge remote-tracking branch 'upstream/master'
cd68c0d [Chris Fregly] fixed typos and backward compatibility
d18e680 [Chris Fregly] Merge remote-tracking branch 'upstream/master'
b3b0ff1 [Chris Fregly] [SPARK-1981] Add AWS Kinesis streaming support
(cherry picked from commit 99243288b049f4a4fb4ba0505ea2310be5eb4bd2)
Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
|