spark - Mirror of Apache Spark

	Commit message (Collapse)	Author	Age	Files	Lines
*	[SPARK-4330][Doc] Link to proper URL for YARN overview	Kousuke Saruta	2014-11-10	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In running-on-yarn.md, a link to YARN overview is here. But the URL is to YARN alpha's. It should be stable's. Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp> Closes #3196 from sarutak/SPARK-4330 and squashes the following commits: 30baa21 [Kousuke Saruta] Fixed running-on-yarn.md to point proper URL for YARN (cherry picked from commit 3c07b8f08240bafcdff5d174989fb433f4bc80b6) Signed-off-by: Matei Zaharia <matei@databricks.com>
*	Update versions for 1.1.1 release	Andrew Or	2014-11-10	1	-2/+2
\|
*	[SPARK-2546] Clone JobConf for each task (branch-1.0 / 1.1 backport)	Josh Rosen	2014-10-19	1	-0/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch attempts to fix SPARK-2546 in `branch-1.0` and `branch-1.1`. The underlying problem is that thread-safety issues in Hadoop Configuration objects may cause Spark tasks to get stuck in infinite loops. The approach taken here is to clone a new copy of the JobConf for each task rather than sharing a single copy between tasks. Note that there are still Configuration thread-safety issues that may affect the driver, but these seem much less likely to occur in practice and will be more complex to fix (see discussion on the SPARK-2546 ticket). This cloning is guarded by a new configuration option (`spark.hadoop.cloneConf`) and is disabled by default in order to avoid unexpected performance regressions for workloads that are unaffected by the Configuration thread-safety issues. Author: Josh Rosen <joshrosen@apache.org> Closes #2684 from JoshRosen/jobconf-fix-backport and squashes the following commits: f14f259 [Josh Rosen] Add configuration option to control cloning of Hadoop JobConf. b562451 [Josh Rosen] Remove unused jobConfCacheKey field. dd25697 [Josh Rosen] [SPARK-2546] [1.0 / 1.1 backport] Clone JobConf for each task.
*	[SPARK-3890][Docs]remove redundant spark.executor.memory in doc	WangTaoTheTonic	2014-10-16	1	-12/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Introduced in https://github.com/pwendell/spark/commit/f7e79bc42c1635686c3af01eef147dae92de2529, I'm not sure why we need two spark.executor.memory here. Author: WangTaoTheTonic <barneystinson@aliyun.com> Author: WangTao <barneystinson@aliyun.com> Closes #2745 from WangTaoTheTonic/redundantconfig and squashes the following commits: e7564dc [WangTao] too long line fdbdb1f [WangTaoTheTonic] trivial workaround d06b6e5 [WangTaoTheTonic] remove redundant spark.executor.memory in doc (cherry picked from commit e7f4ea8a52f0d3d56684b4f9caadce978eac4816) Signed-off-by: Andrew Or <andrewor14@gmail.com>
*	[SPARK-3899][Doc]fix wrong links in streaming doc	w00228970	2014-10-12	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There are three [Custom Receiver Guide] links in streaming doc, the first is wrong. Author: w00228970 <wangfei1@huawei.com> Author: wangfei <wangfei1@huawei.com> Closes #2749 from scwf/streaming-doc and squashes the following commits: 0cd76b7 [wangfei] update link tojump to the Akka-specific section 45b0646 [w00228970] wrong link in streaming doc (cherry picked from commit 92e017fb894be1e8e2b2b5274fec4c31a7a4412e) Signed-off-by: Josh Rosen <joshrosen@apache.org>
*	[SPARK-3535][Mesos] Fix resource handling.	Brenden Matthews	2014-10-03	1	-0/+11
\| \| \| \| \| \| \| \| \| \| \|	Author: Brenden Matthews <brenden@diddyinc.com> Closes #2401 from brndnmtthws/master and squashes the following commits: 4abaa5d [Brenden Matthews] [SPARK-3535][Mesos] Fix resource handling. (cherry picked from commit a8c52d5343e19731909e73db5de151a324d31cd5) Signed-off-by: Andrew Or <andrewor14@gmail.com>
*	SPARK-2058: Overriding SPARK_HOME/conf with SPARK_CONF_DIR	EugenCepoi	2014-10-03	1	-0/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Update of PR #997. With this PR, setting SPARK_CONF_DIR overrides SPARK_HOME/conf (not only spark-defaults.conf and spark-env). Author: EugenCepoi <cepoi.eugen@gmail.com> Closes #2481 from EugenCepoi/SPARK-2058 and squashes the following commits: 0bb32c2 [EugenCepoi] use orElse orNull and fixing trailing percent in compute-classpath.cmd 77f35d7 [EugenCepoi] SPARK-2058: Overriding SPARK_HOME/conf with SPARK_CONF_DIR (cherry picked from commit f0811f928e5b608e1a2cba3b6828ba0ed03b701d) Signed-off-by: Andrew Or <andrewor14@gmail.com>
*	[SQL][Docs] Update the output of printSchema and fix a typo in SQL ↵	Yin Huai	2014-10-02	1	-7/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	programming guide. We have changed the output format of `printSchema`. This PR will update our SQL programming guide to show the updated format. Also, it fixes a typo (the value type of `StructType` in Java API). Author: Yin Huai <huai@cse.ohio-state.edu> Closes #2630 from yhuai/sqlDoc and squashes the following commits: 267d63e [Yin Huai] Update the output of printSchema and fix a typo. (cherry picked from commit 82a6a083a485140858bcd93d73adec59bb5cca64) Signed-off-by: Michael Armbrust <michael@databricks.com>
*	[SPARK-3715][Docs]minor typo	WangTaoTheTonic	2014-09-28	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \|	https://issues.apache.org/jira/browse/SPARK-3715 Author: WangTaoTheTonic <barneystinson@aliyun.com> Closes #2567 from WangTaoTheTonic/minortypo and squashes the following commits: 9cc3f7a [WangTaoTheTonic] minor typo (cherry picked from commit 1f13a40ccd5a869aec62788a1e345dc24fa648c8) Signed-off-by: Michael Armbrust <michael@databricks.com>
*	Docs : use "--total-executor-cores" rather than "--cores" after spark-shell	CrazyJvm	2014-09-27	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \|	Author: CrazyJvm <crazyjvm@gmail.com> Closes #2540 from CrazyJvm/standalone-core and squashes the following commits: 66d9fc6 [CrazyJvm] use "--total-executor-cores" rather than "--cores" after spark-shell (cherry picked from commit 66107f46f374f83729cd79ab260eb59fa123c041) Signed-off-by: Andrew Or <andrewor14@gmail.com>
*	Update docs to use jsonRDD instead of wrong jsonRdd.	Grega Kespret	2014-09-22	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \|	Author: Grega Kespret <grega.kespret@gmail.com> Closes #2479 from gregakespret/patch-1 and squashes the following commits: dd6b90a [Grega Kespret] Update docs to use jsonRDD instead of wrong jsonRdd. (cherry picked from commit 56dae30ca70489a62686cb245728b09b2179bb5a) Signed-off-by: Michael Armbrust <michael@databricks.com>
*	[MLLib] Fix example code variable name misspelling in MLLib Feature ↵	RJ Nowling	2014-09-22	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	Extraction guide Author: RJ Nowling <rnowling@gmail.com> Closes #2459 from rnowling/tfidf-fix and squashes the following commits: b370a91 [RJ Nowling] Fix variable name misspelling in MLLib Feature Extraction guide (cherry picked from commit fec921552ffccc36937214406b3e4a050eb0d8e0) Signed-off-by: Xiangrui Meng <meng@databricks.com>
*	[Docs] Fix outdated docs for standalone cluster	andrewor14	2014-09-19	1	-2/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is now supported! Author: andrewor14 <andrewor14@gmail.com> Author: Andrew Or <andrewor14@gmail.com> Closes #2461 from andrewor14/document-standalone-cluster and squashes the following commits: 85c8b9e [andrewor14] Wording change per Patrick 35e30ee [Andrew Or] Fix outdated docs for standalone cluster (cherry picked from commit 8af2370619a8a6bb1af7df43b8329ab319348ad8) Signed-off-by: Andrew Or <andrewor14@gmail.com>
*	[SPARK-3565]Fix configuration item not consistent with document	WangTaoTheTonic	2014-09-17	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	https://issues.apache.org/jira/browse/SPARK-3565 "spark.ports.maxRetries" should be "spark.port.maxRetries". Make the configuration keys in document and code consistent. Author: WangTaoTheTonic <barneystinson@aliyun.com> Closes #2427 from WangTaoTheTonic/fixPortRetries and squashes the following commits: c178813 [WangTaoTheTonic] Use blank lines trigger Jenkins 646f3fe [WangTaoTheTonic] also in SparkBuild.scala 3700dba [WangTaoTheTonic] Fix configuration item not consistent with document (cherry picked from commit 3f169bfe3c322bf4344e13276dbbe34279b59ad0) Signed-off-by: Patrick Wendell <pwendell@gmail.com>
*	Docs: move HA subsections to a deeper indentation level	Andrew Ash	2014-09-17	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \|	Makes the table of contents read better Author: Andrew Ash <andrew@andrewash.com> Closes #2402 from ash211/docs/better-indentation and squashes the following commits: ea0e130 [Andrew Ash] Move HA subsections to a deeper indentation level (cherry picked from commit b3830b28f8a70224d87c89d8491c514c4c191d23) Signed-off-by: Andrew Or <andrewor14@gmail.com>
*	[SQL][DOCS] Improve table caching section	Michael Armbrust	2014-09-17	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \|	Author: Michael Armbrust <michael@databricks.com> Closes #2434 from marmbrus/patch-1 and squashes the following commits: 67215be [Michael Armbrust] [SQL][DOCS] Improve table caching section (cherry picked from commit cbf983bb4a550ff26756ed7308fb03db42cffcff) Signed-off-by: Michael Armbrust <michael@databricks.com>
*	[SQL][DOCS] Improve section on thrift-server	Michael Armbrust	2014-09-16	1	-18/+40
\| \| \| \| \| \| \| \| \| \| \| \| \|	Taken from liancheng's updates. Merged conflicts with #2316. Author: Michael Armbrust <michael@databricks.com> Closes #2384 from marmbrus/sqlDocUpdate and squashes the following commits: 2db6319 [Michael Armbrust] @liancheng's updates (cherry picked from commit 84073eb1172dc959936149265378f6e24d303685) Signed-off-by: Michael Armbrust <michael@databricks.com>
*	[SQL] [Docs] typo fixes	Nicholas Chammas	2014-09-13	1	-2/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Fixed random typo * Added in missing description for DecimalType Author: Nicholas Chammas <nicholas.chammas@gmail.com> Closes #2367 from nchammas/patch-1 and squashes the following commits: aa528be [Nicholas Chammas] doc fix for SQL DecimalType 3247ac1 [Nicholas Chammas] [SQL] [Docs] typo fixes (cherry picked from commit a523ceaf159733dabcef84c7adc1463546679f65) Signed-off-by: Michael Armbrust <michael@databricks.com>
*	HOTFIX: Changing color on doc menu	Patrick Wendell	2014-09-10	1	-1/+1
\|
*	[SQL] Minor edits to sql programming guide.	Henry Cook	2014-09-08	1	-45/+47
\| \| \| \| \| \| \| \| \| \| \|	Author: Henry Cook <hcook@eecs.berkeley.edu> Closes #2316 from hcook/sql-docs and squashes the following commits: 373f94b [Henry Cook] Minor edits to sql programming guide. (cherry picked from commit 26bc7655de18ab0191ded3f75cb77bc756dc1c03) Signed-off-by: Michael Armbrust <michael@databricks.com>
*	[SPARK-938][doc] Add OpenStack Swift support	Reynold Xin	2014-09-07	2	-0/+154
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	See compiled doc at http://people.apache.org/~rxin/tmp/openstack-swift/_site/storage-openstack-swift.html This is based on #1010. Closes #1010. Author: Reynold Xin <rxin@apache.org> Author: Gil Vernik <gilv@il.ibm.com> Closes #2298 from rxin/openstack-swift and squashes the following commits: ff4e394 [Reynold Xin] Two minor comments from Patrick. 279f6de [Reynold Xin] core-sites -> core-site dfb8fea [Reynold Xin] Updated based on Gil's suggestion. 846f5cb [Reynold Xin] Added a link from overview page. 0447c9f [Reynold Xin] Removed sample code. e9c3761 [Reynold Xin] Merge pull request #1010 from gilv/master 9233fef [Gil Vernik] Fixed typos 6994827 [Gil Vernik] Merge pull request #1 from rxin/openstack ac0679e [Reynold Xin] Fixed an unclosed tr. 47ce99d [Reynold Xin] Merge branch 'master' into openstack cca7192 [Gil Vernik] Removed white spases from pom.xml 99f095d [Reynold Xin] Pending openstack changes. eb22295 [Reynold Xin] Merge pull request #1010 from gilv/master 39a9737 [Gil Vernik] Spark integration with Openstack Swift c977658 [Gil Vernik] Merge branch 'master' of https://github.com/gilv/spark 2aba763 [Gil Vernik] Fix to docs/openstack-integration.md 9b625b5 [Gil Vernik] Merge branch 'master' of https://github.com/gilv/spark eff538d [Gil Vernik] SPARK-938 - Openstack Swift object storage support ce483d7 [Gil Vernik] SPARK-938 - Openstack Swift object storage support b6c37ef [Gil Vernik] Openstack Swift support (cherry picked from commit eddfeddac19870fc265ef406d87e1c3db9b54249) Signed-off-by: Patrick Wendell <pwendell@gmail.com>
*	[SQL] Update SQL Programming Guide	Michael Armbrust	2014-09-07	1	-95/+857
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Author: Michael Armbrust <michael@databricks.com> Author: Yin Huai <huai@cse.ohio-state.edu> Closes #2258 from marmbrus/sqlDocUpdate and squashes the following commits: f3d450b [Michael Armbrust] fix brackets bea3bfa [Michael Armbrust] Davies suggestions 3a29fe2 [Michael Armbrust] tighten visibility a71aa36 [Michael Armbrust] Draft of doc updates 52932c0 [Michael Armbrust] Merge remote-tracking branch 'origin/master' into sqlDocUpdate 1e8c849 [Yin Huai] Update the example used for applySchema. 9457c39 [Yin Huai] Update doc. 31ba240 [Yin Huai] Merge remote-tracking branch 'upstream/master' into dataTypeDoc 29bc668 [Yin Huai] Draft doc for data type and schema APIs. (cherry picked from commit 39db1bfdab434c867044ad4c70fe93a96fb287ad) Signed-off-by: Michael Armbrust <michael@databricks.com>
*	[SPARK-2419][Streaming][Docs] More updates to the streaming programming guide	Tathagata Das	2014-09-06	5	-41/+117
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- Improvements to the kinesis integration guide from @cfregly - More information about unified input dstreams in main guide Author: Tathagata Das <tathagata.das1565@gmail.com> Author: Chris Fregly <chris@fregly.com> Closes #2307 from tdas/streaming-doc-fix1 and squashes the following commits: ec40b5d [Tathagata Das] Updated figure with kinesis fdb9c5e [Tathagata Das] Fixed style issues with kinesis guide 036d219 [Chris Fregly] updated kinesis docs and added an arch diagram 24f622a [Tathagata Das] More modifications. (cherry picked from commit baff7e936101635d9bd4245e45335878bafb75e0) Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
*	[SPARK-2419][Streaming][Docs] Updates to the streaming programming guide	Tathagata Das	2014-09-03	5	-239/+622
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Updated the main streaming programming guide, and also added source-specific guides for Kafka, Flume, Kinesis. Author: Tathagata Das <tathagata.das1565@gmail.com> Author: Jacek Laskowski <jacek@japila.pl> Closes #2254 from tdas/streaming-doc-fix and squashes the following commits: e45c6d7 [Jacek Laskowski] More fixes from an old PR 5125316 [Tathagata Das] Fixed links dc02f26 [Tathagata Das] Refactored streaming kinesis guide and made many other changes. acbc3e3 [Tathagata Das] Fixed links between streaming guides. cb7007f [Tathagata Das] Added Streaming + Flume integration guide. 9bd9407 [Tathagata Das] Updated streaming programming guide with additional information from SPARK-2419. (cherry picked from commit a5224079286d1777864cf9fa77330aadae10cd7b) Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
*	[SPARK-1981][Streaming][Hotfix] Fixed docs related to kinesis	Tathagata Das	2014-09-02	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- Include kinesis in the unidocs - Hide non-public classes from docs Author: Tathagata Das <tathagata.das1565@gmail.com> Closes #2239 from tdas/kinesis-doc-fix and squashes the following commits: 156e20c [Tathagata Das] More fixes, based on PR comments. e9a6c01 [Tathagata Das] Fixed docs related to kinesis (cherry picked from commit e9bb12bea9fbef94332fbec88e3cd9197a27b7ad) Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
*	[SPARK-3332] Revert spark-ec2 patch that identifies clusters using tags	Josh Rosen	2014-09-02	1	-8/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This reverts #1899 and #2163, two patches that modified `spark-ec2` so that clusters are identified using tags instead of security groups. The original motivation for this patch was to allow multiple clusters to run in the same security group. Unfortunately, tagging is not atomic with launching instances on EC2, so with this approach we have the possibility of `spark-ec2` launching instances and crashing before they can be tagged, effectively orphaning those instances. The orphaned instances won't belong to any cluster, so the `spark-ec2` script will be unable to clean them up. Since this feature may still be worth supporting, there are several alternative approaches that we might consider, including detecting orphaned instances and logging warnings, or maybe using another mechanism to group instances into clusters. For the 1.1.0 release, though, I propose that we just revert this patch. Author: Josh Rosen <joshrosen@apache.org> Closes #2225 from JoshRosen/revert-ec2-cluster-naming and squashes the following commits: 0c18e86 [Josh Rosen] Revert "SPARK-2333 - spark_ec2 script should allow option for existing security group" c2ca2d4 [Josh Rosen] Revert "Spark-3213 Fixes issue with spark-ec2 not detecting slaves created with "Launch More like this""
*	[Docs] SQL doc formatting and typo fixes	Nicholas Chammas	2014-08-29	2	-59/+52
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As [reported on the dev list](http://apache-spark-developers-list.1001551.n3.nabble.com/VOTE-Release-Apache-Spark-1-1-0-RC2-tp8107p8131.html): * Code fencing with triple-backticks doesn’t seem to work like it does on GitHub. Newlines are lost. Instead, use 4-space indent to format small code blocks. * Nested bullets need 2 leading spaces, not 1. * Spellcheck! Author: Nicholas Chammas <nicholas.chammas@gmail.com> Author: nchammas <nicholas.chammas@gmail.com> Closes #2201 from nchammas/sql-doc-fixes and squashes the following commits: 873f889 [Nicholas Chammas] [Docs] fix skip-api flag 5195e0c [Nicholas Chammas] [Docs] SQL doc formatting and typo fixes 3b26c8d [nchammas] [Spark QA] Link to console output on test time out (cherry picked from commit 53aa8316e88980c6f46d3b9fc90d935a4738a370) Signed-off-by: Michael Armbrust <michael@databricks.com>
*	[SPARK-3264] Allow users to set executor Spark home in Mesos	Andrew Or	2014-08-28	1	-0/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The executors and the driver may not share the same Spark home. There is currently one way to set the executor side Spark home in Mesos, through setting `spark.home`. However, this is neither documented nor intuitive. This PR adds a more specific config `spark.mesos.executor.home` and exposes this to the user. liancheng tnachen Author: Andrew Or <andrewor14@gmail.com> Closes #2166 from andrewor14/mesos-spark-home and squashes the following commits: b87965e [Andrew Or] Merge branch 'master' of github.com:apache/spark into mesos-spark-home f6abb2e [Andrew Or] Document spark.mesos.executor.home ca7846d [Andrew Or] Add more specific configuration for executor Spark home in Mesos (cherry picked from commit 41dc5987d9abeca6fc0f5935c780d48f517cdf95) Signed-off-by: Andrew Or <andrewor14@gmail.com>
*	[SPARK-3227] [mllib] Added migration guide for v1.0 to v1.1	Joseph K. Bradley	2014-08-27	1	-1/+27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The only updates are in DecisionTree. CC: mengxr Author: Joseph K. Bradley <joseph.kurata.bradley@gmail.com> Closes #2146 from jkbradley/mllib-migration and squashes the following commits: 5a1f487 [Joseph K. Bradley] small edit to doc 411d6d9 [Joseph K. Bradley] Added migration guide for v1.0 to v1.1. The only updates are in DecisionTree. (cherry picked from commit 171a41cb034f4ea80f6a3c91a6872970de16a14a) Signed-off-by: Xiangrui Meng <meng@databricks.com>
*	[SPARK-2830][MLLIB] doc update for 1.1	Xiangrui Meng	2014-08-27	4	-86/+87
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	1. renamed mllib-basics to mllib-data-types 1. renamed mllib-stats to mllib-statistics 1. moved random data generation to the bottom of mllib-stats 1. updated toc accordingly atalwalkar Author: Xiangrui Meng <meng@databricks.com> Closes #2151 from mengxr/mllib-doc-1.1 and squashes the following commits: 0bd79f3 [Xiangrui Meng] add mllib-data-types b64a5d7 [Xiangrui Meng] update the content list of basis statistics in mllib-guide f625cc2 [Xiangrui Meng] move mllib-basics to mllib-data-types 4d69250 [Xiangrui Meng] move random data generation to the bottom of statistics e64f3ce [Xiangrui Meng] move mllib-stats.md to mllib-statistics.md (cherry picked from commit 43dfc84f883822ea27b6e312d4353bf301c2e7ef) Signed-off-by: Xiangrui Meng <meng@databricks.com>
*	Fix unclosed HTML tag in Yarn docs.	Josh Rosen	2014-08-26	1	-1/+1
\|
*	[SPARK-2839][MLlib] Stats Toolkit documentation updated	Burak	2014-08-26	1	-41/+331
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Documentation updated for the Statistics Toolkit of MLlib. mengxr atalwalkar https://issues.apache.org/jira/browse/SPARK-2839 P.S. Accidentally closed #2123. New commits didn't show up after I reopened the PR. I've opened this instead and closed the old one. Author: Burak <brkyvz@gmail.com> Closes #2130 from brkyvz/StatsLib-Docs and squashes the following commits: a54a855 [Burak] [SPARK-2839][MLlib] Addressed comments bfc6896 [Burak] [SPARK-2839][MLlib] Added a more specific link to colStats() for pyspark 213fe3f [Burak] [SPARK-2839][MLlib] Modifications made according to review fec4d9d [Burak] [SPARK-2830][MLlib] Stats Toolkit documentation updated (cherry picked from commit 1208f72ac78960fe5060187761479b2a9a417c1b) Signed-off-by: Xiangrui Meng <meng@databricks.com>
*	[SPARK-3226][MLLIB] doc update for native libraries	Xiangrui Meng	2014-08-26	1	-10/+15
\| \| \| \| \| \| \| \| \| \| \| \| \|	to mention `-Pnetlib-lgpl` option. atalwalkar Author: Xiangrui Meng <meng@databricks.com> Closes #2128 from mengxr/mllib-native and squashes the following commits: 4cbba57 [Xiangrui Meng] update mllib dependencies (cherry picked from commit adbd5c1636669fc474ab02b54cd1ced353f68712) Signed-off-by: Xiangrui Meng <meng@databricks.com>
*	Fixed a typo in docs/running-on-mesos.md	Cheng Lian	2014-08-25	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	It should be `spark-env.sh` rather than `spark.env.sh`. Author: Cheng Lian <lian.cs.zju@gmail.com> Closes #2119 from liancheng/fix-mesos-doc and squashes the following commits: f360548 [Cheng Lian] Fixed a typo in docs/running-on-mesos.md (cherry picked from commit 805fec845b7aa8b4763e3e0e34bec6c3872469f4) Signed-off-by: Josh Rosen <joshrosen@apache.org>
*	[MLlib][SPARK-2997] Update SVD documentation to reflect roughly square	Reza Zadeh	2014-08-24	1	-6/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Update the documentation to reflect the fact we can handle roughly square matrices. Author: Reza Zadeh <rizlar@gmail.com> Closes #2070 from rezazadeh/svddocs and squashes the following commits: 826b8fe [Reza Zadeh] left singular vectors 3f34fc6 [Reza Zadeh] PCA is still TS 7ffa2aa [Reza Zadeh] better title aeaf39d [Reza Zadeh] More docs 788ed13 [Reza Zadeh] add computational cost explanation 6429c59 [Reza Zadeh] Add link to rowmatrix docs 1eeab8b [Reza Zadeh] Update SVD documentation to reflect roughly square (cherry picked from commit b1b20301b3a1b35564d61e58eb5964d5ad5e4d7d) Signed-off-by: Xiangrui Meng <meng@databricks.com>
*	[SPARK-2841][MLlib] Documentation for feature transformations	DB Tsai	2014-08-24	1	-2/+107
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Documentation for newly added feature transformations: 1. TF-IDF 2. StandardScaler 3. Normalizer Author: DB Tsai <dbtsai@alpinenow.com> Closes #2068 from dbtsai/transformer-documentation and squashes the following commits: 109f324 [DB Tsai] address feedback (cherry picked from commit 572952ae615895efaaabcd509d582262000c0852) Signed-off-by: Xiangrui Meng <meng@databricks.com>
*	[SPARK-2963] REGRESSION - The description about how to build for using CLI ↵	Kousuke Saruta	2014-08-22	1	-4/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	and Thrift JDBC server is absent in proper document - The most important things I mentioned in #1885 is as follows. * People who build Spark is not always programmer. * If a person who build Spark is not a programmer, he/she won't read programmer's guide before building. So, how to build for using CLI and JDBC server is not only in programmer's guide. Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp> Closes #2080 from sarutak/SPARK-2963 and squashes the following commits: ee07c76 [Kousuke Saruta] Modified regression of the description about building for using Thrift JDBC server and CLI ed53329 [Kousuke Saruta] Modified description and notaton of proper noun 07c59fc [Kousuke Saruta] Added a description about how to build to use HiveServer and CLI for SparkSQL to building-with-maven.md 6e6645a [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into SPARK-2963 c88fa93 [Kousuke Saruta] Added a description about building to use HiveServer and CLI for SparkSQL
*	[SPARK-2840] [mllib] DecisionTree doc update (Java, Python examples)	Joseph K. Bradley	2014-08-21	1	-69/+283
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Updated DecisionTree documentation, with examples for Java, Python. Added same Java example to code as well. CC: @mengxr @manishamde @atalwalkar Author: Joseph K. Bradley <joseph.kurata.bradley@gmail.com> Closes #2063 from jkbradley/dt-docs and squashes the following commits: 2dd2c19 [Joseph K. Bradley] Last updates based on github review. 9dd1b6b [Joseph K. Bradley] Updated decision tree doc. d802369 [Joseph K. Bradley] Updates based on comments: cache data, corrected doc text. b9bee04 [Joseph K. Bradley] Updated DT examples 57eee9f [Joseph K. Bradley] Created JavaDecisionTree example from example in docs, and corrected doc example as needed. d939a92 [Joseph K. Bradley] Updated DecisionTree documentation. Added Java, Python examples. (cherry picked from commit 050f8d01e47b9b67b02ce50d83fb7b4e528b7204) Signed-off-by: Xiangrui Meng <meng@databricks.com>
*	[SPARK-2843][MLLIB] add a section about regularization parameter in ALS	Xiangrui Meng	2014-08-20	1	-0/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	atalwalkar srowen Author: Xiangrui Meng <meng@databricks.com> Closes #2064 from mengxr/als-doc and squashes the following commits: b2e20ab [Xiangrui Meng] introduced -> discussed 98abdd7 [Xiangrui Meng] add reference 339bd08 [Xiangrui Meng] add a section about regularization parameter in ALS (cherry picked from commit e0f946265b9ea5bc48849cf7794c2c03d5e29fba) Signed-off-by: Xiangrui Meng <meng@databricks.com>
*	[SPARK-3143][MLLIB] add tf-idf user guide	Xiangrui Meng	2014-08-20	1	-3/+80
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Moved TF-IDF before Word2Vec because the former is more basic. I also added a link for Word2Vec. atalwalkar Author: Xiangrui Meng <meng@databricks.com> Closes #2061 from mengxr/tfidf-doc and squashes the following commits: ca04c70 [Xiangrui Meng] address comments a5ea4b4 [Xiangrui Meng] add tf-idf user guide (cherry picked from commit e1571874f26c1df2dfd5ac2959612372716cd2d8) Signed-off-by: Xiangrui Meng <meng@databricks.com>
*	SPARK-3092 [SQL]: Always include the thriftserver when -Phive is enabled.	Patrick Wendell	2014-08-20	2	-9/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently we have a separate profile called hive-thriftserver. I originally suggested this in case users did not want to bundle the thriftserver, but it's ultimately lead to a lot of confusion. Since the thriftserver is only a few classes, I don't see a really good reason to isolate it from the rest of Hive. So let's go ahead and just include it in the same profile to simplify things. This has been suggested in the past by liancheng. Author: Patrick Wendell <pwendell@gmail.com> Closes #2006 from pwendell/hiveserver and squashes the following commits: 742ea40 [Patrick Wendell] Merge remote-tracking branch 'apache/master' into hiveserver 034ad47 [Patrick Wendell] SPARK-3092: Always include the thriftserver when -Phive is enabled. (cherry picked from commit f2f26c2a1dc6d60078c3be9c3d11a21866d9a24f) Signed-off-by: Patrick Wendell <pwendell@gmail.com>
*	[DOCS] Fixed wrong links	Ken Takagiwa	2014-08-19	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \|	Author: Ken Takagiwa <ugw.gi.world@gmail.com> Closes #2042 from giwa/patch-1 and squashes the following commits: 216fe0e [Ken Takagiwa] Fixed wrong links (cherry picked from commit 8a74e4b2a8c7dab154b406539487cf29d578d208) Signed-off-by: Reynold Xin <rxin@apache.org>
*	[SPARK-3130][MLLIB] detect negative values in naive Bayes	Xiangrui Meng	2014-08-19	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	because NB treats feature values as term frequencies. jkbradley Author: Xiangrui Meng <meng@databricks.com> Closes #2038 from mengxr/nb-neg and squashes the following commits: 52c37c3 [Xiangrui Meng] address comments 65f892d [Xiangrui Meng] detect negative values in nb (cherry picked from commit 068b6fe6a10eb1c6b2102d88832203267f030e85) Signed-off-by: Xiangrui Meng <meng@databricks.com>
*	[SPARK-3112][MLLIB] Add documentation and example for StreamingLR	freeman	2014-08-19	1	-0/+75
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Added a documentation section on StreamingLR to the ``MLlib - Linear Methods``, including a worked example. mengxr tdas Author: freeman <the.freeman.lab@gmail.com> Closes #2047 from freeman-lab/streaming-lr-docs and squashes the following commits: 568d250 [freeman] Tweaks to wording / formatting 05a1139 [freeman] Added documentation and example for StreamingLR (cherry picked from commit c7252b0097cfacd36f17357d195b12a59e503b35) Signed-off-by: Xiangrui Meng <meng@databricks.com>
*	[SPARK-3136][MLLIB] Create Java-friendly methods in RandomRDDs	Xiangrui Meng	2014-08-19	2	-2/+74
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Though we don't use default argument for methods in RandomRDDs, it is still not easy for Java users to use because the output type is either `RDD[Double]` or `RDD[Vector]`. Java users should expect `JavaDoubleRDD` and `JavaRDD[Vector]`, respectively. We should create dedicated methods for Java users, and allow default arguments in Scala methods in RandomRDDs, to make life easier for both Java and Scala users. This PR also contains documentation for random data generation. brkyvz Author: Xiangrui Meng <meng@databricks.com> Closes #2041 from mengxr/stat-doc and squashes the following commits: fc5eedf [Xiangrui Meng] add missing comma ffde810 [Xiangrui Meng] address comments aef6d07 [Xiangrui Meng] add doc for random data generation b99d94b [Xiangrui Meng] add java-friendly methods to RandomRDDs (cherry picked from commit 825d4fe47b9c4d48de88622dd48dcf83beb8b80a) Signed-off-by: Xiangrui Meng <meng@databricks.com>
*	SPARK-2333 - spark_ec2 script should allow option for existing security group	Vida Ha	2014-08-19	1	-6/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- Uses the name tag to identify machines in a cluster. - Allows overriding the security group name so it doesn't need to coincide with the cluster name. - Outputs the request id's of up to 10 pending spot instance requests. Author: Vida Ha <vida@databricks.com> Closes #1899 from vidaha/vida/ec2-reuse-security-group and squashes the following commits: c80d5c3 [Vida Ha] wrap retries in a try catch block b2989d5 [Vida Ha] SPARK-2333: spark_ec2 script should allow option for existing security group (cherry picked from commit 94053a7b766788bb62e2dbbf352ccbcc75f71fc0) Signed-off-by: Josh Rosen <joshrosen@apache.org>
*	Fix typo in decision tree docs	Matt Forbes	2014-08-18	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \|	Candidate splits were inconsistent with the example. Author: Matt Forbes <matt@tellapart.com> Closes #1837 from emef/tree-doc and squashes the following commits: 3be14a1 [Matt Forbes] Fix typo in decision tree docs (cherry picked from commit cd0720ca77894d481fb73a8b5bb517013843cb1e) Signed-off-by: Xiangrui Meng <meng@databricks.com>
*	SPARK-3025 [SQL]: Allow JDBC clients to set a fair scheduler pool	Patrick Wendell	2014-08-18	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This definitely needs review as I am not familiar with this part of Spark. I tested this locally and it did seem to work. Author: Patrick Wendell <pwendell@gmail.com> Closes #1937 from pwendell/scheduler and squashes the following commits: b858e33 [Patrick Wendell] SPARK-3025: Allow JDBC clients to set a fair scheduler pool (cherry picked from commit 6bca8898a1aa4ca7161492229bac1748b3da2ad7) Signed-off-by: Michael Armbrust <michael@databricks.com>
*	[SPARK-2842][MLlib]Word2Vec documentation	Liquan Pei	2014-08-17	1	-1/+62
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	mengxr Documentation for Word2Vec Author: Liquan Pei <liquanpei@gmail.com> Closes #2003 from Ishiihara/Word2Vec-doc and squashes the following commits: 4ff11d4 [Liquan Pei] minor fix 8d7458f [Liquan Pei] code reformat 6df0dcb [Liquan Pei] add Word2Vec documentation (cherry picked from commit eef779b8d631de971d440051cae21040f4de558f) Signed-off-by: Xiangrui Meng <meng@databricks.com>
*	[SPARK-1981] updated streaming-kinesis.md	Chris Fregly	2014-08-17	1	-48/+49
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	fixed markup, separated out sections more-clearly, more thorough explanations Author: Chris Fregly <chris@fregly.com> Closes #1757 from cfregly/master and squashes the following commits: 9b1c71a [Chris Fregly] better explained why spark checkpoints are disabled in the example (due to no stateful operations being used) 0f37061 [Chris Fregly] SPARK-1981: (Kinesis streaming support) updated streaming-kinesis.md 862df67 [Chris Fregly] Merge remote-tracking branch 'upstream/master' 8e1ae2e [Chris Fregly] Merge remote-tracking branch 'upstream/master' 4774581 [Chris Fregly] updated docs, renamed retry to retryRandom to be more clear, removed retries around store() method 0393795 [Chris Fregly] moved Kinesis examples out of examples/ and back into extras/kinesis-asl 691a6be [Chris Fregly] fixed tests and formatting, fixed a bug with JavaKinesisWordCount during union of streams 0e1c67b [Chris Fregly] Merge remote-tracking branch 'upstream/master' 74e5c7c [Chris Fregly] updated per TD's feedback. simplified examples, updated docs e33cbeb [Chris Fregly] Merge remote-tracking branch 'upstream/master' bf614e9 [Chris Fregly] per matei's feedback: moved the kinesis examples into the examples/ dir d17ca6d [Chris Fregly] per TD's feedback: updated docs, simplified the KinesisUtils api 912640c [Chris Fregly] changed the foundKinesis class to be a publically-avail class db3eefd [Chris Fregly] Merge remote-tracking branch 'upstream/master' 21de67f [Chris Fregly] Merge remote-tracking branch 'upstream/master' 6c39561 [Chris Fregly] parameterized the versions of the aws java sdk and kinesis client 338997e [Chris Fregly] improve build docs for kinesis 828f8ae [Chris Fregly] more cleanup e7c8978 [Chris Fregly] Merge remote-tracking branch 'upstream/master' cd68c0d [Chris Fregly] fixed typos and backward compatibility d18e680 [Chris Fregly] Merge remote-tracking branch 'upstream/master' b3b0ff1 [Chris Fregly] [SPARK-1981] Add AWS Kinesis streaming support (cherry picked from commit 99243288b049f4a4fb4ba0505ea2310be5eb4bd2) Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>