aboutsummaryrefslogtreecommitdiff
path: root/docs
Commit message (Collapse)AuthorAgeFilesLines
* Update docs to use jsonRDD instead of wrong jsonRdd.Grega Kespret2014-09-221-3/+3
| | | | | | | | Author: Grega Kespret <grega.kespret@gmail.com> Closes #2479 from gregakespret/patch-1 and squashes the following commits: dd6b90a [Grega Kespret] Update docs to use jsonRDD instead of wrong jsonRdd.
* [MLLib] Fix example code variable name misspelling in MLLib Feature ↵RJ Nowling2014-09-221-1/+1
| | | | | | | | | | Extraction guide Author: RJ Nowling <rnowling@gmail.com> Closes #2459 from rnowling/tfidf-fix and squashes the following commits: b370a91 [RJ Nowling] Fix variable name misspelling in MLLib Feature Extraction guide
* Fix Java example in Streaming Programming GuideSantiago M. Mola2014-09-201-1/+1
| | | | | | | | | | "val conf" was used instead of "SparkConf conf" in Java snippet. Author: Santiago M. Mola <santi@mola.io> Closes #2472 from smola/patch-1 and squashes the following commits: 5bfeb9b [Santiago M. Mola] Fix Java example in Streaming Programming Guide
* [Docs] Fix outdated docs for standalone clusterandrewor142014-09-191-2/+4
| | | | | | | | | | | | This is now supported! Author: andrewor14 <andrewor14@gmail.com> Author: Andrew Or <andrewor14@gmail.com> Closes #2461 from andrewor14/document-standalone-cluster and squashes the following commits: 85c8b9e [andrewor14] Wording change per Patrick 35e30ee [Andrew Or] Fix outdated docs for standalone cluster
* [SPARK-1701] Clarify slice vs partition in the programming guideMatthew Farrellee2014-09-191-4/+4
| | | | | | | | | | | | | | | | | This is a partial solution to SPARK-1701, only addressing the documentation confusion. Additional work can be to actually change the numSlices parameter name across languages, with care required for scala & python to maintain backward compatibility for named parameters. Author: Matthew Farrellee <matt@redhat.com> Closes #2305 from mattf/SPARK-1701 and squashes the following commits: c0af05d [Matthew Farrellee] Further tweak 06f80fc [Matthew Farrellee] Wording tweak from Josh Rosen's review 7b045e0 [Matthew Farrellee] [SPARK-1701] Clarify slice vs partition in the programming guide
* SPARK-3579 Jekyll doc generation is different across environments.Patrick Wendell2014-09-182-6/+15
| | | | | | | | | | | | | | This patch makes some small changes to fix this problem: 1. We document specific versions of Jekyll/Kramdown to use that match those used when building the upstream docs. 2. We add a configuration for a property that for some reason varies across packages of Jekyll/Kramdown even with the same version. Author: Patrick Wendell <pwendell@gmail.com> Closes #2443 from pwendell/jekyll and squashes the following commits: 54ee2ab [Patrick Wendell] SPARK-3579 Jekyll doc generation is different across environments.
* [SPARK-3565]Fix configuration item not consistent with documentWangTaoTheTonic2014-09-171-1/+1
| | | | | | | | | | | | | | https://issues.apache.org/jira/browse/SPARK-3565 "spark.ports.maxRetries" should be "spark.port.maxRetries". Make the configuration keys in document and code consistent. Author: WangTaoTheTonic <barneystinson@aliyun.com> Closes #2427 from WangTaoTheTonic/fixPortRetries and squashes the following commits: c178813 [WangTaoTheTonic] Use blank lines trigger Jenkins 646f3fe [WangTaoTheTonic] also in SparkBuild.scala 3700dba [WangTaoTheTonic] Fix configuration item not consistent with document
* Docs: move HA subsections to a deeper indentation levelAndrew Ash2014-09-171-2/+2
| | | | | | | | | | Makes the table of contents read better Author: Andrew Ash <andrew@andrewash.com> Closes #2402 from ash211/docs/better-indentation and squashes the following commits: ea0e130 [Andrew Ash] Move HA subsections to a deeper indentation level
* [SQL][DOCS] Improve table caching sectionMichael Armbrust2014-09-171-4/+4
| | | | | | | | Author: Michael Armbrust <michael@databricks.com> Closes #2434 from marmbrus/patch-1 and squashes the following commits: 67215be [Michael Armbrust] [SQL][DOCS] Improve table caching section
* [Docs] Correct spark.files.fetchTimeout default valueviper-kun2014-09-171-2/+2
| | | | | | | | | | | change the value of spark.files.fetchTimeout Author: viper-kun <xukun.xu@huawei.com> Closes #2406 from viper-kun/master and squashes the following commits: ecb0d46 [viper-kun] [Docs] Correct spark.files.fetchTimeout default value 7cf4c7a [viper-kun] Update configuration.md
* Add a Community Projects pageEvan Chan2014-09-162-1/+3
| | | | | | | | | | | | | | | | This adds a new page to the docs listing community projects -- those created outside of Apache Spark that are of interest to the community of Spark users. Anybody can add to it just by submitting a PR. There was a discussion thread about alternatives: * Creating a Github organization for Spark projects - we could not find any sponsors for this, and it would be difficult to organize since many folks just create repos in their company organization or personal accounts * Apache has some place for storing community projects, but it was deemed difficult to work with, and again would be some permissions issues -- not everyone could update it. Author: Evan Chan <velvia@gmail.com> Closes #2219 from velvia/community-projects-page and squashes the following commits: 7316822 [Evan Chan] Point to Spark wiki: supplemental projects page 613b021 [Evan Chan] Add a few more projects a85eaaf [Evan Chan] Add a Community Projects page
* [SPARK-787] Add S3 configuration parameters to the EC2 deploy scriptsDan Osipov2014-09-161-1/+1
| | | | | | | | | | | | | | | When deploying to AWS, there is additional configuration that is required to read S3 files. EMR creates it automatically, there is no reason that the Spark EC2 script shouldn't. This PR requires a corresponding PR to the mesos/spark-ec2 to be merged, as it gets cloned in the process of setting up machines: https://github.com/mesos/spark-ec2/pull/58 Author: Dan Osipov <daniil.osipov@shazam.com> Closes #1120 from danosipov/s3_credentials and squashes the following commits: 758da8b [Dan Osipov] Modify documentation to include the new parameter 71fab14 [Dan Osipov] Use a parameter --copy-aws-credentials to enable S3 credential deployment 7e0da26 [Dan Osipov] Get AWS credentials out of boto connection instance 39bdf30 [Dan Osipov] Add S3 configuration parameters to the EC2 deploy scripts
* [SQL][DOCS] Improve section on thrift-serverMichael Armbrust2014-09-161-18/+40
| | | | | | | | | | Taken from liancheng's updates. Merged conflicts with #2316. Author: Michael Armbrust <michael@databricks.com> Closes #2384 from marmbrus/sqlDocUpdate and squashes the following commits: 2db6319 [Michael Armbrust] @liancheng's updates
* SPARK-3069 [DOCS] Build instructions in README are outdatedSean Owen2014-09-168-10/+31
| | | | | | | | | | | | | | | | | | | Here's my crack at Bertrand's suggestion. The Github `README.md` contains build info that's outdated. It should just point to the current online docs, and reflect that Maven is the primary build now. (Incidentally, the stanza at the end about contributions of original work should go in https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark too. It won't hurt to be crystal clear about the agreement to license, given that ICLAs are not required of anyone here.) Author: Sean Owen <sowen@cloudera.com> Closes #2014 from srowen/SPARK-3069 and squashes the following commits: 501507e [Sean Owen] Note that Zinc is for Maven builds too db2bd97 [Sean Owen] sbt -> sbt/sbt and add note about zinc be82027 [Sean Owen] Fix additional occurrences of building-with-maven -> building-spark 91c921f [Sean Owen] Move building-with-maven to building-spark and create a redirect. Update doc links to building-spark.html Add jekyll-redirect-from plugin and make associated config changes (including fixing pygments deprecation). Add example of SBT to README.md 999544e [Sean Owen] Change "Building Spark with Maven" title to "Building Spark"; reinstate tl;dr info about dev/run-tests in README.md; add brief note about building with SBT c18d140 [Sean Owen] Optionally, remove the copy of contributing text from main README.md 8e83934 [Sean Owen] Add CONTRIBUTING.md to trigger notice on new pull request page b1c04a1 [Sean Owen] Refer to current online documentation for building, and remove slightly outdated copy in README.md
* [SPARK-3030] [PySpark] Reuse Python workerDavies Liu2014-09-131-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Reuse Python worker to avoid the overhead of fork() Python process for each tasks. It also tracks the broadcasts for each worker, avoid sending repeated broadcasts. This can reduce the time for dummy task from 22ms to 13ms (-40%). It can help to reduce the latency for Spark Streaming. For a job with broadcast (43M after compress): ``` b = sc.broadcast(set(range(30000000))) print sc.parallelize(range(24000), 100).filter(lambda x: x in b.value).count() ``` It will finish in 281s without reused worker, and it will finish in 65s with reused worker(4 CPUs). After reusing the worker, it can save about 9 seconds for transfer and deserialize the broadcast for each tasks. It's enabled by default, could be disabled by `spark.python.worker.reuse = false`. Author: Davies Liu <davies.liu@gmail.com> Closes #2259 from davies/reuse-worker and squashes the following commits: f11f617 [Davies Liu] Merge branch 'master' into reuse-worker 3939f20 [Davies Liu] fix bug in serializer in mllib cf1c55e [Davies Liu] address comments 3133a60 [Davies Liu] fix accumulator with reused worker 760ab1f [Davies Liu] do not reuse worker if there are any exceptions 7abb224 [Davies Liu] refactor: sychronized with itself ac3206e [Davies Liu] renaming 8911f44 [Davies Liu] synchronized getWorkerBroadcasts() 6325fc1 [Davies Liu] bugfix: bid >= 0 e0131a2 [Davies Liu] fix name of config 583716e [Davies Liu] only reuse completed and not interrupted worker ace2917 [Davies Liu] kill python worker after timeout 6123d0f [Davies Liu] track broadcasts for each worker 8d2f08c [Davies Liu] reuse python worker
* [SQL] [Docs] typo fixesNicholas Chammas2014-09-131-2/+1
| | | | | | | | | | | | * Fixed random typo * Added in missing description for DecimalType Author: Nicholas Chammas <nicholas.chammas@gmail.com> Closes #2367 from nchammas/patch-1 and squashes the following commits: aa528be [Nicholas Chammas] doc fix for SQL DecimalType 3247ac1 [Nicholas Chammas] [SQL] [Docs] typo fixes
* [SQL][Docs] Update SQL programming guide to show the correct default value ↵Yin Huai2014-09-121-3/+3
| | | | | | | | | | | | of containsNull in an ArrayType After #1889, the default value of `containsNull` in an `ArrayType` is `true`. Author: Yin Huai <huai@cse.ohio-state.edu> Closes #2374 from yhuai/containsNull and squashes the following commits: dc609a3 [Yin Huai] Update the SQL programming guide to show the correct default value of containsNull in an ArrayType (the default value is true instead of false).
* [SPARK-2558][DOCS] Add --queue example to YARN docMark G. Whitney2014-09-121-0/+1
| | | | | | | | | | | | Put original YARN queue spark-submit arg description in running-on-yarn html table and example command line Author: Mark G. Whitney <mark@whitneyindustries.com> Closes #2218 from kramimus/2258-yarndoc and squashes the following commits: 4b5d808 [Mark G. Whitney] remove yarn queue config f8cda0d [Mark G. Whitney] [SPARK-2558][DOCS] Add spark.yarn.queue description to YARN doc
* SPARK-1713. Use a thread pool for launching executors.Sandy Ryza2014-09-101-0/+7
| | | | | | | | | | This patch copies the approach used in the MapReduce application master for launching containers. Author: Sandy Ryza <sandy@cloudera.com> Closes #663 from sryza/sandy-spark-1713 and squashes the following commits: 036550d [Sandy Ryza] SPARK-1713. [YARN] Use a threadpool for launching executor containers
* [SPARK-3443][MLLIB] update default values of tree:Xiangrui Meng2014-09-081-8/+8
| | | | | | | | | | | | | | | | | | Adjust the default values of decision tree, based on the memory requirement discussed in https://github.com/apache/spark/pull/2125 : 1. maxMemoryInMB: 128 -> 256 2. maxBins: 100 -> 32 3. maxDepth: 4 -> 5 (in some example code) jkbradley Author: Xiangrui Meng <meng@databricks.com> Closes #2322 from mengxr/tree-defaults and squashes the following commits: cda453a [Xiangrui Meng] fix tests 5900445 [Xiangrui Meng] update comments 8c81831 [Xiangrui Meng] update default values of tree:
* [SQL] Minor edits to sql programming guide.Henry Cook2014-09-081-45/+47
| | | | | | | | Author: Henry Cook <hcook@eecs.berkeley.edu> Closes #2316 from hcook/sql-docs and squashes the following commits: 373f94b [Henry Cook] Minor edits to sql programming guide.
* [SPARK-938][doc] Add OpenStack Swift supportReynold Xin2014-09-072-0/+154
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | See compiled doc at http://people.apache.org/~rxin/tmp/openstack-swift/_site/storage-openstack-swift.html This is based on #1010. Closes #1010. Author: Reynold Xin <rxin@apache.org> Author: Gil Vernik <gilv@il.ibm.com> Closes #2298 from rxin/openstack-swift and squashes the following commits: ff4e394 [Reynold Xin] Two minor comments from Patrick. 279f6de [Reynold Xin] core-sites -> core-site dfb8fea [Reynold Xin] Updated based on Gil's suggestion. 846f5cb [Reynold Xin] Added a link from overview page. 0447c9f [Reynold Xin] Removed sample code. e9c3761 [Reynold Xin] Merge pull request #1010 from gilv/master 9233fef [Gil Vernik] Fixed typos 6994827 [Gil Vernik] Merge pull request #1 from rxin/openstack ac0679e [Reynold Xin] Fixed an unclosed tr. 47ce99d [Reynold Xin] Merge branch 'master' into openstack cca7192 [Gil Vernik] Removed white spases from pom.xml 99f095d [Reynold Xin] Pending openstack changes. eb22295 [Reynold Xin] Merge pull request #1010 from gilv/master 39a9737 [Gil Vernik] Spark integration with Openstack Swift c977658 [Gil Vernik] Merge branch 'master' of https://github.com/gilv/spark 2aba763 [Gil Vernik] Fix to docs/openstack-integration.md 9b625b5 [Gil Vernik] Merge branch 'master' of https://github.com/gilv/spark eff538d [Gil Vernik] SPARK-938 - Openstack Swift object storage support ce483d7 [Gil Vernik] SPARK-938 - Openstack Swift object storage support b6c37ef [Gil Vernik] Openstack Swift support
* [SPARK-3280] Made sort-based shuffle the default implementationReynold Xin2014-09-071-5/+4
| | | | | | | | | | | | Sort-based shuffle has lower memory usage and seems to outperform hash-based in almost all of our testing. Author: Reynold Xin <rxin@apache.org> Closes #2178 from rxin/sort-shuffle and squashes the following commits: 713d341 [Reynold Xin] Fixed test failures by setting spark.shuffle.compress to the same value as spark.shuffle.spill.compress. 85165e6 [Reynold Xin] Fixed a comment typo. aa0d372 [Reynold Xin] [SPARK-3280] Made sort-based shuffle the default implementation
* [SQL] Update SQL Programming GuideMichael Armbrust2014-09-071-95/+857
| | | | | | | | | | | | | | | | | Author: Michael Armbrust <michael@databricks.com> Author: Yin Huai <huai@cse.ohio-state.edu> Closes #2258 from marmbrus/sqlDocUpdate and squashes the following commits: f3d450b [Michael Armbrust] fix brackets bea3bfa [Michael Armbrust] Davies suggestions 3a29fe2 [Michael Armbrust] tighten visibility a71aa36 [Michael Armbrust] Draft of doc updates 52932c0 [Michael Armbrust] Merge remote-tracking branch 'origin/master' into sqlDocUpdate 1e8c849 [Yin Huai] Update the example used for applySchema. 9457c39 [Yin Huai] Update doc. 31ba240 [Yin Huai] Merge remote-tracking branch 'upstream/master' into dataTypeDoc 29bc668 [Yin Huai] Draft doc for data type and schema APIs.
* [SPARK-2419][Streaming][Docs] More updates to the streaming programming guideTathagata Das2014-09-065-41/+117
| | | | | | | | | | | | | | | - Improvements to the kinesis integration guide from @cfregly - More information about unified input dstreams in main guide Author: Tathagata Das <tathagata.das1565@gmail.com> Author: Chris Fregly <chris@fregly.com> Closes #2307 from tdas/streaming-doc-fix1 and squashes the following commits: ec40b5d [Tathagata Das] Updated figure with kinesis fdb9c5e [Tathagata Das] Fixed style issues with kinesis guide 036d219 [Chris Fregly] updated kinesis docs and added an arch diagram 24f622a [Tathagata Das] More modifications.
* [SPARK-3378] [DOCS] Replace the word "SparkSQL" with right word "Spark SQL"Kousuke Saruta2014-09-041-1/+1
| | | | | | | | | | Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp> Closes #2251 from sarutak/SPARK-3378 and squashes the following commits: 0bfe234 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into SPARK-3378 bb5938f [Kousuke Saruta] Replaced rest of "SparkSQL" with "Spark SQL" 6df66de [Kousuke Saruta] Replaced "SparkSQL" with "Spark SQL"
* [SPARK-2419][Streaming][Docs] Updates to the streaming programming guideTathagata Das2014-09-035-239/+622
| | | | | | | | | | | | | | | | Updated the main streaming programming guide, and also added source-specific guides for Kafka, Flume, Kinesis. Author: Tathagata Das <tathagata.das1565@gmail.com> Author: Jacek Laskowski <jacek@japila.pl> Closes #2254 from tdas/streaming-doc-fix and squashes the following commits: e45c6d7 [Jacek Laskowski] More fixes from an old PR 5125316 [Tathagata Das] Fixed links dc02f26 [Tathagata Das] Refactored streaming kinesis guide and made many other changes. acbc3e3 [Tathagata Das] Fixed links between streaming guides. cb7007f [Tathagata Das] Added Streaming + Flume integration guide. 9bd9407 [Tathagata Das] Updated streaming programming guide with additional information from SPARK-2419.
* [SPARK-1981][Streaming][Hotfix] Fixed docs related to kinesisTathagata Das2014-09-021-2/+2
| | | | | | | | | | | | - Include kinesis in the unidocs - Hide non-public classes from docs Author: Tathagata Das <tathagata.das1565@gmail.com> Closes #2239 from tdas/kinesis-doc-fix and squashes the following commits: 156e20c [Tathagata Das] More fixes, based on PR comments. e9a6c01 [Tathagata Das] Fixed docs related to kinesis
* [Docs] SQL doc formatting and typo fixesNicholas Chammas2014-08-292-59/+52
| | | | | | | | | | | | | | | | As [reported on the dev list](http://apache-spark-developers-list.1001551.n3.nabble.com/VOTE-Release-Apache-Spark-1-1-0-RC2-tp8107p8131.html): * Code fencing with triple-backticks doesn’t seem to work like it does on GitHub. Newlines are lost. Instead, use 4-space indent to format small code blocks. * Nested bullets need 2 leading spaces, not 1. * Spellcheck! Author: Nicholas Chammas <nicholas.chammas@gmail.com> Author: nchammas <nicholas.chammas@gmail.com> Closes #2201 from nchammas/sql-doc-fixes and squashes the following commits: 873f889 [Nicholas Chammas] [Docs] fix skip-api flag 5195e0c [Nicholas Chammas] [Docs] SQL doc formatting and typo fixes 3b26c8d [nchammas] [Spark QA] Link to console output on test time out
* [SPARK-3264] Allow users to set executor Spark home in MesosAndrew Or2014-08-281-0/+10
| | | | | | | | | | | | | | The executors and the driver may not share the same Spark home. There is currently one way to set the executor side Spark home in Mesos, through setting `spark.home`. However, this is neither documented nor intuitive. This PR adds a more specific config `spark.mesos.executor.home` and exposes this to the user. liancheng tnachen Author: Andrew Or <andrewor14@gmail.com> Closes #2166 from andrewor14/mesos-spark-home and squashes the following commits: b87965e [Andrew Or] Merge branch 'master' of github.com:apache/spark into mesos-spark-home f6abb2e [Andrew Or] Document spark.mesos.executor.home ca7846d [Andrew Or] Add more specific configuration for executor Spark home in Mesos
* [SPARK-3227] [mllib] Added migration guide for v1.0 to v1.1Joseph K. Bradley2014-08-271-1/+27
| | | | | | | | | | | | | The only updates are in DecisionTree. CC: mengxr Author: Joseph K. Bradley <joseph.kurata.bradley@gmail.com> Closes #2146 from jkbradley/mllib-migration and squashes the following commits: 5a1f487 [Joseph K. Bradley] small edit to doc 411d6d9 [Joseph K. Bradley] Added migration guide for v1.0 to v1.1. The only updates are in DecisionTree.
* [SPARK-2830][MLLIB] doc update for 1.1Xiangrui Meng2014-08-274-86/+87
| | | | | | | | | | | | | | | | | | | 1. renamed mllib-basics to mllib-data-types 1. renamed mllib-stats to mllib-statistics 1. moved random data generation to the bottom of mllib-stats 1. updated toc accordingly atalwalkar Author: Xiangrui Meng <meng@databricks.com> Closes #2151 from mengxr/mllib-doc-1.1 and squashes the following commits: 0bd79f3 [Xiangrui Meng] add mllib-data-types b64a5d7 [Xiangrui Meng] update the content list of basis statistics in mllib-guide f625cc2 [Xiangrui Meng] move mllib-basics to mllib-data-types 4d69250 [Xiangrui Meng] move random data generation to the bottom of statistics e64f3ce [Xiangrui Meng] move mllib-stats.md to mllib-statistics.md
* Fix unclosed HTML tag in Yarn docs.Josh Rosen2014-08-261-1/+1
|
* [SPARK-3240] Adding known issue for MESOS-1688Martin Weindel2014-08-261-0/+2
| | | | | | | | | | | | | | When using Mesos with the fine-grained mode, a Spark job can run into a dead lock on low allocatable memory on Mesos slaves. As a work-around 32 MB (= Mesos MIN_MEM) are allocated for each task, to ensure Mesos making new offers after task completion. From my perspective, it would be better to fix this problem in Mesos by dropping the constraint on memory for offers, but as temporary solution this patch helps to avoid the dead lock on current Mesos versions. See [[MESOS-1688] No offers if no memory is allocatable](https://issues.apache.org/jira/browse/MESOS-1688) for details for this problem. Author: Martin Weindel <martin.weindel@gmail.com> Closes #1860 from MartinWeindel/master and squashes the following commits: 5762030 [Martin Weindel] reverting work-around a6bf837 [Martin Weindel] added known issue for issue MESOS-1688 d9d2ca6 [Martin Weindel] work around for problem with Mesos offering semantic (see [https://issues.apache.org/jira/browse/MESOS-1688])
* [SPARK-2839][MLlib] Stats Toolkit documentation updatedBurak2014-08-261-41/+331
| | | | | | | | | | | | | | | | | Documentation updated for the Statistics Toolkit of MLlib. mengxr atalwalkar https://issues.apache.org/jira/browse/SPARK-2839 P.S. Accidentally closed #2123. New commits didn't show up after I reopened the PR. I've opened this instead and closed the old one. Author: Burak <brkyvz@gmail.com> Closes #2130 from brkyvz/StatsLib-Docs and squashes the following commits: a54a855 [Burak] [SPARK-2839][MLlib] Addressed comments bfc6896 [Burak] [SPARK-2839][MLlib] Added a more specific link to colStats() for pyspark 213fe3f [Burak] [SPARK-2839][MLlib] Modifications made according to review fec4d9d [Burak] [SPARK-2830][MLlib] Stats Toolkit documentation updated
* [SPARK-3226][MLLIB] doc update for native librariesXiangrui Meng2014-08-261-10/+15
| | | | | | | | | | to mention `-Pnetlib-lgpl` option. atalwalkar Author: Xiangrui Meng <meng@databricks.com> Closes #2128 from mengxr/mllib-native and squashes the following commits: 4cbba57 [Xiangrui Meng] update mllib dependencies
* Fixed a typo in docs/running-on-mesos.mdCheng Lian2014-08-251-1/+1
| | | | | | | | | | It should be `spark-env.sh` rather than `spark.env.sh`. Author: Cheng Lian <lian.cs.zju@gmail.com> Closes #2119 from liancheng/fix-mesos-doc and squashes the following commits: f360548 [Cheng Lian] Fixed a typo in docs/running-on-mesos.md
* [MLlib][SPARK-2997] Update SVD documentation to reflect roughly squareReza Zadeh2014-08-241-6/+23
| | | | | | | | | | | | | | | | Update the documentation to reflect the fact we can handle roughly square matrices. Author: Reza Zadeh <rizlar@gmail.com> Closes #2070 from rezazadeh/svddocs and squashes the following commits: 826b8fe [Reza Zadeh] left singular vectors 3f34fc6 [Reza Zadeh] PCA is still TS 7ffa2aa [Reza Zadeh] better title aeaf39d [Reza Zadeh] More docs 788ed13 [Reza Zadeh] add computational cost explanation 6429c59 [Reza Zadeh] Add link to rowmatrix docs 1eeab8b [Reza Zadeh] Update SVD documentation to reflect roughly square
* [SPARK-2841][MLlib] Documentation for feature transformationsDB Tsai2014-08-241-2/+107
| | | | | | | | | | | | | Documentation for newly added feature transformations: 1. TF-IDF 2. StandardScaler 3. Normalizer Author: DB Tsai <dbtsai@alpinenow.com> Closes #2068 from dbtsai/transformer-documentation and squashes the following commits: 109f324 [DB Tsai] address feedback
* [SPARK-2963] REGRESSION - The description about how to build for using CLI ↵Kousuke Saruta2014-08-221-4/+7
| | | | | | | | | | | | | | | | | | | | | and Thrift JDBC server is absent in proper document - The most important things I mentioned in #1885 is as follows. * People who build Spark is not always programmer. * If a person who build Spark is not a programmer, he/she won't read programmer's guide before building. So, how to build for using CLI and JDBC server is not only in programmer's guide. Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp> Closes #2080 from sarutak/SPARK-2963 and squashes the following commits: ee07c76 [Kousuke Saruta] Modified regression of the description about building for using Thrift JDBC server and CLI ed53329 [Kousuke Saruta] Modified description and notaton of proper noun 07c59fc [Kousuke Saruta] Added a description about how to build to use HiveServer and CLI for SparkSQL to building-with-maven.md 6e6645a [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into SPARK-2963 c88fa93 [Kousuke Saruta] Added a description about building to use HiveServer and CLI for SparkSQL
* [SPARK-2840] [mllib] DecisionTree doc update (Java, Python examples)Joseph K. Bradley2014-08-211-69/+283
| | | | | | | | | | | | | | | | | Updated DecisionTree documentation, with examples for Java, Python. Added same Java example to code as well. CC: @mengxr @manishamde @atalwalkar Author: Joseph K. Bradley <joseph.kurata.bradley@gmail.com> Closes #2063 from jkbradley/dt-docs and squashes the following commits: 2dd2c19 [Joseph K. Bradley] Last updates based on github review. 9dd1b6b [Joseph K. Bradley] Updated decision tree doc. d802369 [Joseph K. Bradley] Updates based on comments: cache data, corrected doc text. b9bee04 [Joseph K. Bradley] Updated DT examples 57eee9f [Joseph K. Bradley] Created JavaDecisionTree example from example in docs, and corrected doc example as needed. d939a92 [Joseph K. Bradley] Updated DecisionTree documentation. Added Java, Python examples.
* [SPARK-2843][MLLIB] add a section about regularization parameter in ALSXiangrui Meng2014-08-201-0/+11
| | | | | | | | | | | | atalwalkar srowen Author: Xiangrui Meng <meng@databricks.com> Closes #2064 from mengxr/als-doc and squashes the following commits: b2e20ab [Xiangrui Meng] introduced -> discussed 98abdd7 [Xiangrui Meng] add reference 339bd08 [Xiangrui Meng] add a section about regularization parameter in ALS
* [SPARK-3143][MLLIB] add tf-idf user guideXiangrui Meng2014-08-201-3/+80
| | | | | | | | | | | Moved TF-IDF before Word2Vec because the former is more basic. I also added a link for Word2Vec. atalwalkar Author: Xiangrui Meng <meng@databricks.com> Closes #2061 from mengxr/tfidf-doc and squashes the following commits: ca04c70 [Xiangrui Meng] address comments a5ea4b4 [Xiangrui Meng] add tf-idf user guide
* SPARK-3092 [SQL]: Always include the thriftserver when -Phive is enabled.Patrick Wendell2014-08-202-9/+3
| | | | | | | | | | | | | Currently we have a separate profile called hive-thriftserver. I originally suggested this in case users did not want to bundle the thriftserver, but it's ultimately lead to a lot of confusion. Since the thriftserver is only a few classes, I don't see a really good reason to isolate it from the rest of Hive. So let's go ahead and just include it in the same profile to simplify things. This has been suggested in the past by liancheng. Author: Patrick Wendell <pwendell@gmail.com> Closes #2006 from pwendell/hiveserver and squashes the following commits: 742ea40 [Patrick Wendell] Merge remote-tracking branch 'apache/master' into hiveserver 034ad47 [Patrick Wendell] SPARK-3092: Always include the thriftserver when -Phive is enabled.
* [DOCS] Fixed wrong linksKen Takagiwa2014-08-191-2/+2
| | | | | | | | Author: Ken Takagiwa <ugw.gi.world@gmail.com> Closes #2042 from giwa/patch-1 and squashes the following commits: 216fe0e [Ken Takagiwa] Fixed wrong links
* [SPARK-3130][MLLIB] detect negative values in naive BayesXiangrui Meng2014-08-191-1/+2
| | | | | | | | | | | because NB treats feature values as term frequencies. jkbradley Author: Xiangrui Meng <meng@databricks.com> Closes #2038 from mengxr/nb-neg and squashes the following commits: 52c37c3 [Xiangrui Meng] address comments 65f892d [Xiangrui Meng] detect negative values in nb
* [SPARK-3112][MLLIB] Add documentation and example for StreamingLRfreeman2014-08-191-0/+75
| | | | | | | | | | | | | Added a documentation section on StreamingLR to the ``MLlib - Linear Methods``, including a worked example. mengxr tdas Author: freeman <the.freeman.lab@gmail.com> Closes #2047 from freeman-lab/streaming-lr-docs and squashes the following commits: 568d250 [freeman] Tweaks to wording / formatting 05a1139 [freeman] Added documentation and example for StreamingLR
* [SPARK-3136][MLLIB] Create Java-friendly methods in RandomRDDsXiangrui Meng2014-08-192-2/+74
| | | | | | | | | | | | | Though we don't use default argument for methods in RandomRDDs, it is still not easy for Java users to use because the output type is either `RDD[Double]` or `RDD[Vector]`. Java users should expect `JavaDoubleRDD` and `JavaRDD[Vector]`, respectively. We should create dedicated methods for Java users, and allow default arguments in Scala methods in RandomRDDs, to make life easier for both Java and Scala users. This PR also contains documentation for random data generation. brkyvz Author: Xiangrui Meng <meng@databricks.com> Closes #2041 from mengxr/stat-doc and squashes the following commits: fc5eedf [Xiangrui Meng] add missing comma ffde810 [Xiangrui Meng] address comments aef6d07 [Xiangrui Meng] add doc for random data generation b99d94b [Xiangrui Meng] add java-friendly methods to RandomRDDs
* SPARK-2333 - spark_ec2 script should allow option for existing security groupVida Ha2014-08-191-6/+8
| | | | | | | | | | | | | - Uses the name tag to identify machines in a cluster. - Allows overriding the security group name so it doesn't need to coincide with the cluster name. - Outputs the request id's of up to 10 pending spot instance requests. Author: Vida Ha <vida@databricks.com> Closes #1899 from vidaha/vida/ec2-reuse-security-group and squashes the following commits: c80d5c3 [Vida Ha] wrap retries in a try catch block b2989d5 [Vida Ha] SPARK-2333: spark_ec2 script should allow option for existing security group
* Fix typo in decision tree docsMatt Forbes2014-08-181-2/+2
| | | | | | | | | | Candidate splits were inconsistent with the example. Author: Matt Forbes <matt@tellapart.com> Closes #1837 from emef/tree-doc and squashes the following commits: 3be14a1 [Matt Forbes] Fix typo in decision tree docs