aboutsummaryrefslogtreecommitdiff
path: root/docs
Commit message (Collapse)AuthorAgeFilesLines
* SPARK-6626 [DOCS]: Corrected Scala:TwitterUtils parametersjayson2015-04-011-1/+1
| | | | | | | | | | Per Sean Owen's request, here is the update call for TwitterUtils using Scala :) Author: jayson <jayson@ziprecruiter.com> Closes #5295 from JaysonSunshine/master and squashes the following commits: df1d056 [jayson] Corrected Scala:TwitterUtils parameters
* [SPARK-4894][mllib] Added Bernoulli option to NaiveBayes model in mllibleahmcguire2015-03-311-7/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Added optional model type parameter for NaiveBayes training. Can be either Multinomial or Bernoulli. When Bernoulli is given the Bernoulli smoothing is used for fitting and for prediction as per: http://nlp.stanford.edu/IR-book/html/htmledition/the-bernoulli-model-1.html. Default for model is original Multinomial fit and predict. Added additional testing for Bernoulli and Multinomial models. Author: leahmcguire <lmcguire@salesforce.com> Author: Joseph K. Bradley <joseph@databricks.com> Author: Leah McGuire <lmcguire@salesforce.com> Closes #4087 from leahmcguire/master and squashes the following commits: f3c8994 [leahmcguire] changed checks on model type to requires acb69af [leahmcguire] removed enum type and replaces all modelType parameters with strings 2224b15 [Leah McGuire] Merge pull request #2 from jkbradley/leahmcguire-master 9ad89ca [Joseph K. Bradley] removed old code 6a8f383 [Joseph K. Bradley] Added new model save/load format 2.0 for NaiveBayesModel after modelType parameter was added. Updated tests. Also updated ModelType enum-like type. 852a727 [leahmcguire] merged with upstream master a22d670 [leahmcguire] changed NaiveBayesModel modelType parameter back to NaiveBayes.ModelType, made NaiveBayes.ModelType serializable, fixed getter method in NavieBayes 18f3219 [leahmcguire] removed private from naive bayes constructor for lambda only bea62af [leahmcguire] put back in constructor for NaiveBayes 01baad7 [leahmcguire] made fixes from code review fb0a5c7 [leahmcguire] removed typo e2d925e [leahmcguire] fixed nonserializable error that was causing naivebayes test failures 2d0c1ba [leahmcguire] fixed typo in NaiveBayes c298e78 [leahmcguire] fixed scala style errors b85b0c9 [leahmcguire] Merge remote-tracking branch 'upstream/master' 900b586 [leahmcguire] fixed model call so that uses type argument ea09b28 [leahmcguire] Merge remote-tracking branch 'upstream/master' e016569 [leahmcguire] updated test suite with model type fix 85f298f [leahmcguire] Merge remote-tracking branch 'upstream/master' dc65374 [leahmcguire] integrated model type fix 7622b0c [leahmcguire] added comments and fixed style as per rb b93aaf6 [Leah McGuire] Merge pull request #1 from jkbradley/nb-model-type 3730572 [Joseph K. Bradley] modified NB model type to be more Java-friendly b61b5e2 [leahmcguire] added back compatable constructor to NaiveBayesModel to fix MIMA test failure 5a4a534 [leahmcguire] fixed scala style error in NaiveBayes 3891bf2 [leahmcguire] synced with apache spark and resolved merge conflict d9477ed [leahmcguire] removed old inaccurate comment from test suite for mllib naive bayes 76e5b0f [leahmcguire] removed unnecessary sort from test 0313c0c [leahmcguire] fixed style error in NaiveBayes.scala 4a3676d [leahmcguire] Updated changes re-comments. Got rid of verbose populateMatrix method. Public api now has string instead of enumeration. Docs are updated." ce73c63 [leahmcguire] added Bernoulli option to niave bayes model in mllib, added optional model type parameter for training. When Bernoulli is given the Bernoulli smoothing is used for fitting and for prediction http://nlp.stanford.edu/IR-book/html/htmledition/the-bernoulli-model-1.html
* [SPARK-5750][SPARK-3441][SPARK-5836][CORE] Added documentation explaining ↵Ilya Ganelin2015-03-301-11/+72
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | shuffle I've updated the Spark Programming Guide to add a section on the shuffle operation providing some background on what it does. I've also addressed some of its performance impacts. I've included documentation to address the following issues: https://issues.apache.org/jira/browse/SPARK-5836 https://issues.apache.org/jira/browse/SPARK-3441 https://issues.apache.org/jira/browse/SPARK-5750 https://issues.apache.org/jira/browse/SPARK-4227 is related but can be addressed in a separate PR since it involves updates to the Spark Configuration Guide. Author: Ilya Ganelin <ilya.ganelin@capitalone.com> Author: Ilya Ganelin <ilganeli@gmail.com> Closes #5074 from ilganeli/SPARK-5750 and squashes the following commits: 6178e24 [Ilya Ganelin] Update programming-guide.md 7a0b96f [Ilya Ganelin] Update programming-guide.md 2c5df08 [Ilya Ganelin] Merge branch 'SPARK-5750' of github.com:ilganeli/spark into SPARK-5750 dffbd2d [Ilya Ganelin] [SPARK-5750] Slight wording update 1ff4eb4 [Ilya Ganelin] Merge remote-tracking branch 'upstream/master' into SPARK-5750 85f9c6e [Ilya Ganelin] Update programming-guide.md 349d1fa [Ilya Ganelin] Added cross linkf or configuration page eeb5a7a [Ilya Ganelin] [SPARK-5750] Added some minor fixes dd5cc9d [Ilya Ganelin] [SPARK-5750] Fixed some factual inaccuracies with regards to shuffle internals. a8adb57 [Ilya Ganelin] [SPARK-5750] Incoporated feedback from Sean Owen 9954bbe [Ilya Ganelin] Merge remote-tracking branch 'upstream/master' into SPARK-5750 159dd1c [Ilya Ganelin] [SPARK-5750] Style fixes from rxin. 75ef67b [Ilya Ganelin] [SPARK-5750][SPARK-3441][SPARK-5836] Added documentation explaining the shuffle operation and included errata from a number of other JIRAs
* [SPARK-6596] fix the instruction on building scaladocCodingCat2015-03-301-2/+2
| | | | | | | | | | | | | | In README.md under docs/ directory, it says that > You can build just the Spark scaladoc by running build/sbt doc from the SPARK_PROJECT_ROOT directory. I guess the right approach is build/sbt unidoc Author: CodingCat <zhunansjtu@gmail.com> Closes #5253 from CodingCat/SPARK-6596 and squashes the following commits: af379ed [CodingCat] fix the instruction on building scaladoc
* [SPARK-6552][Deploy][Doc]expose start-slave.sh to user and update outdated docWangTaoTheTonic2015-03-281-1/+2
| | | | | | | | | | | | | https://issues.apache.org/jira/browse/SPARK-6552 /cc srowen Author: WangTaoTheTonic <wangtao111@huawei.com> Closes #5205 from WangTaoTheTonic/SPARK-6552 and squashes the following commits: b02263c [WangTaoTheTonic] use less than rather than less equal f0fa408 [WangTaoTheTonic] expose start-slave.sh
* [SPARK-6510][GraphX]: Add Graph#minus method to act as Set#differenceBrennon York2015-03-261-0/+2
| | | | | | | | | | | | | | | | | | | | | Adds a `Graph#minus` method which will return only unique `VertexId`'s from the calling `VertexRDD`. To demonstrate a basic example with pseudocode: ``` Set((0L,0),(1L,1)).minus(Set((1L,1),(2L,2))) > Set((0L,0)) ``` Author: Brennon York <brennon.york@capitalone.com> Closes #5175 from brennonyork/SPARK-6510 and squashes the following commits: 248d5c8 [Brennon York] added minus(VertexRDD[VD]) method to avoid createUsingIndex and updated the mask operations to simplify with andNot call 3fb7cce [Brennon York] updated graphx doc to reflect the addition of minus method 6575d92 [Brennon York] updated mima exclude aaa030b [Brennon York] completed graph#minus functionality 7227c0f [Brennon York] beginning work on minus functionality
* [DOCS][SQL] Fix JDBC exampleMichael Armbrust2015-03-261-1/+1
| | | | | | | | Author: Michael Armbrust <michael@databricks.com> Closes #5192 from marmbrus/fixJDBCDocs and squashes the following commits: b48a33d [Michael Armbrust] [DOCS][SQL] Fix JDBC example
* [SPARK-5987] [MLlib] Save/load for GaussianMixtureModelsMechCoder2015-03-251-0/+8
| | | | | | | | | | | | | | | | Should be self explanatory. Author: MechCoder <manojkumarsivaraj334@gmail.com> Closes #4986 from MechCoder/spark-5987 and squashes the following commits: 7d2cd56 [MechCoder] Iterate over dataframe in a better way e7a14cb [MechCoder] Minor 33c84f9 [MechCoder] Store as Array[Data] instead of Data[Array] 505bd57 [MechCoder] Rebased over master and used MatrixUDT 7422bb4 [MechCoder] Store sigmas as Array[Double] instead of Array[Array[Double]] b9794e4 [MechCoder] Minor cb77095 [MechCoder] [SPARK-5987] Save/load for GaussianMixtureModels
* [DOCUMENTATION]Fixed Missing Type Import in DocumentationBill Chambers2015-03-241-1/+2
| | | | | | | | | | | | Needed to import the types specifically, not the more general pyspark.sql Author: Bill Chambers <wchambers@ischool.berkeley.edu> Author: anabranch <wac.chambers@gmail.com> Closes #5179 from anabranch/master and squashes the following commits: 8fa67bf [anabranch] Corrected SqlContext Import 603b080 [Bill Chambers] [DOCUMENTATION]Fixed Missing Type Import in Documentation
* [SPARK-6469] Improving documentation on YARN local directories usageChristophe Préaud2015-03-241-1/+1
| | | | | | | | | | | | | | | | | | | | | Clarify the local directories usage in YARN Author: Christophe Préaud <christophe.preaud@kelkoo.com> Closes #5165 from preaudc/yarn-doc-local-dirs and squashes the following commits: 6912b90 [Christophe Préaud] Fix some formatting issues. 4fa8ec2 [Christophe Préaud] Merge remote-tracking branch 'upstream/master' into yarn-doc-local-dirs eaaf519 [Christophe Préaud] Clarify the local directories usage in YARN 436fb7d [Christophe Préaud] Revert "Clarify the local directories usage in YARN" 876ae5e [Christophe Préaud] Clarify the local directories usage in YARN 608dbfa [Christophe Préaud] Merge remote-tracking branch 'upstream/master' a49a2ce [Christophe Préaud] Merge remote-tracking branch 'upstream/master' 9ba89ca [Christophe Préaud] Ensure that files are fetched atomically 54419ae [Christophe Préaud] Merge remote-tracking branch 'upstream/master' c6a5590 [Christophe Préaud] Revert commit 8ea871f8130b2490f1bad7374a819bf56f0ccbbd 7456a33 [Christophe Préaud] Merge remote-tracking branch 'upstream/master' 8ea871f [Christophe Préaud] Ensure that files are fetched atomically
* [ML][docs][minor] Define LabeledDocument/Document classes in CV examplePeter Rudenko2015-03-241-8/+43
| | | | | | | | | | | To easier copy/paste Cross-Validation example code snippet need to define LabeledDocument/Document in it, since they difined in a previous example. Author: Peter Rudenko <petro.rudenko@gmail.com> Closes #5135 from petro-rudenko/patch-3 and squashes the following commits: 5190c75 [Peter Rudenko] Fix primitive types for java examples. 1d35383 [Peter Rudenko] [SQL][docs][minor] Define LabeledDocument/Document classes in CV example
* Update the command to use IPython notebookCong Yue2015-03-241-1/+5
| | | | | | | | | | As for "notebook --pylab inline" is not supported any more, update the related documentation for this. Author: Cong Yue <yuecong1104@gmail.com> Closes #5111 from yuecong/patch-1 and squashes the following commits: 872df76 [Cong Yue] Update the command to use IPython notebook
* [SPARK-6337][Documentation, SQL]Spark 1.3 doc fixesvinodkc2015-03-221-2/+5
| | | | | | | | Author: vinodkc <vinod.kc.in@gmail.com> Closes #5112 from vinodkc/spark_1.3_doc_fixes and squashes the following commits: 2c6aee6 [vinodkc] Spark 1.3 doc fixes
* SPARK-6454 [DOCS] Fix links to pyspark apiKamil Smuga2015-03-225-19/+19
| | | | | | | | | | | | Author: Kamil Smuga <smugakamil@gmail.com> Author: stderr <smugakamil@gmail.com> Closes #5120 from kamilsmuga/master and squashes the following commits: fee3281 [Kamil Smuga] more python api links fixed for docs 13240cb [Kamil Smuga] resolved merge conflicts with upstream/master 6649b3b [Kamil Smuga] fix broken docs links to Python API 92f03d7 [stderr] Fix links to pyspark api
* [SPARK-6025] [MLlib] Add helper method evaluateEachIteration to extract ↵MechCoder2015-03-201-2/+2
| | | | | | | | | | | | | | | | learning curve Added evaluateEachIteration to allow the user to manually extract the error for each iteration of GradientBoosting. The internal optimisation can be dealt with later. Author: MechCoder <manojkumarsivaraj334@gmail.com> Closes #4906 from MechCoder/spark-6025 and squashes the following commits: 67146ab [MechCoder] Minor 352001f [MechCoder] Minor 6e8aa10 [MechCoder] Made the following changes Used mapPartition instead of map Refactored computeError and unpersisted broadcast variables bc99ac6 [MechCoder] Refactor the method and stuff dbda033 [MechCoder] [SPARK-6025] Add helper method evaluateEachIteration to extract learning curve
* [SPARK-6423][Mesos] MemoryUtils should use memoryOverhead if it's setJongyoul Lee2015-03-201-5/+3
| | | | | | | | | | | | | - Fixed calculateTotalMemory to use spark.mesos.executor.memoryOverhead - Added testCase Author: Jongyoul Lee <jongyoul@gmail.com> Closes #5099 from jongyoul/SPARK-6423 and squashes the following commits: 6747fce [Jongyoul Lee] [SPARK-6423][Mesos] MemoryUtils should use memoryOverhead if it's set - Changed a description of spark.mesos.executor.memoryOverhead 475a7c8 [Jongyoul Lee] [SPARK-6423][Mesos] MemoryUtils should use memoryOverhead if it's set - Fit the import rules 453c5a2 [Jongyoul Lee] [SPARK-6423][Mesos] MemoryUtils should use memoryOverhead if it's set - Fixed calculateTotalMemory to use spark.mesos.executor.memoryOverhead - Added testCase
* [SPARK-6371] [build] Update version to 1.4.0-SNAPSHOT.Marcelo Vanzin2015-03-201-2/+2
| | | | | | | | | | | | | | | | Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #5056 from vanzin/SPARK-6371 and squashes the following commits: 63220df [Marcelo Vanzin] Merge branch 'master' into SPARK-6371 6506f75 [Marcelo Vanzin] Use more fine-grained exclusion. 178ba71 [Marcelo Vanzin] Oops. 75b2375 [Marcelo Vanzin] Exclude VertexRDD in MiMA. a45a62c [Marcelo Vanzin] Work around MIMA warning. 1d8a670 [Marcelo Vanzin] Re-group jetty exclusion. 0e8e909 [Marcelo Vanzin] Ignore ml, don't ignore graphx. cef4603 [Marcelo Vanzin] Indentation. 296cf82 [Marcelo Vanzin] [SPARK-6371] [build] Update version to 1.4.0-SNAPSHOT.
* [SPARK-6426][Doc]User could also point the yarn cluster config directory via ↵WangTaoTheTonic2015-03-201-2/+2
| | | | | | | | | | | | | | YARN_CONF_DI... ...R https://issues.apache.org/jira/browse/SPARK-6426 Author: WangTaoTheTonic <wangtao111@huawei.com> Closes #5103 from WangTaoTheTonic/SPARK-6426 and squashes the following commits: e6dd78d [WangTaoTheTonic] User could also point the yarn cluster config directory via YARN_CONF_DIR
* [SPARK-6402][DOC] - Remove some refererences to shark in docs and ec2Pierre Borckmans2015-03-192-5/+3
| | | | | | | | | | | | | | | EC2 script and job scheduling documentation still refered to Shark. I removed these references. I also removed a remaining `SHARK_VERSION` variable from `ec2-variables.sh`. Author: Pierre Borckmans <pierre.borckmans@realimpactanalytics.com> Closes #5083 from pierre-borckmans/remove_refererences_to_shark_in_docs and squashes the following commits: 4e90ffc [Pierre Borckmans] Removed deprecated SHARK_VERSION caea407 [Pierre Borckmans] Remove shark reference from ec2 script doc 196c744 [Pierre Borckmans] Removed references to Shark
* SPARK-6085 Part. 2 Increase default value for memory overheadJongyoul Lee2015-03-181-3/+3
| | | | | | | | | | | | - fixed a description of spark.mesos.executor.memoryOverhead from 7% to 10% - This is a second part of SPARK-6085 Author: Jongyoul Lee <jongyoul@gmail.com> Closes #5065 from jongyoul/SPARK-6085-1 and squashes the following commits: c5af84c [Jongyoul Lee] SPARK-6085 Part. 2 Increase default value for memory overhead - Changed "MiB" to "MB" dbac1c0 [Jongyoul Lee] SPARK-6085 Part. 2 Increase default value for memory overhead - fixed a description of spark.mesos.executor.memoryOverhead from 7% to 10%
* [SPARK-6383][SQL]Fixed compiler and errors in Dataframe examplesTijo Thomas2015-03-171-6/+6
| | | | | | | | | Author: Tijo Thomas <tijoparacka@gmail.com> Closes #5068 from tijoparacka/fix_sql_dataframe_example and squashes the following commits: 6953ac1 [Tijo Thomas] Handled Java and Python example sections 0751a74 [Tijo Thomas] Fixed compiler and errors in Dataframe examples
* [SPARK-6336] LBFGS should document what convergenceTol meanslewuathe2015-03-171-0/+4
| | | | | | | | | | | | LBFGS uses convergence tolerance. This value should be written in document as an argument. Author: lewuathe <lewuathe@me.com> Closes #5033 from Lewuathe/SPARK-6336 and squashes the following commits: e738b33 [lewuathe] Modify text to be more natural ac03c3a [lewuathe] Modify documentations 6ccb304 [lewuathe] [SPARK-6336] LBFGS should document what convergenceTol means
* [SPARK-6313] Add config option to disable file locks/fetchFile cache to ...nemccarthy2015-03-171-0/+12
| | | | | | | | | | | | | | ...support NFS mounts. This is a work around for now with the goal to find a more permanent solution. https://issues.apache.org/jira/browse/SPARK-6313 Author: nemccarthy <nathan@nemccarthy.me> Closes #5036 from nemccarthy/master and squashes the following commits: 2eaaf42 [nemccarthy] [SPARK-6313] Update config wording doc for spark.files.useFetchCache 5de7eb4 [nemccarthy] [SPARK-6313] Add config option to disable file locks/fetchFile cache to support NFS mounts
* [docs] [SPARK-4820] Spark build encounters "File name too long" on some ↵Theodore Vasiloudis2015-03-171-0/+12
| | | | | | | | | | | | encrypted filesystems Added a note instructing users how to build Spark in an encrypted file system. Author: Theodore Vasiloudis <tvas@sics.se> Closes #5041 from thvasilo/patch-2 and squashes the following commits: 09d890b [Theodore Vasiloudis] Workaroung for buiding in an encrypted filesystem
* [SPARK-3619] Part 2. Upgrade to Mesos 0.21 to work around MESOS-1688Jongyoul Lee2015-03-151-4/+1
| | | | | | | | | | | | | | - MESOS_NATIVE_LIBRARY become deprecated - Chagned MESOS_NATIVE_LIBRARY to MESOS_NATIVE_JAVA_LIBRARY Author: Jongyoul Lee <jongyoul@gmail.com> Closes #4361 from jongyoul/SPARK-3619-1 and squashes the following commits: f1ea91f [Jongyoul Lee] Merge branch 'SPARK-3619-1' of https://github.com/jongyoul/spark into SPARK-3619-1 a6a00c2 [Jongyoul Lee] [SPARK-3619] Upgrade to Mesos 0.21 to work around MESOS-1688 - Removed 'Known issues' section 2e15a21 [Jongyoul Lee] [SPARK-3619] Upgrade to Mesos 0.21 to work around MESOS-1688 - MESOS_NATIVE_LIBRARY become deprecated - Chagned MESOS_NATIVE_LIBRARY to MESOS_NATIVE_JAVA_LIBRARY 0dace7b [Jongyoul Lee] [SPARK-3619] Upgrade to Mesos 0.21 to work around MESOS-1688 - MESOS_NATIVE_LIBRARY become deprecated - Chagned MESOS_NATIVE_LIBRARY to MESOS_NATIVE_JAVA_LIBRARY
* [SPARK-6329][Docs]: Minor doc changes for Mesos and TOCBrennon York2015-03-141-3/+5
| | | | | | | | | | Updated the configuration docs from the minor items that Reynold had left over from SPARK-1182; specifically I updated the `running-on-mesos` link to point directly to `running-on-mesos#configuration` and upgraded the `yarn`, `mesos`, etc. bullets to `<h5>` tags in hopes that they'll get pushed into the TOC. Author: Brennon York <brennon.york@capitalone.com> Closes #5022 from brennonyork/SPARK-6329 and squashes the following commits: 42a10a9 [Brennon York] minor doc fixes
* [SPARK-6278][MLLIB] Mention the change of objective in linear regressionXiangrui Meng2015-03-131-0/+2
| | | | | | | | | | | | | As discussed in the RC3 vote thread, we should mention the change of objective in linear regression in the migration guide. srowen Author: Xiangrui Meng <meng@databricks.com> Closes #4978 from mengxr/SPARK-6278 and squashes the following commits: fb3bbe6 [Xiangrui Meng] mention regularization parameter bfd6cff [Xiangrui Meng] Merge remote-tracking branch 'apache/master' into SPARK-6278 375fd09 [Xiangrui Meng] address Sean's comments f87ae71 [Xiangrui Meng] mention step size change
* [SPARK-5310] [SQL] [DOC] Parquet section for the SQL programming guideCheng Lian2015-03-131-57/+180
| | | | | | | | | | | | | | | | Also fixed a bunch of minor styling issues. <!-- Reviewable:start --> [<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/5001) <!-- Reviewable:end --> Author: Cheng Lian <lian@databricks.com> Closes #5001 from liancheng/parquet-doc and squashes the following commits: 89ad3db [Cheng Lian] Addresses @rxin's comments 7eb6955 [Cheng Lian] Docs for the new Parquet data source 415eefb [Cheng Lian] Some minor formatting improvements
* [SPARK-6275][Documentation]Miss toDF() function in docs/sql-programming-guide.mdzzcclp2015-03-121-1/+1
| | | | | | | | | | Miss `toDF()` function in docs/sql-programming-guide.md Author: zzcclp <xm_zzc@sina.com> Closes #4977 from zzcclp/SPARK-6275 and squashes the following commits: 9a96c7b [zzcclp] Miss toDF()
* [SPARK-5814][MLLIB][GRAPHX] Remove JBLAS from runtimeXiangrui Meng2015-03-121-5/+0
| | | | | | | | | | | | | | | | | The issue is discussed in https://issues.apache.org/jira/browse/SPARK-5669. Replacing all JBLAS usage by netlib-java gives us a simpler dependency tree and less license issues to worry about. I didn't touch the test scope in this PR. The user guide is not modified to avoid merge conflicts with branch-1.3. srowen ankurdave pwendell Author: Xiangrui Meng <meng@databricks.com> Closes #4699 from mengxr/SPARK-5814 and squashes the following commits: 48635c6 [Xiangrui Meng] move netlib-java version to parent pom ca21c74 [Xiangrui Meng] remove jblas from ml-guide 5f7767a [Xiangrui Meng] Merge remote-tracking branch 'apache/master' into SPARK-5814 c5c4183 [Xiangrui Meng] merge master 0f20cad [Xiangrui Meng] add mima excludes e53e9f4 [Xiangrui Meng] remove jblas from mllib runtime ceaa14d [Xiangrui Meng] replace jblas by netlib-java in graphx fa7c2ca [Xiangrui Meng] move jblas to test scope
* [SPARK-6128][Streaming][Documentation] Updates to Spark Streaming ↵Tathagata Das2015-03-114-109/+528
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Programming Guide Updates to the documentation are as follows: - Added information on Kafka Direct API and Kafka Python API - Added joins to the main streaming guide - Improved details on the fault-tolerance semantics Generated docs located here http://people.apache.org/~tdas/spark-1.3.0-temp-docs/streaming-programming-guide.html#fault-tolerance-semantics More things to add: - Configuration for Kafka receive rate - May be add concurrentJobs Author: Tathagata Das <tathagata.das1565@gmail.com> Closes #4956 from tdas/streaming-guide-update-1.3 and squashes the following commits: 819408c [Tathagata Das] Minor fixes. debe484 [Tathagata Das] Added DataFrames and MLlib 380cf8d [Tathagata Das] Fix link 04167a6 [Tathagata Das] Merge remote-tracking branch 'apache-github/master' into streaming-guide-update-1.3 0b77486 [Tathagata Das] Updates based on Josh's comments. 86c4c2a [Tathagata Das] Updated streaming guides 82de92a [Tathagata Das] Add Kafka to Python api docs
* SPARK-3642. Document the nuances of shared variables.Sandy Ryza2015-03-111-0/+6
| | | | | | | | Author: Sandy Ryza <sandy@cloudera.com> Closes #2490 from sryza/sandy-spark-3642 and squashes the following commits: aae3340 [Sandy Ryza] SPARK-3642. Document the nuances of broadcast variables
* [SPARK-4423] Improve foreach() documentation to avoid confusion between ↵Ilya Ganelin2015-03-111-4/+68
| | | | | | | | | | | | | | | | | | | | | local- and cluster-mode behavior Hi all - I've added a writeup on how closures work within Spark to help clarify the general case for this problem and similar problems. I hope this addresses the issue and would love any feedback. Author: Ilya Ganelin <ilya.ganelin@capitalone.com> Closes #4696 from ilganeli/SPARK-4423 and squashes the following commits: c5dc498 [Ilya Ganelin] Fixed typo 07b78e8 [Ilya Ganelin] Updated to fix capitalization 48c1983 [Ilya Ganelin] Updated to fix capitalization and clarify wording 2fd2a07 [Ilya Ganelin] Incoporated a few more minor fixes. Fixed a bug in python code. Added semicolons for java 4772f99 [Ilya Ganelin] Incorporated latest feedback 448bd79 [Ilya Ganelin] Updated some verbage and added section links 5dbbda5 [Ilya Ganelin] Improved some wording d374d3a [Ilya Ganelin] Merge remote-tracking branch 'upstream/master' into SPARK-4423 2600668 [Ilya Ganelin] Minor edits c768ab2 [Ilya Ganelin] Updated documentation to add a section on closures. This helps understand confusing behavior of foreach and map functions when attempting to modify variables outside of the scope of an RDD action or transformation
* [MINOR] [DOCS] Fix map -> mapToPair in Streaming Java exampleSean Owen2015-03-111-4/+4
| | | | | | | | | | Fix map -> mapToPair in Java example. (And zap some unneeded "throws Exception" while here) Author: Sean Owen <sowen@cloudera.com> Closes #4967 from srowen/MapToPairFix and squashes the following commits: ded2bc0 [Sean Owen] Fix map -> mapToPair in Java example. (And zap some unneeded "throws Exception" while here)
* [SPARK-4924] Add a library for launching Spark jobs programmatically.Marcelo Vanzin2015-03-111-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This change encapsulates all the logic involved in launching a Spark job into a small Java library that can be easily embedded into other applications. The overall goal of this change is twofold, as described in the bug: - Provide a public API for launching Spark processes. This is a common request from users and currently there's no good answer for it. - Remove a lot of the duplicated code and other coupling that exists in the different parts of Spark that deal with launching processes. A lot of the duplication was due to different code needed to build an application's classpath (and the bootstrapper needed to run the driver in certain situations), and also different code needed to parse spark-submit command line options in different contexts. The change centralizes those as much as possible so that all code paths can rely on the library for handling those appropriately. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #3916 from vanzin/SPARK-4924 and squashes the following commits: 18c7e4d [Marcelo Vanzin] Fix make-distribution.sh. 2ce741f [Marcelo Vanzin] Add lots of quotes. 3b28a75 [Marcelo Vanzin] Update new pom. a1b8af1 [Marcelo Vanzin] Merge branch 'master' into SPARK-4924 897141f [Marcelo Vanzin] Review feedback. e2367d2 [Marcelo Vanzin] Merge branch 'master' into SPARK-4924 28cd35e [Marcelo Vanzin] Remove stale comment. b1d86b0 [Marcelo Vanzin] Merge branch 'master' into SPARK-4924 00505f9 [Marcelo Vanzin] Add blurb about new API in the programming guide. 5f4ddcc [Marcelo Vanzin] Better usage messages. 92a9cfb [Marcelo Vanzin] Fix Win32 launcher, usage. 6184c07 [Marcelo Vanzin] Rename field. 4c19196 [Marcelo Vanzin] Update comment. 7e66c18 [Marcelo Vanzin] Fix pyspark tests. 0031a8e [Marcelo Vanzin] Review feedback. c12d84b [Marcelo Vanzin] Review feedback. And fix spark-submit on Windows. e2d4d71 [Marcelo Vanzin] Simplify some code used to launch pyspark. 43008a7 [Marcelo Vanzin] Don't make builder extend SparkLauncher. b4d6912 [Marcelo Vanzin] Use spark-submit script in SparkLauncher. 28b1434 [Marcelo Vanzin] Add a comment. 304333a [Marcelo Vanzin] Fix propagation of properties file arg. bb67b93 [Marcelo Vanzin] Remove unrelated Yarn change (that is also wrong). 8ec0243 [Marcelo Vanzin] Add missing newline. 95ddfa8 [Marcelo Vanzin] Fix handling of --help for spark-class command builder. 72da7ec [Marcelo Vanzin] Rename SparkClassLauncher. 62978e4 [Marcelo Vanzin] Minor cleanup of Windows code path. 9cd5b44 [Marcelo Vanzin] Make all non-public APIs package-private. e4c80b6 [Marcelo Vanzin] Reorganize the code so that only SparkLauncher is public. e50dc5e [Marcelo Vanzin] Merge branch 'master' into SPARK-4924 de81da2 [Marcelo Vanzin] Fix CommandUtils. 86a87bf [Marcelo Vanzin] Merge branch 'master' into SPARK-4924 2061967 [Marcelo Vanzin] Merge branch 'master' into SPARK-4924 46d46da [Marcelo Vanzin] Clean up a test and make it more future-proof. b93692a [Marcelo Vanzin] Merge branch 'master' into SPARK-4924 ad03c48 [Marcelo Vanzin] Revert "Fix a thread-safety issue in "local" mode." 0b509d0 [Marcelo Vanzin] Merge branch 'master' into SPARK-4924 23aa2a9 [Marcelo Vanzin] Read java-opts from conf dir, not spark home. 7cff919 [Marcelo Vanzin] Javadoc updates. eae4d8e [Marcelo Vanzin] Fix new unit tests on Windows. e570fb5 [Marcelo Vanzin] Merge branch 'master' into SPARK-4924 44cd5f7 [Marcelo Vanzin] Add package-info.java, clean up javadocs. f7cacff [Marcelo Vanzin] Remove "launch Spark in new thread" feature. 7ed8859 [Marcelo Vanzin] Some more feedback. 54cd4fd [Marcelo Vanzin] Merge branch 'master' into SPARK-4924 61919df [Marcelo Vanzin] Clean leftover debug statement. aae5897 [Marcelo Vanzin] Use launcher classes instead of jars in non-release mode. e584fc3 [Marcelo Vanzin] Rework command building a little bit. 525ef5b [Marcelo Vanzin] Rework Unix spark-class to handle argument with newlines. 8ac4e92 [Marcelo Vanzin] Minor test cleanup. e946a99 [Marcelo Vanzin] Merge PySparkLauncher into SparkSubmitCliLauncher. c617539 [Marcelo Vanzin] Review feedback round 1. fc6a3e2 [Marcelo Vanzin] Merge branch 'master' into SPARK-4924 f26556b [Marcelo Vanzin] Fix a thread-safety issue in "local" mode. 2f4e8b4 [Marcelo Vanzin] Changes needed to make this work with SPARK-4048. 799fc20 [Marcelo Vanzin] Merge branch 'master' into SPARK-4924 bb5d324 [Marcelo Vanzin] Merge branch 'master' into SPARK-4924 53faef1 [Marcelo Vanzin] Merge branch 'master' into SPARK-4924 a7936ef [Marcelo Vanzin] Fix pyspark tests. 656374e [Marcelo Vanzin] Mima fixes. 4d511e7 [Marcelo Vanzin] Fix tools search code. 7a01e4a [Marcelo Vanzin] Fix pyspark on Yarn. 1b3f6e9 [Marcelo Vanzin] Call SparkSubmit from spark-class launcher for unknown classes. 25c5ae6 [Marcelo Vanzin] Centralize SparkSubmit command line parsing. 27be98a [Marcelo Vanzin] Modify Spark to use launcher lib. 6f70eea [Marcelo Vanzin] [SPARK-4924] Add a library for launching Spark jobs programatically.
* [SPARK-5183][SQL] Update SQL Docs with JDBC and Migration GuideMichael Armbrust2015-03-101-12/+370
| | | | | | | | | | | | | Author: Michael Armbrust <michael@databricks.com> Closes #4958 from marmbrus/sqlDocs and squashes the following commits: 9351dbc [Michael Armbrust] fix parquet example 6877e13 [Michael Armbrust] add sql examples d81b7e7 [Michael Armbrust] rxins comments e393528 [Michael Armbrust] fix order 19c2735 [Michael Armbrust] more on data source load/store 00d5914 [Michael Armbrust] Update SQL Docs with JDBC and Migration Guide
* [SPARK-5310][Doc] Update SQL Programming Guide to include DataFrames.Reynold Xin2015-03-093-122/+286
| | | | | | | | Author: Reynold Xin <rxin@databricks.com> Closes #4954 from rxin/df-docs and squashes the following commits: c592c70 [Reynold Xin] [SPARK-5310][Doc] Update SQL Programming Guide to include DataFrames.
* Fix python typo (+ Scala, Java typos)RobertZK2015-03-071-7/+10
| | | | | | | | | | Author: RobertZK <technoguyrob@gmail.com> Author: Robert Krzyzanowski <technoguyrob@gmail.com> Closes #4840 from robertzk/patch-1 and squashes the following commits: d286215 [RobertZK] lambda fix per @laserson 5937989 [Robert Krzyzanowski] Fix python typo
* SPARK-6085 Increase default value for memory overheadtedyu2015-03-042-2/+2
| | | | | | | | | Author: tedyu <yuzhihong@gmail.com> Closes #4836 from tedyu/master and squashes the following commits: d65b495 [tedyu] SPARK-6085 Increase default value for memory overhead 1fdd4df [tedyu] SPARK-6085 Increase default value for memory overhead
* SPARK-1911 [DOCS] Warn users if their assembly jars are not built with Java 6Sean Owen2015-03-031-0/+4
| | | | | | | | | | | | Add warning about building with Java 7+ and running the JAR on early Java 6. CC andrewor14 Author: Sean Owen <sowen@cloudera.com> Closes #4874 from srowen/SPARK-1911 and squashes the following commits: 79fa2f6 [Sean Owen] Add warning about building with Java 7+ and running the JAR on early Java 6.
* [SPARK-5537][MLlib][Docs] Add user guide for multinomial logistic regressionDB Tsai2015-03-021-0/+10
| | | | | | | | | | Adding more description on top of #4861. Author: DB Tsai <dbtsai@alpinenow.com> Closes #4866 from dbtsai/doc and squashes the following commits: 37e9d07 [DB Tsai] doc
* [SPARK-6097][MLLIB] Support tree model save/load in PySpark/MLlibXiangrui Meng2015-03-022-18/+30
| | | | | | | | | | | | | | | | Similar to `MatrixFactorizaionModel`, we only need wrappers to support save/load for tree models in Python. jkbradley Author: Xiangrui Meng <meng@databricks.com> Closes #4854 from mengxr/SPARK-6097 and squashes the following commits: 4586a4d [Xiangrui Meng] fix more typos 8ebcac2 [Xiangrui Meng] fix python style 91172d8 [Xiangrui Meng] fix typos 201b3b9 [Xiangrui Meng] update user guide b5158e2 [Xiangrui Meng] support tree model save/load in PySpark/MLlib
* [SPARK-5537] Add user guide for multinomial logistic regressionXiangrui Meng2015-03-021-61/+217
| | | | | | | | | | | | | | | This is based on #4801 from dbtsai. The linear method guide is re-organized a little bit for this change. Closes #4801 Author: Xiangrui Meng <meng@databricks.com> Author: DB Tsai <dbtsai@alpinenow.com> Closes #4861 from mengxr/SPARK-5537 and squashes the following commits: 47af0ac [Xiangrui Meng] update user guide for multinomial logistic regression cdc2e15 [Xiangrui Meng] Merge remote-tracking branch 'apache/master' into AlpineNow-mlor-doc 096d0ca [DB Tsai] first commit
* [SPARK-6048] SparkConf should not translate deprecated configs on setAndrew Or2015-03-021-1/+3
| | | | | | | | | | | | | | | | | There are multiple issues with translating on set outlined in the JIRA. This PR reverts the translation logic added to `SparkConf`. In the future, after the 1.3.0 release we will figure out a way to reorganize the internal structure more elegantly. For now, let's preserve the existing semantics of `SparkConf` since it's a public interface. Unfortunately this means duplicating some code for now, but this is all internal and we can always clean it up later. Author: Andrew Or <andrew@databricks.com> Closes #4799 from andrewor14/conf-set-translate and squashes the following commits: 11c525b [Andrew Or] Move warning to driver 10e77b5 [Andrew Or] Add documentation for deprecation precedence a369cb1 [Andrew Or] Merge branch 'master' of github.com:apache/spark into conf-set-translate c26a9e3 [Andrew Or] Revert all translate logic in SparkConf fef6c9c [Andrew Or] Restore deprecation logic for spark.executor.userClassPathFirst 94b4dfa [Andrew Or] Translate on get, not set
* SPARK-5390 [DOCS] Encourage users to post on Stack Overflow in Community DocsSean Owen2015-03-021-8/+2
| | | | | | | | | | | | | Point "Community" to main Spark Community page; mention SO tag apache-spark. Separately, the Apache site can be updated to mention, under Mailing Lists: "StackOverflow also has an apache-spark tag for Spark Q&A." or similar. Author: Sean Owen <sowen@cloudera.com> Closes #4843 from srowen/SPARK-5390 and squashes the following commits: 3508ac6 [Sean Owen] Point "Community" to main Spark Community page; mention SO tag apache-spark
* aggregateMessages example in graphX docDEBORAH SIEGEL2015-03-021-2/+2
| | | | | | | | | | | | Examples illustrating difference between legacy mapReduceTriplets usage and aggregateMessages usage has type issues on the reduce for both operators. Being just an example- changed example to reduce the message String by concatenation. Although non-optimal for performance. Author: DEBORAH SIEGEL <deborahsiegel@DEBORAHs-MacBook-Pro.local> Closes #4853 from d3borah/master and squashes the following commits: db54173 [DEBORAH SIEGEL] fixed aggregateMessages example in graphX doc
* [SPARK-6083] [MLLib] [DOC] Make Python API example consistent in NaiveBayesMechCoder2015-03-011-10/+16
| | | | | | | | | Author: MechCoder <manojkumarsivaraj334@gmail.com> Closes #4834 from MechCoder/spark-6083 and squashes the following commits: 1cdd7b5 [MechCoder] Add parse function 65bbbe9 [MechCoder] [SPARK-6083] Make Python API example consistent in NaiveBayes
* [SPARK-6053][MLLIB] support save/load in PySpark's ALSXiangrui Meng2015-03-011-3/+5
| | | | | | | | | | | | | A simple wrapper to save/load `MatrixFactorizationModel` in Python. jkbradley Author: Xiangrui Meng <meng@databricks.com> Closes #4811 from mengxr/SPARK-5991 and squashes the following commits: f135dac [Xiangrui Meng] update save doc 57e5200 [Xiangrui Meng] address comments 06140a4 [Xiangrui Meng] Merge remote-tracking branch 'apache/master' into SPARK-5991 282ec8d [Xiangrui Meng] support save/load in PySpark's ALS
* [SPARK-4587] [mllib] [docs] Fixed save,load calls in ML guide examplesJoseph K. Bradley2015-02-275-40/+60
| | | | | | | | | | | | | Should pass spark context to save/load CC: mengxr Author: Joseph K. Bradley <joseph@databricks.com> Closes #4816 from jkbradley/ml-io-doc-fix and squashes the following commits: 83d369d [Joseph K. Bradley] added comment to save,load parts of ML guide examples 2841170 [Joseph K. Bradley] Fixed save,load calls in ML guide examples
* fix spark-6033, clarify the spark.worker.cleanup behavior in standalone mode许鹏2015-02-261-2/+1
| | | | | | | | | | | | | | jira case spark-6033 https://issues.apache.org/jira/browse/SPARK-6033 In standalone deploy mode, the cleanup will only remove the stopped application's directories. The original description about the cleanup behavior is incorrect. Author: 许鹏 <peng.xu@fraudmetrix.cn> Closes #4803 from hseagle/spark-6033 and squashes the following commits: 927a6a0 [许鹏] fix the incorrect description about the spark.worker.cleanup in standalone mode