aboutsummaryrefslogtreecommitdiff
path: root/docs
Commit message (Collapse)AuthorAgeFilesLines
* [SPARK-7739] [MLLIB] Improve ChiSqSelector example code in user guidesethah2015-06-301-6/+8
| | | | | | | | | Author: sethah <seth.hendrickson16@gmail.com> Closes #7029 from sethah/working_on_SPARK-7739 and squashes the following commits: ef96916 [sethah] Fixing some style issues efea1f8 [sethah] adding clarification to ChiSqSelector example
* [SPARK-8615] [DOCUMENTATION] Fixed Sample deprecated codeTijo Thomas2015-06-301-5/+5
| | | | | | | | | | | | Modified the deprecated jdbc api in the documentation. Author: Tijo Thomas <tijoparacka@gmail.com> Closes #7039 from tijoparacka/JIRA_8615 and squashes the following commits: 6e73b8a [Tijo Thomas] Reverted new lines 4042fcf [Tijo Thomas] updated to sql documentation a27949c [Tijo Thomas] Fixed Sample deprecated code
* [SPARK-4127] [MLLIB] [PYSPARK] Python bindings for ↵MechCoder2015-06-301-0/+52
| | | | | | | | | | | | | | | | | | | StreamingLinearRegressionWithSGD Python bindings for StreamingLinearRegressionWithSGD Author: MechCoder <manojkumarsivaraj334@gmail.com> Closes #6744 from MechCoder/spark-4127 and squashes the following commits: d8f6457 [MechCoder] Moved StreamingLinearAlgorithm to pyspark.mllib.regression d47cc24 [MechCoder] Inherit from StreamingLinearAlgorithm 1b4ddd6 [MechCoder] minor 4de6c68 [MechCoder] Minor refactor 5e85a3b [MechCoder] Add tests for simultaneous training and prediction fb27889 [MechCoder] Add example and docs 505380b [MechCoder] Add tests d42bdae [MechCoder] [SPARK-4127] Python bindings for StreamingLinearRegressionWithSGD
* [SPARK-3629] [YARN] [DOCS]: Improvement of the "Running Spark on YARN" documentNeelesh Srinivas Salian2015-06-271-82/+82
| | | | | | | | | | | | | | As per the description in the JIRA, I moved the contents of the page and added a few additional content. Author: Neelesh Srinivas Salian <nsalian@cloudera.com> Closes #6924 from nssalian/SPARK-3629 and squashes the following commits: 944b7a0 [Neelesh Srinivas Salian] Changed the lines about deploy-mode and added backticks to all parameters 40dbc0b [Neelesh Srinivas Salian] Changed dfs to HDFS, deploy-mode in backticks and updated the master yarn line 9cbc072 [Neelesh Srinivas Salian] Updated a few lines in the Launching Spark on YARN Section 8e8db7f [Neelesh Srinivas Salian] Removed the changes in this commit to help clearly distinguish movement from update 151c298 [Neelesh Srinivas Salian] SPARK-3629: Improvement of the Spark on YARN document
* [SPARK-8639] [DOCS] Fixed Minor Typos in DocumentationRosstin2015-06-272-2/+2
| | | | | | | | | | | | Ticket: [SPARK-8639](https://issues.apache.org/jira/browse/SPARK-8639) fixed minor typos in docs/README.md and docs/api.md Author: Rosstin <asterazul@gmail.com> Closes #7046 from Rosstin/SPARK-8639 and squashes the following commits: 6c18058 [Rosstin] fixed minor typos in docs/README.md and docs/api.md
* [SPARK-8302] Support heterogeneous cluster install paths on YARN.Marcelo Vanzin2015-06-261-0/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some users have Hadoop installations on different paths across their cluster. Currently, that makes it hard to set up some configuration in Spark since that requires hardcoding paths to jar files or native libraries, which wouldn't work on such a cluster. This change introduces a couple of YARN-specific configurations that instruct the backend to replace certain paths when launching remote processes. That way, if the configuration says the Spark jar is in "/spark/spark.jar", and also says that "/spark" should be replaced with "{{SPARK_INSTALL_DIR}}", YARN will start containers in the NMs with "{{SPARK_INSTALL_DIR}}/spark.jar" as the location of the jar. Coupled with YARN's environment whitelist (which allows certain env variables to be exposed to containers), this allows users to support such heterogeneous environments, as long as a single replacement is enough. (Otherwise, this feature would need to be extended to support multiple path replacements.) Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #6752 from vanzin/SPARK-8302 and squashes the following commits: 4bff8d4 [Marcelo Vanzin] Add docs, rename configs. 0aa2a02 [Marcelo Vanzin] Only do replacement for paths that need it. 2e9cc9d [Marcelo Vanzin] Style. a5e1f68 [Marcelo Vanzin] [SPARK-8302] Support heterogeneous cluster install paths on YARN.
* [SPARK-8506] Add pakages to R context created through init.Holden Karau2015-06-241-4/+13
| | | | | | | | | | | | | Author: Holden Karau <holden@pigscanfly.ca> Closes #6928 from holdenk/SPARK-8506-sparkr-does-not-provide-an-easy-way-to-depend-on-spark-packages-when-performing-init-from-inside-of-r and squashes the following commits: b60dd63 [Holden Karau] Add an example with the spark-csv package fa8bc92 [Holden Karau] typo: sparm -> spark 865a90c [Holden Karau] strip spaces for comparision c7a4471 [Holden Karau] Add some documentation c1a9233 [Holden Karau] refactor for testing c818556 [Holden Karau] Add pakages to R
* [SPARK-8139] [SQL] Updates docs and comments of data sources and Parquet ↵Cheng Lian2015-06-231-1/+29
| | | | | | | | | | | | | output committer options This PR only applies to master branch (1.5.0-SNAPSHOT) since it references `org.apache.parquet` classes which only appear in Parquet 1.7.0. Author: Cheng Lian <lian@databricks.com> Closes #6683 from liancheng/output-committer-docs and squashes the following commits: b4648b8 [Cheng Lian] Removes spark.sql.sources.outputCommitterClass as it's not a public option ee63923 [Cheng Lian] Updates docs and comments of data sources and Parquet output committer options
* [DOC] [SQL] Addes Hive metastore Parquet table conversion sectionCheng Lian2015-06-231-6/+88
| | | | | | | | | | | | | | | | | | This PR adds a section about Hive metastore Parquet table conversion. It documents: 1. Schema reconciliation rules introduced in #5214 (see [this comment] [1] in #5188) 2. Metadata refreshing requirement introduced in #5339 [1]: https://github.com/apache/spark/pull/5188#issuecomment-86531248 Author: Cheng Lian <lian@databricks.com> Closes #5348 from liancheng/sql-doc-parquet-conversion and squashes the following commits: 42ae0d0 [Cheng Lian] Adds Python `refreshTable` snippet 4c9847d [Cheng Lian] Resorts to SQL for Python metadata refreshing snippet 756e660 [Cheng Lian] Adds Python snippet for metadata refreshing 50675db [Cheng Lian] Addes Hive metastore Parquet table conversion section
* [SPARK-7715] [MLLIB] [ML] [DOC] Updated MLlib programming guide for release 1.4Joseph K. Bradley2015-06-214-33/+65
| | | | | | | | | | | | | | | | Reorganized docs a bit. Added migration guides. **Q**: Do we want to say more for the 1.3 -> 1.4 migration guide for ```spark.ml```? It would be a lot. CC: mengxr Author: Joseph K. Bradley <joseph@databricks.com> Closes #6897 from jkbradley/ml-guide-1.4 and squashes the following commits: 4bf26d6 [Joseph K. Bradley] tiny fix 8085067 [Joseph K. Bradley] fixed spacing/layout issues in ml guide from previous commit in this PR 6cd5c78 [Joseph K. Bradley] Updated MLlib programming guide for release 1.4
* [SPARK-8390] [STREAMING] [KAFKA] fix docs related to HasOffsetRangescody koeninger2015-06-191-21/+49
| | | | | | | | | | | | Author: cody koeninger <cody@koeninger.org> Closes #6863 from koeninger/SPARK-8390 and squashes the following commits: 26a06bd [cody koeninger] Merge branch 'master' into SPARK-8390 3744492 [cody koeninger] [Streaming][Kafka][SPARK-8390] doc changes per TD, test to make sure approach shown in docs actually compiles + runs b108c9d [cody koeninger] [Streaming][Kafka][SPARK-8390] further doc fixes, clean up spacing bb4336b [cody koeninger] [Streaming][Kafka][SPARK-8390] fix docs related to HasOffsetRanges, cleanup 3f3c57a [cody koeninger] [Streaming][Kafka][SPARK-8389] Example of getting offset ranges out of the existing java direct stream api
* [SPARK-4118] [MLLIB] [PYSPARK] Python bindings for StreamingKMeansMechCoder2015-06-191-4/+44
| | | | | | | | | | | | | | | | | | | | | | Python bindings for StreamingKMeans Will change status to MRG once docs, tests and examples are updated. Author: MechCoder <manojkumarsivaraj334@gmail.com> Closes #6499 from MechCoder/spark-4118 and squashes the following commits: 7722d16 [MechCoder] minor style fixes 51052d3 [MechCoder] Doc fixes 2061a76 [MechCoder] Add tests for simultaneous training and prediction Minor style fixes 81482fd [MechCoder] minor 5d9fe61 [MechCoder] predictOn should take into account the latest model 8ab9e89 [MechCoder] Fix Python3 error a9817df [MechCoder] Better tests and minor fixes c80e451 [MechCoder] Add ignore_unicode_prefix ee8ce16 [MechCoder] Update tests, doc and examples 4b1481f [MechCoder] Some changes and tests d8b066a [MechCoder] [SPARK-4118] [MLlib] [PySpark] Python bindings for StreamingKMeans
* [SPARK-5836] [DOCS] [STREAMING] Clarify what may cause long-running Spark ↵Sean Owen2015-06-191-3/+5
| | | | | | | | | | | | apps to preserve shuffle files Clarify what may cause long-running Spark apps to preserve shuffle files Author: Sean Owen <sowen@cloudera.com> Closes #6901 from srowen/SPARK-5836 and squashes the following commits: a9faef0 [Sean Owen] Clarify what may cause long-running Spark apps to preserve shuffle files
* [SPARK-7265] Improving documentation for Spark SQL Hive supportJihong MA2015-06-191-1/+6
| | | | | | | | | | | | | | | | Please review this pull request. Author: Jihong MA <linlin200605@gmail.com> Closes #5933 from JihongMA/SPARK-7265 and squashes the following commits: dfaa971 [Jihong MA] SPARK-7265 minor fix of the content ace454d [Jihong MA] SPARK-7265 take out PySpark on YARN limitation 9ea0832 [Jihong MA] Merge remote-tracking branch 'upstream/master' d5bf3f5 [Jihong MA] Merge remote-tracking branch 'upstream/master' 7b842e6 [Jihong MA] Merge remote-tracking branch 'upstream/master' 9c84695 [Jihong MA] SPARK-7265 address review comment a399aa6 [Jihong MA] SPARK-7265 Improving documentation for Spark SQL Hive support
* [SPARK-8462] [DOCS] Documentation fixes for Spark SQLLars Francke2015-06-181-14/+14
| | | | | | | | | | | This fixes various minor documentation issues on the Spark SQL page Author: Lars Francke <lars.francke@gmail.com> Closes #6890 from lfrancke/SPARK-8462 and squashes the following commits: dd7e302 [Lars Francke] Merge branch 'master' into SPARK-8462 34eff2c [Lars Francke] Minor documentation fixes
* [SPARK-8376] [DOCS] Add common lang3 to the Spark Flume Sink doczsxwing2015-06-181-0/+6
| | | | | | | | | | Commons Lang 3 has been added as one of the dependencies of Spark Flume Sink since #5703. This PR updates the doc for it. Author: zsxwing <zsxwing@gmail.com> Closes #6829 from zsxwing/flume-sink-dep and squashes the following commits: f8617f0 [zsxwing] Add common lang3 to the Spark Flume Sink doc
* [SPARK-8353] [DOCS] Show anchor links when hovering over documentation headersJosh Rosen2015-06-184-28/+18
| | | | | | | | | | | | | | | | | | This patch uses [AnchorJS](https://bryanbraun.github.io/anchorjs/) to show deep anchor links when hovering over headers in the Spark documentation. For example: ![image](https://cloud.githubusercontent.com/assets/50748/8240800/1502f85c-15ba-11e5-819a-97b231370a39.png) This makes it easier for users to link to specific sections of the documentation. I also removed some dead Javascript which isn't used in our current docs (it was introduced for the old AMPCamp training, but isn't used anymore). Author: Josh Rosen <joshrosen@databricks.com> Closes #6808 from JoshRosen/SPARK-8353 and squashes the following commits: e59d8a7 [Josh Rosen] Suppress underline on hover f518b6a [Josh Rosen] Turn on for all headers, since we use H1s in a bunch of places a9fec01 [Josh Rosen] Add anchor links when hovering over headers; remove some dead JS code
* [SPARK-8320] [STREAMING] Add example in streaming programming guide that ↵Neelesh Srinivas Salian2015-06-181-0/+8
| | | | | | | | | | | | | | | | | | | shows union of multiple input streams Added python code to https://spark.apache.org/docs/latest/streaming-programming-guide.html to the Level of Parallelism in Data Receiving section. Please review and let me know if there are any additional changes that are needed. Thank you. Author: Neelesh Srinivas Salian <nsalian@cloudera.com> Closes #6862 from nssalian/SPARK-8320 and squashes the following commits: 4bfd126 [Neelesh Srinivas Salian] Changed loop structure to be more in line with Python style e5345de [Neelesh Srinivas Salian] Changes to kafak append, for loop and show to print() 3fc5c6d [Neelesh Srinivas Salian] SPARK-8320
* [SPARK-7961][SQL]Refactor SQLConf to display better error messagezsxwing2015-06-171-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 1. Add `SQLConfEntry` to store the information about a configuration. For those configurations that cannot be found in `sql-programming-guide.md`, I left the doc as `<TODO>`. 2. Verify the value when setting a configuration if this is in SQLConf. 3. Use `SET -v` to display all public configurations. Author: zsxwing <zsxwing@gmail.com> Closes #6747 from zsxwing/sqlconf and squashes the following commits: 7d09bad [zsxwing] Use SQLConfEntry in HiveContext 49f6213 [zsxwing] Add getConf, setConf to SQLContext and HiveContext e014f53 [zsxwing] Merge branch 'master' into sqlconf 93dad8e [zsxwing] Fix the unit tests cf950c1 [zsxwing] Fix the code style and tests 3c5f03e [zsxwing] Add unsetConf(SQLConfEntry) and fix the code style a2f4add [zsxwing] getConf will return the default value if a config is not set 037b1db [zsxwing] Add schema to SetCommand 0520c3c [zsxwing] Merge branch 'master' into sqlconf 7afb0ec [zsxwing] Fix the configurations about HiveThriftServer 7e728e3 [zsxwing] Add doc for SQLConfEntry and fix 'toString' 5e95b10 [zsxwing] Add enumConf c6ba76d [zsxwing] setRawString => setConfString, getRawString => getConfString 4abd807 [zsxwing] Fix the test for 'set -v' 6e47e56 [zsxwing] Fix the compilation error 8973ced [zsxwing] Remove floatConf 1fc3a8b [zsxwing] Remove the 'conf' command and use 'set -v' instead 99c9c16 [zsxwing] Fix tests that use SQLConfEntry as a string 88a03cc [zsxwing] Add new lines between confs and return types ce7c6c8 [zsxwing] Remove seqConf f3c1b33 [zsxwing] Refactor SQLConf to display better error message
* [SPARK-7605] [MLLIB] [PYSPARK] Python API for ElementwiseProductMechCoder2015-06-171-0/+22
| | | | | | | | | | | Python API for org.apache.spark.mllib.feature.ElementwiseProduct Author: MechCoder <manojkumarsivaraj334@gmail.com> Closes #6346 from MechCoder/spark-7605 and squashes the following commits: 79d1ef5 [MechCoder] Consistent and support list / array types 5f81d81 [MechCoder] [SPARK-7605] [MLlib] Python API for ElementwiseProduct
* [SPARK-8395] [DOCS] start-slave.sh docs incorrectSean Owen2015-06-171-1/+1
| | | | | | | | | | start-slave.sh no longer takes a worker # param in 1.4+ Author: Sean Owen <sowen@cloudera.com> Closes #6855 from srowen/SPARK-8395 and squashes the following commits: 300278e [Sean Owen] start-slave.sh no longer takes a worker # param in 1.4+
* [SQL] [DOC] improved a commentRadek Ostrowski2015-06-161-1/+1
| | | | | | | | | | | | [SQL][DOC] I found it a bit confusing when I came across it for the first time in the docs Author: Radek Ostrowski <dest.hawaii@gmail.com> Author: radek <radek@radeks-MacBook-Pro-2.local> Closes #6332 from radek1st/master and squashes the following commits: dae3347 [Radek Ostrowski] fixed typo c76bb3a [radek] improved a comment
* [SPARK-DOCS] [SPARK-SQL] Update sql-programming-guide.mdMoussa Taifi2015-06-161-1/+1
| | | | | | | | | | Typo in thriftserver section Author: Moussa Taifi <moutai10@gmail.com> Closes #6847 from moutai/patch-1 and squashes the following commits: 1bd29df [Moussa Taifi] Update sql-programming-guide.md
* fix read/write mixupPeter Hoffmann2015-06-141-2/+2
| | | | | | | | Author: Peter Hoffmann <ph@peter-hoffmann.com> Closes #6815 from hoffmann/patch-1 and squashes the following commits: 2abb6da [Peter Hoffmann] fix read/write mixup
* [Spark-8343] [Streaming] [Docs] Improve Spark Streaming Guides.Mike Dusenberry2015-06-132-140/+133
| | | | | | | | | | | This improves the Spark Streaming Guides by fixing broken links, rewording confusing sections, fixing typos, adding missing words, etc. Author: Mike Dusenberry <dusenberrymw@gmail.com> Closes #6801 from dusenberrymw/SPARK-8343_Improve_Spark_Streaming_Guides_MERGED and squashes the following commits: 6688090 [Mike Dusenberry] Improvements to the Spark Streaming Custom Receiver Guide, including slight rewording of confusing sections, and fixing typos & missing words. 436fbd8 [Mike Dusenberry] Bunch of improvements to the Spark Streaming Guide, including fixing broken links, slight rewording of confusing sections, fixing typos & missing words, etc.
* [SPARK-7284] [STREAMING] Updated streaming documentationTathagata Das2015-06-123-56/+50
| | | | | | | | | | | | | | | - Kinesis API updated - Kafka version updated, and Python API for Direct Kafka added - Added SQLContext.getOrCreate() - Added information on how to get partitionId in foreachRDD Author: Tathagata Das <tathagata.das1565@gmail.com> Closes #6781 from tdas/SPARK-7284 and squashes the following commits: aac7be0 [Tathagata Das] Added information on how to get partition id a66ec22 [Tathagata Das] Complete the line incomplete line, a92ca39 [Tathagata Das] Updated streaming documentation
* [SPARK-6511] [docs] Fix example command in hadoop-provided docs.Marcelo Vanzin2015-06-111-1/+1
| | | | | | | | Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #6766 from vanzin/SPARK-6511 and squashes the following commits: 49f0f67 [Marcelo Vanzin] [SPARK-6511] [docs] Fix example command in hadoop-provided docs.
* [SPARK-8282] [SPARKR] Make number of threads used in RBackend configurableHossein2015-06-101-0/+12
| | | | | | | | | | | | | | Read number of threads for RBackend from configuration. [SPARK-8282] #comment Linking with JIRA Author: Hossein <hossein@databricks.com> Closes #6730 from falaki/SPARK-8282 and squashes the following commits: 33b3d98 [Hossein] Documented new config parameter 70f2a9c [Hossein] Fixing import ec44225 [Hossein] Read number of threads for RBackend from configuration
* [SQL] [MINOR] Fixes a minor Java example error in SQL programming guideCheng Lian2015-06-101-1/+1
| | | | | | | | Author: Cheng Lian <lian@databricks.com> Closes #6749 from liancheng/java-sample-fix and squashes the following commits: 5b44585 [Cheng Lian] Fixes a minor Java example error in SQL programming guide
* [SPARK-6511] [DOCUMENTATION] Explain how to use Hadoop provided buildsPatrick Wendell2015-06-092-3/+33
| | | | | | | | | | | | | | | This provides preliminary documentation pointing out how to use the Hadoop free builds. I am hoping over time this list can grow to include most of the popular Hadoop distributions. Getting more people using these builds will help us long term reduce the number of binaries we build. Author: Patrick Wendell <patrick@databricks.com> Closes #6729 from pwendell/hadoop-provided and squashes the following commits: 1113b76 [Patrick Wendell] [SPARK-6511] [Documentation] Explain how to use Hadoop provided builds
* [SPARK-8274] [DOCUMENTATION-MLLIB] Fix wrong URLs in MLlib Frequent Pattern ↵FavioVazquez2015-06-091-2/+2
| | | | | | | | | | | | | | | | | | | Mining Documentation There is a mistake in the URLs of the Scala section of FP-Growth in the MLlib Frequent Pattern Mining documentation. The URL points to https://spark.apache.org/docs/latest/api/java/org/apache/spark/mllib/fpm/FPGrowth.html which is the Java's API, the link should point to the Scala API https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.mllib.fpm.FPGrowth There's another mistake in the FP-GrowthModel in the same section, the link points, again, to the Java's API https://spark.apache.org/docs/latest/api/java/org/apache/spark/mllib/fpm/FPGrowthModel.html, the link should point to the Scala API https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.mllib.fpm.FPGrowthModel Author: FavioVazquez <favio.vazquezp@gmail.com> Closes #6722 from FavioVazquez/fix-wrog-urls-mllib-fpgrowth and squashes the following commits: e1ca54d [FavioVazquez] - Fixed wrong URLs in MLlib Frequent Pattern Mining, FP-Growth Scala section ad882a3 [FavioVazquez] Merge remote-tracking branch 'upstream/master' f27a20b [FavioVazquez] Merge remote-tracking branch 'upstream/master' 9af7074 [FavioVazquez] Merge remote-tracking branch 'upstream/master' edab1ef [FavioVazquez] Merge remote-tracking branch 'upstream/master' b2e2f8c [FavioVazquez] Merge remote-tracking branch 'upstream/master'
* [STREAMING] [DOC] Remove duplicated description about WALKousuke Saruta2015-06-091-1/+1
| | | | | | | | | | | | | | | | | | I noticed there is a duplicated description about WAL. ``` To ensure zero-data loss, you have to additionally enable Write Ahead Logs in Spark Streaming. To ensure zero data loss, enable the Write Ahead Logs (introduced in Spark 1.2). ``` Let's remove the duplication. I don't file this issue in JIRA because it's minor. Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp> Closes #6719 from sarutak/remove-multiple-description and squashes the following commits: cc9bb21 [Kousuke Saruta] Removed duplicated description about WAL
* [SPARK-7939] [SQL] Add conf to enable/disable partition column type inferenceLiang-Chi Hsieh2015-06-081-1/+5
| | | | | | | | | | | | JIRA: https://issues.apache.org/jira/browse/SPARK-7939 Author: Liang-Chi Hsieh <viirya@gmail.com> Closes #6503 from viirya/disable_partition_type_inference and squashes the following commits: 3e90470 [Liang-Chi Hsieh] Default to enable type inference and update docs. 455edb1 [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into disable_partition_type_inference 9a57933 [Liang-Chi Hsieh] Add conf to enable/disable partition column type inference.
* [SPARK-4761] [DOC] [SQL] kryo default setting in SQL Thrift serverDaoyuan Wang2015-06-081-2/+5
| | | | | | | | | | | | this is a follow up of #3621 /cc liancheng pwendell Author: Daoyuan Wang <daoyuan.wang@intel.com> Closes #6639 from adrian-wang/kryodoc and squashes the following commits: 3c4b1cf [Daoyuan Wang] [DOC] kryo default setting in SQL Thrift server
* [SPARK-7733] [CORE] [BUILD] Update build, code to use Java 7 for 1.5.0+Sean Owen2015-06-073-7/+3
| | | | | | | | | | Update build to use Java 7, and remove some comments and special-case support for Java 6. Author: Sean Owen <sowen@cloudera.com> Closes #6265 from srowen/SPARK-7733 and squashes the following commits: 59bda4e [Sean Owen] Update build to use Java 7, and remove some comments and special-case support for Java 6
* [DOC] [TYPO] Fix typo in standalone deploy scripts descriptionYijie Shen2015-06-071-1/+1
| | | | | | | | Author: Yijie Shen <henry.yijieshen@gmail.com> Closes #6691 from yijieshen/patch-2 and squashes the following commits: b40a4b0 [Yijie Shen] [DOC][TYPO] Fix typo in standalone deploy scripts description
* [SPARK-7955] [CORE] Ensure executors with cached RDD blocks are not re…Hari Shreedharan2015-06-061-0/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | …moved if dynamic allocation is enabled. This is a work in progress. This patch ensures that an executor that has cached RDD blocks are not removed, but makes no attempt to find another executor to remove. This is meant to get some feedback on the current approach, and if it makes sense then I will look at choosing another executor to remove. No testing has been done either. Author: Hari Shreedharan <hshreedharan@apache.org> Closes #6508 from harishreedharan/dymanic-caching and squashes the following commits: dddf1eb [Hari Shreedharan] Minor configuration description update. 10130e2 [Hari Shreedharan] Fix compile issue. 5417b53 [Hari Shreedharan] Add documentation for new config. Remove block from cachedBlocks when it is dropped. 875916a [Hari Shreedharan] Make some code more readable. 39940ca [Hari Shreedharan] Handle the case where the executor has not yet registered. 90ad711 [Hari Shreedharan] Remove unused imports and unused methods. 063985c [Hari Shreedharan] Send correct message instead of recursively calling same method. ec2fd7e [Hari Shreedharan] Add file missed in last commit 5d10fad [Hari Shreedharan] Update cached blocks status using local info, rather than doing an RPC. 193af4c [Hari Shreedharan] WIP. Use local state rather than via RPC. ae932ff [Hari Shreedharan] Fix config param name. 272969d [Hari Shreedharan] Fix seconds to millis bug. 5a1993f [Hari Shreedharan] Add timeout for cache executors. Ignore broadcast blocks while checking if there are cached blocks. 57fefc2 [Hari Shreedharan] [SPARK-7955][Core] Ensure executors with cached RDD blocks are not removed if dynamic allocation is enabled.
* [SPARK-7747] [SQL] [DOCS] spark.sql.planner.externalSortLuca Martinetti2015-06-051-0/+7
| | | | | | | | | | Add documentation for spark.sql.planner.externalSort Author: Luca Martinetti <luca@luca.io> Closes #6272 from lucamartinetti/docs-externalsort and squashes the following commits: 985661b [Luca Martinetti] [SPARK-7747] [SQL] [DOCS] Add documentation for spark.sql.planner.externalSort
* [STREAMING] Update streaming-kafka-integration.mdAkhil Das2015-06-051-5/+5
| | | | | | | | | | Fixed the broken links (Examples) in the documentation. Author: Akhil Das <akhld@darktech.ca> Closes #6666 from akhld/patch-2 and squashes the following commits: 2228b83 [Akhil Das] Update streaming-kafka-integration.md
* [HOTFIX] History Server API docs error fix.Hari Shreedharan2015-06-031-1/+1
| | | | | | | | | | | Minor error in the monitoring docs. Also made indentation changes in `ApiRootResource` Author: Hari Shreedharan <hshreedharan@apache.org> Closes #6628 from harishreedharan/eventlog-formatting and squashes the following commits: a12553d [Hari Shreedharan] Javadoc updates. ca399b6 [Hari Shreedharan] [HOTFIX] History Server API docs error fix.
* [SPARK-7161] [HISTORY SERVER] Provide REST api to download event logs fro...Hari Shreedharan2015-06-031-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ...m History Server This PR adds a new API that allows the user to download event logs for an application as a zip file. APIs have been added to download all logs for a given application or just for a specific attempt. This also add an additional method to the ApplicationHistoryProvider to get the raw files, zipped. Author: Hari Shreedharan <hshreedharan@apache.org> Closes #5792 from harishreedharan/eventlog-download and squashes the following commits: 221cc26 [Hari Shreedharan] Update docs with new API information. a131be6 [Hari Shreedharan] Fix style issues. 5528bd8 [Hari Shreedharan] Merge branch 'master' into eventlog-download 6e8156e [Hari Shreedharan] Simplify tests, use Guava stream copy methods. d8ddede [Hari Shreedharan] Remove unnecessary case in EventLogDownloadResource. ffffb53 [Hari Shreedharan] Changed interface to use zip stream. Added more tests. 1100b40 [Hari Shreedharan] Ensure that `Path` does not appear in interfaces, by rafactoring interfaces. 5a5f3e2 [Hari Shreedharan] Fix test ordering issue. 0b66948 [Hari Shreedharan] Minor formatting/import fixes. 4fc518c [Hari Shreedharan] Fix rat failures. a48b91f [Hari Shreedharan] Refactor to make attemptId optional in the API. Also added tests. 0fc1424 [Hari Shreedharan] File download now works for individual attempts and the entire application. 350d7e8 [Hari Shreedharan] Merge remote-tracking branch 'asf/master' into eventlog-download fd6ab00 [Hari Shreedharan] Fix style issues 32b7662 [Hari Shreedharan] Use UIRoot directly in ApiRootResource. Also, use `Response` class to set headers. 7b362b2 [Hari Shreedharan] Almost working. 3d18ebc [Hari Shreedharan] [WIP] Try getting the event log download to work.
* [SPARK-7801] [BUILD] Updating versions to SPARK 1.5.0Patrick Wendell2015-06-031-2/+2
| | | | | | | | | | | | | Author: Patrick Wendell <patrick@databricks.com> Closes #6328 from pwendell/spark-1.5-update and squashes the following commits: 2f42d02 [Patrick Wendell] A few more excludes 4bebcf0 [Patrick Wendell] Update to RC4 61aaf46 [Patrick Wendell] Using new release candidate 55f1610 [Patrick Wendell] Another exclude 04b4f04 [Patrick Wendell] More issues with transient 1.4 changes 36f549b [Patrick Wendell] [SPARK-7801] [BUILD] Updating versions to SPARK 1.5.0
* [SPARK-8043] [MLLIB] [DOC] update NaiveBayes and SVM examples in docYuhao Yang2015-06-023-18/+14
| | | | | | | | | | | | | jira: https://issues.apache.org/jira/browse/SPARK-8043 I found some issues during testing the save/load examples in markdown Documents, as a part of 1.4 QA plan Author: Yuhao Yang <hhbyyh@gmail.com> Closes #6584 from hhbyyh/naiveDocExample and squashes the following commits: a01a206 [Yuhao Yang] fix for Gaussian mixture 2fb8b96 [Yuhao Yang] update NaiveBayes and SVM examples in doc
* [SPARK-7985] [ML] [MLlib] [Docs] Remove "fittingParamMap" references. ↵Mike Dusenberry2015-06-021-4/+4
| | | | | | | | | | | | | | | | | Updating ML Doc "Estimator, Transformer, and Param" examples. Updating ML Doc's *"Estimator, Transformer, and Param"* example to use `model.extractParamMap` instead of `model.fittingParamMap`, which no longer exists. mengxr, I believe this addresses (part of) the *update documentation* TODO list item from [PR 5820](https://github.com/apache/spark/pull/5820). Author: Mike Dusenberry <dusenberrymw@gmail.com> Closes #6514 from dusenberrymw/Fix_ML_Doc_Estimator_Transformer_Param_Example and squashes the following commits: 6366e1f [Mike Dusenberry] Updating instances of model.extractParamMap to model.parent.extractParamMap, since the Params of the parent Estimator could possibly differ from thos of the Model. d850e0e [Mike Dusenberry] Removing all references to "fittingParamMap" throughout Spark, since it has been removed. 0480304 [Mike Dusenberry] Updating the ML Doc "Estimator, Transformer, and Param" Java example to use model.extractParamMap() instead of model.fittingParamMap(), which no longer exists. 7d34939 [Mike Dusenberry] Updating ML Doc "Estimator, Transformer, and Param" example to use model.extractParamMap instead of model.fittingParamMap, which no longer exists.
* [SPARK-7582] [MLLIB] user guide for StringIndexerXiangrui Meng2015-06-011-0/+116
| | | | | | | | | | | | | This PR adds a Java unit test and user guide for `StringIndexer`. I put it before `OneHotEncoder` because they are closely related. jkbradley Author: Xiangrui Meng <meng@databricks.com> Closes #6561 from mengxr/SPARK-7582 and squashes the following commits: 4bba4f1 [Xiangrui Meng] fix example ba1cd1b [Xiangrui Meng] fix style 7fa18d1 [Xiangrui Meng] add user guide for StringIndexer 136cb93 [Xiangrui Meng] add a Java unit test for StringIndexer
* [SPARK-7584] [MLLIB] User guide for VectorAssemblerXiangrui Meng2015-06-011-0/+114
| | | | | | | | | | | | | | | | This PR adds a section in the user guide for `VectorAssembler` with code examples in Python/Java/Scala. It also adds a unit test in Java. jkbradley Author: Xiangrui Meng <meng@databricks.com> Closes #6556 from mengxr/SPARK-7584 and squashes the following commits: 11313f6 [Xiangrui Meng] simplify Java example 0cd47f3 [Xiangrui Meng] update user guide fd36292 [Xiangrui Meng] update Java unit test ce61ca0 [Xiangrui Meng] add Java unit test for VectorAssembler e399942 [Xiangrui Meng] scala/python example code
* [DOC] Minor modification to Streaming docs with regards to parallel data ↵Nishkam Ravi2015-06-011-4/+4
| | | | | | | | | | | | | | receiving pwendell tdas Author: Nishkam Ravi <nravi@cloudera.com> Author: nishkamravi2 <nishkamravi@gmail.com> Author: nravi <nravi@c1704.halxg.cloudera.com> Closes #6544 from nishkamravi2/master_nravi and squashes the following commits: 46e8c03 [Nishkam Ravi] Slight modification to streaming docs
* [SPARK-7949] [MLLIB] [DOC] update document with some missing save/loadYuhao Yang2015-05-313-6/+38
| | | | | | | | | | | | | | | | add save load for examples: KMeansModel PowerIterationClusteringModel Word2VecModel IsotonicRegressionModel Author: Yuhao Yang <hhbyyh@gmail.com> Closes #6498 from hhbyyh/docSaveLoad and squashes the following commits: 7f9f06d [Yuhao Yang] add missing imports c604cad [Yuhao Yang] Merge remote-tracking branch 'upstream/master' into docSaveLoad 1dd77cc [Yuhao Yang] update document with some missing save/load
* Update documentation for the new DataFrame reader/writer interface.Reynold Xin2015-05-301-60/+66
| | | | | | | | | Author: Reynold Xin <rxin@databricks.com> Closes #6522 from rxin/sql-doc-1.4 and squashes the following commits: c227be7 [Reynold Xin] Updated link. 040b6d7 [Reynold Xin] Update documentation for the new DataFrame reader/writer interface.
* [SPARK-7920] [MLLIB] Make MLlib ChiSqSelector Serializable (& Fix Related ↵Mike Dusenberry2015-05-301-0/+1
| | | | | | | | | | | | | | | Documentation Example). The MLlib ChiSqSelector class is not serializable, and so the example in the ChiSqSelector documentation fails. Also, that example is missing the import of ChiSqSelector. This PR makes ChiSqSelector extend Serializable in MLlib, and adds the ChiSqSelector import statement to the associated example in the documentation. Author: Mike Dusenberry <dusenberrymw@gmail.com> Closes #6462 from dusenberrymw/Make_ChiSqSelector_Serializable_and_Fix_Related_Docs_Example and squashes the following commits: 9cb2f94 [Mike Dusenberry] Make MLlib ChiSqSelector Serializable. d9003bf [Mike Dusenberry] Add missing import in MLlib ChiSqSelector Docs Scala example.