aboutsummaryrefslogtreecommitdiff
path: root/docs
Commit message (Collapse)AuthorAgeFilesLines
* [SPARK-12570][ML][DOC] DecisionTreeRegressor: provide variance of ↵Yanbo Liang2016-01-051-1/+10
| | | | | | | | | | | | prediction: user guide update Update user guide doc for ```DecisionTreeRegressor``` providing variance of prediction. cc jkbradley Author: Yanbo Liang <ybliang8@gmail.com> Closes #10594 from yanboliang/spark-12570.
* [SPARKR][DOC] minor doc update for version in migration guidefelixcheung2016-01-051-3/+3
| | | | | | | | | checked that the change is in Spark 1.6.0. shivaram Author: felixcheung <felixcheung_m@hotmail.com> Closes #10574 from felixcheung/rwritemodedoc.
* [SPARK-12579][SQL] Force user-specified JDBC driver to take precedenceJosh Rosen2016-01-041-3/+1
| | | | | | | | | | | | | | | | Spark SQL's JDBC data source allows users to specify an explicit JDBC driver to load (using the `driver` argument), but in the current code it's possible that the user-specified driver will not be used when it comes time to actually create a JDBC connection. In a nutshell, the problem is that you might have multiple JDBC drivers on the classpath that claim to be able to handle the same subprotocol, so simply registering the user-provided driver class with the our `DriverRegistry` and JDBC's `DriverManager` is not sufficient to ensure that it's actually used when creating the JDBC connection. This patch addresses this issue by first registering the user-specified driver with the DriverManager, then iterating over the driver manager's loaded drivers in order to obtain the correct driver and use it to create a connection (previously, we just called `DriverManager.getConnection()` directly). If a user did not specify a JDBC driver to use, then we call `DriverManager.getDriver` to figure out the class of the driver to use, then pass that class's name to executors; this guards against corner-case bugs in situations where the driver and executor JVMs might have different sets of JDBC drivers on their classpaths (previously, there was the (rare) potential for `DriverManager.getConnection()` to use different drivers on the driver and executors if the user had not explicitly specified a JDBC driver class and the classpaths were different). This patch is inspired by a similar patch that I made to the `spark-redshift` library (https://github.com/databricks/spark-redshift/pull/143), which contains its own modified fork of some of Spark's JDBC data source code (for cross-Spark-version compatibility reasons). Author: Josh Rosen <joshrosen@databricks.com> Closes #10519 from JoshRosen/jdbc-driver-precedence.
* [SPARK-12588] Remove HttpBroadcast in Spark 2.0.Reynold Xin2015-12-302-28/+4
| | | | | | | | We switched to TorrentBroadcast in Spark 1.1, and HttpBroadcast has been undocumented since then. It's time to remove it in Spark 2.0. Author: Reynold Xin <rxin@databricks.com> Closes #10531 from rxin/SPARK-12588.
* [SPARK-12429][STREAMING][DOC] Add Accumulator and Broadcast example for ↵Shixiong Zhu2015-12-222-3/+168
| | | | | | | | | | Streaming This PR adds Scala, Java and Python examples to show how to use Accumulator and Broadcast in Spark Streaming to support checkpointing. Author: Shixiong Zhu <shixiong@databricks.com> Closes #10385 from zsxwing/accumulator-broadcast-example.
* [SPARK-12487][STREAMING][DOCUMENT] Add docs for Kafka message handlerShixiong Zhu2015-12-221-0/+3
| | | | | | Author: Shixiong Zhu <shixiong@databricks.com> Closes #10439 from zsxwing/kafka-message-handler-doc.
* [SPARK-11807] Remove support for Hadoop < 2.2Reynold Xin2015-12-211-14/+4
| | | | | | | | i.e. Hadoop 1 and Hadoop 2.0 Author: Reynold Xin <rxin@databricks.com> Closes #10404 from rxin/SPARK-11807.
* [SPARK-12388] change default compression to lz4Davies Liu2015-12-211-1/+1
| | | | | | | | | | | | | | According the benchmark [1], LZ4-java could be 80% (or 30%) faster than Snappy. After changing the compressor to LZ4, I saw 20% improvement on end-to-end time for a TPCDS query (Q4). [1] https://github.com/ning/jvm-compressor-benchmark/wiki cc rxin Author: Davies Liu <davies@databricks.com> Closes #10342 from davies/lz4.
* [SPARK-11808] Remove Bagel.Reynold Xin2015-12-192-160/+0
| | | | | | Author: Reynold Xin <rxin@databricks.com> Closes #10395 from rxin/SPARK-11808.
* Bump master version to 2.0.0-SNAPSHOT.Reynold Xin2015-12-191-2/+2
| | | | | | Author: Reynold Xin <rxin@databricks.com> Closes #10387 from rxin/version-bump.
* [SPARK-12091] [PYSPARK] Deprecate the JAVA-specific deserialized storage levelsgatorsmile2015-12-182-7/+10
| | | | | | | | | | | | | | The current default storage level of Python persist API is MEMORY_ONLY_SER. This is different from the default level MEMORY_ONLY in the official document and RDD APIs. davies Is this inconsistency intentional? Thanks! Updates: Since the data is always serialized on the Python side, the storage levels of JAVA-specific deserialization are not removed, such as MEMORY_ONLY. Updates: Based on the reviewers' feedback. In Python, stored objects will always be serialized with the [Pickle](https://docs.python.org/2/library/pickle.html) library, so it does not matter whether you choose a serialized level. The available storage levels in Python include `MEMORY_ONLY`, `MEMORY_ONLY_2`, `MEMORY_AND_DISK`, `MEMORY_AND_DISK_2`, `DISK_ONLY`, `DISK_ONLY_2` and `OFF_HEAP`. Author: gatorsmile <gatorsmile@gmail.com> Closes #10092 from gatorsmile/persistStorageLevel.
* [SPARK-11985][STREAMING][KINESIS][DOCS] Update Kinesis docsBurak Yavuz2015-12-181-9/+45
| | | | | | | | | | - Provide example on `message handler` - Provide bit on KPL record de-aggregation - Fix typos Author: Burak Yavuz <brkyvz@gmail.com> Closes #9970 from brkyvz/kinesis-docs.
* [SPARK-11608][MLLIB][DOC] Added migration guide for MLlib 1.6Joseph K. Bradley2015-12-162-15/+42
| | | | | | | | | | No known breaking changes, but some deprecations and changes of behavior. CC: mengxr Author: Joseph K. Bradley <joseph@databricks.com> Closes #10235 from jkbradley/mllib-guide-update-1.6.
* [SPARK-6518][MLLIB][EXAMPLE][DOC] Add example code and user guide for ↵Yu ISHIKAWA2015-12-162-0/+36
| | | | | | | | | | | bisecting k-means This PR includes only an example code in order to finish it quickly. I'll send another PR for the docs soon. Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com> Closes #9952 from yu-iskw/SPARK-6518.
* [SPARK-12215][ML][DOC] User guide section for KMeans in spark.mlYu ISHIKAWA2015-12-161-0/+71
| | | | | | | | cc jkbradley Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com> Closes #10244 from yu-iskw/SPARK-12215.
* [SPARK-12318][SPARKR] Save mode in SparkR should be error by defaultJeff Zhang2015-12-161-1/+8
| | | | | | | | shivaram Please help review. Author: Jeff Zhang <zjffdu@apache.org> Closes #10290 from zjffdu/SPARK-12318.
* [SPARK-12324][MLLIB][DOC] Fixes the sidebar in the ML documentationTimothy Hunter2015-12-163-33/+141
| | | | | | | | | | | | | | | | | | | | | | | This fixes the sidebar, using a pure CSS mechanism to hide it when the browser's viewport is too narrow. Credit goes to the original author Titan-C (mentioned in the NOTICE). Note that I am not a CSS expert, so I can only address comments up to some extent. Default view: <img width="936" alt="screen shot 2015-12-14 at 12 46 39 pm" src="https://cloud.githubusercontent.com/assets/7594753/11793597/6d1d6eda-a261-11e5-836b-6eb2054e9054.png"> When collapsed manually by the user: <img width="1004" alt="screen shot 2015-12-14 at 12 54 02 pm" src="https://cloud.githubusercontent.com/assets/7594753/11793669/c991989e-a261-11e5-8bf6-aecf3bdb6319.png"> Disappears when column is too narrow: <img width="697" alt="screen shot 2015-12-14 at 12 47 22 pm" src="https://cloud.githubusercontent.com/assets/7594753/11793607/7754dbcc-a261-11e5-8b15-e0d074b0e47c.png"> Can still be opened by the user if necessary: <img width="651" alt="screen shot 2015-12-14 at 12 51 15 pm" src="https://cloud.githubusercontent.com/assets/7594753/11793612/7bf82968-a261-11e5-9cc3-e827a7a6b2b0.png"> Author: Timothy Hunter <timhunter@databricks.com> Closes #10297 from thunterdb/12324.
* [SPARK-10123][DEPLOY] Support specifying deploy mode from configurationjerryshao2015-12-151-3/+12
| | | | | | | | Please help to review, thanks a lot. Author: jerryshao <sshao@hortonworks.com> Closes #10195 from jerryshao/SPARK-10123.
* [SPARK-12351][MESOS] Add documentation about submitting Spark with mesos ↵Timothy Chen2015-12-152-6/+35
| | | | | | | | | | cluster mode. Adding more documentation about submitting jobs with mesos cluster mode. Author: Timothy Chen <tnachen@gmail.com> Closes #10086 from tnachen/mesos_supervise_docs.
* [MINOR][DOC] Fix broken word2vec linkBenFradet2015-12-141-1/+1
| | | | | | | | Follow-up of [SPARK-12199](https://issues.apache.org/jira/browse/SPARK-12199) and #10193 where a broken link has been left as is. Author: BenFradet <benjamin.fradet@gmail.com> Closes #10282 from BenFradet/SPARK-12199.
* [SPARK-12199][DOC] Follow-up: Refine example code in ml-features.mdXusen Yin2015-12-121-11/+11
| | | | | | | | | | | | https://issues.apache.org/jira/browse/SPARK-12199 Follow-up PR of SPARK-11551. Fix some errors in ml-features.md mengxr Author: Xusen Yin <yinxusen@gmail.com> Closes #10193 from yinxusen/SPARK-12199.
* [SPARK-12217][ML] Document invalid handling for StringIndexerBenFradet2015-12-111-0/+36
| | | | | | | | | | Added a paragraph regarding StringIndexer#setHandleInvalid to the ml-features documentation. I wonder if I should also add a snippet to the code example, input welcome. Author: BenFradet <benjamin.fradet@gmail.com> Closes #10257 from BenFradet/SPARK-12217.
* [SPARK-11964][DOCS][ML] Add in Pipeline Import/Export Documentationanabranch2015-12-111-0/+13
| | | | | | | | | Adding in Pipeline Import and Export Documentation. Author: anabranch <wac.chambers@gmail.com> Author: Bill Chambers <wchambers@ischool.berkeley.edu> Closes #10179 from anabranch/master.
* [STREAMING][DOC][MINOR] Update the description of direct Kafka stream docjerryshao2015-12-101-1/+1
| | | | | | | | | | With the merge of [SPARK-8337](https://issues.apache.org/jira/browse/SPARK-8337), now the Python API has the same functionalities compared to Scala/Java, so here changing the description to make it more precise. zsxwing tdas , please review, thanks a lot. Author: jerryshao <sshao@hortonworks.com> Closes #10246 from jerryshao/direct-kafka-doc-update.
* [SPARK-12251] Document and improve off-heap memory configurationsJosh Rosen2015-12-101-0/+16
| | | | | | | | | | | | | This patch adds documentation for Spark configurations that affect off-heap memory and makes some naming and validation improvements for those configs. - Change `spark.memory.offHeapSize` to `spark.memory.offHeap.size`. This is fine because this configuration has not shipped in any Spark release yet (it's new in Spark 1.6). - Deprecated `spark.unsafe.offHeap` in favor of a new `spark.memory.offHeap.enabled` configuration. The motivation behind this change is to gather all memory-related configurations under the same prefix. - Add a check which prevents users from setting `spark.memory.offHeap.enabled=true` when `spark.memory.offHeap.size == 0`. After SPARK-11389 (#9344), which was committed in Spark 1.6, Spark enforces a hard limit on the amount of off-heap memory that it will allocate to tasks. As a result, enabling off-heap execution memory without setting `spark.memory.offHeap.size` will lead to immediate OOMs. The new configuration validation makes this scenario easier to diagnose, helping to avoid user confusion. - Document these configurations on the configuration page. Author: Josh Rosen <joshrosen@databricks.com> Closes #10237 from JoshRosen/SPARK-12251.
* [SPARK-11563][CORE][REPL] Use RpcEnv to transfer REPL-generated classes.Marcelo Vanzin2015-12-102-16/+0
| | | | | | | | | | | | | | | This avoids bringing up yet another HTTP server on the driver, and instead reuses the file server already managed by the driver's RpcEnv. As a bonus, the repl now inherits the security features of the network library. There's also a small change to create the directory for storing classes under the root temp dir for the application (instead of directly under java.io.tmpdir). Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #9923 from vanzin/SPARK-11563.
* [SPARK-12212][ML][DOC] Clarifies the difference between spark.ml, ↵Timothy Hunter2015-12-1031-1793/+149
| | | | | | | | | | | | spark.mllib and mllib in the documentation. Replaces a number of occurences of `MLlib` in the documentation that were meant to refer to the `spark.mllib` package instead. It should clarify for new users the difference between `spark.mllib` (the package) and MLlib (the umbrella project for ML in spark). It also removes some files that I forgot to delete with #10207 Author: Timothy Hunter <timhunter@databricks.com> Closes #10234 from thunterdb/12212.
* [SPARK-11678][SQL][DOCS] Document basePath in the programming guide.Yin Huai2015-12-091-0/+7
| | | | | | | | | | | | | This PR adds document for `basePath`, which is a new parameter used by `HadoopFsRelation`. The compiled doc is shown below. ![image](https://cloud.githubusercontent.com/assets/2072857/11673132/1ba01192-9dcb-11e5-98d9-ac0b4e92e98c.png) JIRA: https://issues.apache.org/jira/browse/SPARK-11678 Author: Yin Huai <yhuai@databricks.com> Closes #10211 from yhuai/basePathDoc.
* [SPARK-12211][DOC][GRAPHX] Fix version number in graphx doc for migration ↵Andrew Ray2015-12-091-1/+1
| | | | | | | | | | from 1.1 Migration from 1.1 section added to the GraphX doc in 1.2.0 (see https://spark.apache.org/docs/1.2.0/graphx-programming-guide.html#migrating-from-spark-11) uses \{{site.SPARK_VERSION}} as the version where changes were introduced, it should be just 1.2. Author: Andrew Ray <ray.andrew@gmail.com> Closes #10206 from aray/graphx-doc-1.1-migration.
* [SPARK-11551][DOC] Replace example code in ml-features.md using include_exampleXusen Yin2015-12-091-1061/+51
| | | | | | | | | PR on behalf of somideshmukh, thanks! Author: Xusen Yin <yinxusen@gmail.com> Author: somideshmukh <somilde@us.ibm.com> Closes #10219 from yinxusen/SPARK-11551.
* [SPARK-8517][ML][DOC] Reorganizes the spark.ml user guideTimothy Hunter2015-12-088-81/+1752
| | | | | | | | | | This PR moves pieces of the spark.ml user guide to reflect suggestions in SPARK-8517. It does not introduce new content, as requested. <img width="192" alt="screen shot 2015-12-08 at 11 36 00 am" src="https://cloud.githubusercontent.com/assets/7594753/11666166/e82b84f2-9d9f-11e5-8904-e215424d8444.png"> Author: Timothy Hunter <timhunter@databricks.com> Closes #10207 from thunterdb/spark-8517.
* [SPARK-12069][SQL] Update documentation with DatasetsMichael Armbrust2015-12-083-100/+172
| | | | | | Author: Michael Armbrust <michael@databricks.com> Closes #10060 from marmbrus/docs.
* [SPARK-12159][ML] Add user guide section for IndexToString transformerBenFradet2015-12-081-16/+88
| | | | | | | | Documentation regarding the `IndexToString` label transformer with code snippets in Scala/Java/Python. Author: BenFradet <benjamin.fradet@gmail.com> Closes #10166 from BenFradet/SPARK-12159.
* [SPARK-11551][DOC][EXAMPLE] Revert PR #10002Cheng Lian2015-12-081-51/+1058
| | | | | | | | | | This reverts PR #10002, commit 78209b0ccaf3f22b5e2345dfb2b98edfdb746819. The original PR wasn't tested on Jenkins before being merged. Author: Cheng Lian <lian@databricks.com> Closes #10200 from liancheng/revert-pr-10002.
* [SPARK-11958][SPARK-11957][ML][DOC] SQLTransformer user guide and example codeYanbo Liang2015-12-071-0/+59
| | | | | | | | Add ```SQLTransformer``` user guide, example code and make Scala API doc more clear. Author: Yanbo Liang <ybliang8@gmail.com> Closes #10006 from yanboliang/spark-11958.
* [SPARK-11551][DOC][EXAMPLE] Replace example code in ml-features.md using ↵somideshmukh2015-12-071-1058/+51
| | | | | | | | | | | | | include_example Made new patch contaning only markdown examples moved to exmaple/folder. Ony three java code were not shfted since they were contaning compliation error ,these classes are 1)StandardScale 2)NormalizerExample 3)VectorIndexer Author: Xusen Yin <yinxusen@gmail.com> Author: somideshmukh <somilde@us.ibm.com> Closes #10002 from somideshmukh/SomilBranch1.33.
* [SPARK-11963][DOC] Add docs for QuantileDiscretizerXusen Yin2015-12-071-0/+65
| | | | | | | | https://issues.apache.org/jira/browse/SPARK-11963 Author: Xusen Yin <yinxusen@gmail.com> Closes #9962 from yinxusen/SPARK-11963.
* [SPARK-12080][CORE] Kryo - Support multiple user registratorsrotems2015-12-041-2/+2
| | | | | | Author: rotems <roter> Closes #10078 from Botnaim/KryoMultipleCustomRegistrators.
* [SPARK-12116][SPARKR][DOCS] document how to workaround function name ↵felixcheung2015-12-031-1/+2
| | | | | | | | | | conflicts with dplyr shivaram Author: felixcheung <felixcheung_m@hotmail.com> Closes #10119 from felixcheung/rdocdplyrmasked.
* [DOCUMENTATION][MLLIB] typo in mllib docJeff Zhang2015-12-031-1/+1
| | | | | | | | \cc mengxr Author: Jeff Zhang <zjffdu@apache.org> Closes #10093 from zjffdu/mllib_typo.
* [SPARK-12081] Make unified memory manager work with small heapsAndrew Or2015-12-012-3/+3
| | | | | | | | | | The existing `spark.memory.fraction` (default 0.75) gives the system 25% of the space to work with. For small heaps, this is not enough: e.g. default 1GB leaves only 250MB system memory. This is especially a problem in local mode, where the driver and executor are crammed in the same JVM. Members of the community have reported driver OOM's in such cases. **New proposal.** We now reserve 300MB before taking the 75%. For 1GB JVMs, this leaves `(1024 - 300) * 0.75 = 543MB` for execution and storage. This is proposal (1) listed in the [JIRA](https://issues.apache.org/jira/browse/SPARK-12081). Author: Andrew Or <andrew@databricks.com> Closes #10081 from andrewor14/unified-memory-small-heaps.
* [SPARK-11961][DOC] Add docs of ChiSqSelectorXusen Yin2015-12-011-0/+50
| | | | | | | | https://issues.apache.org/jira/browse/SPARK-11961 Author: Xusen Yin <yinxusen@gmail.com> Closes #9965 from yinxusen/SPARK-11961.
* [SPARK-11821] Propagate Kerberos keytab for all environmentswoj-i2015-12-012-5/+6
| | | | | | | | | andrewor14 the same PR as in branch 1.5 harishreedharan Author: woj-i <wojciechindyk@gmail.com> Closes #9859 from woj-i/master.
* [HOTFIX][SPARK-12000] Add missing quotes in Jekyll API docs plugin.Josh Rosen2015-11-301-1/+1
| | | | I accidentally omitted these as part of #10049.
* [SPARK-12035] Add more debug information in include_example tag of JekyllXusen Yin2015-11-301-4/+6
| | | | | | | | | | https://issues.apache.org/jira/browse/SPARK-12035 When we debuging lots of example code files, like in https://github.com/apache/spark/pull/10002, it's hard to know which file causes errors due to limited information in `include_example.rb`. With their filenames, we can locate bugs easily. Author: Xusen Yin <yinxusen@gmail.com> Closes #10026 from yinxusen/SPARK-12035.
* [SPARK-12000] Fix API doc generation issuesJosh Rosen2015-11-301-3/+3
| | | | | | | | | | | | | | | This pull request fixes multiple issues with API doc generation. - Modify the Jekyll plugin so that the entire doc build fails if API docs cannot be generated. This will make it easy to detect when the doc build breaks, since this will now trigger Jenkins failures. - Change how we handle the `-target` compiler option flag in order to fix `javadoc` generation. - Incorporate doc changes from thunterdb (in #10048). Closes #10048. Author: Josh Rosen <joshrosen@databricks.com> Author: Timothy Hunter <timhunter@databricks.com> Closes #10049 from JoshRosen/fix-doc-build.
* [SPARK-11960][MLLIB][DOC] User guide for streaming testsFeynman Liang2015-11-302-0/+26
| | | | | | | | CC jkbradley mengxr josepablocam Author: Feynman Liang <feynman.liang@gmail.com> Closes #10005 from feynmanliang/streaming-test-user-guide.
* [SPARK-11689][ML] Add user guide and example code for LDA under spark.mlYuhao Yang2015-11-303-1/+34
| | | | | | | | | | | | | | jira: https://issues.apache.org/jira/browse/SPARK-11689 Add simple user guide for LDA under spark.ml and example code under examples/. Use include_example to include example code in the user guide markdown. Check SPARK-11606 for instructions. Original PR is reverted due to document build error. https://github.com/apache/spark/pull/9722 mengxr feynmanliang yinxusen Sorry for the troubling. Author: Yuhao Yang <hhbyyh@gmail.com> Closes #9974 from hhbyyh/ldaMLExample.
* [MINOR][DOCS] fixed list display in ml-ensemblesBenFradet2015-11-301-0/+1
| | | | | | | | | | | | The list in ml-ensembles.md wasn't properly formatted and, as a result, was looking like this: ![old](http://i.imgur.com/2ZhELLR.png) This PR aims to make it look like this: ![new](http://i.imgur.com/0Xriwd2.png) Author: BenFradet <benjamin.fradet@gmail.com> Closes #10025 from BenFradet/ml-ensembles-doc.
* doc typo: "classificaion" -> "classification"muxator2015-11-261-1/+1
| | | | | | Author: muxator <muxator@users.noreply.github.com> Closes #10008 from muxator/patch-1.