spark - Mirror of Apache Spark

	Commit message (Collapse)	Author	Age	Files	Lines
*	[SPARK-1680][DOCS] Explain environment variables for running on YARN in ↵	Andrew	2016-01-27	1	-0/+2
\| \| \| \| \| \| \| \| \| \|	cluster mode JIRA 1680 added a property called spark.yarn.appMasterEnv. This PR draws users' attention to this special case by adding an explanation in configuration.html#environment-variables Author: Andrew <weiner.andrew.j@gmail.com> Closes #10869 from weineran/branch-yarn-docs.
*	[SPARK-7997][CORE] Remove Akka from Spark Core and Streaming	Shixiong Zhu	2016-01-22	1	-61/+4
\| \| \| \| \| \| \| \| \| \| \| \|	- Remove Akka dependency from core. Note: the streaming-akka project still uses Akka. - Remove HttpFileServer - Remove Akka configs from SparkConf and SSLOptions - Rename `spark.akka.frameSize` to `spark.rpc.message.maxSize`. I think it's still worth to keep this config because using `DirectTaskResult` or `IndirectTaskResult` depends on it. - Update comments and docs Author: Shixiong Zhu <shixiong@databricks.com> Closes #10854 from zsxwing/remove-akka.
*	[SPARK-12534][DOC] update documentation to list command line equivalent to ↵	felixcheung	2016-01-21	1	-5/+5
\| \| \| \| \| \| \| \| \| \|	properties Several Spark properties equivalent to Spark submit command line options are missing. Author: felixcheung <felixcheung_m@hotmail.com> Closes #10491 from felixcheung/sparksubmitdoc.
*	[SPARK-2750][WEB UI] Add https support to the Web UI	scwf	2016-01-19	1	-0/+22
\| \| \| \| \| \| \| \| \|	Author: scwf <wangfei1@huawei.com> Author: Marcelo Vanzin <vanzin@cloudera.com> Author: WangTaoTheTonic <wangtao111@huawei.com> Author: w00228970 <wangfei1@huawei.com> Closes #10238 from vanzin/SPARK-2750.
*	[SPARK-12507][STREAMING][DOCUMENT] Expose closeFileAfterWrite and ↵	Shixiong Zhu	2016-01-07	1	-0/+18
\| \| \| \| \| \| \| \| \| \|	allowBatching configurations for Streaming /cc tdas brkyvz Author: Shixiong Zhu <shixiong@databricks.com> Closes #10453 from zsxwing/streaming-conf.
*	[DOC] fix 'spark.memory.offHeap.enabled' default value to false	zzcclp	2016-01-06	1	-1/+1
\| \| \| \| \| \| \| \|	modify 'spark.memory.offHeap.enabled' default value to false Author: zzcclp <xm_zzc@sina.com> Closes #10633 from zzcclp/fix_spark.memory.offHeap.enabled_default_value.
*	[SPARK-7689] Remove TTL-based metadata cleaning in Spark 2.0	Josh Rosen	2016-01-06	1	-11/+0
\| \| \| \| \| \| \| \| \| \| \| \|	This PR removes `spark.cleaner.ttl` and the associated TTL-based metadata cleaning code. Now that we have the `ContextCleaner` and a timer to trigger periodic GCs, I don't think that `spark.cleaner.ttl` is necessary anymore. The TTL-based cleaning isn't enabled by default, isn't included in our end-to-end tests, and has been a source of user confusion when it is misconfigured. If the TTL is set too low, data which is still being used may be evicted / deleted, leading to hard to diagnose bugs. For all of these reasons, I think that we should remove this functionality in Spark 2.0. Additional benefits of doing this include marginally reduced memory usage, since we no longer need to store timetsamps in hashmaps, and a handful fewer threads. Author: Josh Rosen <joshrosen@databricks.com> Closes #10534 from JoshRosen/remove-ttl-based-cleaning.
*	[SPARK-12588] Remove HttpBroadcast in Spark 2.0.	Reynold Xin	2015-12-30	1	-17/+2
\| \| \| \| \| \| \| \|	We switched to TorrentBroadcast in Spark 1.1, and HttpBroadcast has been undocumented since then. It's time to remove it in Spark 2.0. Author: Reynold Xin <rxin@databricks.com> Closes #10531 from rxin/SPARK-12588.
*	[SPARK-12388] change default compression to lz4	Davies Liu	2015-12-21	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	According the benchmark [1], LZ4-java could be 80% (or 30%) faster than Snappy. After changing the compressor to LZ4, I saw 20% improvement on end-to-end time for a TPCDS query (Q4). [1] https://github.com/ning/jvm-compressor-benchmark/wiki cc rxin Author: Davies Liu <davies@databricks.com> Closes #10342 from davies/lz4.
*	[SPARK-12091] [PYSPARK] Deprecate the JAVA-specific deserialized storage levels	gatorsmile	2015-12-18	1	-3/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	The current default storage level of Python persist API is MEMORY_ONLY_SER. This is different from the default level MEMORY_ONLY in the official document and RDD APIs. davies Is this inconsistency intentional? Thanks! Updates: Since the data is always serialized on the Python side, the storage levels of JAVA-specific deserialization are not removed, such as MEMORY_ONLY. Updates: Based on the reviewers' feedback. In Python, stored objects will always be serialized with the [Pickle](https://docs.python.org/2/library/pickle.html) library, so it does not matter whether you choose a serialized level. The available storage levels in Python include `MEMORY_ONLY`, `MEMORY_ONLY_2`, `MEMORY_AND_DISK`, `MEMORY_AND_DISK_2`, `DISK_ONLY`, `DISK_ONLY_2` and `OFF_HEAP`. Author: gatorsmile <gatorsmile@gmail.com> Closes #10092 from gatorsmile/persistStorageLevel.
*	[SPARK-10123][DEPLOY] Support specifying deploy mode from configuration	jerryshao	2015-12-15	1	-3/+12
\| \| \| \| \| \| \| \|	Please help to review, thanks a lot. Author: jerryshao <sshao@hortonworks.com> Closes #10195 from jerryshao/SPARK-10123.
*	[SPARK-12251] Document and improve off-heap memory configurations	Josh Rosen	2015-12-10	1	-0/+16
\| \| \| \| \| \| \| \| \| \| \| \| \|	This patch adds documentation for Spark configurations that affect off-heap memory and makes some naming and validation improvements for those configs. - Change `spark.memory.offHeapSize` to `spark.memory.offHeap.size`. This is fine because this configuration has not shipped in any Spark release yet (it's new in Spark 1.6). - Deprecated `spark.unsafe.offHeap` in favor of a new `spark.memory.offHeap.enabled` configuration. The motivation behind this change is to gather all memory-related configurations under the same prefix. - Add a check which prevents users from setting `spark.memory.offHeap.enabled=true` when `spark.memory.offHeap.size == 0`. After SPARK-11389 (#9344), which was committed in Spark 1.6, Spark enforces a hard limit on the amount of off-heap memory that it will allocate to tasks. As a result, enabling off-heap execution memory without setting `spark.memory.offHeap.size` will lead to immediate OOMs. The new configuration validation makes this scenario easier to diagnose, helping to avoid user confusion. - Document these configurations on the configuration page. Author: Josh Rosen <joshrosen@databricks.com> Closes #10237 from JoshRosen/SPARK-12251.
*	[SPARK-11563][CORE][REPL] Use RpcEnv to transfer REPL-generated classes.	Marcelo Vanzin	2015-12-10	1	-8/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This avoids bringing up yet another HTTP server on the driver, and instead reuses the file server already managed by the driver's RpcEnv. As a bonus, the repl now inherits the security features of the network library. There's also a small change to create the directory for storing classes under the root temp dir for the application (instead of directly under java.io.tmpdir). Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #9923 from vanzin/SPARK-11563.
*	[SPARK-12080][CORE] Kryo - Support multiple user registrators	rotems	2015-12-04	1	-2/+2
\| \| \| \| \| \|	Author: rotems <roter> Closes #10078 from Botnaim/KryoMultipleCustomRegistrators.
*	[SPARK-12081] Make unified memory manager work with small heaps	Andrew Or	2015-12-01	1	-2/+2
\| \| \| \| \| \| \| \| \| \|	The existing `spark.memory.fraction` (default 0.75) gives the system 25% of the space to work with. For small heaps, this is not enough: e.g. default 1GB leaves only 250MB system memory. This is especially a problem in local mode, where the driver and executor are crammed in the same JVM. Members of the community have reported driver OOM's in such cases. New proposal. We now reserve 300MB before taking the 75%. For 1GB JVMs, this leaves `(1024 - 300) * 0.75 = 543MB` for execution and storage. This is proposal (1) listed in the [JIRA](https://issues.apache.org/jira/browse/SPARK-12081). Author: Andrew Or <andrew@databricks.com> Closes #10081 from andrewor14/unified-memory-small-heaps.
*	[DOCUMENTATION] Fix minor doc error	Jeff Zhang	2015-11-25	1	-1/+1
\| \| \| \| \| \|	Author: Jeff Zhang <zjffdu@apache.org> Closes #9956 from zjffdu/dev_typo.
*	[SPARK-11140][CORE] Transfer files using network lib when using NettyRpcEnv.	Marcelo Vanzin	2015-11-23	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This change abstracts the code that serves jars / files to executors so that each RpcEnv can have its own implementation; the akka version uses the existing HTTP-based file serving mechanism, while the netty versions uses the new stream support added to the network lib, which makes file transfers benefit from the easier security configuration of the network library, and should also reduce overhead overall. The change includes a small fix to TransportChannelHandler so that it propagates user events to downstream handlers. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #9530 from vanzin/SPARK-11140.
*	[SPARK-11710] Document new memory management model	Andrew Or	2015-11-16	1	-5/+8
\| \| \| \| \| \|	Author: Andrew Or <andrew@databricks.com> Closes #9676 from andrewor14/memory-management-docs.
*	[MINOR][DOCS] typo in docs/configuration.md	Kai Jiang	2015-11-14	1	-5/+5
\| \| \| \| \| \| \| \| \| \| \|	`<\code>` end tag missing backslash in docs/configuration.md{L308-L339} ref #8795 Author: Kai Jiang <jiangkai@gmail.com> Closes #9715 from vectorijk/minor-typo-docs.
*	[SPARK-11305][DOCS] Remove Third-Party Hadoop Distributions Doc Page	Sean Owen	2015-11-01	1	-0/+15
\| \| \| \| \| \| \| \| \| \|	Remove Hadoop third party distro page, and move Hadoop cluster config info to configuration page CC pwendell Author: Sean Owen <sowen@cloudera.com> Closes #9298 from srowen/SPARK-11305.
*	[SPARK-10971][SPARKR] RRunner should allow setting path to Rscript.	Sun Rui	2015-10-23	1	-0/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add a new spark conf option "spark.sparkr.r.driver.command" to specify the executable for an R script in client modes. The existing spark conf option "spark.sparkr.r.command" is used to specify the executable for an R script in cluster modes for both driver and workers. See also [launch R worker script](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/api/r/RRDD.scala#L395). BTW, [envrionment variable "SPARKR_DRIVER_R"](https://github.com/apache/spark/blob/master/launcher/src/main/java/org/apache/spark/launcher/SparkSubmitCommandBuilder.java#L275) is used to locate R shell on the local host. For your information, PYSPARK has two environment variables serving simliar purpose: PYSPARK_PYTHON Python binary executable to use for PySpark in both driver and workers (default is `python`). PYSPARK_DRIVER_PYTHON Python binary executable to use for PySpark in driver only (default is PYSPARK_PYTHON). pySpark use the code [here](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/PythonRunner.scala#L41) to determine the python executable for a python script. Author: Sun Rui <rui.sun@intel.com> Closes #9179 from sun-rui/SPARK-10971.
*	[SPARK-10708] Consolidate sort shuffle implementations	Josh Rosen	2015-10-22	1	-5/+2
\| \| \| \| \| \| \| \|	There's a lot of duplication between SortShuffleManager and UnsafeShuffleManager. Given that these now provide the same set of functionality, now that UnsafeShuffleManager supports large records, I think that we should replace SortShuffleManager's serialized shuffle implementation with UnsafeShuffleManager's and should merge the two managers together. Author: Josh Rosen <joshrosen@databricks.com> Closes #8829 from JoshRosen/consolidate-sort-shuffle-implementations.
*	[SPARK-11039][Documentation][Web UI] Document additional ui configurations	Nick Pritchard	2015-10-15	1	-0/+14
\| \| \| \| \| \| \| \| \| \|	Add documentation for configuration: - spark.sql.ui.retainedExecutions - spark.streaming.ui.retainedBatches Author: Nick Pritchard <nicholas.pritchard@falkonry.com> Closes #9052 from pnpritchard/SPARK-11039.
*	[SPARK-10983] Unified memory manager	Andrew Or	2015-10-13	1	-29/+70
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch unifies the memory management of the storage and execution regions such that either side can borrow memory from each other. When memory pressure arises, storage will be evicted in favor of execution. To avoid regressions in cases where storage is crucial, we dynamically allocate a fraction of space for storage that execution cannot evict. Several configurations are introduced: - spark.memory.fraction (default 0.75): fraction of the heap space used for execution and storage. The lower this is, the more frequently spills and cached data eviction occur. The purpose of this config is to set aside memory for internal metadata, user data structures, and imprecise size estimation in the case of sparse, unusually large records. - spark.memory.storageFraction (default 0.5): size of the storage region within the space set aside by `spark.memory.fraction`. Cached data may only be evicted if total storage exceeds this region. - spark.memory.useLegacyMode (default false): whether to use the memory management that existed in Spark 1.5 and before. This is mainly for backward compatibility. For a detailed description of the design, see [SPARK-10000](https://issues.apache.org/jira/browse/SPARK-10000). This patch builds on top of the `MemoryManager` interface introduced in #9000. Author: Andrew Or <andrew@databricks.com> Closes #9084 from andrewor14/unified-memory-manager.
*	Akka framesize units should be specified	admackin	2015-10-08	1	-1/+1
\| \| \| \| \| \| \| \|	1.4 docs noted that the units were MB - i have assumed this is still the case Author: admackin <admackin@users.noreply.github.com> Closes #9025 from admackin/master.
*	add doc for spark.streaming.stopGracefullyOnShutdown	Bin Wang	2015-09-27	1	-0/+8
\| \| \| \| \| \|	Author: Bin Wang <wbin00@gmail.com> Closes #8898 from wb14123/doc.
*	[SPARK-10676] [DOCS] Add documentation for SASL encryption options.	Marcelo Vanzin	2015-09-21	1	-0/+16
\| \| \| \| \| \|	Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #8803 from vanzin/SPARK-10676.
*	[SPARK-10662] [DOCS] Code snippets are not properly formatted in tables	Jacek Laskowski	2015-09-21	1	-49/+48
\| \| \| \| \| \| \| \| \| \|	* Backticks are processed properly in Spark Properties table * Removed unnecessary spaces * See http://people.apache.org/~pwendell/spark-nightly/spark-master-docs/latest/running-on-yarn.html Author: Jacek Laskowski <jacek.laskowski@deepsense.io> Closes #8795 from jaceklaskowski/docs-yarn-formatting.
*	[SPARK-10710] Remove ability to disable spilling in core and SQL	Josh Rosen	2015-09-19	1	-11/+3
\| \| \| \| \| \| \| \| \| \|	It does not make much sense to set `spark.shuffle.spill` or `spark.sql.planner.externalSort` to false: I believe that these configurations were initially added as "escape hatches" to guard against bugs in the external operators, but these operators are now mature and well-tested. In addition, these configurations are not handled in a consistent way anymore: SQL's Tungsten codepath ignores these configurations and will continue to use spilling operators. Similarly, Spark Core's `tungsten-sort` shuffle manager does not respect `spark.shuffle.spill=false`. This pull request removes these configurations, adds warnings at the appropriate places, and deletes a large amount of code which was only used in code paths that did not support spilling. Author: Josh Rosen <joshrosen@databricks.com> Closes #8831 from JoshRosen/remove-ability-to-disable-spilling.
*	[SPARK-9808] Remove hash shuffle file consolidation.	Reynold Xin	2015-09-18	1	-10/+0
\| \| \| \| \| \|	Author: Reynold Xin <rxin@databricks.com> Closes #8812 from rxin/SPARK-9808-1.
*	[SPARK-10514] [MESOS] waiting for min no of total cores acquired by Spark by ↵	Akash Mishra	2015-09-10	1	-2/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	implementing the sufficientResourcesRegistered method spark.scheduler.minRegisteredResourcesRatio configuration parameter works for YARN mode but not for Mesos Coarse grained mode. If the parameter specified default value of 0 will be set for spark.scheduler.minRegisteredResourcesRatio in base class and this method will always return true. There are no existing test for YARN mode too. Hence not added test for the same. Author: Akash Mishra <akash.mishra20@gmail.com> Closes #8672 from SleepyThread/master.
*	[SPARK-10469] [DOC] Try and document the three options	Holden Karau	2015-09-10	1	-3/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	From JIRA: Add documentation for tungsten-sort. From the mailing list "I saw a new "spark.shuffle.manager=tungsten-sort" implemented in https://issues.apache.org/jira/browse/SPARK-7081, but it can't be found its corresponding description in http://people.apache.org/~pwendell/spark-releases/spark-1.5.0-rc3-docs/configuration.html(Currenlty there are only 'sort' and 'hash' two options)." Author: Holden Karau <holden@pigscanfly.ca> Closes #8638 from holdenk/SPARK-10469-document-tungsten-sort.
*	[SPARK-10492] [STREAMING] [DOCUMENTATION] Update Streaming documentation ↵	Tathagata Das	2015-09-08	1	-0/+13
\| \| \| \| \| \| \| \| \| \|	about rate limiting and backpressure Author: Tathagata Das <tathagata.das1565@gmail.com> Closes #8656 from tdas/SPARK-10492 and squashes the following commits: 986cdd6 [Tathagata Das] Added information on backpressure
*	[SPARK-9767] Remove ConnectionManager.	Reynold Xin	2015-09-07	1	-11/+0
\| \| \| \| \| \| \| \|	We introduced the Netty network module for shuffle in Spark 1.2, and has turned it on by default for 3 releases. The old ConnectionManager is difficult to maintain. If we merge the patch now, by the time it is released, it would be 1 yr for which ConnectionManager is off by default. It's time to remove it. Author: Reynold Xin <rxin@databricks.com> Closes #8161 from rxin/SPARK-9767.
*	[SPARK-10432] spark.port.maxRetries documentation is unclear	Tom Graves	2015-09-03	1	-1/+5
\| \| \| \| \| \|	Author: Tom Graves <tgraves@yahoo-inc.com> Closes #8585 from tgravescs/SPARK-10432.
*	[SPARK-4223] [CORE] Support * in acls.	zhuol	2015-09-01	1	-3/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	SPARK-4223. Currently we support setting view and modify acls but you have to specify a list of users. It would be nice to support * meaning all users have access. Manual tests to verify that: "*" works for any user in: a. Spark ui: view and kill stage. Done. b. Spark history server. Done. c. Yarn application killing. Done. Author: zhuol <zhuol@yahoo-inc.com> Closes #8398 from zhuoliu/4223.
*	[SPARK-10315] remove document on spark.akka.failure-detector.threshold	CodingCat	2015-08-27	1	-10/+0
\| \| \| \| \| \| \| \| \| \|	https://issues.apache.org/jira/browse/SPARK-10315 this parameter is not used any longer and there is some mistake in the current document , should be 'akka.remote.watch-failure-detector.threshold' Author: CodingCat <zhunansjtu@gmail.com> Closes #8483 from CodingCat/SPARK_10315.
*	[SPARK-9705] [DOC] fix docs about Python version	Davies Liu	2015-08-18	1	-1/+5
\| \| \| \| \| \| \| \|	cc JoshRosen Author: Davies Liu <davies@databricks.com> Closes #8245 from davies/python_doc.
*	[SPARK-9934] Deprecate NIO ConnectionManager.	Reynold Xin	2015-08-14	1	-1/+2
\| \| \| \| \| \| \| \|	Deprecate NIO ConnectionManager in Spark 1.5.0, before removing it in Spark 1.6.0. Author: Reynold Xin <rxin@databricks.com> Closes #8162 from rxin/SPARK-9934.
*	[SPARK-9641] [DOCS] spark.shuffle.service.port is not documented	Sean Owen	2015-08-06	1	-0/+19
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Document spark.shuffle.service.{enabled,port} CC sryza tgravescs This is pretty minimal; is there more to say here about the service? Author: Sean Owen <sowen@cloudera.com> Closes #7991 from srowen/SPARK-9641 and squashes the following commits: 3bb946e [Sean Owen] Add link to docs for setup and config of external shuffle service 2302e01 [Sean Owen] Document spark.shuffle.service.{enabled,port}
*	[SPARK-9202] capping maximum number of executor&driver information kept in ↵	CodingCat	2015-07-31	1	-0/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Worker https://issues.apache.org/jira/browse/SPARK-9202 Author: CodingCat <zhunansjtu@gmail.com> Closes #7714 from CodingCat/SPARK-9202 and squashes the following commits: 23977fb [CodingCat] add comments about why we don't synchronize finishedExecutors & finishedDrivers dc9772d [CodingCat] addressing the comments e125241 [CodingCat] stylistic fix 80bfe52 [CodingCat] fix JsonProtocolSuite d7d9485 [CodingCat] styistic fix and respect insert ordering 031755f [CodingCat] add license info & stylistic fix c3b5361 [CodingCat] test cases and docs c557b3a [CodingCat] applications are fine 9cac751 [CodingCat] application is fine... ad87ed7 [CodingCat] trimFinishedExecutorsAndDrivers
*	[SPARK-9327] [DOCS] Fix documentation about classpath config options.	Marcelo Vanzin	2015-07-28	1	-2/+2
\| \| \| \| \| \| \| \|	Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #7651 from vanzin/SPARK-9327 and squashes the following commits: 2923e23 [Marcelo Vanzin] [SPARK-9327] [docs] Fix documentation about classpath config options.
*	[SPARK-9144] Remove DAGScheduler.runLocallyWithinThread and ↵	Josh Rosen	2015-07-22	1	-9/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	spark.localExecution.enabled Spark has an option called spark.localExecution.enabled; according to the docs: > Enables Spark to run certain jobs, such as first() or take() on the driver, without sending tasks to the cluster. This can make certain jobs execute very quickly, but may require shipping a whole partition of data to the driver. This feature ends up adding quite a bit of complexity to DAGScheduler, especially in the runLocallyWithinThread method, but as far as I know nobody uses this feature (I searched the mailing list and haven't seen any recent mentions of the configuration nor stacktraces including the runLocally method). As a step towards scheduler complexity reduction, I propose that we remove this feature and all code related to it for Spark 1.5. This pull request simply brings #7484 up to date. Author: Josh Rosen <joshrosen@databricks.com> Author: Reynold Xin <rxin@databricks.com> Closes #7585 from rxin/remove-local-exec and squashes the following commits: 84bd10e [Reynold Xin] Python fix. 1d9739a [Reynold Xin] Merge pull request #7484 from JoshRosen/remove-localexecution eec39fa [Josh Rosen] Remove allowLocal(); deprecate user-facing uses of it. b0835dc [Josh Rosen] Remove local execution code in DAGScheduler 8975d96 [Josh Rosen] Remove local execution tests. ffa8c9b [Josh Rosen] Remove documentation for configuration
*	[SPARK-9244] Increase some memory defaults	Matei Zaharia	2015-07-22	1	-9/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There are a few memory limits that people hit often and that we could make higher, especially now that memory sizes have grown. - spark.akka.frameSize: This defaults at 10 but is often hit for map output statuses in large shuffles. This memory is not fully allocated up-front, so we can just make this larger and still not affect jobs that never sent a status that large. We increase it to 128. - spark.executor.memory: Defaults at 512m, which is really small. We increase it to 1g. Author: Matei Zaharia <matei@databricks.com> Closes #7586 from mateiz/configs and squashes the following commits: ce0038a [Matei Zaharia] [SPARK-9244] Increase some memory defaults
*	[SPARK-9010] [DOCUMENTATION] Improve the Spark Configuration document about ↵	zhaishidan	2015-07-14	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	`spark.kryoserializer.buffer` The meaning of spark.kryoserializer.buffer should be "Initial size of Kryo's serialization buffer. Note that there will be one buffer per core on each worker. This buffer will grow up to spark.kryoserializer.buffer.max if needed.". The spark.kryoserializer.buffer.max.mb is out-of-date in spark 1.4. Author: zhaishidan <zhaishidan@haizhi.com> Closes #7393 from stanzhai/master and squashes the following commits: 69729ef [zhaishidan] fix document error about spark.kryoserializer.buffer.max.mb
*	[SPARK-8958] Dynamic allocation: change cached timeout to infinity	Andrew Or	2015-07-10	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \|	pwendell and I discussed this a little more offline and concluded that it would be good to keep it more conservative. Losing cached blocks may be very expensive and we should only allow it if the user knows what he/she is doing. FYI harishreedharan sryza. Author: Andrew Or <andrew@databricks.com> Closes #7329 from andrewor14/da-cached-timeout and squashes the following commits: cef0b4e [Andrew Or] Change timeout to infinity
*	[SPARK-8927] [DOCS] Format wrong for some config descriptions	Jonathan Alter	2015-07-09	1	-2/+2
\| \| \| \| \| \| \| \| \| \|	A couple descriptions were not inside `<td></td>` and were being displayed immediately under the section title instead of in their row. Author: Jonathan Alter <jonalter@users.noreply.github.com> Closes #7292 from jonalter/docs-config and squashes the following commits: 5ce1570 [Jonathan Alter] [DOCS] Format wrong for some config descriptions
*	[SPARK-3071] Increase default driver memory	Ilya Ganelin	2015-07-01	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I've updated default values in comments, documentation, and in the command line builder to be 1g based on comments in the JIRA. I've also updated most usages to point at a single variable defined in the Utils.scala and JavaUtils.java files. This wasn't possible in all cases (R, shell scripts etc.) but usage in most code is now pointing at the same place. Please let me know if I've missed anything. Will the spark-shell use the value within the command line builder during instantiation? Author: Ilya Ganelin <ilya.ganelin@capitalone.com> Closes #7132 from ilganeli/SPARK-3071 and squashes the following commits: 4074164 [Ilya Ganelin] String fix 271610b [Ilya Ganelin] Merge branch 'SPARK-3071' of github.com:ilganeli/spark into SPARK-3071 273b6e9 [Ilya Ganelin] Test fix fd67721 [Ilya Ganelin] Update JavaUtils.java 26cc177 [Ilya Ganelin] test fix e5db35d [Ilya Ganelin] Fixed test failure 39732a1 [Ilya Ganelin] merge fix a6f7deb [Ilya Ganelin] Created default value for DRIVER MEM in Utils that's now used in almost all locations instead of setting manually in each 09ad698 [Ilya Ganelin] Update SubmitRestProtocolSuite.scala 19b6f25 [Ilya Ganelin] Missed one doc update 2698a3d [Ilya Ganelin] Updated default value for driver memory
*	[SQL] [DOC] improved a comment	Radek Ostrowski	2015-06-16	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \|	[SQL][DOC] I found it a bit confusing when I came across it for the first time in the docs Author: Radek Ostrowski <dest.hawaii@gmail.com> Author: radek <radek@radeks-MacBook-Pro-2.local> Closes #6332 from radek1st/master and squashes the following commits: dae3347 [Radek Ostrowski] fixed typo c76bb3a [radek] improved a comment
*	[SPARK-8282] [SPARKR] Make number of threads used in RBackend configurable	Hossein	2015-06-10	1	-0/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Read number of threads for RBackend from configuration. [SPARK-8282] #comment Linking with JIRA Author: Hossein <hossein@databricks.com> Closes #6730 from falaki/SPARK-8282 and squashes the following commits: 33b3d98 [Hossein] Documented new config parameter 70f2a9c [Hossein] Fixing import ec44225 [Hossein] Read number of threads for RBackend from configuration