| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
|
| |
JIRA link: https://issues.apache.org/jira/browse/SPARK-11729
Author: Xusen Yin <yinxusen@gmail.com>
Closes #9713 from yinxusen/SPARK-11729.
|
|
|
|
|
|
|
|
|
|
| |
This PR adds a new option `spark.sql.hive.thriftServer.singleSession` for disabling multi-session support in the Thrift server.
Note that this option is added as a Spark configuration (retrieved from `SparkConf`) rather than Spark SQL configuration (retrieved from `SQLConf`). This is because all SQL configurations are session-ized. Since multi-session support is by default on, no JDBC connection can modify global configurations like the newly added one.
Author: Cheng Lian <lian@databricks.com>
Closes #9740 from liancheng/spark-11089.single-session-option.
|
|
|
|
|
|
|
|
| |
MESOS_NATIVE_LIBRARY was renamed in favor of MESOS_NATIVE_JAVA_LIBRARY. This commit fixes the reference in the documentation.
Author: Philipp Hoffmann <mail@philipphoffmann.de>
Closes #9768 from philipphoffmann/patch-2.
|
|
|
|
|
|
|
|
|
|
|
| |
In the **[Task Launching Overheads](http://spark.apache.org/docs/latest/streaming-programming-guide.html#task-launching-overheads)** section,
>Task Serialization: Using Kryo serialization for serializing tasks can reduce the task sizes, and therefore reduce the time taken to send them to the slaves.
as we known **Task Serialization** is configuration by **spark.closure.serializer** parameter, but currently only the Java serializer is supported. If we set **spark.closure.serializer** to **org.apache.spark.serializer.KryoSerializer**, then this will throw a exception.
Author: yangping.wu <wyphao.2007@163.com>
Closes #9734 from 397090770/397090770-patch-1.
|
|
|
|
|
|
| |
Author: Andrew Or <andrew@databricks.com>
Closes #9676 from andrewor14/memory-management-docs.
|
|
|
|
|
|
|
|
|
|
|
| |
`<\code>` end tag missing backslash in
docs/configuration.md{L308-L339}
ref #8795
Author: Kai Jiang <jiangkai@gmail.com>
Closes #9715 from vectorijk/minor-typo-docs.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
https://issues.apache.org/jira/browse/SPARK-11336
mengxr I add a hyperlink of Spark on Github and a hint of their existences in Spark code repo in each code example. I remove the config key for changing the example code dir, since we assume all examples should be in spark/examples.
The hyperlink, though we cannot use it now, since the Spark v1.6.0 has not been released yet, can be used after the release. So it is not a problem.
I add some screen shots, so you can get an instant feeling.
<img width="949" alt="screen shot 2015-10-27 at 10 47 18 pm" src="https://cloud.githubusercontent.com/assets/2637239/10780634/bd20e072-7cfc-11e5-8960-def4fc62a8ea.png">
<img width="1144" alt="screen shot 2015-10-27 at 10 47 31 pm" src="https://cloud.githubusercontent.com/assets/2637239/10780636/c3f6e180-7cfc-11e5-80b2-233589f4a9a3.png">
Author: Xusen Yin <yinxusen@gmail.com>
Closes #9320 from yinxusen/SPARK-11336.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
MLUtils.loadLibSVMFile to load DataFrame
Use LibSVM data source rather than MLUtils.loadLibSVMFile to load DataFrame, include:
* Use libSVM data source for all example codes under examples/ml, and remove unused import.
* Use libSVM data source for user guides under ml-*** which were omitted by #8697.
* Fix bug: We should use ```sqlContext.read().format("libsvm").load(path)``` at Java side, but the API doc and user guides misuse as ```sqlContext.read.format("libsvm").load(path)```.
* Code cleanup.
mengxr
Author: Yanbo Liang <ybliang8@gmail.com>
Closes #9690 from yanboliang/spark-11723.
|
|
|
|
|
|
|
|
|
|
|
| |
include_example
I have made the required changes and tested.
Kindly review the changes.
Author: Rishabh Bhardwaj <rbnext29@gmail.com>
Closes #9407 from rishabhbhardwaj/SPARK-11445.
|
|
|
|
|
|
|
|
|
|
| |
Perceptron Classification
Add Python example code for Multilayer Perceptron Classification, and make example code in user guide document testable. mengxr
Author: Yanbo Liang <ybliang8@gmail.com>
Closes #9594 from yanboliang/spark-11629.
|
|
|
|
|
|
|
|
| |
managers
Author: Andrew Or <andrew@databricks.com>
Closes #9637 from andrewor14/update-da-docs.
|
|
|
|
|
|
|
|
| |
<img width="931" alt="screen shot 2015-11-11 at 1 53 21 pm" src="https://cloud.githubusercontent.com/assets/2133137/11108261/35d183d4-889a-11e5-9572-85e9d6cebd26.png">
Author: Andrew Or <andrew@databricks.com>
Closes #9638 from andrewor14/fix-kryo-docs.
|
|
|
|
|
|
|
|
|
|
|
|
| |
offset ranges for a KafkaRDD
tdas koeninger
This updates the Spark Streaming + Kafka Integration Guide doc with a working method to access the offsets of a `KafkaRDD` through Python.
Author: Nick Evans <me@nicolasevans.org>
Closes #9289 from manygrams/update_kafka_direct_python_docs.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
classes
This patch modifies Spark's closure cleaner (and a few other places) to use ASM 5, which is necessary in order to support cleaning of closures that were compiled by Java 8.
In order to avoid ASM dependency conflicts, Spark excludes ASM from all of its dependencies and uses a shaded version of ASM 4 that comes from `reflectasm` (see [SPARK-782](https://issues.apache.org/jira/browse/SPARK-782) and #232). This patch updates Spark to use a shaded version of ASM 5.0.4 that was published by the Apache XBean project; the POM used to create the shaded artifact can be found at https://github.com/apache/geronimo-xbean/blob/xbean-4.4/xbean-asm5-shaded/pom.xml.
http://movingfulcrum.tumblr.com/post/80826553604/asm-framework-50-the-missing-migration-guide was a useful resource while upgrading the code to use the new ASM5 opcodes.
I also added a new regression tests in the `java8-tests` subproject; the existing tests were insufficient to catch this bug, which only affected Scala 2.11 user code which was compiled targeting Java 8.
Author: Josh Rosen <joshrosen@databricks.com>
Closes #9512 from JoshRosen/SPARK-6152.
|
|
|
|
|
|
|
|
| |
include_example
Author: Pravin Gadakh <pravingadakh177@gmail.com>
Closes #9516 from pravingadakh/SPARK-11550.
|
|
|
|
|
|
|
|
|
|
|
|
| |
include_example
https://issues.apache.org/jira/browse/SPARK-11382
B.T.W. I fix an error in naive_bayes_example.py.
Author: Xusen Yin <yinxusen@gmail.com>
Closes #9596 from yinxusen/SPARK-11382.
|
|
|
|
|
|
|
|
| |
This fix is to add one line to explain the current behavior of Spark SQL when writing Parquet files. All columns are forced to be nullable for compatibility reasons.
Author: gatorsmile <gatorsmile@gmail.com>
Closes #9314 from gatorsmile/lossNull.
|
|
|
|
|
|
|
|
|
|
| |
mllib-collaborative-filtering.md using include_example
Kindly review the changes.
Author: Rishabh Bhardwaj <rbnext29@gmail.com>
Closes #9519 from rishabhbhardwaj/SPARK-11337.
|
|
|
|
|
|
|
|
|
|
| |
include_example]
I have tested it on my local, it is working fine, please review
Author: sachin aggarwal <different.sachin@gmail.com>
Closes #9539 from agsachin/SPARK-11552-real.
|
|
|
|
|
|
| |
Author: Bharat Lal <bharat.iisc@gmail.com>
Closes #9560 from bharatl/SPARK-11581.
|
|
|
|
|
|
|
|
|
|
|
| |
1) kafkaStreams is a list. The list should be unpacked when passing it into the streaming context union method, which accepts a variable number of streams.
2) print() should be pprint() for pyspark.
This contribution is my original work, and I license the work to the project under the project's open source license.
Author: chriskang90 <jckang@uchicago.edu>
Closes #9545 from c-kang/streaming_python_typo.
|
|
|
|
|
|
|
|
| |
Add user guide and example code for ```AFTSurvivalRegression```.
Author: Yanbo Liang <ybliang8@gmail.com>
Closes #9491 from yanboliang/spark-10689.
|
|
|
|
|
|
|
|
| |
It doesn't show up as a hyperlink currently. It will show up as a hyperlink after this change.
Author: Rohit Agarwal <mindprince@gmail.com>
Closes #9544 from mindprince/patch-2.
|
|
|
|
|
|
|
|
| |
Doc change to align with HiveConf default in terms of where to create `warehouse` directory.
Author: xin Wu <xinwu@us.ibm.com>
Closes #9365 from xwu0226/spark-10046-commit.
|
|
|
|
|
|
|
|
| |
This snippet seems to be mistakenly introduced at two places in #5348.
Author: Rohit Agarwal <mindprince@gmail.com>
Closes #9540 from mindprince/patch-1.
|
|
|
|
|
|
|
|
|
|
| |
generation documentation
Fix Python example to use normalRDD as advertised
Author: Sean Owen <sowen@cloudera.com>
Closes #9529 from srowen/SPARK-11476.
|
|
|
|
|
|
|
|
| |
We should use ```coefficients``` rather than ```weights``` in user guide that freshman can get the right conventional name at the outset. mengxr vectorijk
Author: Yanbo Liang <ybliang8@gmail.com>
Closes #9493 from yanboliang/docs-coefficients.
|
|
|
|
|
|
|
|
| |
Spark should build against Scala 2.10.5, since that includes a fix for Scaladoc that will fix doc snapshot publishing: https://issues.scala-lang.org/browse/SI-8479
Author: Josh Rosen <joshrosen@databricks.com>
Closes #9450 from JoshRosen/upgrade-to-scala-2.10.5.
|
|
|
|
|
|
| |
Author: Wenchen Fan <wenchen@databricks.com>
Closes #9467 from cloud-fan/doc.
|
|
|
|
|
|
|
|
| |
The trim_codeblock(lines) function in include_example.rb removes some blank lines in the code.
Author: Xusen Yin <yinxusen@gmail.com>
Closes #9400 from yinxusen/SPARK-11443.
|
|
|
|
|
|
|
|
|
| |
using include_example
Author: Pravin Gadakh <pravingadakh177@gmail.com>
Author: Pravin Gadakh <prgadakh@in.ibm.com>
Closes #9340 from pravingadakh/SPARK-11380.
|
|
|
|
|
|
|
| |
Author: lewuathe <lewuathe@me.com>
Author: Lewuathe <lewuathe@me.com>
Closes #9394 from Lewuathe/missing-link-to-R-dataframe.
|
|
|
|
|
|
|
|
|
|
| |
![image](https://cloud.githubusercontent.com/assets/8969467/10871746/612ba44a-80a4-11e5-99a0-40b9931dee52.png)
(This is without css, but you get the idea)
shivaram
Author: felixcheung <felixcheung_m@hotmail.com>
Closes #9401 from felixcheung/rstudioprogrammingguide.
|
|
|
|
|
|
|
|
|
|
|
| |
mllib-naive-bayes.md/mllib-isotonic-regression.md using include_example
I have made the required changes in mllib-naive-bayes.md/mllib-isotonic-regression.md and also verified them.
Kindle Review it.
Author: Rishabh Bhardwaj <rbnext29@gmail.com>
Closes #9353 from rishabhbhardwaj/SPARK-11383.
|
|
|
|
|
|
|
|
|
|
| |
Remove Hadoop third party distro page, and move Hadoop cluster config info to configuration page
CC pwendell
Author: Sean Owen <sowen@cloudera.com>
Closes #9298 from srowen/SPARK-11305.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
from R programmatically or from RStudio
Mapping spark.driver.memory from sparkEnvir to spark-submit commandline arguments.
shivaram suggested that we possibly add other spark.driver.* properties - do we want to add all of those? I thought those could be set in SparkConf?
sun-rui
Author: felixcheung <felixcheung_m@hotmail.com>
Closes #9290 from felixcheung/rdrivermem.
|
|
|
|
|
|
| |
Author: tedyu <yuzhihong@gmail.com>
Closes #9281 from tedyu/master.
|
|
|
|
|
|
|
|
| |
Recall by threshold snippet was using "precisionByThreshold"
Author: Mageswaran.D <mageswaran1989@gmail.com>
Closes #9333 from Mageswaran1989/Typo_in_mllib-evaluation-metrics.md.
|
|
|
|
|
|
|
|
|
|
| |
mengxr https://issues.apache.org/jira/browse/SPARK-11297
Add new code tags to hold the same look and feel with previous documents.
Author: Xusen Yin <yinxusen@gmail.com>
Closes #9265 from yinxusen/SPARK-11297.
|
|
|
|
|
|
|
|
|
|
|
|
| |
include_example
mengxr https://issues.apache.org/jira/browse/SPARK-11289
I make some changes in ML feature extractors. I.e. TF-IDF, Word2Vec, and CountVectorizer. I add new example code in spark/examples, hope it is the right place to add those examples.
Author: Xusen Yin <yinxusen@gmail.com>
Closes #9266 from yinxusen/SPARK-11289.
|
|
|
|
|
|
|
|
| |
The SQL programming guide's link to the DataFrame functions reference points to the wrong location; this patch fixes that.
Author: Josh Rosen <joshrosen@databricks.com>
Closes #9269 from JoshRosen/SPARK-11299.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add a new spark conf option "spark.sparkr.r.driver.command" to specify the executable for an R script in client modes.
The existing spark conf option "spark.sparkr.r.command" is used to specify the executable for an R script in cluster modes for both driver and workers. See also [launch R worker script](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/api/r/RRDD.scala#L395).
BTW, [envrionment variable "SPARKR_DRIVER_R"](https://github.com/apache/spark/blob/master/launcher/src/main/java/org/apache/spark/launcher/SparkSubmitCommandBuilder.java#L275) is used to locate R shell on the local host.
For your information, PYSPARK has two environment variables serving simliar purpose:
PYSPARK_PYTHON Python binary executable to use for PySpark in both driver and workers (default is `python`).
PYSPARK_DRIVER_PYTHON Python binary executable to use for PySpark in driver only (default is PYSPARK_PYTHON).
pySpark use the code [here](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/PythonRunner.scala#L41) to determine the python executable for a python script.
Author: Sun Rui <rui.sun@intel.com>
Closes #9179 from sun-rui/SPARK-10971.
|
|
|
|
|
|
|
|
|
|
| |
A POC code for making example code in user guide testable.
mengxr We still need to talk about the labels in code.
Author: Xusen Yin <yinxusen@gmail.com>
Closes #9109 from yinxusen/SPARK-10382.
|
|
|
|
|
|
|
|
| |
Removed typo on line 8 in markdown : "Received" -> "Receiver"
Author: Rohan Bhanderi <rohan.bhanderi@sjsu.edu>
Closes #9242 from RohanBhanderi/patch-1.
|
|
|
|
|
|
|
|
| |
There's a lot of duplication between SortShuffleManager and UnsafeShuffleManager. Given that these now provide the same set of functionality, now that UnsafeShuffleManager supports large records, I think that we should replace SortShuffleManager's serialized shuffle implementation with UnsafeShuffleManager's and should merge the two managers together.
Author: Josh Rosen <joshrosen@databricks.com>
Closes #8829 from JoshRosen/consolidate-sort-shuffle-implementations.
|
|
|
|
|
|
|
|
|
|
|
| |
Currently log4j.properties file is not uploaded to executor's which is leading them to use the default values. This fix will make sure that file is always uploaded to distributed cache so that executor will use the latest settings.
If user specifies log configurations through --files then executors will be picking configs from --files instead of $SPARK_CONF_DIR/log4j.properties
Author: vundela <vsr@cloudera.com>
Author: Srinivasa Reddy Vundela <vsr@cloudera.com>
Closes #9118 from vundela/master.
|
|
|
|
|
|
|
|
| |
This patch fixes a small typo in the GraphX programming guide
Author: Lukasz Piepiora <lpiepiora@gmail.com>
Closes #9160 from lpiepiora/11174-fix-typo-in-graphx-programming-guide.
|
|
|
|
|
|
| |
Author: Britta Weber <britta.weber@elasticsearch.com>
Closes #9136 from brwe/typo-bellow.
|
|
|
|
|
|
|
|
|
|
| |
Add documentation for configuration:
- spark.sql.ui.retainedExecutions
- spark.streaming.ui.retainedBatches
Author: Nick Pritchard <nicholas.pritchard@falkonry.com>
Closes #9052 from pnpritchard/SPARK-11039.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch unifies the memory management of the storage and execution regions such that either side can borrow memory from each other. When memory pressure arises, storage will be evicted in favor of execution. To avoid regressions in cases where storage is crucial, we dynamically allocate a fraction of space for storage that execution cannot evict. Several configurations are introduced:
- **spark.memory.fraction (default 0.75)**: fraction of the heap space used for execution and storage. The lower this is, the more frequently spills and cached data eviction occur. The purpose of this config is to set aside memory for internal metadata, user data structures, and imprecise size estimation in the case of sparse, unusually large records.
- **spark.memory.storageFraction (default 0.5)**: size of the storage region within the space set aside by `spark.memory.fraction`. Cached data may only be evicted if total storage exceeds this region.
- **spark.memory.useLegacyMode (default false)**: whether to use the memory management that existed in Spark 1.5 and before. This is mainly for backward compatibility.
For a detailed description of the design, see [SPARK-10000](https://issues.apache.org/jira/browse/SPARK-10000). This patch builds on top of the `MemoryManager` interface introduced in #9000.
Author: Andrew Or <andrew@databricks.com>
Closes #9084 from andrewor14/unified-memory-manager.
|