| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Sorts all configuration options present on the `configuration.md` page to ease readability.
Author: Brennon York <brennon.york@capitalone.com>
Closes #3863 from brennonyork/SPARK-1182 and squashes the following commits:
5696f21 [Brennon York] fixed merge conflict with port comments
81a7b10 [Brennon York] capitalized A in Allocation
e240486 [Brennon York] moved all spark.mesos properties into the running-on-mesos doc
7de5f75 [Brennon York] moved serialization from application to compression and serialization section
a16fec0 [Brennon York] moved shuffle settings from network to shuffle
f8fa286 [Brennon York] sorted encryption category
1023f15 [Brennon York] moved initialExecutors
e9d62aa [Brennon York] fixed akka.heartbeat.interval
25e6f6f [Brennon York] moved spark.executer.user*
4625ade [Brennon York] added spark.executor.extra* items
4ee5648 [Brennon York] fixed merge conflicts
1b49234 [Brennon York] sorting mishap
2b5758b [Brennon York] sorting mishap
6fbdf42 [Brennon York] sorting mishap
55dc6f8 [Brennon York] sorted security
ec34294 [Brennon York] sorted dynamic allocation
2a7c4a3 [Brennon York] sorted scheduling
aa9acdc [Brennon York] sorted networking
a4380b8 [Brennon York] sorted execution behavior
27f3919 [Brennon York] sorted compression and serialization
80a5bbb [Brennon York] sorted spark ui
3f32e5b [Brennon York] sorted shuffle behavior
6c51b38 [Brennon York] sorted runtime environment
efe9d6f [Brennon York] sorted application properties
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
DataFrame.explain return wrong result when the query is DDL command.
For example, the following two queries should print out the same execution plan, but it not.
sql("create table tb as select * from src where key > 490").explain(true)
sql("explain extended create table tb as select * from src where key > 490")
This is because DataFrame.explain leverage logicalPlan which had been forced executed, we should use the unexecuted plan queryExecution.logical.
Author: Yanbo Liang <ybliang8@gmail.com>
Closes #4707 from yanboliang/spark-5926 and squashes the following commits:
fa6db63 [Yanbo Liang] logicalPlan is not lazy
0e40a1b [Yanbo Liang] make DataFrame.explain leverage queryExecution.logical
|
|
|
|
|
|
|
|
| |
Author: Liang-Chi Hsieh <viirya@gmail.com>
Closes #4760 from viirya/dup_literal and squashes the following commits:
06e7516 [Liang-Chi Hsieh] Remove duplicate Literal matching block.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
`ReadContext.init` calls `InitContext.getMergedKeyValueMetadata`, which doesn't know how to merge conflicting user defined key-value metadata and throws exception. In our case, when dealing with different but compatible schemas, we have different Spark SQL schema JSON strings in different Parquet part-files, thus causes this problem. Reading similar Parquet files generated by Hive doesn't suffer from this issue.
In this PR, we manually merge the schemas before passing it to `ReadContext` to avoid the exception.
<!-- Reviewable:start -->
[<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/4768)
<!-- Reviewable:end -->
Author: Cheng Lian <lian@databricks.com>
Closes #4768 from liancheng/spark-6010 and squashes the following commits:
9002f0a [Cheng Lian] Fixes SPARK-6010
|
|
|
|
|
|
|
|
|
|
|
| |
use RELEASE_VERSION when building the Python API docs
Author: Davies Liu <davies@databricks.com>
Closes #4731 from davies/api_version and squashes the following commits:
c9744c9 [Davies Liu] Update create-release.sh
08cbc3f [Davies Liu] fix python docs
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This metric is incomplete, because the files are memory mapped, so much of the read from disk occurs later as tasks actually read the file's data.
This should be merged into 1.3, so that we never expose this incorrect metric to users.
CC pwendell ksakellis sryza
Author: Kay Ousterhout <kayousterhout@gmail.com>
Closes #4749 from kayousterhout/SPARK-5982 and squashes the following commits:
9737b5e [Kay Ousterhout] More fixes
a1eb300 [Kay Ousterhout] Removed one more use of local read time
cf13497 [Kay Ousterhout] [SPARK-5982] Remove incorrectwq Local Read Time Metric
|
|
|
|
|
|
|
|
|
|
| |
Fixes the issue whereby when VertexRDD's are `diff`ed, `innerJoin`ed, or `leftJoin`ed and have different partition sizes they fail under the `zipPartitions` method. This fix tests whether the partitions are equal or not and, if not, will repartition the other to match the partition size of the calling VertexRDD.
Author: Brennon York <brennon.york@capitalone.com>
Closes #4705 from brennonyork/SPARK-1955 and squashes the following commits:
0882590 [Brennon York] updated to properly handle differently-partitioned vertexRDDs
|
|
|
|
|
|
|
|
|
|
|
|
| |
for automatic deletion.
As documented in createDirectory, the result of createDirectory is not registered for automatic removal. Currently there are 4 directories left in `/tmp` after just running `pyspark`.
Author: Milan Straka <fox@ucw.cz>
Closes #4759 from foxik/remove-tmp-dirs and squashes the following commits:
280450d [Milan Straka] Use createTempDir in getOrCreateLocalRootDirs...
|
|
|
|
|
|
|
|
|
|
|
|
| |
Clarify default max wait in spark.shuffle.io.retryWait docs
CC andrewor14
Author: Sean Owen <sowen@cloudera.com>
Closes #4769 from srowen/SPARK-5930 and squashes the following commits:
ae2792b [Sean Owen] Clarify default max wait in spark.shuffle.io.retryWait docs
|
|
|
|
|
|
|
|
| |
Author: Michael Armbrust <michael@databricks.com>
Closes #4757 from marmbrus/udtConversions and squashes the following commits:
3714aad [Michael Armbrust] [SPARK-5996][SQL] Fix specialized outbound conversions
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Web Page always be 0 if sc.stop() is called
In Standalone mode, the number of cores in Completed Applications of the Master Web Page will always be zero, if sc.stop() is called.
But the number will always be right, if sc.stop() is not called.
The reason maybe:
after sc.stop() is called, the function removeExecutor of class ApplicationInfo will be called, thus reduce the variable coresGranted to zero. The variable coresGranted is used to display the number of Cores on the Web Page.
Author: guliangliang <guliangliang@qiyi.com>
Closes #4567 from marsishandsome/Spark5771 and squashes the following commits:
694796e [guliangliang] remove duplicate code
a20e390 [guliangliang] change to Cores Using & Requested
0c19c95 [guliangliang] change Cores to Cores (max)
cfbd97d [guliangliang] [SPARK-5771] Number of Cores in Completed Applications of Standalone Master Web Page always be 0 if sc.stop() is called
|
|
|
|
|
|
|
|
|
|
| |
Corrected 3 Typos in the GraphX programming guide. I hope this is the correct way to contribute.
Author: Benedikt Linse <benedikt.linse@gmail.com>
Closes #4766 from 1123/master and squashes the following commits:
8a63812 [Benedikt Linse] fixing 3 typos in the graphx programming guide
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
modified to adhere to accepted coding standards as pointed by tdas in PR #3844
Author: prabs <prabsmails@gmail.com>
Author: Prabeesh K <prabsmails@gmail.com>
Closes #4178 from prabeesh/master and squashes the following commits:
bd2cb49 [Prabeesh K] adress the comment
ccc0765 [prabs] adress the comment
46f9619 [prabs] adress the comment
c035bdc [prabs] adress the comment
22dd7f7 [prabs] address the comments
0cc67bd [prabs] adress the comment
838c38e [prabs] adress the comment
cd57029 [prabs] address the comments
66919a3 [Prabeesh K] changed MqttDefaultFilePersistence to MemoryPersistence
5857989 [prabs] modified to adhere to accepted coding standards
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
select empty should NOT be the same as select. make sure selectExpr is behaving the same.
join param documentation
link to source doesn't work in jekyll generated file
cross reference of columns (i.e. enabling linking)
show(): move df example before df.show()
move tests in SQLContext out of docstring otherwise doc is too long
Column.desc and .asc doesn't have any documentation
in documentation, sort functions.*)
Author: Davies Liu <davies@databricks.com>
Closes #4756 from davies/df_docs and squashes the following commits:
f30502c [Davies Liu] fix doc
32f0d46 [Davies Liu] fix DataFrame docs
|
|
|
|
|
|
|
|
|
|
| |
https://issues.apache.org/jira/browse/SPARK-5286
Author: Yin Huai <yhuai@databricks.com>
Closes #4755 from yhuai/SPARK-5286-throwable and squashes the following commits:
4c0c450 [Yin Huai] Catch Throwable instead of Exception.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Published Kafka-assembly JAR was empty in 1.3.0-RC1
This is because the maven build generated two Jars-
1. an empty JAR file (since kafka-assembly has no code of its own)
2. a assembly JAR file containing everything in a different location as 1
The maven publishing plugin uploaded 1 and not 2.
Instead if 2 is not configure to generate in a different location, there is only 1 jar containing everything, which gets published.
Author: Tathagata Das <tathagata.das1565@gmail.com>
Closes #4753 from tdas/SPARK-5993 and squashes the following commits:
c390db8 [Tathagata Das] Fix assembly jar location of kafka-assembly
|
|
|
|
|
|
|
|
|
|
|
|
| |
Also added desc/asc function for constructing sorting expressions more conveniently. And added a small fix to lift alias out of cast expression.
Author: Reynold Xin <rxin@databricks.com>
Closes #4752 from rxin/SPARK-5985 and squashes the following commits:
aeda5ae [Reynold Xin] Added Experimental flag to ColumnName.
047ad03 [Reynold Xin] Lift alias out of cast.
c9cf17c [Reynold Xin] [SPARK-5985][SQL] DataFrame sortBy -> orderBy in Python.
|
|
|
|
|
|
|
|
|
|
|
|
| |
Added a new test suite to make sure Java DF programs can use varargs properly.
Also moved all suites into test.org.apache.spark package to make sure the suites also test for method visibility.
Author: Reynold Xin <rxin@databricks.com>
Closes #4751 from rxin/df-tests and squashes the following commits:
1e8b8e4 [Reynold Xin] Fixed imports and renamed JavaAPISuite.
a6ca53b [Reynold Xin] [SPARK-5904][SQL] DataFrame Java API test suites.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
**NOTICE** Do NOT merge this, as we're waiting for #3881 to be merged.
`HiveThriftServer2Suite` has been notorious for its flakiness for a while. This was mostly due to spawning and communicate with external server processes. This PR revamps this test suite for better robustness:
1. Fixes a racing condition occurred while using `tail -f` to check log file
It's possible that the line we are looking for has already been printed into the log file before we start the `tail -f` process. This PR uses `tail -n +0 -f` to ensure all lines are checked.
2. Retries up to 3 times if the server fails to start
In most of the cases, the server fails to start because of port conflict. This PR no longer asks the system to choose an available TCP port, but uses a random port first, and retries up to 3 times if the server fails to start.
3. A server instance is reused among all test cases within a single suite
The original `HiveThriftServer2Suite` is splitted into two test suites, `HiveThriftBinaryServerSuite` and `HiveThriftHttpServerSuite`. Each suite starts a `HiveThriftServer2` instance and reuses it for all of its test cases.
**TODO**
- [ ] Starts the Thrift server in foreground once #3881 is merged (adding `--foreground` flag to `spark-daemon.sh`)
<!-- Reviewable:start -->
[<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/4720)
<!-- Reviewable:end -->
Author: Cheng Lian <lian@databricks.com>
Closes #4720 from liancheng/revamp-thrift-server-tests and squashes the following commits:
d6c80eb [Cheng Lian] Relaxes server startup timeout
6f14eb1 [Cheng Lian] Revamped HiveThriftServer2Suite for robustness
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
One can early stop if the decrease in error rate is lesser than a certain tol or if the error increases if the training data is overfit.
This introduces a new method runWithValidation which takes in a pair of RDD's , one for the training data and the other for the validation.
Author: MechCoder <manojkumarsivaraj334@gmail.com>
Closes #4677 from MechCoder/spark-5436 and squashes the following commits:
1bb21d4 [MechCoder] Combine regression and classification tests into a single one
e4d799b [MechCoder] Addresses indentation and doc comments
b48a70f [MechCoder] COSMIT
b928a19 [MechCoder] Move validation while training section under usage tips
fad9b6e [MechCoder] Made the following changes 1. Add section to documentation 2. Return corresponding to bestValidationError 3. Allow negative tolerance.
55e5c3b [MechCoder] One liner for prevValidateError
3e74372 [MechCoder] TST: Add test for classification
77549a9 [MechCoder] [SPARK-5436] Validate GradientBoostedTrees using runWithValidation
|
|
|
|
|
|
|
|
|
|
| |
Author: Davies Liu <davies@databricks.com>
Closes #4745 from davies/fix_zip and squashes the following commits:
2124b2c [Davies Liu] Update tests.py
b5c828f [Davies Liu] increase the number of records
c1e40fd [Davies Liu] fix zip with two RDDs with AutoBatchedSerializer
|
|
|
|
|
|
|
|
| |
Author: Michael Armbrust <michael@databricks.com>
Closes #4746 from marmbrus/hiveLock and squashes the following commits:
8b871cf [Michael Armbrust] [SPARK-5952][SQL] Lock when using hive metastore client
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add Slf4jSink to Spark Metrics using Coda Hale's SlfjReporter.
This sends metrics to log4j, allowing spark users to reuse log4j pipeline for metrics collection.
Reviewed existing unit tests and didn't see any sink-related tests. Please advise on if tests should be added.
Author: Judy <judynash@microsoft.com>
Author: judynash <judynash@microsoft.com>
Closes #4644 from judynash/master and squashes the following commits:
57ef214 [judynash] doc clarification and indent fixes
a751a66 [Judy] Spark-5708: Add Slf4jSink to Spark Metrics
|
|
|
|
|
|
|
|
|
|
| |
Variance is calculated on labels/responses.
Author: Xiangrui Meng <meng@databricks.com>
Closes #4740 from mengxr/patch-1 and squashes the following commits:
673317b [Xiangrui Meng] [MLLIB] Change x_i to y_i in Variance's user guide
|
|
|
|
|
|
|
|
|
|
|
| |
For screenshot see: https://issues.apache.org/jira/browse/SPARK-5965
This was caused by 20a6013106b56a1a1cc3e8cda092330ffbe77cc3.
Author: Andrew Or <andrew@databricks.com>
Closes #4739 from andrewor14/user-jar-blocker and squashes the following commits:
23c4a9e [Andrew Or] Use right argument
|
|
|
|
|
|
|
|
|
|
|
|
| |
Patch should be self-explanatory
pwendell JoshRosen
Author: Tathagata Das <tathagata.das1565@gmail.com>
Closes #4741 from tdas/SPARK-5967 and squashes the following commits:
653b5bb [Tathagata Das] Fixed the fix and added test
e2de972 [Tathagata Das] Clear stages which have no corresponding active jobs.
|
|
|
|
|
|
|
|
|
| |
Author: Michael Armbrust <michael@databricks.com>
Closes #4738 from marmbrus/udtRepart and squashes the following commits:
c06d7b5 [Michael Armbrust] fix compilation
91c8829 [Michael Armbrust] [SQL][SPARK-5532] Repartition should not use external rdd representation
|
|
|
|
|
|
|
|
| |
Author: Michael Armbrust <michael@databricks.com>
Closes #4736 from marmbrus/asExprs and squashes the following commits:
5ba97e4 [Michael Armbrust] [SPARK-5910][SQL] Support for as in selectExpr
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Please refer to the [JIRA ticket] [1] for the motivation.
[1]: https://issues.apache.org/jira/browse/SPARK-5968
<!-- Reviewable:start -->
[<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/4744)
<!-- Reviewable:end -->
Author: Cheng Lian <lian@databricks.com>
Closes #4744 from liancheng/spark-5968 and squashes the following commits:
caac6a8 [Cheng Lian] Suppresses ParquetOutputCommitter WARN logs
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Removed SVD code from examples.
* Corrected Java API doc link.
* Updated variable names: `AtransposeA` -> `ata`.
* Minor changes.
brkyvz
Author: Xiangrui Meng <meng@databricks.com>
Closes #4737 from mengxr/update-block-matrix-user-guide and squashes the following commits:
70f53ac [Xiangrui Meng] update block matrix user guide
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Author: Michael Armbrust <michael@databricks.com>
Closes #4684 from marmbrus/explainAnalysis and squashes the following commits:
afbaa19 [Michael Armbrust] fix python
d93278c [Michael Armbrust] fix hive
e5fa0a4 [Michael Armbrust] Merge remote-tracking branch 'origin/master' into explainAnalysis
52119f2 [Michael Armbrust] more tests
82a5431 [Michael Armbrust] fix tests
25753d2 [Michael Armbrust] Merge remote-tracking branch 'origin/master' into explainAnalysis
aee1e6a [Michael Armbrust] fix hive
b23a844 [Michael Armbrust] newline
de8dc51 [Michael Armbrust] more comments
acf620a [Michael Armbrust] [SPARK-5873][SQL] Show partially analyzed plans in query execution
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
JIRA: https://issues.apache.org/jira/browse/SPARK-5935
Author: Yin Huai <yhuai@databricks.com>
Author: Yin Huai <huai@cse.ohio-state.edu>
Closes #4710 from yhuai/jsonMapType and squashes the following commits:
3e40390 [Yin Huai] Remove unnecessary changes.
f8e6267 [Yin Huai] Fix test.
baa36e3 [Yin Huai] Accept MapType in the schema provided to jsonFile/jsonRDD.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fixes:
* typo in Scala example
* Removed comment "usually applied on sparse data" since that is debatable
* small edits to text for clarity
CC: avulanov I noticed a typo post-hoc and ended up making a few small edits. Do the changes look OK?
Author: Joseph K. Bradley <joseph@databricks.com>
Closes #4732 from jkbradley/chisqselector-docs and squashes the following commits:
9656a3b [Joseph K. Bradley] added Java example for ChiSqSelector to guide
3f3f9f4 [Joseph K. Bradley] small fixes to ChiSqSelector docs
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Added description of ChiSqSelector and few words about feature selection in general. I could add a code example, however it would not look reasonable in the absence of feature discretizer or a dataset in the `data` folder that has redundant features.
Author: Alexander Ulanov <nashb@yandex.ru>
Closes #4709 from avulanov/SPARK-5912 and squashes the following commits:
19a8a4e [Alexander Ulanov] Addressing reviewers comments @jkbradley
58d9e4d [Alexander Ulanov] Addressing reviewers comments @jkbradley
eb6b9fe [Alexander Ulanov] Typo
2921a1d [Alexander Ulanov] ChiSqSelector example of use
c845350 [Alexander Ulanov] ChiSqSelector docs
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add parameter parsing in FPGrowth example app in Scala and Java
And a sample data file is added in data/mllib folder
Author: Jacky Li <jacky.likun@huawei.com>
Closes #4714 from jackylk/parameter and squashes the following commits:
8c478b3 [Jacky Li] fix according to comments
3bb74f6 [Jacky Li] make FPGrowth exampl app take parameters
f0e4d10 [Jacky Li] make FPGrowth exampl app take parameters
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
https://issues.apache.org/jira/browse/SPARK-5724
In AkkaUtil, we set several failure detector related the parameters as following
```
al akkaConf = ConfigFactory.parseMap(conf.getAkkaConf.toMap[String, String])
.withFallback(akkaSslConfig).withFallback(ConfigFactory.parseString(
s"""
|akka.daemonic = on
|akka.loggers = [""akka.event.slf4j.Slf4jLogger""]
|akka.stdout-loglevel = "ERROR"
|akka.jvm-exit-on-fatal-error = off
|akka.remote.require-cookie = "$requireCookie"
|akka.remote.secure-cookie = "$secureCookie"
|akka.remote.transport-failure-detector.heartbeat-interval = $akkaHeartBeatInterval s
|akka.remote.transport-failure-detector.acceptable-heartbeat-pause = $akkaHeartBeatPauses s
|akka.remote.transport-failure-detector.threshold = $akkaFailureDetector
|akka.actor.provider = "akka.remote.RemoteActorRefProvider"
|akka.remote.netty.tcp.transport-class = "akka.remote.transport.netty.NettyTransport"
|akka.remote.netty.tcp.hostname = "$host"
|akka.remote.netty.tcp.port = $port
|akka.remote.netty.tcp.tcp-nodelay = on
|akka.remote.netty.tcp.connection-timeout = $akkaTimeout s
|akka.remote.netty.tcp.maximum-frame-size = ${akkaFrameSize}B
|akka.remote.netty.tcp.execution-pool-size = $akkaThreads
|akka.actor.default-dispatcher.throughput = $akkaBatchSize
|akka.log-config-on-start = $logAkkaConfig
|akka.remote.log-remote-lifecycle-events = $lifecycleEvents
|akka.log-dead-letters = $lifecycleEvents
|akka.log-dead-letters-during-shutdown = $lifecycleEvents
""".stripMargin))
```
Actually, we do not have any parameter naming "akka.remote.transport-failure-detector.threshold"
see: http://doc.akka.io/docs/akka/2.3.4/general/configuration.html
what we have is "akka.remote.watch-failure-detector.threshold"
Author: CodingCat <zhunansjtu@gmail.com>
Closes #4512 from CodingCat/SPARK-5724 and squashes the following commits:
bafe56e [CodingCat] fix the grammar in configuration doc
338296e [CodingCat] remove failure-detector related info
8bfcfd4 [CodingCat] fix the misconfiguration in AkkaUtils
|
|
|
|
|
|
|
|
| |
Author: Saisai Shao <saisai.shao@intel.com>
Closes #4722 from jerryshao/SPARK-5943 and squashes the following commits:
1b01233 [Saisai Shao] Update the test to use new API to reduce the warning
|
|
|
|
|
|
|
|
| |
Author: Makoto Fukuhara <fukuo33@gmail.com>
Closes #4724 from fukuo33/fix-typo and squashes the following commits:
8c806b9 [Makoto Fukuhara] fix typo.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
longer used
Instead of storing a strong reference to accumulators, I've replaced this with a weak reference and updated any code that uses these accumulators to check whether the reference resolves before using the accumulator. A weak reference will be cleared when there is no longer an existing copy of the variable versus using a soft reference in which case accumulators would only be cleared when the GC explicitly ran out of memory.
Author: Ilya Ganelin <ilya.ganelin@capitalone.com>
Closes #4021 from ilganeli/SPARK-3885 and squashes the following commits:
4ba9575 [Ilya Ganelin] Fixed error in test suite
8510943 [Ilya Ganelin] Extra code
bb76ef0 [Ilya Ganelin] File deleted somehow
283a333 [Ilya Ganelin] Added cleanup method for accumulators to remove stale references within Accumulators.original to accumulators that are now out of scope
345fd4f [Ilya Ganelin] Merge remote-tracking branch 'upstream/master' into SPARK-3885
7485a82 [Ilya Ganelin] Fixed build error
c8e0f2b [Ilya Ganelin] Added working test for accumulator garbage collection
94ce754 [Ilya Ganelin] Still not being properly garbage collected
8722b63 [Ilya Ganelin] Fixing gc test
7414a9c [Ilya Ganelin] Added test for accumulator garbage collection
18d62ec [Ilya Ganelin] Updated to throw Exception when accessing a GCd accumulator
9a81928 [Ilya Ganelin] Reverting permissions changes
28f705c [Ilya Ganelin] Merge remote-tracking branch 'upstream/master' into SPARK-3885
b820ab4b [Ilya Ganelin] reset
d78f4bf [Ilya Ganelin] Removed obsolete comment
0746e61 [Ilya Ganelin] Updated DAGSchedulerSUite to fix bug
3350852 [Ilya Ganelin] Updated DAGScheduler and Suite to correctly use new implementation of WeakRef Accumulator storage
c49066a [Ilya Ganelin] Merge remote-tracking branch 'upstream/master' into SPARK-3885
cbb9023 [Ilya Ganelin] Merge remote-tracking branch 'upstream/master' into SPARK-3885
a77d11b [Ilya Ganelin] Updated Accumulators class to store weak references instead of strong references to allow garbage collection of old accumulators
|
|
|
|
|
|
|
|
|
|
| |
...th RangePartitioner
Author: Aaron Josephs <ajoseph4@binghamton.edu>
Closes #1381 from aaronjosephs/PLAT-911 and squashes the following commits:
e30ade5 [Aaron Josephs] [SPARK-911] allow efficient queries for a range if RDD is partitioned with RangePartitioner
|
|
|
|
|
|
|
|
| |
Author: Cheng Hao <hao.cheng@intel.com>
Closes #4717 from chenghao-intel/typo1 and squashes the following commits:
858d7b0 [Cheng Hao] update the typo
|
|
|
|
|
|
|
|
|
|
|
|
| |
MapReduce API
This looks like a simple typo ```SparkContext.newHadoopRDD``` instead of ```SparkContext.newAPIHadoopRDD``` as in actual http://spark.apache.org/docs/1.2.1/api/scala/index.html#org.apache.spark.SparkContext
Author: Alexander <abezzubov@nflabs.com>
Closes #4718 from bzz/hadoop-InputFormats-doc-fix and squashes the following commits:
680a4c4 [Alexander] Fix typo in docs on custom Hadoop InputFormats
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit exists to close the following pull requests on Github:
Closes #3490 (close requested by 'andrewor14')
Closes #4646 (close requested by 'srowen')
Closes #3591 (close requested by 'andrewor14')
Closes #3656 (close requested by 'andrewor14')
Closes #4553 (close requested by 'JoshRosen')
Closes #4202 (close requested by 'srowen')
Closes #4497 (close requested by 'marmbrus')
Closes #4150 (close requested by 'andrewor14')
Closes #2409 (close requested by 'andrewor14')
Closes #4221 (close requested by 'srowen')
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
partitions
Fix a overflow bug in JdbcRDD when calculating partitions for large BIGINT ids
Author: Evan Yu <ehotou@gmail.com>
Closes #4701 from hotou/SPARK-5860 and squashes the following commits:
9e038d1 [Evan Yu] [SPARK-5860][CORE] Prevent overflowing at the length level
7883ad9 [Evan Yu] [SPARK-5860][CORE] Prevent overflowing at the length level
c88755a [Evan Yu] [SPARK-5860][CORE] switch to BigInt instead of BigDecimal
4e9ff4f [Evan Yu] [SPARK-5860][CORE] JdbcRDD overflow on large range with high number of partitions
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
class is used in t...
...ests.
Without this SparkHadoopUtil is used by the Client instead of YarnSparkHadoopUtil.
Author: Hari Shreedharan <hshreedharan@apache.org>
Closes #4711 from harishreedharan/SPARK-5937 and squashes the following commits:
d154de6 [Hari Shreedharan] Use System.clearProperty() instead of setting the value of SPARK_YARN_MODE to empty string.
f729f70 [Hari Shreedharan] Fix ClientSuite to set YARN mode, so that the correct class is used in tests.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Continue to see IllegalStateException in YARN cluster mode. Adding a simple workaround for now.
Author: Nishkam Ravi <nravi@cloudera.com>
Author: nishkamravi2 <nishkamravi@gmail.com>
Author: nravi <nravi@c1704.halxg.cloudera.com>
Closes #4690 from nishkamravi2/master_nravi and squashes the following commits:
d453197 [nishkamravi2] Update NewHadoopRDD.scala
6f41a1d [nishkamravi2] Update NewHadoopRDD.scala
0ce2c32 [nishkamravi2] Update HadoopRDD.scala
f7e33c2 [Nishkam Ravi] Merge branch 'master_nravi' of https://github.com/nishkamravi2/spark into master_nravi
ba1eb8b [Nishkam Ravi] Try-catch block around the two occurrences of removeShutDownHook. Deletion of semi-redundant occurrences of expensive operation inShutDown.
71d0e17 [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi
494d8c0 [nishkamravi2] Update DiskBlockManager.scala
3c5ddba [nishkamravi2] Update DiskBlockManager.scala
f0d12de [Nishkam Ravi] Workaround for IllegalStateException caused by recent changes to BlockManager.stop
79ea8b4 [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi
b446edc [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi
5c9a4cb [nishkamravi2] Update TaskSetManagerSuite.scala
535295a [nishkamravi2] Update TaskSetManager.scala
3e1b616 [Nishkam Ravi] Modify test for maxResultSize
9f6583e [Nishkam Ravi] Changes to maxResultSize code (improve error message and add condition to check if maxResultSize > 0)
5f8f9ed [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi
636a9ff [nishkamravi2] Update YarnAllocator.scala
8f76c8b [Nishkam Ravi] Doc change for yarn memory overhead
35daa64 [Nishkam Ravi] Slight change in the doc for yarn memory overhead
5ac2ec1 [Nishkam Ravi] Remove out
dac1047 [Nishkam Ravi] Additional documentation for yarn memory overhead issue
42c2c3d [Nishkam Ravi] Additional changes for yarn memory overhead issue
362da5e [Nishkam Ravi] Additional changes for yarn memory overhead
c726bd9 [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi
f00fa31 [Nishkam Ravi] Improving logging for AM memoryOverhead
1cf2d1e [nishkamravi2] Update YarnAllocator.scala
ebcde10 [Nishkam Ravi] Modify default YARN memory_overhead-- from an additive constant to a multiplier (redone to resolve merge conflicts)
2e69f11 [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi
efd688a [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark
2b630f9 [nravi] Accept memory input as "30g", "512M" instead of an int value, to be consistent with rest of Spark
3bf8fad [nravi] Merge branch 'master' of https://github.com/apache/spark
5423a03 [nravi] Merge branch 'master' of https://github.com/apache/spark
eb663ca [nravi] Merge branch 'master' of https://github.com/apache/spark
df2aeb1 [nravi] Improved fix for ConcurrentModificationIssue (Spark-1097, Hadoop-10456)
6b840f0 [nravi] Undo the fix for SPARK-1758 (the problem is fixed)
5108700 [nravi] Fix in Spark for the Concurrent thread modification issue (SPARK-1097, HADOOP-10456)
681b36f [nravi] Fix for SPARK-1758: failing test org.apache.spark.JavaAPISuite.wholeTextFiles
|
|
|
|
|
|
|
|
|
|
| |
fix typo: it should be "default:" instead of "default;"
Author: Jacky Li <jackylk@users.noreply.github.com>
Closes #4713 from jackylk/patch-10 and squashes the following commits:
15daf2e [Jacky Li] [MLlib] fix typo
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
tuple/list
Fix createDataFrame() from pandas DataFrame (not tested by jenkins, depends on SPARK-5693).
It also support to create DataFrame from plain tuple/list without column names, `_1`, `_2` will be used as column names.
Author: Davies Liu <davies@databricks.com>
Closes #4679 from davies/pandas and squashes the following commits:
c0cbe0b [Davies Liu] fix tests
8466d1d [Davies Liu] fix create DataFrame from pandas
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For SPARK-5867:
* The spark.ml programming guide needs to be updated to use the new SQL DataFrame API instead of the old SchemaRDD API.
* It should also include Python examples now.
For SPARK-5892:
* Fix Python docs
* Various other cleanups
BTW, I accidentally merged this with master. If you want to compile it on your own, use this branch which is based on spark/branch-1.3 and cherry-picks the commits from this PR: [https://github.com/jkbradley/spark/tree/doc-review-1.3-check]
CC: mengxr (ML), davies (Python docs)
Author: Joseph K. Bradley <joseph@databricks.com>
Closes #4675 from jkbradley/doc-review-1.3 and squashes the following commits:
f191bb0 [Joseph K. Bradley] small cleanups
e786efa [Joseph K. Bradley] small doc corrections
6b1ab4a [Joseph K. Bradley] fixed python lint test
946affa [Joseph K. Bradley] Added sample data for ml.MovieLensALS example. Changed spark.ml Java examples to use DataFrames API instead of sql()
da81558 [Joseph K. Bradley] Merge remote-tracking branch 'upstream/master' into doc-review-1.3
629dbf5 [Joseph K. Bradley] Updated based on code review: * made new page for old migration guides * small fixes * moved inherit_doc in python
b9df7c4 [Joseph K. Bradley] Small cleanups: toDF to toDF(), adding s for string interpolation
34b067f [Joseph K. Bradley] small doc correction
da16aef [Joseph K. Bradley] Fixed python mllib docs
8cce91c [Joseph K. Bradley] GMM: removed old imports, added some doc
695f3f6 [Joseph K. Bradley] partly done trying to fix inherit_doc for class hierarchies in python docs
a72c018 [Joseph K. Bradley] made ChiSqTestResult appear in python docs
b05a80d [Joseph K. Bradley] organize imports. doc cleanups
e572827 [Joseph K. Bradley] updated programming guide for ml and mllib
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Follow-on to https://github.com/apache/spark/pull/4591
Document isEmpty / take / parallelize and their interaction with (an empty) RDD[Nothing] and RDD[Null]. Also, fix a marginally related minor issue with histogram() and EmptyRDD.
CC rxin since you reviewed the last one although I imagine this is an uncontroversial resolution.
Author: Sean Owen <sowen@cloudera.com>
Closes #4698 from srowen/SPARK-5744.2 and squashes the following commits:
9b2a811 [Sean Owen] 2 extra javadoc fixes
d1b9fba [Sean Owen] Document isEmpty / take / parallelize and their interaction with (an empty) RDD[Nothing] and RDD[Null]. Also, fix a marginally related minor issue with histogram() and EmptyRDD.
|