| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
| |
This reverts commit 5a514b61bbfb609c505d8d65f2483068a56f1f70.
|
|
|
|
|
|
|
|
| |
This commit is to resolve SPARK-12396.
Author: echo2mei <534384876@qq.com>
Closes #10354 from echoTomei/master.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This PR makes JSON parser and schema inference handle more cases where we have unparsed records. It is based on #10043. The last commit fixes the failed test and updates the logic of schema inference.
Regarding the schema inference change, if we have something like
```
{"f1":1}
[1,2,3]
```
originally, we will get a DF without any column.
After this change, we will get a DF with columns `f1` and `_corrupt_record`. Basically, for the second row, `[1,2,3]` will be the value of `_corrupt_record`.
When merge this PR, please make sure that the author is simplyianm.
JIRA: https://issues.apache.org/jira/browse/SPARK-12057
Closes #10043
Author: Ian Macalinao <me@ian.pw>
Author: Yin Huai <yhuai@databricks.com>
Closes #10288 from yhuai/handleCorruptJson.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
when invFunc is None
when invFunc is None, `reduceByKeyAndWindow(func, None, winsize, slidesize)` is equivalent to
reduceByKey(func).window(winsize, slidesize).reduceByKey(winsize, slidesize)
and no checkpoint is necessary. The corresponding Scala code does exactly that, but Python code always creates a windowed stream with obligatory checkpointing. The patch fixes this.
I do not know how to unit-test this.
Author: David Tolpin <david.tolpin@gmail.com>
Closes #9888 from dtolpin/master.
|
|
|
|
|
|
|
|
| |
No change in functionality is intended. This only changes internal API.
Author: Andrew Or <andrew@databricks.com>
Closes #10343 from andrewor14/clean-bm-serializer.
|
|
|
|
|
|
| |
Author: Marcelo Vanzin <vanzin@cloudera.com>
Closes #10339 from vanzin/SPARK-12386.
|
|
|
|
|
|
|
|
| |
string when redirecting.
Author: Rohit Agarwal <rohita@qubole.com>
Closes #10180 from mindprince/SPARK-12186.
|
|
|
|
|
|
|
|
|
|
|
|
| |
Runtime.getRuntime.addShutdownHook() is called
SPARK-9886 fixed ExternalBlockStore.scala
This PR fixes the remaining references to Runtime.getRuntime.addShutdownHook()
Author: tedyu <yuzhihong@gmail.com>
Closes #10325 from ted-yu/master.
|
|
|
|
|
|
|
|
|
|
| |
`DAGSchedulerEventLoop` normally only logs errors (so it can continue to process more events, from other jobs). However, this is not desirable in the tests -- the tests should be able to easily detect any exception, and also shouldn't silently succeed if there is an exception.
This was suggested by mateiz on https://github.com/apache/spark/pull/7699. It may have already turned up an issue in "zero split job".
Author: Imran Rashid <irashid@cloudera.com>
Closes #8466 from squito/SPARK-10248.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit exists to close the following pull requests on Github:
Closes #1217 (requested by ankurdave, srowen)
Closes #4650 (requested by andrewor14)
Closes #5307 (requested by vanzin)
Closes #5664 (requested by andrewor14)
Closes #5713 (requested by marmbrus)
Closes #5722 (requested by andrewor14)
Closes #6685 (requested by srowen)
Closes #7074 (requested by srowen)
Closes #7119 (requested by andrewor14)
Closes #7997 (requested by jkbradley)
Closes #8292 (requested by srowen)
Closes #8975 (requested by andrewor14, vanzin)
Closes #8980 (requested by andrewor14, davies)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
```
Exception in thread "main" org.apache.spark.rpc.RpcTimeoutException:
Cannot receive any reply in ${timeout.duration}. This timeout is controlled by spark.rpc.askTimeout
at org.apache.spark.rpc.RpcTimeout.org$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:48)
at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:63)
at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:33)
```
Author: Andrew Or <andrew@databricks.com>
Closes #10334 from andrewor14/rpc-typo.
|
|
|
|
|
|
|
|
| |
MLlib should use SQLContext.getOrCreate() instead of creating new SQLContext.
Author: Davies Liu <davies@databricks.com>
Closes #10338 from davies/create_context.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Extend CrossValidator with HasSeed in PySpark.
This PR replaces [https://github.com/apache/spark/pull/7997]
CC: yanboliang thunterdb mmenestret Would one of you mind taking a look? Thanks!
Author: Joseph K. Bradley <joseph@databricks.com>
Author: Martin MENESTRET <mmenestret@ippon.fr>
Closes #10268 from jkbradley/pyspark-cv-seed.
|
|
|
|
|
|
|
|
|
|
|
| |
pushed down.
Currently ORC filters are not tested properly. All the tests pass even if the filters are not pushed down or disabled. In this PR, I add some logics for this.
Since ORC does not filter record by record fully, this checks the count of the result and if it contains the expected values.
Author: hyukjinkwon <gurwls223@gmail.com>
Closes #9687 from HyukjinKwon/SPARK-11677.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Based on the suggestions from marmbrus cloud-fan in https://github.com/apache/spark/pull/10165 , this PR is to print the decoded values(user objects) in `Dataset.show`
```scala
implicit val kryoEncoder = Encoders.kryo[KryoClassData]
val ds = Seq(KryoClassData("a", 1), KryoClassData("b", 2), KryoClassData("c", 3)).toDS()
ds.show(20, false);
```
The current output is like
```
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|value |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|[1, 0, 111, 114, 103, 46, 97, 112, 97, 99, 104, 101, 46, 115, 112, 97, 114, 107, 46, 115, 113, 108, 46, 75, 114, 121, 111, 67, 108, 97, 115, 115, 68, 97, 116, -31, 1, 1, -126, 97, 2]|
|[1, 0, 111, 114, 103, 46, 97, 112, 97, 99, 104, 101, 46, 115, 112, 97, 114, 107, 46, 115, 113, 108, 46, 75, 114, 121, 111, 67, 108, 97, 115, 115, 68, 97, 116, -31, 1, 1, -126, 98, 4]|
|[1, 0, 111, 114, 103, 46, 97, 112, 97, 99, 104, 101, 46, 115, 112, 97, 114, 107, 46, 115, 113, 108, 46, 75, 114, 121, 111, 67, 108, 97, 115, 115, 68, 97, 116, -31, 1, 1, -126, 99, 6]|
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
```
After the fix, it will be like the below if and only if the users override the `toString` function in the class `KryoClassData`
```scala
override def toString: String = s"KryoClassData($a, $b)"
```
```
+-------------------+
|value |
+-------------------+
|KryoClassData(a, 1)|
|KryoClassData(b, 2)|
|KryoClassData(c, 3)|
+-------------------+
```
If users do not override the `toString` function, the results will be like
```
+---------------------------------------+
|value |
+---------------------------------------+
|org.apache.spark.sql.KryoClassData68ef|
|org.apache.spark.sql.KryoClassData6915|
|org.apache.spark.sql.KryoClassData693b|
+---------------------------------------+
```
Question: Should we add another optional parameter in the function `show`? It will decide if the function `show` will display the hex values or the object values?
Author: gatorsmile <gatorsmile@gmail.com>
Closes #10215 from gatorsmile/showDecodedValue.
|
|
|
|
|
|
|
|
| |
for Tuple encoder
Author: Wenchen Fan <wenchen@databricks.com>
Closes #10293 from cloud-fan/err-msg.
|
|
|
|
|
|
|
|
|
|
| |
We have DataFrame example for SparkR, we also need to add ML example under ```examples/src/main/r```.
cc mengxr jkbradley shivaram
Author: Yanbo Liang <ybliang8@gmail.com>
Closes #10324 from yanboliang/spark-12364.
|
|
|
|
|
|
|
|
|
|
| |
No known breaking changes, but some deprecations and changes of behavior.
CC: mengxr
Author: Joseph K. Bradley <joseph@databricks.com>
Closes #10235 from jkbradley/mllib-guide-update-1.6.
|
|
|
|
|
|
|
|
|
|
| |
tests
Although this patch still doesn't solve the issue why the return code is 0 (see JIRA description), it resolves the issue of python version mismatch.
Author: Jeff Zhang <zjffdu@apache.org>
Closes #10322 from zjffdu/SPARK-12361.
|
|
|
|
|
|
|
|
|
|
|
|
| |
test suites
Use ```sqlContext``` from ```MLlibTestSparkContext``` rather than creating new one for spark.ml test suites. I have checked thoroughly and found there are four test cases need to update.
cc mengxr jkbradley
Author: Yanbo Liang <ybliang8@gmail.com>
Closes #10279 from yanboliang/spark-12309.
|
|
|
|
|
|
|
|
| |
Add random seed Param to Scala CrossValidator
Author: Yanbo Liang <ybliang8@gmail.com>
Closes #9108 from yanboliang/spark-9694.
|
|
|
|
|
|
|
|
|
|
|
| |
bisecting k-means
This PR includes only an example code in order to finish it quickly.
I'll send another PR for the docs soon.
Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com>
Closes #9952 from yu-iskw/SPARK-6518.
|
|
|
|
|
|
|
|
|
|
|
|
| |
cluster mode.
SPARK_HOME is now causing problem with Mesos cluster mode since spark-submit script has been changed recently to take precendence when running spark-class scripts to look in SPARK_HOME if it's defined.
We should skip passing SPARK_HOME from the Spark client in cluster mode with Mesos, since Mesos shouldn't use this configuration but should use spark.executor.home instead.
Author: Timothy Chen <tnachen@gmail.com>
Closes #10332 from tnachen/scheduler_ui.
|
|
|
|
|
|
|
|
| |
cc jkbradley
Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com>
Closes #10244 from yu-iskw/SPARK-12215.
|
|
|
|
|
|
|
|
| |
Add ```write.json``` and ```write.parquet``` for SparkR, and deprecated ```saveAsParquetFile```.
Author: Yanbo Liang <ybliang8@gmail.com>
Closes #10281 from yanboliang/spark-12310.
|
|
|
|
|
|
|
|
| |
shivaram Please help review.
Author: Jeff Zhang <zjffdu@apache.org>
Closes #10290 from zjffdu/SPARK-12318.
|
|
|
|
|
|
|
|
| |
cc rxin
Author: Davies Liu <davies@databricks.com>
Closes #10316 from davies/remove_generate_projection.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This fixes the sidebar, using a pure CSS mechanism to hide it when the browser's viewport is too narrow.
Credit goes to the original author Titan-C (mentioned in the NOTICE).
Note that I am not a CSS expert, so I can only address comments up to some extent.
Default view:
<img width="936" alt="screen shot 2015-12-14 at 12 46 39 pm" src="https://cloud.githubusercontent.com/assets/7594753/11793597/6d1d6eda-a261-11e5-836b-6eb2054e9054.png">
When collapsed manually by the user:
<img width="1004" alt="screen shot 2015-12-14 at 12 54 02 pm" src="https://cloud.githubusercontent.com/assets/7594753/11793669/c991989e-a261-11e5-8bf6-aecf3bdb6319.png">
Disappears when column is too narrow:
<img width="697" alt="screen shot 2015-12-14 at 12 47 22 pm" src="https://cloud.githubusercontent.com/assets/7594753/11793607/7754dbcc-a261-11e5-8b15-e0d074b0e47c.png">
Can still be opened by the user if necessary:
<img width="651" alt="screen shot 2015-12-14 at 12 51 15 pm" src="https://cloud.githubusercontent.com/assets/7594753/11793612/7bf82968-a261-11e5-9cc3-e827a7a6b2b0.png">
Author: Timothy Hunter <timhunter@databricks.com>
Closes #10297 from thunterdb/12324.
|
|
|
|
| |
This reverts commit 31b391019ff6eb5a483f4b3e62fd082de7ff8416.
|
|
|
|
| |
This reverts commit 840bd2e008da5b22bfa73c587ea2c57666fffc60.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
https://issues.apache.org/jira/browse/SPARK-12315
`IsNotNull` filter is not being pushed down for JDBC datasource.
It looks it is SQL standard according to [SQL-92](http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt), SQL:1999, [SQL:2003](http://www.wiscorp.com/sql_2003_standard.zip) and [SQL:201x](http://www.wiscorp.com/sql20nn.zip) and I believe most databases support this.
In this PR, I simply added the case for `IsNotNull` filter to produce a proper filter string.
Author: hyukjinkwon <gurwls223@gmail.com>
This patch had conflicts when merged, resolved by
Committer: Reynold Xin <rxin@databricks.com>
Closes #10287 from HyukjinKwon/SPARK-12315.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
https://issues.apache.org/jira/browse/SPARK-12314
`IsNull` filter is not being pushed down for JDBC datasource.
It looks it is SQL standard according to [SQL-92](http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt), SQL:1999, [SQL:2003](http://www.wiscorp.com/sql_2003_standard.zip) and [SQL:201x](http://www.wiscorp.com/sql20nn.zip) and I believe most databases support this.
In this PR, I simply added the case for `IsNull` filter to produce a proper filter string.
Author: hyukjinkwon <gurwls223@gmail.com>
This patch had conflicts when merged, resolved by
Committer: Reynold Xin <rxin@databricks.com>
Closes #10286 from HyukjinKwon/SPARK-12314.
|
|
|
|
|
|
|
|
|
|
| |
https://issues.apache.org/jira/browse/SPARK-12249
Currently `!=` operator is not pushed down correctly.
I simply added a case for this.
Author: hyukjinkwon <gurwls223@gmail.com>
Closes #10233 from HyukjinKwon/SPARK-12249.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
…endly Receiver graphs
Currently, the Spark Streaming web UI uses the same maxY when displays 'Input Rate Times& Histograms' and 'Per-Receiver Times& Histograms'.
This may lead to somewhat un-friendly graphs: once we have tens of Receivers or more, every 'Per-Receiver Times' line almost hits the ground.
This issue proposes to calculate a new maxY against the original one, which is shared among all the `Per-Receiver Times& Histograms' graphs.
Before:
![before-5](https://cloud.githubusercontent.com/assets/15843379/11761362/d790c356-a0fa-11e5-860e-4b834603de1d.png)
After:
![after-5](https://cloud.githubusercontent.com/assets/15843379/11761361/cfabf692-a0fa-11e5-97d0-4ad124aaca2a.png)
Author: proflin <proflin.me@gmail.com>
Closes #10318 from proflin/SPARK-12304.
|
|
|
|
|
|
|
|
|
|
|
|
| |
Spark on Yarn handle AM being told command from RM
When RM throws ApplicationAttemptNotFoundException for allocate
invocation, making the ApplicationMaster to finish immediately without any
retries.
Author: Devaraj K <devaraj@apache.org>
Closes #10129 from devaraj-kavali/SPARK-4117.
|
|
|
|
|
|
| |
Author: Wenchen Fan <cloud0fan@outlook.com>
Closes #8645 from cloud-fan/test.
|
|
|
|
|
|
|
|
| |
This change builds the event history of completed apps asynchronously so the RPC thread will not be blocked and allow new workers to register/remove if the event log history is very large and takes a long time to rebuild.
Author: Bryan Cutler <bjcutler@us.ibm.com>
Closes #10284 from BryanCutler/async-MasterUI-SPARK-12062.
|
|
|
|
|
|
|
|
| |
ExternalBlockStore.scala
Author: Naveen <naveenminchu@gmail.com>
Closes #10313 from naveenminchu/branch-fix-SPARK-9886.
|
|
|
|
|
|
|
|
| |
Please help to review, thanks a lot.
Author: jerryshao <sshao@hortonworks.com>
Closes #10195 from jerryshao/SPARK-10123.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
AsyncRDDActions to support non-blocking operation
These changes rework the implementations of `SimpleFutureAction`, `ComplexFutureAction`, `JobWaiter`, and `AsyncRDDActions` such that asynchronous callbacks on the generated `Futures` NEVER block waiting for a job to complete. A small amount of mutex synchronization is necessary to protect the internal fields that manage cancellation, but these locks are only held very briefly and in practice should almost never cause any blocking to occur. The existing blocking APIs of these classes are retained, but they simply delegate to the underlying non-blocking API and `Await` the results with indefinite timeouts.
Associated JIRA ticket: https://issues.apache.org/jira/browse/SPARK-9026
Also fixes: https://issues.apache.org/jira/browse/SPARK-4514
This pull request contains all my own original work, which I release to the Spark project under its open source license.
Author: Richard W. Eggert II <richard.eggert@gmail.com>
Closes #9264 from reggert/fix-futureaction.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
https://issues.apache.org/jira/browse/SPARK-9516
- [x] new look of Thread Dump Page
- [x] click column title to sort
- [x] grep
- [x] search as you type
squito JoshRosen It's ready for the review now
Author: CodingCat <zhunansjtu@gmail.com>
Closes #7910 from CodingCat/SPARK-9516.
|
|
|
|
|
|
|
|
|
|
| |
cluster mode.
Adding more documentation about submitting jobs with mesos cluster mode.
Author: Timothy Chen <tnachen@gmail.com>
Closes #10086 from tnachen/mesos_supervise_docs.
|
|
|
|
|
|
|
|
|
|
| |
ExternalShuffleBlockResolver
Replace shuffleManagerClassName with shortShuffleMgrName is to reduce time of string's comparison. and put sort's comparison on the front. cc JoshRosen andrewor14
Author: Lianhui Wang <lianhuiwang09@gmail.com>
Closes #10131 from lianhuiwang/spark-12130.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
setConf
This is continuation of SPARK-12056 where change is applied to SqlNewHadoopRDD.scala
andrewor14
FYI
Author: tedyu <yuzhihong@gmail.com>
Closes #10164 from tedyu/master.
|
| |
|
|
|
|
|
|
| |
Author: Jean-Baptiste Onofré <jbonofre@apache.org>
Closes #10130 from jbonofre/SPARK-12105.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
pushed down
https://issues.apache.org/jira/browse/SPARK-12236
Currently JDBC filters are not tested properly. All the tests pass even if the filters are not pushed down due to Spark-side filtering.
In this PR,
Firstly, I corrected the tests to properly check the pushed down filters by removing Spark-side filtering.
Also, `!=` was being tested which is actually not pushed down. So I removed them.
Lastly, I moved the `stripSparkFilter()` function to `SQLTestUtils` as this functions would be shared for all tests for pushed down filters. This function would be also shared with ORC datasource as the filters for that are also not being tested properly.
Author: hyukjinkwon <gurwls223@gmail.com>
Closes #10221 from HyukjinKwon/SPARK-12236.
|
|
|
|
|
|
|
|
| |
schemas.
Author: Nong Li <nong@databricks.com>
Closes #10260 from nongli/spark-11271.
|
|
|
|
|
|
|
|
|
|
| |
Rename ```weights``` to ```coefficients``` for examples/DeveloperApiExample.
cc mengxr jkbradley
Author: Yanbo Liang <ybliang8@gmail.com>
Closes #10280 from yanboliang/spark-coefficients.
|