| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
| |
Hide the error logs for 'SQLListenerMemoryLeakSuite' to avoid noises. Most of changes are space changes.
Author: Shixiong Zhu <shixiong@databricks.com>
Closes #10363 from zsxwing/hide-log.
|
|
|
|
|
|
|
|
|
|
| |
recovering from checkpoint data
Add a transient flag `DStream.restoredFromCheckpointData` to control the restore processing in DStream to avoid duplicate works: check this flag first in `DStream.restoreCheckpointData`, only when `false`, the restore process will be executed.
Author: jhu-chang <gt.hu.chang@gmail.com>
Closes #9765 from jhu-chang/SPARK-11749.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This PR removes Hive windows functions from Spark and replaces them with (native) Spark ones. The PR is on par with Hive in terms of features.
This has the following advantages:
* Better memory management.
* The ability to use spark UDAFs in Window functions.
cc rxin / yhuai
Author: Herman van Hovell <hvanhovell@questtec.nl>
Closes #9819 from hvanhovell/SPARK-8641-2.
|
|
|
|
|
|
|
|
|
|
| |
assertOrderInvariantEquals method
org.apache.spark.streaming.Java8APISuite.java is failing due to trying to sort immutable list in assertOrderInvariantEquals method.
Author: Evan Chen <chene@us.ibm.com>
Closes #10336 from evanyc15/SPARK-12376-StreamingJavaAPISuite.
|
|
|
|
|
|
|
|
|
|
| |
found
Point users to spark-packages.org to find them.
Author: Reynold Xin <rxin@databricks.com>
Closes #10351 from rxin/SPARK-12397.
|
|
|
|
|
|
|
|
| |
String.split accepts a regular expression, so we should escape "." and "|".
Author: Shixiong Zhu <shixiong@databricks.com>
Closes #10361 from zsxwing/reg-bug.
|
|
|
|
|
|
|
|
| |
Fix problem with #10332, this one should fix Cluster mode on Mesos
Author: Iulian Dragos <jaguarul@gmail.com>
Closes #10359 from dragos/issue/fix-spark-12345-one-more-time.
|
|
|
|
|
|
|
|
|
|
| |
characters
This PR encodes and decodes the file name to fix the issue.
Author: Shixiong Zhu <shixiong@databricks.com>
Closes #10208 from zsxwing/uri.
|
|
|
|
|
|
|
|
| |
Since we rename the column name from ```text``` to ```value``` for DataFrame load by ```SQLContext.read.text```, we need to update doc.
Author: Yanbo Liang <ybliang8@gmail.com>
Closes #10349 from yanboliang/text-value.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For API DataFrame.join(right, usingColumns, joinType), if the joinType is right_outer or full_outer, the resulting join columns could be wrong (will be null).
The order of columns had been changed to match that with MySQL and PostgreSQL [1].
This PR also fix the nullability of output for outer join.
[1] http://www.postgresql.org/docs/9.2/static/queries-table-expressions.html
Author: Davies Liu <davies@databricks.com>
Closes #10353 from davies/fix_join.
|
|
|
|
| |
This reverts commit 5a514b61bbfb609c505d8d65f2483068a56f1f70.
|
|
|
|
|
|
|
|
| |
This commit is to resolve SPARK-12396.
Author: echo2mei <534384876@qq.com>
Closes #10354 from echoTomei/master.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This PR makes JSON parser and schema inference handle more cases where we have unparsed records. It is based on #10043. The last commit fixes the failed test and updates the logic of schema inference.
Regarding the schema inference change, if we have something like
```
{"f1":1}
[1,2,3]
```
originally, we will get a DF without any column.
After this change, we will get a DF with columns `f1` and `_corrupt_record`. Basically, for the second row, `[1,2,3]` will be the value of `_corrupt_record`.
When merge this PR, please make sure that the author is simplyianm.
JIRA: https://issues.apache.org/jira/browse/SPARK-12057
Closes #10043
Author: Ian Macalinao <me@ian.pw>
Author: Yin Huai <yhuai@databricks.com>
Closes #10288 from yhuai/handleCorruptJson.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
when invFunc is None
when invFunc is None, `reduceByKeyAndWindow(func, None, winsize, slidesize)` is equivalent to
reduceByKey(func).window(winsize, slidesize).reduceByKey(winsize, slidesize)
and no checkpoint is necessary. The corresponding Scala code does exactly that, but Python code always creates a windowed stream with obligatory checkpointing. The patch fixes this.
I do not know how to unit-test this.
Author: David Tolpin <david.tolpin@gmail.com>
Closes #9888 from dtolpin/master.
|
|
|
|
|
|
|
|
| |
No change in functionality is intended. This only changes internal API.
Author: Andrew Or <andrew@databricks.com>
Closes #10343 from andrewor14/clean-bm-serializer.
|
|
|
|
|
|
| |
Author: Marcelo Vanzin <vanzin@cloudera.com>
Closes #10339 from vanzin/SPARK-12386.
|
|
|
|
|
|
|
|
| |
string when redirecting.
Author: Rohit Agarwal <rohita@qubole.com>
Closes #10180 from mindprince/SPARK-12186.
|
|
|
|
|
|
|
|
|
|
|
|
| |
Runtime.getRuntime.addShutdownHook() is called
SPARK-9886 fixed ExternalBlockStore.scala
This PR fixes the remaining references to Runtime.getRuntime.addShutdownHook()
Author: tedyu <yuzhihong@gmail.com>
Closes #10325 from ted-yu/master.
|
|
|
|
|
|
|
|
|
|
| |
`DAGSchedulerEventLoop` normally only logs errors (so it can continue to process more events, from other jobs). However, this is not desirable in the tests -- the tests should be able to easily detect any exception, and also shouldn't silently succeed if there is an exception.
This was suggested by mateiz on https://github.com/apache/spark/pull/7699. It may have already turned up an issue in "zero split job".
Author: Imran Rashid <irashid@cloudera.com>
Closes #8466 from squito/SPARK-10248.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit exists to close the following pull requests on Github:
Closes #1217 (requested by ankurdave, srowen)
Closes #4650 (requested by andrewor14)
Closes #5307 (requested by vanzin)
Closes #5664 (requested by andrewor14)
Closes #5713 (requested by marmbrus)
Closes #5722 (requested by andrewor14)
Closes #6685 (requested by srowen)
Closes #7074 (requested by srowen)
Closes #7119 (requested by andrewor14)
Closes #7997 (requested by jkbradley)
Closes #8292 (requested by srowen)
Closes #8975 (requested by andrewor14, vanzin)
Closes #8980 (requested by andrewor14, davies)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
```
Exception in thread "main" org.apache.spark.rpc.RpcTimeoutException:
Cannot receive any reply in ${timeout.duration}. This timeout is controlled by spark.rpc.askTimeout
at org.apache.spark.rpc.RpcTimeout.org$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:48)
at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:63)
at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:33)
```
Author: Andrew Or <andrew@databricks.com>
Closes #10334 from andrewor14/rpc-typo.
|
|
|
|
|
|
|
|
| |
MLlib should use SQLContext.getOrCreate() instead of creating new SQLContext.
Author: Davies Liu <davies@databricks.com>
Closes #10338 from davies/create_context.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Extend CrossValidator with HasSeed in PySpark.
This PR replaces [https://github.com/apache/spark/pull/7997]
CC: yanboliang thunterdb mmenestret Would one of you mind taking a look? Thanks!
Author: Joseph K. Bradley <joseph@databricks.com>
Author: Martin MENESTRET <mmenestret@ippon.fr>
Closes #10268 from jkbradley/pyspark-cv-seed.
|
|
|
|
|
|
|
|
|
|
|
| |
pushed down.
Currently ORC filters are not tested properly. All the tests pass even if the filters are not pushed down or disabled. In this PR, I add some logics for this.
Since ORC does not filter record by record fully, this checks the count of the result and if it contains the expected values.
Author: hyukjinkwon <gurwls223@gmail.com>
Closes #9687 from HyukjinKwon/SPARK-11677.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Based on the suggestions from marmbrus cloud-fan in https://github.com/apache/spark/pull/10165 , this PR is to print the decoded values(user objects) in `Dataset.show`
```scala
implicit val kryoEncoder = Encoders.kryo[KryoClassData]
val ds = Seq(KryoClassData("a", 1), KryoClassData("b", 2), KryoClassData("c", 3)).toDS()
ds.show(20, false);
```
The current output is like
```
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|value |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|[1, 0, 111, 114, 103, 46, 97, 112, 97, 99, 104, 101, 46, 115, 112, 97, 114, 107, 46, 115, 113, 108, 46, 75, 114, 121, 111, 67, 108, 97, 115, 115, 68, 97, 116, -31, 1, 1, -126, 97, 2]|
|[1, 0, 111, 114, 103, 46, 97, 112, 97, 99, 104, 101, 46, 115, 112, 97, 114, 107, 46, 115, 113, 108, 46, 75, 114, 121, 111, 67, 108, 97, 115, 115, 68, 97, 116, -31, 1, 1, -126, 98, 4]|
|[1, 0, 111, 114, 103, 46, 97, 112, 97, 99, 104, 101, 46, 115, 112, 97, 114, 107, 46, 115, 113, 108, 46, 75, 114, 121, 111, 67, 108, 97, 115, 115, 68, 97, 116, -31, 1, 1, -126, 99, 6]|
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
```
After the fix, it will be like the below if and only if the users override the `toString` function in the class `KryoClassData`
```scala
override def toString: String = s"KryoClassData($a, $b)"
```
```
+-------------------+
|value |
+-------------------+
|KryoClassData(a, 1)|
|KryoClassData(b, 2)|
|KryoClassData(c, 3)|
+-------------------+
```
If users do not override the `toString` function, the results will be like
```
+---------------------------------------+
|value |
+---------------------------------------+
|org.apache.spark.sql.KryoClassData68ef|
|org.apache.spark.sql.KryoClassData6915|
|org.apache.spark.sql.KryoClassData693b|
+---------------------------------------+
```
Question: Should we add another optional parameter in the function `show`? It will decide if the function `show` will display the hex values or the object values?
Author: gatorsmile <gatorsmile@gmail.com>
Closes #10215 from gatorsmile/showDecodedValue.
|
|
|
|
|
|
|
|
| |
for Tuple encoder
Author: Wenchen Fan <wenchen@databricks.com>
Closes #10293 from cloud-fan/err-msg.
|
|
|
|
|
|
|
|
|
|
| |
We have DataFrame example for SparkR, we also need to add ML example under ```examples/src/main/r```.
cc mengxr jkbradley shivaram
Author: Yanbo Liang <ybliang8@gmail.com>
Closes #10324 from yanboliang/spark-12364.
|
|
|
|
|
|
|
|
|
|
| |
No known breaking changes, but some deprecations and changes of behavior.
CC: mengxr
Author: Joseph K. Bradley <joseph@databricks.com>
Closes #10235 from jkbradley/mllib-guide-update-1.6.
|
|
|
|
|
|
|
|
|
|
| |
tests
Although this patch still doesn't solve the issue why the return code is 0 (see JIRA description), it resolves the issue of python version mismatch.
Author: Jeff Zhang <zjffdu@apache.org>
Closes #10322 from zjffdu/SPARK-12361.
|
|
|
|
|
|
|
|
|
|
|
|
| |
test suites
Use ```sqlContext``` from ```MLlibTestSparkContext``` rather than creating new one for spark.ml test suites. I have checked thoroughly and found there are four test cases need to update.
cc mengxr jkbradley
Author: Yanbo Liang <ybliang8@gmail.com>
Closes #10279 from yanboliang/spark-12309.
|
|
|
|
|
|
|
|
| |
Add random seed Param to Scala CrossValidator
Author: Yanbo Liang <ybliang8@gmail.com>
Closes #9108 from yanboliang/spark-9694.
|
|
|
|
|
|
|
|
|
|
|
| |
bisecting k-means
This PR includes only an example code in order to finish it quickly.
I'll send another PR for the docs soon.
Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com>
Closes #9952 from yu-iskw/SPARK-6518.
|
|
|
|
|
|
|
|
|
|
|
|
| |
cluster mode.
SPARK_HOME is now causing problem with Mesos cluster mode since spark-submit script has been changed recently to take precendence when running spark-class scripts to look in SPARK_HOME if it's defined.
We should skip passing SPARK_HOME from the Spark client in cluster mode with Mesos, since Mesos shouldn't use this configuration but should use spark.executor.home instead.
Author: Timothy Chen <tnachen@gmail.com>
Closes #10332 from tnachen/scheduler_ui.
|
|
|
|
|
|
|
|
| |
cc jkbradley
Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com>
Closes #10244 from yu-iskw/SPARK-12215.
|
|
|
|
|
|
|
|
| |
Add ```write.json``` and ```write.parquet``` for SparkR, and deprecated ```saveAsParquetFile```.
Author: Yanbo Liang <ybliang8@gmail.com>
Closes #10281 from yanboliang/spark-12310.
|
|
|
|
|
|
|
|
| |
shivaram Please help review.
Author: Jeff Zhang <zjffdu@apache.org>
Closes #10290 from zjffdu/SPARK-12318.
|
|
|
|
|
|
|
|
| |
cc rxin
Author: Davies Liu <davies@databricks.com>
Closes #10316 from davies/remove_generate_projection.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This fixes the sidebar, using a pure CSS mechanism to hide it when the browser's viewport is too narrow.
Credit goes to the original author Titan-C (mentioned in the NOTICE).
Note that I am not a CSS expert, so I can only address comments up to some extent.
Default view:
<img width="936" alt="screen shot 2015-12-14 at 12 46 39 pm" src="https://cloud.githubusercontent.com/assets/7594753/11793597/6d1d6eda-a261-11e5-836b-6eb2054e9054.png">
When collapsed manually by the user:
<img width="1004" alt="screen shot 2015-12-14 at 12 54 02 pm" src="https://cloud.githubusercontent.com/assets/7594753/11793669/c991989e-a261-11e5-8bf6-aecf3bdb6319.png">
Disappears when column is too narrow:
<img width="697" alt="screen shot 2015-12-14 at 12 47 22 pm" src="https://cloud.githubusercontent.com/assets/7594753/11793607/7754dbcc-a261-11e5-8b15-e0d074b0e47c.png">
Can still be opened by the user if necessary:
<img width="651" alt="screen shot 2015-12-14 at 12 51 15 pm" src="https://cloud.githubusercontent.com/assets/7594753/11793612/7bf82968-a261-11e5-9cc3-e827a7a6b2b0.png">
Author: Timothy Hunter <timhunter@databricks.com>
Closes #10297 from thunterdb/12324.
|
|
|
|
| |
This reverts commit 31b391019ff6eb5a483f4b3e62fd082de7ff8416.
|
|
|
|
| |
This reverts commit 840bd2e008da5b22bfa73c587ea2c57666fffc60.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
https://issues.apache.org/jira/browse/SPARK-12315
`IsNotNull` filter is not being pushed down for JDBC datasource.
It looks it is SQL standard according to [SQL-92](http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt), SQL:1999, [SQL:2003](http://www.wiscorp.com/sql_2003_standard.zip) and [SQL:201x](http://www.wiscorp.com/sql20nn.zip) and I believe most databases support this.
In this PR, I simply added the case for `IsNotNull` filter to produce a proper filter string.
Author: hyukjinkwon <gurwls223@gmail.com>
This patch had conflicts when merged, resolved by
Committer: Reynold Xin <rxin@databricks.com>
Closes #10287 from HyukjinKwon/SPARK-12315.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
https://issues.apache.org/jira/browse/SPARK-12314
`IsNull` filter is not being pushed down for JDBC datasource.
It looks it is SQL standard according to [SQL-92](http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt), SQL:1999, [SQL:2003](http://www.wiscorp.com/sql_2003_standard.zip) and [SQL:201x](http://www.wiscorp.com/sql20nn.zip) and I believe most databases support this.
In this PR, I simply added the case for `IsNull` filter to produce a proper filter string.
Author: hyukjinkwon <gurwls223@gmail.com>
This patch had conflicts when merged, resolved by
Committer: Reynold Xin <rxin@databricks.com>
Closes #10286 from HyukjinKwon/SPARK-12314.
|
|
|
|
|
|
|
|
|
|
| |
https://issues.apache.org/jira/browse/SPARK-12249
Currently `!=` operator is not pushed down correctly.
I simply added a case for this.
Author: hyukjinkwon <gurwls223@gmail.com>
Closes #10233 from HyukjinKwon/SPARK-12249.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
…endly Receiver graphs
Currently, the Spark Streaming web UI uses the same maxY when displays 'Input Rate Times& Histograms' and 'Per-Receiver Times& Histograms'.
This may lead to somewhat un-friendly graphs: once we have tens of Receivers or more, every 'Per-Receiver Times' line almost hits the ground.
This issue proposes to calculate a new maxY against the original one, which is shared among all the `Per-Receiver Times& Histograms' graphs.
Before:
![before-5](https://cloud.githubusercontent.com/assets/15843379/11761362/d790c356-a0fa-11e5-860e-4b834603de1d.png)
After:
![after-5](https://cloud.githubusercontent.com/assets/15843379/11761361/cfabf692-a0fa-11e5-97d0-4ad124aaca2a.png)
Author: proflin <proflin.me@gmail.com>
Closes #10318 from proflin/SPARK-12304.
|
|
|
|
|
|
|
|
|
|
|
|
| |
Spark on Yarn handle AM being told command from RM
When RM throws ApplicationAttemptNotFoundException for allocate
invocation, making the ApplicationMaster to finish immediately without any
retries.
Author: Devaraj K <devaraj@apache.org>
Closes #10129 from devaraj-kavali/SPARK-4117.
|
|
|
|
|
|
| |
Author: Wenchen Fan <cloud0fan@outlook.com>
Closes #8645 from cloud-fan/test.
|
|
|
|
|
|
|
|
| |
This change builds the event history of completed apps asynchronously so the RPC thread will not be blocked and allow new workers to register/remove if the event log history is very large and takes a long time to rebuild.
Author: Bryan Cutler <bjcutler@us.ibm.com>
Closes #10284 from BryanCutler/async-MasterUI-SPARK-12062.
|
|
|
|
|
|
|
|
| |
ExternalBlockStore.scala
Author: Naveen <naveenminchu@gmail.com>
Closes #10313 from naveenminchu/branch-fix-SPARK-9886.
|
|
|
|
|
|
|
|
| |
Please help to review, thanks a lot.
Author: jerryshao <sshao@hortonworks.com>
Closes #10195 from jerryshao/SPARK-10123.
|