| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
| |
No known breaking changes, but some deprecations and changes of behavior.
CC: mengxr
Author: Joseph K. Bradley <joseph@databricks.com>
Closes #10235 from jkbradley/mllib-guide-update-1.6.
|
|
|
|
|
|
|
|
|
|
| |
tests
Although this patch still doesn't solve the issue why the return code is 0 (see JIRA description), it resolves the issue of python version mismatch.
Author: Jeff Zhang <zjffdu@apache.org>
Closes #10322 from zjffdu/SPARK-12361.
|
|
|
|
|
|
|
|
|
|
|
|
| |
test suites
Use ```sqlContext``` from ```MLlibTestSparkContext``` rather than creating new one for spark.ml test suites. I have checked thoroughly and found there are four test cases need to update.
cc mengxr jkbradley
Author: Yanbo Liang <ybliang8@gmail.com>
Closes #10279 from yanboliang/spark-12309.
|
|
|
|
|
|
|
|
| |
Add random seed Param to Scala CrossValidator
Author: Yanbo Liang <ybliang8@gmail.com>
Closes #9108 from yanboliang/spark-9694.
|
|
|
|
|
|
|
|
|
|
|
| |
bisecting k-means
This PR includes only an example code in order to finish it quickly.
I'll send another PR for the docs soon.
Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com>
Closes #9952 from yu-iskw/SPARK-6518.
|
|
|
|
|
|
|
|
|
|
|
|
| |
cluster mode.
SPARK_HOME is now causing problem with Mesos cluster mode since spark-submit script has been changed recently to take precendence when running spark-class scripts to look in SPARK_HOME if it's defined.
We should skip passing SPARK_HOME from the Spark client in cluster mode with Mesos, since Mesos shouldn't use this configuration but should use spark.executor.home instead.
Author: Timothy Chen <tnachen@gmail.com>
Closes #10332 from tnachen/scheduler_ui.
|
|
|
|
|
|
|
|
| |
cc jkbradley
Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com>
Closes #10244 from yu-iskw/SPARK-12215.
|
|
|
|
|
|
|
|
| |
Add ```write.json``` and ```write.parquet``` for SparkR, and deprecated ```saveAsParquetFile```.
Author: Yanbo Liang <ybliang8@gmail.com>
Closes #10281 from yanboliang/spark-12310.
|
|
|
|
|
|
|
|
| |
shivaram Please help review.
Author: Jeff Zhang <zjffdu@apache.org>
Closes #10290 from zjffdu/SPARK-12318.
|
|
|
|
|
|
|
|
| |
cc rxin
Author: Davies Liu <davies@databricks.com>
Closes #10316 from davies/remove_generate_projection.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This fixes the sidebar, using a pure CSS mechanism to hide it when the browser's viewport is too narrow.
Credit goes to the original author Titan-C (mentioned in the NOTICE).
Note that I am not a CSS expert, so I can only address comments up to some extent.
Default view:
<img width="936" alt="screen shot 2015-12-14 at 12 46 39 pm" src="https://cloud.githubusercontent.com/assets/7594753/11793597/6d1d6eda-a261-11e5-836b-6eb2054e9054.png">
When collapsed manually by the user:
<img width="1004" alt="screen shot 2015-12-14 at 12 54 02 pm" src="https://cloud.githubusercontent.com/assets/7594753/11793669/c991989e-a261-11e5-8bf6-aecf3bdb6319.png">
Disappears when column is too narrow:
<img width="697" alt="screen shot 2015-12-14 at 12 47 22 pm" src="https://cloud.githubusercontent.com/assets/7594753/11793607/7754dbcc-a261-11e5-8b15-e0d074b0e47c.png">
Can still be opened by the user if necessary:
<img width="651" alt="screen shot 2015-12-14 at 12 51 15 pm" src="https://cloud.githubusercontent.com/assets/7594753/11793612/7bf82968-a261-11e5-9cc3-e827a7a6b2b0.png">
Author: Timothy Hunter <timhunter@databricks.com>
Closes #10297 from thunterdb/12324.
|
|
|
|
| |
This reverts commit 31b391019ff6eb5a483f4b3e62fd082de7ff8416.
|
|
|
|
| |
This reverts commit 840bd2e008da5b22bfa73c587ea2c57666fffc60.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
https://issues.apache.org/jira/browse/SPARK-12315
`IsNotNull` filter is not being pushed down for JDBC datasource.
It looks it is SQL standard according to [SQL-92](http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt), SQL:1999, [SQL:2003](http://www.wiscorp.com/sql_2003_standard.zip) and [SQL:201x](http://www.wiscorp.com/sql20nn.zip) and I believe most databases support this.
In this PR, I simply added the case for `IsNotNull` filter to produce a proper filter string.
Author: hyukjinkwon <gurwls223@gmail.com>
This patch had conflicts when merged, resolved by
Committer: Reynold Xin <rxin@databricks.com>
Closes #10287 from HyukjinKwon/SPARK-12315.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
https://issues.apache.org/jira/browse/SPARK-12314
`IsNull` filter is not being pushed down for JDBC datasource.
It looks it is SQL standard according to [SQL-92](http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt), SQL:1999, [SQL:2003](http://www.wiscorp.com/sql_2003_standard.zip) and [SQL:201x](http://www.wiscorp.com/sql20nn.zip) and I believe most databases support this.
In this PR, I simply added the case for `IsNull` filter to produce a proper filter string.
Author: hyukjinkwon <gurwls223@gmail.com>
This patch had conflicts when merged, resolved by
Committer: Reynold Xin <rxin@databricks.com>
Closes #10286 from HyukjinKwon/SPARK-12314.
|
|
|
|
|
|
|
|
|
|
| |
https://issues.apache.org/jira/browse/SPARK-12249
Currently `!=` operator is not pushed down correctly.
I simply added a case for this.
Author: hyukjinkwon <gurwls223@gmail.com>
Closes #10233 from HyukjinKwon/SPARK-12249.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
…endly Receiver graphs
Currently, the Spark Streaming web UI uses the same maxY when displays 'Input Rate Times& Histograms' and 'Per-Receiver Times& Histograms'.
This may lead to somewhat un-friendly graphs: once we have tens of Receivers or more, every 'Per-Receiver Times' line almost hits the ground.
This issue proposes to calculate a new maxY against the original one, which is shared among all the `Per-Receiver Times& Histograms' graphs.
Before:
![before-5](https://cloud.githubusercontent.com/assets/15843379/11761362/d790c356-a0fa-11e5-860e-4b834603de1d.png)
After:
![after-5](https://cloud.githubusercontent.com/assets/15843379/11761361/cfabf692-a0fa-11e5-97d0-4ad124aaca2a.png)
Author: proflin <proflin.me@gmail.com>
Closes #10318 from proflin/SPARK-12304.
|
|
|
|
|
|
|
|
|
|
|
|
| |
Spark on Yarn handle AM being told command from RM
When RM throws ApplicationAttemptNotFoundException for allocate
invocation, making the ApplicationMaster to finish immediately without any
retries.
Author: Devaraj K <devaraj@apache.org>
Closes #10129 from devaraj-kavali/SPARK-4117.
|
|
|
|
|
|
| |
Author: Wenchen Fan <cloud0fan@outlook.com>
Closes #8645 from cloud-fan/test.
|
|
|
|
|
|
|
|
| |
This change builds the event history of completed apps asynchronously so the RPC thread will not be blocked and allow new workers to register/remove if the event log history is very large and takes a long time to rebuild.
Author: Bryan Cutler <bjcutler@us.ibm.com>
Closes #10284 from BryanCutler/async-MasterUI-SPARK-12062.
|
|
|
|
|
|
|
|
| |
ExternalBlockStore.scala
Author: Naveen <naveenminchu@gmail.com>
Closes #10313 from naveenminchu/branch-fix-SPARK-9886.
|
|
|
|
|
|
|
|
| |
Please help to review, thanks a lot.
Author: jerryshao <sshao@hortonworks.com>
Closes #10195 from jerryshao/SPARK-10123.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
AsyncRDDActions to support non-blocking operation
These changes rework the implementations of `SimpleFutureAction`, `ComplexFutureAction`, `JobWaiter`, and `AsyncRDDActions` such that asynchronous callbacks on the generated `Futures` NEVER block waiting for a job to complete. A small amount of mutex synchronization is necessary to protect the internal fields that manage cancellation, but these locks are only held very briefly and in practice should almost never cause any blocking to occur. The existing blocking APIs of these classes are retained, but they simply delegate to the underlying non-blocking API and `Await` the results with indefinite timeouts.
Associated JIRA ticket: https://issues.apache.org/jira/browse/SPARK-9026
Also fixes: https://issues.apache.org/jira/browse/SPARK-4514
This pull request contains all my own original work, which I release to the Spark project under its open source license.
Author: Richard W. Eggert II <richard.eggert@gmail.com>
Closes #9264 from reggert/fix-futureaction.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
https://issues.apache.org/jira/browse/SPARK-9516
- [x] new look of Thread Dump Page
- [x] click column title to sort
- [x] grep
- [x] search as you type
squito JoshRosen It's ready for the review now
Author: CodingCat <zhunansjtu@gmail.com>
Closes #7910 from CodingCat/SPARK-9516.
|
|
|
|
|
|
|
|
|
|
| |
cluster mode.
Adding more documentation about submitting jobs with mesos cluster mode.
Author: Timothy Chen <tnachen@gmail.com>
Closes #10086 from tnachen/mesos_supervise_docs.
|
|
|
|
|
|
|
|
|
|
| |
ExternalShuffleBlockResolver
Replace shuffleManagerClassName with shortShuffleMgrName is to reduce time of string's comparison. and put sort's comparison on the front. cc JoshRosen andrewor14
Author: Lianhui Wang <lianhuiwang09@gmail.com>
Closes #10131 from lianhuiwang/spark-12130.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
setConf
This is continuation of SPARK-12056 where change is applied to SqlNewHadoopRDD.scala
andrewor14
FYI
Author: tedyu <yuzhihong@gmail.com>
Closes #10164 from tedyu/master.
|
| |
|
|
|
|
|
|
| |
Author: Jean-Baptiste Onofré <jbonofre@apache.org>
Closes #10130 from jbonofre/SPARK-12105.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
pushed down
https://issues.apache.org/jira/browse/SPARK-12236
Currently JDBC filters are not tested properly. All the tests pass even if the filters are not pushed down due to Spark-side filtering.
In this PR,
Firstly, I corrected the tests to properly check the pushed down filters by removing Spark-side filtering.
Also, `!=` was being tested which is actually not pushed down. So I removed them.
Lastly, I moved the `stripSparkFilter()` function to `SQLTestUtils` as this functions would be shared for all tests for pushed down filters. This function would be also shared with ORC datasource as the filters for that are also not being tested properly.
Author: hyukjinkwon <gurwls223@gmail.com>
Closes #10221 from HyukjinKwon/SPARK-12236.
|
|
|
|
|
|
|
|
| |
schemas.
Author: Nong Li <nong@databricks.com>
Closes #10260 from nongli/spark-11271.
|
|
|
|
|
|
|
|
|
|
| |
Rename ```weights``` to ```coefficients``` for examples/DeveloperApiExample.
cc mengxr jkbradley
Author: Yanbo Liang <ybliang8@gmail.com>
Closes #10280 from yanboliang/spark-coefficients.
|
|
|
|
|
|
|
|
| |
cc\ tdas zsxwing , please review. Thanks a lot.
Author: jerryshao <sshao@hortonworks.com>
Closes #10305 from jerryshao/fix-typo-state-impl.
|
|
|
|
|
|
|
|
| |
Fix a minor typo (unbalanced bracket) in ResetSystemProperties.
Author: Holden Karau <holden@us.ibm.com>
Closes #10303 from holdenk/SPARK-12332-trivial-typo-in-ResetSystemProperties-comment.
|
|
|
|
|
|
|
|
|
|
| |
Support UnsafeRow for the Coalesce/Except/Intersect.
Could you review if my code changes are ok? davies Thank you!
Author: gatorsmile <gatorsmile@gmail.com>
Closes #10285 from gatorsmile/unsafeSupportCIE.
|
|
|
|
|
|
|
|
|
|
| |
Dataset APIs
marmbrus This PR is to address your comment. Thanks for your review!
Author: gatorsmile <gatorsmile@gmail.com>
Closes #10214 from gatorsmile/followup12188.
|
|
|
|
|
|
|
|
| |
I think it was a mistake, and we have not catched it so far until https://github.com/apache/spark/pull/10260 which begin to check if the `fromRowExpression` is resolved.
Author: Wenchen Fan <wenchen@databricks.com>
Closes #10263 from cloud-fan/encoder.
|
|
|
|
|
|
|
|
| |
cc yhuai felixcheung shaneknapp
Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
Closes #10300 from shivaram/comment-lintr-disable.
|
|
|
|
|
|
|
|
|
|
| |
JIRA: https://issues.apache.org/jira/browse/SPARK-12016
We should not directly use Word2VecModel in pyspark. We need to wrap it in a Word2VecModelWrapper when loading it in pyspark.
Author: Liang-Chi Hsieh <viirya@appier.com>
Closes #10100 from viirya/fix-load-py-wordvecmodel.
|
|
|
|
|
|
|
|
| |
Follow-up of [SPARK-12199](https://issues.apache.org/jira/browse/SPARK-12199) and #10193 where a broken link has been left as is.
Author: BenFradet <benjamin.fradet@gmail.com>
Closes #10282 from BenFradet/SPARK-12199.
|
|
|
|
|
|
|
|
|
|
| |
When SparkStrategies.BasicOperators's "case BroadcastHint(child) => apply(child)" is hit, it only recursively invokes BasicOperators.apply with this "child". It makes many strategies have no change to process this plan, which probably leads to "No plan" issue, so we use planLater to go through all strategies.
https://issues.apache.org/jira/browse/SPARK-12275
Author: yucai <yucai.yu@intel.com>
Closes #10265 from yucai/broadcast_hint.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently, we could generate different plans for query with single distinct (depends on spark.sql.specializeSingleDistinctAggPlanning), one works better on low cardinality columns, the other
works better for high cardinality column (default one).
This PR change to generate a single plan (three aggregations and two exchanges), which work better in both cases, then we could safely remove the flag `spark.sql.specializeSingleDistinctAggPlanning` (introduced in 1.6).
For a query like `SELECT COUNT(DISTINCT a) FROM table` will be
```
AGG-4 (count distinct)
Shuffle to a single reducer
Partial-AGG-3 (count distinct, no grouping)
Partial-AGG-2 (grouping on a)
Shuffle by a
Partial-AGG-1 (grouping on a)
```
This PR also includes large refactor for aggregation (reduce 500+ lines of code)
cc yhuai nongli marmbrus
Author: Davies Liu <davies@databricks.com>
Closes #10228 from davies/single_distinct.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
shutdown hook
1. Make sure workers and masters exit so that no worker or master will still be running when triggering the shutdown hook.
2. Set ExecutorState to FAILED if it's still RUNNING when executing the shutdown hook.
This should fix the potential exceptions when exiting a local cluster
```
java.lang.AssertionError: assertion failed: executor 4 state transfer from RUNNING to RUNNING is illegal
at scala.Predef$.assert(Predef.scala:179)
at org.apache.spark.deploy.master.Master$$anonfun$receive$1.applyOrElse(Master.scala:260)
at org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:116)
at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:204)
at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100)
at org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:215)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
java.lang.IllegalStateException: Shutdown hooks cannot be modified during shutdown.
at org.apache.spark.util.SparkShutdownHookManager.add(ShutdownHookManager.scala:246)
at org.apache.spark.util.ShutdownHookManager$.addShutdownHook(ShutdownHookManager.scala:191)
at org.apache.spark.util.ShutdownHookManager$.addShutdownHook(ShutdownHookManager.scala:180)
at org.apache.spark.deploy.worker.ExecutorRunner.start(ExecutorRunner.scala:73)
at org.apache.spark.deploy.worker.Worker$$anonfun$receive$1.applyOrElse(Worker.scala:474)
at org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:116)
at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:204)
at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100)
at org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:215)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
```
Author: Shixiong Zhu <shixiong@databricks.com>
Closes #10269 from zsxwing/executor-state.
|
|
|
|
|
|
|
|
| |
disconnetion message
Author: Shixiong Zhu <shixiong@databricks.com>
Closes #10261 from zsxwing/SPARK-12267.
|
|
|
|
|
|
|
|
|
|
|
|
| |
https://issues.apache.org/jira/browse/SPARK-12199
Follow-up PR of SPARK-11551. Fix some errors in ml-features.md
mengxr
Author: Xusen Yin <yinxusen@gmail.com>
Closes #10193 from yinxusen/SPARK-12199.
|
|
|
|
|
|
|
|
| |
order to avoid ClassCastException due to KryoSerializer in KinesisReceiver
Author: Jean-Baptiste Onofré <jbonofre@apache.org>
Closes #10203 from jbonofre/SPARK-11193.
|
|
|
|
|
|
|
|
|
|
|
| |
The existing sample functions miss the parameter `seed`, however, the corresponding function interface in `generics` has such a parameter. Thus, although the function caller can call the function with the 'seed', we are not using the value.
This could cause SparkR unit tests failed. For example, I hit it in another PR:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47213/consoleFull
Author: gatorsmile <gatorsmile@gmail.com>
Closes #10160 from gatorsmile/sampleR.
|
|
|
|
|
|
|
|
| |
Modifies the String overload to call the Column overload and ensures this is called in a test.
Author: Ankur Dave <ankurdave@gmail.com>
Closes #10271 from ankurdave/SPARK-12298.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
dataframe_example.py
Since ```Dataset``` has a new meaning in Spark 1.6, we should rename it to avoid confusion.
#9873 finished the work of Scala example, here we focus on the Python one.
Move dataset_example.py to ```examples/ml``` and rename to ```dataframe_example.py```.
BTW, fix minor missing issues of #9873.
cc mengxr
Author: Yanbo Liang <ybliang8@gmail.com>
Closes #9957 from yanboliang/SPARK-11978.
|