| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
|
| |
I looked at each case individually and it looks like they can all be removed. The only one that I had to think twice was toArray (I even thought about un-deprecating it, until I realized it was a problem in Java to have toArray returning java.util.List).
Author: Reynold Xin <rxin@databricks.com>
Closes #10569 from rxin/SPARK-12615.
|
|
|
|
|
|
|
|
|
|
|
|
| |
JIRA: https://issues.apache.org/jira/browse/SPARK-12643
Without setting lib directory for antlr, the updates of imported grammar files can not be detected. So SparkSqlParser.g will not be rebuilt automatically.
Since it is a minor update, no JIRA ticket is opened. Let me know if it is needed. Thanks.
Author: Liang-Chi Hsieh <viirya@gmail.com>
Closes #10571 from viirya/antlr-build.
|
|
|
|
|
|
| |
Author: Reynold Xin <rxin@databricks.com>
Closes #10559 from rxin/remove-deprecated-sql.
|
|
|
|
|
|
| |
Author: Reynold Xin <rxin@databricks.com>
Closes #10561 from rxin/update-mima.
|
|
|
|
|
|
|
|
|
|
| |
and reflection that supported 1.x
Remove use of deprecated Hadoop APIs now that 2.2+ is required
Author: Sean Owen <sowen@cloudera.com>
Closes #10446 from srowen/SPARK-12481.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This PR inlines the Hive SQL parser in Spark SQL.
The previous (merged) incarnation of this PR passed all tests, but had and still has problems with the build. These problems are caused by a the fact that - for some reason - in some cases the ANTLR generated code is not included in the compilation fase.
This PR is a WIP and should not be merged until we have sorted out the build issues.
Author: Herman van Hovell <hvanhovell@questtec.nl>
Author: Nong Li <nong@databricks.com>
Author: Nong Li <nongli@gmail.com>
Closes #10525 from hvanhovell/SPARK-12362.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
setupEndpointRef
### Remove AkkaRpcEnv
Keep `SparkEnv.actorSystem` because Streaming still uses it. Will remove it and AkkaUtils after refactoring Streaming actorStream API.
### Remove systemName
There are 2 places using `systemName`:
* `RpcEnvConfig.name`. Actually, although it's used as `systemName` in `AkkaRpcEnv`, `NettyRpcEnv` uses it as the service name to output the log `Successfully started service *** on port ***`. Since the service name in log is useful, I keep `RpcEnvConfig.name`.
* `def setupEndpointRef(systemName: String, address: RpcAddress, endpointName: String)`. Each `ActorSystem` has a `systemName`. Akka requires `systemName` in its URI and will refuse a connection if `systemName` is not matched. However, `NettyRpcEnv` doesn't use it. So we can remove `systemName` from `setupEndpointRef` since we are removing `AkkaRpcEnv`.
### Remove RpcEnv.uriOf
`uriOf` exists because Akka uses different URI formats for with and without authentication, e.g., `akka.ssl.tcp...` and `akka.tcp://...`. But `NettyRpcEnv` uses the same format. So it's not necessary after removing `AkkaRpcEnv`.
Author: Shixiong Zhu <shixiong@databricks.com>
Closes #10459 from zsxwing/remove-akka-rpc-env.
|
|
|
|
|
|
|
|
| |
We switched to TorrentBroadcast in Spark 1.1, and HttpBroadcast has been undocumented since then. It's time to remove it in Spark 2.0.
Author: Reynold Xin <rxin@databricks.com>
Closes #10531 from rxin/SPARK-12588.
|
|
|
|
| |
This reverts commit b600bccf41a7b1958e33d8301a19214e6517e388 due to non-deterministic build breaks.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is a WIP. The PR has been taken over from nongli (see https://github.com/apache/spark/pull/10420). I have removed some additional dead code, and fixed a few issues which were caused by the fact that the inlined Hive parser is newer than the Hive parser we currently use in Spark.
I am submitting this PR in order to get some feedback and testing done. There is quite a bit of work to do:
- [ ] Get it to pass jenkins build/test.
- [ ] Aknowledge Hive-project for using their parser.
- [ ] Refactorings between HiveQl and the java classes.
- [ ] Create our own ASTNode and integrate the current implicit extentions.
- [ ] Move remaining ```SemanticAnalyzer``` and ```ParseUtils``` functionality to ```HiveQl```.
- [ ] Removing Hive dependencies from the parser. This will require some edits in the grammar files.
- [ ] Introduce our own context which needs to contain a ```TokenRewriteStream```.
- [ ] Add ```useSQL11ReservedKeywordsForIdentifier``` and ```allowQuotedId``` to the catalyst or sql configuration.
- [ ] Remove ```HiveConf``` from grammar files &HiveQl, and pass in our own configuration.
- [ ] Moving the parser into sql/core.
cc nongli rxin
Author: Herman van Hovell <hvanhovell@questtec.nl>
Author: Nong Li <nong@databricks.com>
Author: Nong Li <nongli@gmail.com>
Closes #10509 from hvanhovell/SPARK-12362.
|
|
|
|
|
|
| |
Author: Reynold Xin <rxin@databricks.com>
Closes #10394 from rxin/SPARK-2331.
|
|
|
|
|
|
| |
Author: Reynold Xin <rxin@databricks.com>
Closes #10395 from rxin/SPARK-11808.
|
|
|
|
|
|
| |
Author: Reynold Xin <rxin@databricks.com>
Closes #10387 from rxin/version-bump.
|
|
|
|
|
|
|
|
|
|
| |
Add `computePrincipalComponentsAndVariance` to also compute PCA's explained variance.
CC mengxr
Author: Sean Owen <sowen@cloudera.com>
Closes #9736 from srowen/SPARK-11530.
|
|
|
|
|
|
|
|
|
|
| |
The json endpoint for stages doesn't include information on the stage duration that is present in the UI. This looks like a simple oversight, they should be included. eg., the metrics should be included at api/v1/applications/<appId>/stages.
Metrics I've added are: submissionTime, firstTaskLaunchedTime and completionTime
Author: Xin Ren <iamshrek@126.com>
Closes #10107 from keypointt/SPARK-11155.
|
|
|
|
|
|
|
|
|
|
| |
We should upgrade to SBT 0.13.9, since this is a requirement in order to use SBT's new Maven-style resolution features (which will be done in a separate patch, because it's blocked by some binary compatibility issues in the POM reader plugin).
I also upgraded Scalastyle to version 0.8.0, which was necessary in order to fix a Scala 2.10.5 compatibility issue (see https://github.com/scalastyle/scalastyle/issues/156). The newer Scalastyle is slightly stricter about whitespace surrounding tokens, so I fixed the new style violations.
Author: Josh Rosen <joshrosen@databricks.com>
Closes #10112 from JoshRosen/upgrade-to-sbt-0.13.9.
|
|
|
|
|
|
| |
Author: Marcelo Vanzin <vanzin@cloudera.com>
Closes #10147 from vanzin/SPARK-11314.
|
|
|
|
|
|
|
|
|
|
|
|
| |
Across Different Languages
I have tried to address all the comments in pull request https://github.com/apache/spark/pull/2447.
Note that the second commit (using the new method in all internal code of all components) is quite intrusive and could be omitted.
Author: Jeroen Schot <jeroen.schot@surfsara.nl>
Closes #9767 from schot/master.
|
|
|
|
|
|
|
|
| |
This PR backports PR #10039 to master
Author: Cheng Lian <lian@databricks.com>
Closes #10063 from liancheng/spark-12046.doc-fix.master.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This pull request fixes multiple issues with API doc generation.
- Modify the Jekyll plugin so that the entire doc build fails if API docs cannot be generated. This will make it easy to detect when the doc build breaks, since this will now trigger Jenkins failures.
- Change how we handle the `-target` compiler option flag in order to fix `javadoc` generation.
- Incorporate doc changes from thunterdb (in #10048).
Closes #10048.
Author: Josh Rosen <joshrosen@databricks.com>
Author: Timothy Hunter <timhunter@databricks.com>
Closes #10049 from JoshRosen/fix-doc-build.
|
|
|
|
|
|
|
|
| |
support SBT pom reader only.
Author: Prashant Sharma <scrapcodes@gmail.com>
Closes #10012 from ScrapCodes/minor-build-comment.
|
|
|
|
|
|
|
|
|
|
| |
In the previous implementation, the driver needs to know the executor listening address to send the thread dump request. However, in Netty RPC, the executor doesn't listen to any port, so the executor thread dump feature is broken.
This patch makes the driver use the endpointRef stored in BlockManagerMasterEndpoint to send the thread dump request to fix it.
Author: Shixiong Zhu <shixiong@databricks.com>
Closes #9976 from zsxwing/executor-thread-dump.
|
|
|
|
|
|
|
|
|
|
| |
Spark 2.0."
Also fixed some documentation as I saw them.
Author: Reynold Xin <rxin@databricks.com>
Closes #9930 from rxin/SPARK-11947.
|
|
|
|
|
|
|
|
| |
This patch removes `spark.driver.allowMultipleContexts=true` from our test configuration. The multiple SparkContexts check was originally disabled because certain tests suites in SQL needed to create multiple contexts. As far as I know, this configuration change is no longer necessary, so we should remove it in order to make it easier to find test cleanup bugs.
Author: Josh Rosen <joshrosen@databricks.com>
Closes #9865 from JoshRosen/SPARK-4424.
|
|
|
|
|
|
|
|
|
|
| |
accept a VoidFunction<...>
Currently streaming foreachRDD Java API uses a function prototype requiring a return value of null. This PR deprecates the old method and uses VoidFunction to allow for more concise declaration. Also added VoidFunction2 to Java API in order to use in Streaming methods. Unit test is added for using foreachRDD with VoidFunction, and changes have been tested with Java 7 and Java 8 using lambdas.
Author: Bryan Cutler <bjcutler@us.ibm.com>
Closes #9488 from BryanCutler/foreachRDD-VoidFunction-SPARK-4557.
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fixed the merge conflicts in #7410
Closes #7410
Author: Shixiong Zhu <shixiong@databricks.com>
Author: jerryshao <saisai.shao@intel.com>
Author: jerryshao <sshao@hortonworks.com>
Closes #9742 from zsxwing/pr7410.
|
|
|
|
|
|
|
|
| |
This adds an extra filter for private or protected classes. We only filter for package private right now.
Author: Timothy Hunter <timhunter@databricks.com>
Closes #9697 from thunterdb/spark-11732.
|
|
|
|
|
|
|
|
| |
This is to support JSON serialization of Param[Vector] in the pipeline API. It could be used for other purposes too. The schema is the same as `VectorUDT`. jkbradley
Author: Xiangrui Meng <meng@databricks.com>
Closes #9751 from mengxr/SPARK-11766.
|
|
|
|
|
|
|
|
| |
Remove some old yarn related building codes, please review, thanks a lot.
Author: jerryshao <sshao@hortonworks.com>
Closes #9625 from jerryshao/remove-old-module.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
classes
This patch modifies Spark's closure cleaner (and a few other places) to use ASM 5, which is necessary in order to support cleaning of closures that were compiled by Java 8.
In order to avoid ASM dependency conflicts, Spark excludes ASM from all of its dependencies and uses a shaded version of ASM 4 that comes from `reflectasm` (see [SPARK-782](https://issues.apache.org/jira/browse/SPARK-782) and #232). This patch updates Spark to use a shaded version of ASM 5.0.4 that was published by the Apache XBean project; the POM used to create the shaded artifact can be found at https://github.com/apache/geronimo-xbean/blob/xbean-4.4/xbean-asm5-shaded/pom.xml.
http://movingfulcrum.tumblr.com/post/80826553604/asm-framework-50-the-missing-migration-guide was a useful resource while upgrading the code to use the new ASM5 opcodes.
I also added a new regression tests in the `java8-tests` subproject; the existing tests were insufficient to catch this bug, which only affected Scala 2.11 user code which was compiled targeting Java 8.
Author: Josh Rosen <joshrosen@databricks.com>
Closes #9512 from JoshRosen/SPARK-6152.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch re-enables tests for the Docker JDBC data source. These tests were reverted in #4872 due to transitive dependency conflicts introduced by the `docker-client` library. This patch should avoid those problems by using a version of `docker-client` which shades its transitive dependencies and by performing some build-magic to work around problems with that shaded JAR.
In addition, I significantly refactored the tests to simplify the setup and teardown code and to fix several Docker networking issues which caused problems when running in `boot2docker`.
Closes #8101.
Author: Josh Rosen <joshrosen@databricks.com>
Author: Yijie Shen <henry.yijieshen@gmail.com>
Closes #9503 from JoshRosen/docker-jdbc-tests.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch modifies Spark's SBT build so that it no longer uses `retrieveManaged` / `lib_managed` to store its dependencies. The motivations for this change are nicely described on the JIRA ticket ([SPARK-7841](https://issues.apache.org/jira/browse/SPARK-7841)); my personal interest in doing this stems from the fact that `lib_managed` has caused me some pain while debugging dependency issues in another PR of mine.
Removing our use of `lib_managed` would be trivial except for one snag: the Datanucleus JARs, required by Spark SQL's Hive integration, cannot be included in assembly JARs due to problems with merging OSGI `plugin.xml` files. As a result, several places in the packaging and deployment pipeline assume that these Datanucleus JARs are copied to `lib_managed/jars`. In the interest of maintaining compatibility, I have chosen to retain the `lib_managed/jars` directory _only_ for these Datanucleus JARs and have added custom code to `SparkBuild.scala` to automatically copy those JARs to that folder as part of the `assembly` task.
`dev/mima` also depended on `lib_managed` in a hacky way in order to set classpaths when generating MiMa excludes; I've updated this to obtain the classpaths directly from SBT instead.
/cc dragos marmbrus pwendell srowen
Author: Josh Rosen <joshrosen@databricks.com>
Closes #9575 from JoshRosen/SPARK-7841.
|
|
|
|
|
|
|
|
|
|
| |
I looked at the other endpoints, and they don't seem to be missing any fields.
Added fields:
![image](https://cloud.githubusercontent.com/assets/613879/10948801/58159982-82e4-11e5-86dc-62da201af910.png)
Author: Charles Yeh <charlesyeh@dropbox.com>
Closes #9472 from CharlesYeh/api_vars.
|
|
|
|
|
|
|
|
| |
various dialects as private.
Author: Reynold Xin <rxin@databricks.com>
Closes #9511 from rxin/SPARK-11541.
|
|
|
|
|
|
|
|
|
| |
sbt's version resolution code always picks the most recent version, and we
don't want that for guava.
Author: Marcelo Vanzin <vanzin@cloudera.com>
Closes #9508 from vanzin/SPARK-11538.
|
|
|
|
| |
This reverts commit 9cf56c96b7d02a14175d40b336da14c2e1c88339.
|
|
|
|
|
|
|
|
| |
Spark should build against Scala 2.10.5, since that includes a fix for Scaladoc that will fix doc snapshot publishing: https://issues.scala-lang.org/browse/SI-8479
Author: Josh Rosen <joshrosen@databricks.com>
Closes #9450 from JoshRosen/upgrade-to-scala-2.10.5.
|
|
|
|
|
|
|
|
| |
These two classes should be public, since they are used in public code.
Author: Reynold Xin <rxin@databricks.com>
Closes #9445 from rxin/SPARK-11485.
|
|
|
|
|
|
|
|
| |
Like ml ```LinearRegression```, ```LogisticRegression``` should provide a training summary including feature names and their coefficients.
Author: Yanbo Liang <ybliang8@gmail.com>
Closes #9303 from yanboliang/spark-9492.
|
|
|
|
|
|
|
|
| |
This is the first task (https://issues.apache.org/jira/browse/SPARK-11469) of https://issues.apache.org/jira/browse/SPARK-11438
Author: Yin Huai <yhuai@databricks.com>
Closes #9393 from yhuai/udfNondeterministic.
|
|
|
|
|
|
|
|
|
|
| |
Since we do not need to preserve a page before calling compute(), MapPartitionsWithPreparationRDD is not needed anymore.
This PR basically revert #8543, #8511, #8038, #8011
Author: Davies Liu <davies@databricks.com>
Closes #9381 from davies/remove_prepare2.
|
|
|
|
|
|
|
|
| |
There's a lot of duplication between SortShuffleManager and UnsafeShuffleManager. Given that these now provide the same set of functionality, now that UnsafeShuffleManager supports large records, I think that we should replace SortShuffleManager's serialized shuffle implementation with UnsafeShuffleManager's and should merge the two managers together.
Author: Josh Rosen <joshrosen@databricks.com>
Closes #8829 from JoshRosen/consolidate-sort-shuffle-implementations.
|
|
|
|
|
|
|
|
| |
…redNodeLocationData
Author: Jacek Laskowski <jacek.laskowski@deepsense.io>
Closes #8976 from jaceklaskowski/SPARK-10921.
|
|
|
|
|
|
|
|
| |
Shows that an error is actually due to a fatal warning.
Author: Jakob Odersky <jodersky@gmail.com>
Closes #9128 from jodersky/fatalwarnings.
|
|
|
|
|
|
|
|
| |
Modify the SBT build script to include GitHub source links for generated Scaladocs, on releases only (no snapshots).
Author: Jakob Odersky <jodersky@gmail.com>
Closes #9110 from jodersky/unidoc.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This PR improve the sessions management by replacing the thread-local based to one SQLContext per session approach, introduce separated temporary tables and UDFs/UDAFs for each session.
A new session of SQLContext could be created by:
1) create an new SQLContext
2) call newSession() on existing SQLContext
For HiveContext, in order to reduce the cost for each session, the classloader and Hive client are shared across multiple sessions (created by newSession).
CacheManager is also shared by multiple sessions, so cache a table multiple times in different sessions will not cause multiple copies of in-memory cache.
Added jars are still shared by all the sessions, because SparkContext does not support sessions.
cc marmbrus yhuai rxin
Author: Davies Liu <davies@databricks.com>
Closes #8909 from davies/sessions.
|
|
|
|
|
|
| |
Author: Marcelo Vanzin <vanzin@cloudera.com>
Closes #8775 from vanzin/SPARK-10300.
|
|
|
|
|
|
|
|
| |
This PR remove the typeId in columnar cache, it's not needed anymore, it also remove DATE and TIMESTAMP (use INT/LONG instead).
Author: Davies Liu <davies@databricks.com>
Closes #8989 from davies/refactor_cache.
|
|
|
|
|
|
|
|
|
|
| |
In many modeling application, data points are not necessarily sampled with equal probabilities. Linear regression should support weighting which account the over or under sampling.
work in progress.
Author: Meihua Wu <meihuawu@umich.edu>
Closes #8631 from rotationsymmetry/SPARK-9642.
|
|
|
|
|
|
| |
Author: Reynold Xin <rxin@databricks.com>
Closes #8812 from rxin/SPARK-9808-1.
|