aboutsummaryrefslogtreecommitdiff
Commit message (Collapse)AuthorAgeFilesLines
...
* [SPARK-12599][MLLIB][SQL] Remove the use of callUDF in MLlibReynold Xin2016-01-022-2/+16
| | | | | | | | callUDF has been deprecated. However, we do not have an alternative for users to specify the output data type without type tags. This pull request introduced a new API for that, and replaces the invocation of the deprecated callUDF with that. Author: Reynold Xin <rxin@databricks.com> Closes #10547 from rxin/SPARK-12599.
* [SPARK-12481][CORE][STREAMING][SQL] Remove usage of Hadoop deprecated APIs ↵Sean Owen2016-01-0246-441/+150
| | | | | | | | | | and reflection that supported 1.x Remove use of deprecated Hadoop APIs now that 2.2+ is required Author: Sean Owen <sowen@cloudera.com> Closes #10446 from srowen/SPARK-12481.
* [SPARK-10180][SQL] JDBC datasource are not processing EqualNullSafe filterhyukjinkwon2016-01-022-2/+7
| | | | | | | | | | This PR is followed by https://github.com/apache/spark/pull/8391. Previous PR fixes JDBCRDD to support null-safe equality comparison for JDBC datasource. This PR fixes the problem that it can actually return null as a result of the comparison resulting error as using the value of that comparison. Author: hyukjinkwon <gurwls223@gmail.com> Author: HyukjinKwon <gurwls223@gmail.com> Closes #8743 from HyukjinKwon/SPARK-10180.
* [SPARK-12362][SQL][WIP] Inline Hive ParserHerman van Hovell2016-01-0118-73/+5443
| | | | | | | | | | | | | | This PR inlines the Hive SQL parser in Spark SQL. The previous (merged) incarnation of this PR passed all tests, but had and still has problems with the build. These problems are caused by a the fact that - for some reason - in some cases the ANTLR generated code is not included in the compilation fase. This PR is a WIP and should not be merged until we have sorted out the build issues. Author: Herman van Hovell <hvanhovell@questtec.nl> Author: Nong Li <nong@databricks.com> Author: Nong Li <nongli@gmail.com> Closes #10525 from hvanhovell/SPARK-12362.
* Revert "[SPARK-12286][SPARK-12290][SPARK-12294][SPARK-12284][SQL] always ↵Reynold Xin2016-01-0134-74/+574
| | | | | | output UnsafeRow" This reverts commit 0da7bd50ddf0fb9e0e8aeadb9c7fb3edf6f0ee6e.
* [SPARK-12286][SPARK-12290][SPARK-12294][SPARK-12284][SQL] always output ↵Davies Liu2016-01-0134-574/+74
| | | | | | | | | | | | | | | | UnsafeRow It's confusing that some operator output UnsafeRow but some not, easy to make mistake. This PR change to only output UnsafeRow for all the operators (SparkPlan), removed the rule to insert Unsafe/Safe conversions. For those that can't output UnsafeRow directly, added UnsafeProjection into them. Closes #10330 cc JoshRosen rxin Author: Davies Liu <davies@databricks.com> Closes #10511 from davies/unsafe_row.
* Disable test-dependencies.sh.Reynold Xin2016-01-011-2/+3
|
* [SPARK-12592][SQL][TEST] Don't mute Spark loggers in TestHive.reset()Cheng Lian2016-01-011-1/+4
| | | | | | | | There's a hack done in `TestHive.reset()`, which intended to mute noisy Hive loggers. However, Spark testing loggers are also muted. Author: Cheng Lian <lian@databricks.com> Closes #10540 from liancheng/spark-12592.dont-mute-spark-loggers.
* [SPARK-12409][SPARK-12387][SPARK-12391][SQL] Refactor filter pushdown for ↵Liang-Chi Hsieh2016-01-012-31/+45
| | | | | | | | | | | | JDBCRDD and add few filters This patch refactors the filter pushdown for JDBCRDD and also adds few filters. Added filters are basically from #10468 with some refactoring. Test cases are from #10468. Author: Liang-Chi Hsieh <viirya@gmail.com> Closes #10470 from viirya/refactor-jdbc-filter.
* [SPARK-3873][MLLIB] Import order fixes.Marcelo Vanzin2015-12-3195-169/+160
| | | | | | | | | | | A slight adjustment to the checker configuration was needed; there is a handful of warnings still left, but those are because of a bug in the checker that I'll fix separately (before enabling errors for the checker, of course). Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #10535 from vanzin/SPARK-3873-mllib.
* [SPARK-11743][SQL] Move the test for arrayOfUDTLiang-Chi Hsieh2015-12-311-13/+2
| | | | | | | | A following pr for #9712. Move the test for arrayOfUDT. Author: Liang-Chi Hsieh <viirya@gmail.com> Closes #10538 from viirya/move-udt-test.
* [SPARK-10359][PROJECT-INFRA] Multiple fixes to dev/test-dependencies.sh scriptJosh Rosen2015-12-312-2/+9
| | | | | | | | | | | | | This patch includes multiple fixes for the `dev/test-dependencies.sh` script (which was introduced in #10461): - Use `build/mvn --force` instead of `mvn` in one additional place. - Explicitly set a zero exit code on success. - Set `LC_ALL=C` to make `sort` results agree across machines (see https://stackoverflow.com/questions/28881/). - Set `should_run_build_tests=True` for `build` module (this somehow got lost). Author: Josh Rosen <joshrosen@databricks.com> Closes #10543 from JoshRosen/dep-script-fixes.
* [SPARK-3873][STREAMING] Import order fixes for streaming.Marcelo Vanzin2015-12-3173-180/+181
| | | | | | | | Also included a few miscelaneous other modules that had very few violations. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #10532 from vanzin/SPARK-3873-streaming.
* [SPARK-12039][SQL] Re-enable HiveSparkSubmitSuite's SPARK-9757 Persist ↵Yin Huai2015-12-311-1/+1
| | | | | | | | | | | | Parquet relation with decimal column https://issues.apache.org/jira/browse/SPARK-12039 since we do not support hadoop1, we can re-enable this test in master. Author: Yin Huai <yhuai@databricks.com> Closes #10533 from yhuai/SPARK-12039-enable.
* [SPARK-7995][SPARK-6280][CORE] Remove AkkaRpcEnv and remove systemName from ↵Shixiong Zhu2015-12-3129-1120/+90
| | | | | | | | | | | | | | | | | | | | | setupEndpointRef ### Remove AkkaRpcEnv Keep `SparkEnv.actorSystem` because Streaming still uses it. Will remove it and AkkaUtils after refactoring Streaming actorStream API. ### Remove systemName There are 2 places using `systemName`: * `RpcEnvConfig.name`. Actually, although it's used as `systemName` in `AkkaRpcEnv`, `NettyRpcEnv` uses it as the service name to output the log `Successfully started service *** on port ***`. Since the service name in log is useful, I keep `RpcEnvConfig.name`. * `def setupEndpointRef(systemName: String, address: RpcAddress, endpointName: String)`. Each `ActorSystem` has a `systemName`. Akka requires `systemName` in its URI and will refuse a connection if `systemName` is not matched. However, `NettyRpcEnv` doesn't use it. So we can remove `systemName` from `setupEndpointRef` since we are removing `AkkaRpcEnv`. ### Remove RpcEnv.uriOf `uriOf` exists because Akka uses different URI formats for with and without authentication, e.g., `akka.ssl.tcp...` and `akka.tcp://...`. But `NettyRpcEnv` uses the same format. So it's not necessary after removing `AkkaRpcEnv`. Author: Shixiong Zhu <shixiong@databricks.com> Closes #10459 from zsxwing/remove-akka-rpc-env.
* [SPARK-12585] [SQL] move numFields to constructor of UnsafeRowDavies Liu2015-12-3016-137/+86
| | | | | | | | | | Right now, numFields will be passed in by pointTo(), then bitSetWidthInBytes is calculated, making pointTo() a little bit heavy. It should be part of constructor of UnsafeRow. Author: Davies Liu <davies@databricks.com> Closes #10528 from davies/numFields.
* House cleaning: close old pull requests.Reynold Xin2015-12-300-0/+0
| | | | | | | | | | Closes #5400 Closes #5408 Closes #5423 Closes #5668 Closes #6757 Closes #6745 Closes #6613
* Closes #10386 since it was superseded by #10468.Reynold Xin2015-12-300-0/+0
|
* House cleaning: close open pull requests created before June 1st, 2015Reynold Xin2015-12-300-0/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Closes #5358 Closes #3744 Closes #3677 Closes #3536 Closes #3249 Closes #3221 Closes #2446 Closes #3794 Closes #3815 Closes #3816 Closes #3866 Closes #4286 Closes #5184 Closes #5170 Closes #5142 Closes #5025 Closes #5005 Closes #4897 Closes #4887 Closes #4849 Closes #4632 Closes #4622 Closes #4456 Closes #4449 Closes #4417 Closes #5483 Closes #5325 Closes #6545 Closes #6449 Closes #6433 Closes #6416 Closes #6403 Closes #6386 Closes #6263 Closes #6245 Closes #6213 Closes #6155 Closes #6133 Closes #6018 Closes #5978 Closes #5869 Closes #5852 Closes #5848 Closes #5754 Closes #5598 Closes #5503 Closes #4380
* [SPARK-12561] Remove JobLogger in Spark 2.0.Reynold Xin2015-12-301-277/+0
| | | | | | | | It was research code and has been deprecated since 1.0.0. No one really uses it since they can just use event logging. Author: Reynold Xin <rxin@databricks.com> Closes #10530 from rxin/SPARK-12561.
* [SPARK-3873][GRAPHX] Import order fixes.Marcelo Vanzin2015-12-3023-50/+33
| | | | | | | | There's one warning left, caused by a bug in the checker. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #10537 from vanzin/SPARK-3873-graphx.
* [SPARK-3873][YARN] Fix import ordering.Marcelo Vanzin2015-12-3012-26/+23
| | | | | | Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #10536 from vanzin/SPARK-3873-yarn.
* [SPARK-12588] Remove HttpBroadcast in Spark 2.0.Reynold Xin2015-12-3012-491/+22
| | | | | | | | We switched to TorrentBroadcast in Spark 1.1, and HttpBroadcast has been undocumented since then. It's time to remove it in Spark 2.0. Author: Reynold Xin <rxin@databricks.com> Closes #10531 from rxin/SPARK-12588.
* [SPARK-8641][SPARK-12455][SQL] Native Spark Window functions - Follow-up ↵Herman van Hovell2015-12-303-3/+162
| | | | | | | | | | | | | | | | | | (docs & tests) This PR is a follow-up for PR https://github.com/apache/spark/pull/9819. It adds documentation for the window functions and a couple of NULL tests. The documentation was largely based on the documentation in (the source of) Hive and Presto: * https://prestodb.io/docs/current/functions/window.html * https://cwiki.apache.org/confluence/display/Hive/LanguageManual+WindowingAndAnalytics I am not sure if we need to add the licenses of these two projects to the licenses directory. They are both under the ASL. srowen any thoughts? cc yhuai Author: Herman van Hovell <hvanhovell@questtec.nl> Closes #10402 from hvanhovell/SPARK-8641-docs.
* [SPARK-12399] Display correct error message when accessing REST API with an ↵Carson Wang2015-12-301-2/+14
| | | | | | | | | | | | | | | | | | | | | | | unknown app Id I got an exception when accessing the below REST API with an unknown application Id. `http://<server-url>:18080/api/v1/applications/xxx/jobs` Instead of an exception, I expect an error message "no such app: xxx" which is a similar error message when I access `/api/v1/applications/xxx` ``` org.spark-project.guava.util.concurrent.UncheckedExecutionException: java.util.NoSuchElementException: no app with key xxx at org.spark-project.guava.cache.LocalCache$Segment.get(LocalCache.java:2263) at org.spark-project.guava.cache.LocalCache.get(LocalCache.java:4000) at org.spark-project.guava.cache.LocalCache.getOrLoad(LocalCache.java:4004) at org.spark-project.guava.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874) at org.apache.spark.deploy.history.HistoryServer.getSparkUI(HistoryServer.scala:116) at org.apache.spark.status.api.v1.UIRoot$class.withSparkUI(ApiRootResource.scala:226) at org.apache.spark.deploy.history.HistoryServer.withSparkUI(HistoryServer.scala:46) at org.apache.spark.status.api.v1.ApiRootResource.getJobs(ApiRootResource.scala:66) ``` Author: Carson Wang <carson.wang@intel.com> Closes #10352 from carsonwang/unknownAppFix.
* [SPARK-12409][SPARK-12387][SPARK-12391][SQL] Support AND/OR/IN/LIKE ↵Takeshi YAMAMURO2015-12-302-2/+35
| | | | | | | | | | push-down filters for JDBC This is rework from #10386 and add more tests and LIKE push-down support. Author: Takeshi YAMAMURO <linguin.m.s@gmail.com> Closes #10468 from maropu/SupportMorePushdownInJdbc.
* [SPARK-10359] Enumerate dependencies in a file and diff against it for new ↵Josh Rosen2015-12-3010-120/+512
| | | | | | | | | | | | | | | | | pull requests This patch adds a new build check which enumerates Spark's resolved runtime classpath and saves it to a file, then diffs against that file to detect whether pull requests have introduced dependency changes. The aim of this check is to make it simpler to reason about whether pull request which modify the build have introduced new dependencies or changed transitive dependencies in a way that affects the final classpath. This supplants the checks added in SPARK-4123 / #5093, which are currently disabled due to bugs. This patch is based on pwendell's work in #8531. Closes #8531. Author: Josh Rosen <joshrosen@databricks.com> Author: Patrick Wendell <patrick@databricks.com> Closes #10461 from JoshRosen/SPARK-10359.
* [SPARK-12300] [SQL] [PYSPARK] fix schema inferance on local collectionsHolden Karau2015-12-302-7/+14
| | | | | | | | Current schema inference for local python collections halts as soon as there are no NullTypes. This is different than when we specify a sampling ratio of 1.0 on a distributed collection. This could result in incomplete schema information. Author: Holden Karau <holden@us.ibm.com> Closes #10275 from holdenk/SPARK-12300-fix-schmea-inferance-on-local-collections.
* [SPARK-12495][SQL] use true as default value for propagateNull in NewInstanceWenchen Fan2015-12-307-37/+38
| | | | | | | | | | Most of cases we should propagate null when call `NewInstance`, and so far there is only one case we should stop null propagation: create product/java bean. So I think it makes more sense to propagate null by dafault. This also fixes a bug when encode null array/map, which is firstly discovered in https://github.com/apache/spark/pull/10401 Author: Wenchen Fan <wenchen@databricks.com> Closes #10443 from cloud-fan/encoder.
* [SPARK-12263][DOCS] IllegalStateException: Memory can't be 0 for ↵Neelesh Srinivas Salian2015-12-301-1/+1
| | | | | | | | | | | SPARK_WORKER_MEMORY without unit Updated the Worker Unit IllegalStateException message to indicate no values less than 1MB instead of 0 to help solve this. Requesting review Author: Neelesh Srinivas Salian <nsalian@cloudera.com> Closes #10483 from nssalian/SPARK-12263.
* Revert "[SPARK-12362][SQL][WIP] Inline Hive Parser"Reynold Xin2015-12-3018-5402/+72
| | | | This reverts commit b600bccf41a7b1958e33d8301a19214e6517e388 due to non-deterministic build breaks.
* [SPARK-12564][SQL] Improve missing column AnalysisExceptiongatorsmile2015-12-292-2/+2
| | | | | | | | | | | | ``` org.apache.spark.sql.AnalysisException: cannot resolve 'value' given input columns text; ``` lets put a `:` after `columns` and put the columns in `[]` so that they match the toString of DataFrame. Author: gatorsmile <gatorsmile@gmail.com> Closes #10518 from gatorsmile/improveAnalysisExceptionMsg.
* [SPARK-12490][CORE] Limit the css style scope to fix the Streaming UIShixiong Zhu2015-12-293-3/+5
| | | | | | | | | | | | #10441 broke the Streaming UI because of the new CSS style. <img width="503" alt="screen shot 2015-12-29 at 4 49 04 pm" src="https://cloud.githubusercontent.com/assets/1000778/12044763/1efce0fe-ae4c-11e5-9f8b-39df08426bf8.png"> This PR just added a class for the new style and only applied them to the paged tables. Author: Shixiong Zhu <shixiong@databricks.com> Closes #10517 from zsxwing/fix-streaming-ui.
* [SPARK-12362][SQL][WIP] Inline Hive ParserNong Li2015-12-2918-72/+5402
| | | | | | | | | | | | | | | | | | | | | | | | This is a WIP. The PR has been taken over from nongli (see https://github.com/apache/spark/pull/10420). I have removed some additional dead code, and fixed a few issues which were caused by the fact that the inlined Hive parser is newer than the Hive parser we currently use in Spark. I am submitting this PR in order to get some feedback and testing done. There is quite a bit of work to do: - [ ] Get it to pass jenkins build/test. - [ ] Aknowledge Hive-project for using their parser. - [ ] Refactorings between HiveQl and the java classes. - [ ] Create our own ASTNode and integrate the current implicit extentions. - [ ] Move remaining ```SemanticAnalyzer``` and ```ParseUtils``` functionality to ```HiveQl```. - [ ] Removing Hive dependencies from the parser. This will require some edits in the grammar files. - [ ] Introduce our own context which needs to contain a ```TokenRewriteStream```. - [ ] Add ```useSQL11ReservedKeywordsForIdentifier``` and ```allowQuotedId``` to the catalyst or sql configuration. - [ ] Remove ```HiveConf``` from grammar files &HiveQl, and pass in our own configuration. - [ ] Moving the parser into sql/core. cc nongli rxin Author: Herman van Hovell <hvanhovell@questtec.nl> Author: Nong Li <nong@databricks.com> Author: Nong Li <nongli@gmail.com> Closes #10509 from hvanhovell/SPARK-12362.
* [SPARK-12549][SQL] Take Option[Seq[DataType]] in UDF input type specification.Reynold Xin2015-12-295-68/+75
| | | | | | | | In Spark we allow UDFs to declare its expected input types in order to apply type coercion. The expected input type parameter takes a Seq[DataType] and uses Nil when no type coercion is applied. It makes more sense to take Option[Seq[DataType]] instead, so we can differentiate a no-arg function vs function with no expected input type specified. Author: Reynold Xin <rxin@databricks.com> Closes #10504 from rxin/SPARK-12549.
* [SPARK-12349][SPARK-12349][ML] Fix typo in Spark version regex introduced in ↵Sean Owen2015-12-291-1/+1
| | | | | | | | | | | / PR 10327 Sorry jkbradley Ref: https://github.com/apache/spark/pull/10327#discussion_r48502942 Author: Sean Owen <sowen@cloudera.com> Closes #10508 from srowen/SPARK-12349.2.
* [SPARK-11199][SPARKR] Improve R context management story and add getOrCreateHossein2015-12-292-1/+5
| | | | | | | | | | | * Changes api.r.SQLUtils to use ```SQLContext.getOrCreate``` instead of creating a new context. * Adds a simple test [SPARK-11199] #comment link with JIRA Author: Hossein <hossein@databricks.com> Closes #9185 from falaki/SPARK-11199.
* [SPARK-12530][BUILD] Fix build break at Spark-Master-Maven-Snapshots from #1293Kazuaki Ishizaki2015-12-291-3/+4
| | | | | | | | | | | Compilation error caused due to string concatenations that are not a constant Use raw string literal to avoid string concatenations https://amplab.cs.berkeley.edu/jenkins/view/Spark-Packaging/job/Spark-Master-Maven-Snapshots/1293/ Author: Kazuaki Ishizaki <ishizaki@jp.ibm.com> Closes #10488 from kiszk/SPARK-12530.
* [SPARK-12526][SPARKR] ifelse`, `when`, `otherwise` unable to take Column as ↵Forest Fang2015-12-293-7/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | value `ifelse`, `when`, `otherwise` is unable to take `Column` typed S4 object as values. For example: ```r ifelse(lit(1) == lit(1), lit(2), lit(3)) ifelse(df$mpg > 0, df$mpg, 0) ``` will both fail with ```r attempt to replicate an object of type 'environment' ``` The PR replaces `ifelse` calls with `if ... else ...` inside the function implementations to avoid attempt to vectorize(i.e. `rep()`). It remains to be discussed whether we should instead support vectorization in these functions for consistency because `ifelse` in base R is vectorized but I cannot foresee any scenarios these functions will want to be vectorized in SparkR. For reference, added test cases which trigger failures: ```r . Error: when(), otherwise() and ifelse() with column on a DataFrame ---------- error in evaluating the argument 'x' in selecting a method for function 'collect': error in evaluating the argument 'col' in selecting a method for function 'select': attempt to replicate an object of type 'environment' Calls: when -> when -> ifelse -> ifelse 1: withCallingHandlers(eval(code, new_test_environment), error = capture_calls, message = function(c) invokeRestart("muffleMessage")) 2: eval(code, new_test_environment) 3: eval(expr, envir, enclos) 4: expect_equal(collect(select(df, when(df$a > 1 & df$b > 2, lit(1))))[, 1], c(NA, 1)) at test_sparkSQL.R:1126 5: expect_that(object, equals(expected, label = expected.label, ...), info = info, label = label) 6: condition(object) 7: compare(actual, expected, ...) 8: collect(select(df, when(df$a > 1 & df$b > 2, lit(1)))) Error: Test failures Execution halted ``` Author: Forest Fang <forest.fang@outlook.com> Closes #10481 from saurfang/spark-12526.
* [SPARK-11394][SQL] Throw IllegalArgumentException for unsupported types in ↵Takeshi YAMAMURO2015-12-282-0/+5
| | | | | | | | | | | postgresql If DataFrame has BYTE types, throws an exception: org.postgresql.util.PSQLException: ERROR: type "byte" does not exist Author: Takeshi YAMAMURO <linguin.m.s@gmail.com> Closes #9350 from maropu/FixBugInPostgreJdbc.
* [SPARK-12547][SQL] Tighten scala style checker enforcement for UDF registrationReynold Xin2015-12-282-29/+30
| | | | | | | | | | We use scalastyle:off to turn off style checks in certain places where it is not possible to follow the style guide. This is usually ok. However, in udf registration, we disable the checker for a large amount of code simply because some of them exceed 100 char line limit. It is better to just disable the line limit check rather than everything. In this pull request, I only disabled line length check, and fixed a problem (lack explicit types for public methods). Author: Reynold Xin <rxin@databricks.com> Closes #10501 from rxin/SPARK-12547.
* [SPARK-12522][SQL][MINOR] Add the missing document strings for the SQL ↵gatorsmile2015-12-283-8/+11
| | | | | | | | | | | | | | | | | | configuration Fixing the missing the document for the configuration. We can see the missing messages "TODO" when issuing the command "SET -V". ``` spark.sql.columnNameOfCorruptRecord spark.sql.hive.verifyPartitionPath spark.sql.sources.parallelPartitionDiscovery.threshold spark.sql.hive.convertMetastoreParquet.mergeSchema spark.sql.hive.convertCTAS spark.sql.hive.thriftServer.async ``` Author: gatorsmile <gatorsmile@gmail.com> Closes #10471 from gatorsmile/commandDesc.
* [SPARK-12490] Don't use Javascript for web UI's paginated table controlsJosh Rosen2015-12-285-97/+178
| | | | | | | | | | The web UI's paginated table uses Javascript to implement certain navigation controls, such as table sorting and the "go to page" form. This is unnecessary and should be simplified to use plain HTML form controls and links. /cc zsxwing, who wrote this original code, and yhuai. Author: Josh Rosen <joshrosen@databricks.com> Closes #10441 from JoshRosen/simplify-paginated-table-sorting.
* [SPARK-12489][CORE][SQL][MLIB] Fix minor issues found by FindBugsShixiong Zhu2015-12-287-32/+51
| | | | | | | | | | | | Include the following changes: 1. Close `java.sql.Statement` 2. Fix incorrect `asInstanceOf`. 3. Remove unnecessary `synchronized` and `ReentrantLock`. Author: Shixiong Zhu <shixiong@databricks.com> Closes #10440 from zsxwing/findbugs.
* [SPARK-12525] Fix fatal compiler warnings in Kinesis ASL due to @transient ↵Josh Rosen2015-12-282-8/+8
| | | | | | | | | | | | | | | | | annotations The Scala 2.11 SBT build currently fails for Spark 1.6.0 and master due to warnings about the `transient` annotation: ``` [error] [warn] /Users/joshrosen/Documents/spark/extras/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisBackedBlockRDD.scala:73: no valid targets for annotation on value sc - it is discarded unused. You may specify targets with meta-annotations, e.g. (transient param) [error] [warn] transient sc: SparkContext, ``` This fix implemented here is the same as what we did in #8433: remove the `transient` annotations when they are not necessary and replace use `transient private val` in the remaining cases. Author: Josh Rosen <joshrosen@databricks.com> Closes #10479 from JoshRosen/fix-sbt-2.11.
* [SPARK-12222][CORE] Deserialize RoaringBitmap using Kryo serializer throw ↵Daoyuan Wang2015-12-291-6/+1
| | | | | | | | | | | | Buffer underflow exception Since we only need to implement `def skipBytes(n: Int)`, code in #10213 could be simplified. davies scwf Author: Daoyuan Wang <daoyuan.wang@intel.com> Closes #10253 from adrian-wang/kryo.
* [SPARK-12441][SQL] Fixing missingInput in ↵gatorsmile2015-12-2815-18/+63
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Generate/MapPartitions/AppendColumns/MapGroups/CoGroup When explain any plan with Generate, we will see an exclamation mark in the plan. Normally, when we see this mark, it means the plan has an error. This PR is to correct the `missingInput` in `Generate`. For example, ```scala val df = Seq((1, "a b c"), (2, "a b"), (3, "a")).toDF("number", "letters") val df2 = df.explode('letters) { case Row(letters: String) => letters.split(" ").map(Tuple1(_)).toSeq } df2.explain(true) ``` Before the fix, the plan is like ``` == Parsed Logical Plan == 'Generate UserDefinedGenerator('letters), true, false, None +- Project [_1#0 AS number#2,_2#1 AS letters#3] +- LocalRelation [_1#0,_2#1], [[1,a b c],[2,a b],[3,a]] == Analyzed Logical Plan == number: int, letters: string, _1: string Generate UserDefinedGenerator(letters#3), true, false, None, [_1#8] +- Project [_1#0 AS number#2,_2#1 AS letters#3] +- LocalRelation [_1#0,_2#1], [[1,a b c],[2,a b],[3,a]] == Optimized Logical Plan == Generate UserDefinedGenerator(letters#3), true, false, None, [_1#8] +- LocalRelation [number#2,letters#3], [[1,a b c],[2,a b],[3,a]] == Physical Plan == !Generate UserDefinedGenerator(letters#3), true, false, [number#2,letters#3,_1#8] +- LocalTableScan [number#2,letters#3], [[1,a b c],[2,a b],[3,a]] ``` **Updates**: The same issues are also found in the other four Dataset operators: `MapPartitions`/`AppendColumns`/`MapGroups`/`CoGroup`. Fixed all these four. Author: gatorsmile <gatorsmile@gmail.com> Author: xiaoli <lixiao1983@gmail.com> Author: Xiao Li <xiaoli@Xiaos-MacBook-Pro.local> Closes #10393 from gatorsmile/generateExplain.
* [SPARK-7727][SQL] Avoid inner classes in RuleExecutorStephan Kessler2015-12-283-5/+74
| | | | | | | | | | Moved (case) classes Strategy, Once, FixedPoint and Batch to the companion object. This is necessary if we want to have the Optimizer easily extendable in the following sense: Usually a user wants to add additional rules, and just take the ones that are already there. However, inner classes made that impossible since the code did not compile This allows easy extension of existing Optimizers see the DefaultOptimizerExtendableSuite for a corresponding test case. Author: Stephan Kessler <stephan.kessler@sap.com> Closes #10174 from stephankessler/SPARK-7727.
* [SPARK-12424][ML] The implementation of ParamMap#filter is wrong.Kousuke Saruta2015-12-292-2/+34
| | | | | | | | | ParamMap#filter uses `mutable.Map#filterKeys`. The return type of `filterKey` is collection.Map, not mutable.Map but the result is casted to mutable.Map using `asInstanceOf` so we get `ClassCastException`. Also, the return type of Map#filterKeys is not Serializable. It's the issue of Scala (https://issues.scala-lang.org/browse/SI-6654). Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp> Closes #10381 from sarutak/SPARK-12424.
* [SPARK-12287][SQL] Support UnsafeRow in MapPartitions/MapGroups/CoGroupgatorsmile2015-12-281-0/+13
| | | | | | | | | | | | Support Unsafe Row in MapPartitions/MapGroups/CoGroup. Added a test case for MapPartitions. Since MapGroups and CoGroup are built on AppendColumns, all the related dataset test cases already can verify the correctness when MapGroups and CoGroup processing unsafe rows. davies cloud-fan Not sure if my understanding is right, please correct me. Thank you! Author: gatorsmile <gatorsmile@gmail.com> Closes #10398 from gatorsmile/unsafeRowMapGroup.