aboutsummaryrefslogtreecommitdiff
Commit message (Collapse)AuthorAgeFilesLines
* [SPARK-7782] fixed sort arrow issuezuxqoj2015-05-271-3/+3
| | | | | | | | | | | | | | | | | | Current behaviour:: In spark UI ![screen shot 2015-05-27 at 3 27 51 pm](https://cloud.githubusercontent.com/assets/3919211/7837541/47d330ba-04a5-11e5-89d1-e5b11da1a513.png) In YARN ![screen shot 2015-05-27 at 3](https://cloud.githubusercontent.com/assets/3919211/7837594/aebd1d36-04a5-11e5-8216-86e03c07d2bd.png) In jira ![screen shot 2015-05-27 at 3_2](https://cloud.githubusercontent.com/assets/3919211/7837616/d3fedce2-04a5-11e5-9e68-960ed54e5d83.png) Author: zuxqoj <sbshekhar@gmail.com> Closes #6437 from zuxqoj/SPARK-7782_PR and squashes the following commits: cd068b9 [zuxqoj] [SPARK-7782] fixed sort arrow issue
* [DOCS] Fix typo in documentation for Java UDF registrationMatt Wise2015-05-271-1/+1
| | | | | | | | | | This contribution is my original work and I license the work to the project under the project's open source license Author: Matt Wise <mwise@quixey.com> Closes #6447 from wisematthew/fix-typo-in-java-udf-registration-doc and squashes the following commits: e7ef5f7 [Matt Wise] Fix typo in documentation for Java UDF registration
* [SPARK-7896] Allow ChainedBuffer to store more than 2 GBSandy Ryza2015-05-272-42/+55
| | | | | | | | | | | | | | | Author: Sandy Ryza <sandy@cloudera.com> Closes #6440 from sryza/sandy-spark-7896 and squashes the following commits: 49d8a0d [Sandy Ryza] Fix bug introduced when reading over record boundaries 6006856 [Sandy Ryza] Fix overflow issues 006b4b2 [Sandy Ryza] Fix scalastyle by removing non ascii characters 8b000ca [Sandy Ryza] Add ascii art to describe layout of data in metaBuffer f2053c0 [Sandy Ryza] Fix negative overflow issue 0368c78 [Sandy Ryza] Initialize size as 0 a5a4820 [Sandy Ryza] Use explicit types for all numbers in ChainedBuffer b7e0213 [Sandy Ryza] SPARK-7896. Allow ChainedBuffer to store more than 2 GB
* [SPARK-7873] Allow KryoSerializerInstance to create multiple streams at the ↵Josh Rosen2015-05-273-24/+147
| | | | | | | | | | | | | | | | | | | | | | | same time This is a somewhat obscure bug, but I think that it will seriously impact KryoSerializer users who use custom registrators which disabled auto-reset. When auto-reset is disabled, then this breaks things in some of our shuffle paths which actually end up creating multiple OutputStreams from the same shared SerializerInstance (which is unsafe). This was introduced by a patch (SPARK-3386) which enables serializer re-use in some of the shuffle paths, since constructing new serializer instances is actually pretty costly for KryoSerializer. We had already fixed another corner-case (SPARK-7766) bug related to this, but missed this one. I think that the root problem here is that KryoSerializerInstance can be used in a way which is unsafe even within a single thread, e.g. by creating multiple open OutputStreams from the same instance or by interleaving deserialize and deserializeStream calls. I considered a smaller patch which adds assertions to guard against this type of "misuse" but abandoned that approach after I realized how convoluted the Scaladoc became. This patch fixes this bug by making it legal to create multiple streams from the same KryoSerializerInstance. Internally, KryoSerializerInstance now implements a `borrowKryo()` / `releaseKryo()` API that's backed by a "pool" of capacity 1. Each call to a KryoSerializerInstance method will borrow the Kryo, do its work, then release the serializer instance back to the pool. If the pool is empty and we need an instance, it will allocate a new Kryo on-demand. This makes it safe for multiple OutputStreams to be opened from the same serializer. If we try to release a Kryo back to the pool but the pool already contains a Kryo, then we'll just discard the new Kryo. I don't think there's a clear benefit to having a larger pool since our usages tend to fall into two cases, a) where we only create a single OutputStream and b) where we create a huge number of OutputStreams with the same lifecycle, then destroy the KryoSerializerInstance (this is what's happening in the bypassMergeSort code path that my regression test hits). Author: Josh Rosen <joshrosen@databricks.com> Closes #6415 from JoshRosen/SPARK-7873 and squashes the following commits: 00b402e [Josh Rosen] Initialize eagerly to fix a failing test ba55d20 [Josh Rosen] Add explanatory comments 3f1da96 [Josh Rosen] Guard against duplicate close() ab457ca [Josh Rosen] Sketch a loan/release based solution. 9816e8f [Josh Rosen] Add a failing test showing how deserialize() and deserializeStream() can interfere. 7350886 [Josh Rosen] Add failing regression test for SPARK-7873
* [SPARK-7907] [SQL] [UI] Rename tab ThriftServer to SQL.Yin Huai2015-05-274-6/+8
| | | | | | | | | | | | | | | | | This PR has three changes: 1. Renaming the table of `ThriftServer` to `SQL`; 2. Renaming the title of the tab from `ThriftServer` to `JDBC/ODBC Server`; and 3. Renaming the title of the session page from `ThriftServer` to `JDBC/ODBC Session`. https://issues.apache.org/jira/browse/SPARK-7907 Author: Yin Huai <yhuai@databricks.com> Closes #6448 from yhuai/JDBCServer and squashes the following commits: eadcc3d [Yin Huai] Update test. 9168005 [Yin Huai] Use SQL as the tab name. 221831e [Yin Huai] Rename ThriftServer to JDBCServer.
* [SPARK-7897][SQL] Use DecimalType to represent unsigned bigint in JDBCRDDLiang-Chi Hsieh2015-05-271-1/+1
| | | | | | | | | | JIRA: https://issues.apache.org/jira/browse/SPARK-7897 Author: Liang-Chi Hsieh <viirya@gmail.com> Closes #6438 from viirya/jdbc_unsigned_bigint and squashes the following commits: ccb3c3f [Liang-Chi Hsieh] Use DecimalType to represent unsigned bigint.
* [SPARK-7853] [SQL] Fixes a class loader issue in Spark SQLCheng Hao2015-05-273-8/+53
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This PR is based on PR #6396 authored by chenghao-intel. Essentially, Spark SQL should use context classloader to load SerDe classes. yhuai helped updating the test case, and I fixed a bug in the original `CliSuite`: while testing the CLI tool with `runCliWithin`, we don't append `\n` to the last query, thus the last query is never executed. Original PR description is pasted below. ---- ``` bin/spark-sql --jars ./sql/hive/src/test/resources/hive-hcatalog-core-0.13.1.jar CREATE TABLE t1(a string, b string) ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'; ``` Throws exception like ``` 15/05/26 00:16:33 ERROR SparkSQLDriver: Failed in [CREATE TABLE t1(a string, b string) ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'] org.apache.spark.sql.execution.QueryExecutionException: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Cannot validate serde: org.apache.hive.hcatalog.data.JsonSerDe at org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$runHive$1.apply(ClientWrapper.scala:333) at org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$runHive$1.apply(ClientWrapper.scala:310) at org.apache.spark.sql.hive.client.ClientWrapper.withHiveState(ClientWrapper.scala:139) at org.apache.spark.sql.hive.client.ClientWrapper.runHive(ClientWrapper.scala:310) at org.apache.spark.sql.hive.client.ClientWrapper.runSqlHive(ClientWrapper.scala:300) at org.apache.spark.sql.hive.HiveContext.runSqlHive(HiveContext.scala:457) at org.apache.spark.sql.hive.execution.HiveNativeCommand.run(HiveNativeCommand.scala:33) at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:57) at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:57) at org.apache.spark.sql.execution.ExecutedCommand.doExecute(commands.scala:68) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:88) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:88) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:148) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:87) at org.apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala:922) at org.apache.spark.sql.SQLContext$QueryExecution.toRdd(SQLContext.scala:922) at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:147) at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:131) at org.apache.spark.sql.DataFrame$.apply(DataFrame.scala:51) at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:727) at org.apache.spark.sql.hive.thriftserver.AbstractSparkSQLDriver.run(AbstractSparkSQLDriver.scala:57) ``` Author: Cheng Hao <hao.cheng@intel.com> Author: Cheng Lian <lian@databricks.com> Author: Yin Huai <yhuai@databricks.com> Closes #6435 from liancheng/classLoader and squashes the following commits: d4c4845 [Cheng Lian] Fixes CliSuite 75e80e2 [Yin Huai] Update the fix. fd26533 [Cheng Hao] scalastyle dd78775 [Cheng Hao] workaround for classloader of IsolatedClientLoader
* [SPARK-7684] [SQL] Refactoring MetastoreDataSourcesSuite to workaround ↵Cheng Lian2015-05-273-666/+722
| | | | | | | | | | | | | | | | SPARK-7684 As stated in SPARK-7684, currently `TestHive.reset` has some execution order specific bug, which makes running specific test suites locally pretty frustrating. This PR refactors `MetastoreDataSourcesSuite` (which relies on `TestHive.reset` heavily) using various `withXxx` utility methods in `SQLTestUtils` to ask each test case to cleanup their own mess so that we can avoid calling `TestHive.reset`. Author: Cheng Lian <lian@databricks.com> Author: Yin Huai <yhuai@databricks.com> Closes #6353 from liancheng/workaround-spark-7684 and squashes the following commits: 26939aa [Yin Huai] Move the initialization of jsonFilePath to beforeAll. a423d48 [Cheng Lian] Fixes Scala style issue dfe45d0 [Cheng Lian] Refactors MetastoreDataSourcesSuite to workaround SPARK-7684 92a116d [Cheng Lian] Fixes minor styling issues
* [SPARK-7790] [SQL] date and decimal conversion for dynamic partition keyDaoyuan Wang2015-05-273-4/+48
| | | | | | | | | | Author: Daoyuan Wang <daoyuan.wang@intel.com> Closes #6318 from adrian-wang/dynpart and squashes the following commits: ad73b61 [Daoyuan Wang] not use sqlTestUtils for try catch because dont have sqlcontext here 6c33b51 [Daoyuan Wang] fix according to liancheng f0f8074 [Daoyuan Wang] some specific types as dynamic partition
* Removed Guava dependency from JavaTypeInference's type signature.Reynold Xin2015-05-272-4/+11
| | | | | | | | | | This should also close #6243. Author: Reynold Xin <rxin@databricks.com> Closes #6431 from rxin/JavaTypeInference-guava and squashes the following commits: e58df3c [Reynold Xin] Removed Gauva dependency from JavaTypeInference's type signature.
* [SPARK-7864] [UI] Fix the logic grabbing the link from table in AllJobPageKousuke Saruta2015-05-272-2/+2
| | | | | | | | | | | | | This issue is related to #6419 . Now AllJobPage doesn't have a "kill link" but I think fix the issue mentioned in #6419 just in case to avoid accidents in the future. So, it's minor issue for now and I don't file this issue in JIRA. Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp> Closes #6432 from sarutak/remove-ambiguity-of-link and squashes the following commits: cd1a503 [Kousuke Saruta] Fixed ambiguity link issue in AllJobPage
* [SPARK-7847] [SQL] Fixes dynamic partition directory escapingCheng Lian2015-05-274-60/+152
| | | | | | | | | | | | | | Please refer to [SPARK-7847] [1] for details. [1]: https://issues.apache.org/jira/browse/SPARK-7847 Author: Cheng Lian <lian@databricks.com> Closes #6389 from liancheng/spark-7847 and squashes the following commits: 935c652 [Cheng Lian] Adds test case for writing various data types as dynamic partition value f4fc398 [Cheng Lian] Converts partition columns to Scala type when writing dynamic partitions d0aeca0 [Cheng Lian] Fixes dynamic partition directory escaping
* [SPARK-7878] Rename Stage.jobId to firstJobIdKay Ousterhout2015-05-274-37/+33
| | | | | | | | | | | | | | | | | The previous name was confusing, because each stage can be associated with many jobs, and jobId is just the ID of the first job that was associated with the Stage. This commit also renames some of the method parameters in DAGScheduler.scala to clarify when the jobId refers to the first job ID associated with the stage (as opposed to the jobId associated with a job that's currently being scheduled). cc markhamstra JoshRosen (hopefully this will help prevent future bugs like SPARK-6880) Author: Kay Ousterhout <kayousterhout@gmail.com> Closes #6418 from kayousterhout/SPARK-7878 and squashes the following commits: b71a9b8 [Kay Ousterhout] [SPARK-7878] Rename Stage.jobId to firstJobId
* [CORE] [TEST] HistoryServerSuite failed due to timezone issuescwf2015-05-271-1/+1
| | | | | | | | | | | | follow up for #6377 Change time to the equivalent in GMT /cc squito Author: scwf <wangfei1@huawei.com> Closes #6425 from scwf/fix-HistoryServerSuite and squashes the following commits: 4d37935 [scwf] fix HistoryServerSuite
* [SQL] Rename MathematicalExpression UnaryMathExpression, and specify ↵Reynold Xin2015-05-272-32/+23
| | | | | | | | | | | | | | BinaryMathExpression's output data type as DoubleType. Two minor changes. cc brkyvz Author: Reynold Xin <rxin@databricks.com> Closes #6428 from rxin/math-func-cleanup and squashes the following commits: 5910df5 [Reynold Xin] [SQL] Rename MathematicalExpression UnaryMathExpression, and specify BinaryMathExpression's output data type as DoubleType.
* [SPARK-7887][SQL] Remove EvaluatedType from SQL Expression.Reynold Xin2015-05-2728-95/+32
| | | | | | | | | | This type is not really used. Might as well remove it. Author: Reynold Xin <rxin@databricks.com> Closes #6427 from rxin/evalutedType and squashes the following commits: 51a319a [Reynold Xin] [SPARK-7887][SQL] Remove EvaluatedType from SQL Expression.
* [SPARK-7697][SQL] Use LongType for unsigned int in JDBCRDDLiang-Chi Hsieh2015-05-271-3/+8
| | | | | | | | | | | | | JIRA: https://issues.apache.org/jira/browse/SPARK-7697 The reported problem case is mysql. But for h2 db, there is no unsigned int. So it is not able to add corresponding test. Author: Liang-Chi Hsieh <viirya@gmail.com> Closes #6229 from viirya/unsignedint_as_long and squashes the following commits: dc4b5d8 [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into unsignedint_as_long 608695b [Liang-Chi Hsieh] Use LongType for unsigned int in JDBCRDD.
* [SPARK-7850][BUILD] Hive 0.12.0 profile in POM should be removedCheolsoo Park2015-05-275-574/+1
| | | | | | | | | | | | I grep'ed hive-0.12.0 in the source code and removed all the profiles and doc references. Author: Cheolsoo Park <cheolsoop@netflix.com> Closes #6393 from piaozhexiu/SPARK-7850 and squashes the following commits: fb429ce [Cheolsoo Park] Remove hive-0.13.1 profile 82bf09a [Cheolsoo Park] Remove hive 0.12.0 shim code f3722da [Cheolsoo Park] Remove hive-0.12.0 profile and references from POM and build docs
* [SPARK-7535] [.1] [MLLIB] minor changes to the pipeline APIXiangrui Meng2015-05-266-32/+71
| | | | | | | | | | | | | | | | | | | | | 1. removed `Params.validateParams(extra)` 2. added `Evaluate.evaluate(dataset, paramPairs*)` 3. updated `RegressionEvaluator` doc jkbradley Author: Xiangrui Meng <meng@databricks.com> Closes #6392 from mengxr/SPARK-7535.1 and squashes the following commits: 5ff5af8 [Xiangrui Meng] add unit test for CV.validateParams f1f8369 [Xiangrui Meng] update CV.validateParams() to test estimatorParamMaps 607445d [Xiangrui Meng] merge master 8716f5f [Xiangrui Meng] specify default metric name in RegressionEvaluator e4e5631 [Xiangrui Meng] update RegressionEvaluator doc 801e864 [Xiangrui Meng] Merge remote-tracking branch 'apache/master' into SPARK-7535.1 fcbd3e2 [Xiangrui Meng] Merge branch 'master' into SPARK-7535.1 2192316 [Xiangrui Meng] remove validateParams(extra); add evaluate(dataset, extra*)
* [SPARK-7868] [SQL] Ignores _temporary directories in HadoopFsRelationCheng Lian2015-05-262-7/+29
| | | | | | | | | | So that potential partial/corrupted data files left by failed tasks/jobs won't affect normal data scan. Author: Cheng Lian <lian@databricks.com> Closes #6411 from liancheng/spark-7868 and squashes the following commits: 273ea36 [Cheng Lian] Ignores _temporary directories
* [SPARK-7858] [SQL] Use output schema, not relation schema, for data source ↵Josh Rosen2015-05-265-41/+37
| | | | | | | | | | | | | | | | | | | | | | | | | | input conversion In `DataSourceStrategy.createPhysicalRDD`, we use the relation schema as the target schema for converting incoming rows into Catalyst rows. However, we should be using the output schema instead, since our scan might return a subset of the relation's columns. This patch incorporates #6414 by liancheng, which fixes an issue in `SimpleTestRelation` that prevented this bug from being caught by our old tests: > In `SimpleTextRelation`, we specified `needsConversion` to `true`, indicating that values produced by this testing relation should be of Scala types, and need to be converted to Catalyst types when necessary. However, we also used `Cast` to convert strings to expected data types. And `Cast` always produces values of Catalyst types, thus no conversion is done at all. This PR makes `SimpleTextRelation` produce Scala values so that data conversion code paths can be properly tested. Closes #5986. Author: Josh Rosen <joshrosen@databricks.com> Author: Cheng Lian <lian@databricks.com> Author: Cheng Lian <liancheng@users.noreply.github.com> Closes #6400 from JoshRosen/SPARK-7858 and squashes the following commits: e71c866 [Josh Rosen] Re-fix bug so that the tests pass again 56b13e5 [Josh Rosen] Add regression test to hadoopFsRelationSuites 2169a0f [Josh Rosen] Remove use of SpecificMutableRow and BufferedIterator 6cd7366 [Josh Rosen] Fix SPARK-7858 by using output types for conversion. 5a00e66 [Josh Rosen] Add assertions in order to reproduce SPARK-7858 8ba195c [Cheng Lian] Merge 9968fba9979287aaa1f141ba18bfb9d4c116a3b3 into 61664732b25b35f94be35a42cde651cbfd0e02b7 9968fba [Cheng Lian] Tests the data type conversion code paths
* [SPARK-7637] [SQL] O(N) merge implementation for StructType mergerowan2015-05-262-4/+81
| | | | | | | | | | | | Contribution is my original work and I license the work to the project under the projects open source license. Author: rowan <rowan.chattaway@googlemail.com> Closes #6259 from rowan000/SPARK-7637 and squashes the following commits: c479df4 [rowan] SPARK-7637: rename mapFields to fieldsMap as per comments on github. 8d2e419 [rowan] SPARK-7637: fix up whitespace changes 0e9d662 [rowan] SPARK-7637: O(N) merge implementatio for StructType merge
* [SPARK-7883] [DOCS] [MLLIB] Fixing broken trainImplicit Scala example in ↵Mike Dusenberry2015-05-261-1/+2
| | | | | | | | | | | | MLlib Collaborative Filtering documentation. Fixing broken trainImplicit Scala example in MLlib Collaborative Filtering documentation to match one of the possible ALS.trainImplicit function signatures. Author: Mike Dusenberry <dusenberrymw@gmail.com> Closes #6422 from dusenberrymw/Fix_MLlib_Collab_Filtering_trainImplicit_Example and squashes the following commits: 36492f4 [Mike Dusenberry] Fixing broken trainImplicit example in MLlib Collaborative Filtering documentation to match one of the possible ALS.trainImplicit function signatures.
* [SPARK-7864] [UI] Do not kill innocent stages from visualizationAndrew Or2015-05-263-3/+3
| | | | | | | | | | | | | | **Reproduction.** Run a long-running job, go to the job page, expand the DAG visualization, and click into a stage. Your stage is now killed. Why? This is because the visualization code just reaches into the stage table and grabs the first link it finds. In our case, this first link happens to be the kill link instead of the one to the stage page. **Fix.** Use proper CSS selectors to avoid ambiguity. This is an alternative to #6407. Thanks carsonwang for catching this. Author: Andrew Or <andrew@databricks.com> Closes #6419 from andrewor14/fix-ui-viz-kill and squashes the following commits: 25203bd [Andrew Or] Do not kill innocent stages
* [SPARK-7748] [MLLIB] Graduate spark.ml from alphaXiangrui Meng2015-05-2643-201/+267
| | | | | | | | | | | | | | | | | | With descent coverage of feature transformers, algorithms, and model tuning support, it is time to graduate `spark.ml` from alpha. This PR changes all `AlphaComponent` annotations to either `DeveloperApi` or `Experimental`, depending on whether we expect a class/method to be used by end users (who use the pipeline API to assemble/tune their ML pipelines but not to create new pipeline components.) `UnaryTransformer` becomes a `DeveloperApi` in this PR. jkbradley harsha2010 Author: Xiangrui Meng <meng@databricks.com> Closes #6417 from mengxr/SPARK-7748 and squashes the following commits: effbccd [Xiangrui Meng] organize imports c15028e [Xiangrui Meng] added missing docs 1b2e5f8 [Xiangrui Meng] update package doc 73ca791 [Xiangrui Meng] alpha -> ex/dev for the rest 93819db [Xiangrui Meng] alpha -> ex/dev in ml.param 55ca073 [Xiangrui Meng] alpha -> ex/dev in ml.feature 83572f1 [Xiangrui Meng] add Experimental and DeveloperApi tags (wip)
* [SPARK-6602] [CORE] Remove some places in core that calling SparkEnv.actorSystemzsxwing2015-05-267-29/+33
| | | | | | | | | | Author: zsxwing <zsxwing@gmail.com> Closes #6333 from zsxwing/remove-actor-system-usage and squashes the following commits: f125aa6 [zsxwing] Fix YarnAllocatorSuite ceadcf6 [zsxwing] Change the "port" parameter type of "AkkaUtils.address" to "int"; update ApplicationMaster and YarnAllocator to get the driverUrl from RpcEnv 3239380 [zsxwing] Remove some places in core that calling SparkEnv.actorSystem
* [SPARK-3674] YARN support in Spark EC2Shivaram Venkataraman2015-05-261-0/+2
| | | | | | | | | | | | | This corresponds to https://github.com/mesos/spark-ec2/pull/116 in the spark-ec2 repo. The only changes required on the spark_ec2.py script is to open the RM port. cc andrewor14 Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu> Closes #6376 from shivaram/spark-ec2-yarn and squashes the following commits: 961504a [Shivaram Venkataraman] Merge branch 'master' of https://github.com/apache/spark into spark-ec2-yarn 152c94c [Shivaram Venkataraman] Open 8088 for YARN in EC2
* [SPARK-7844] [MLLIB] Fix broken tests in KernelDensityMechCoder2015-05-262-5/+7
| | | | | | | | | | | | | The densities in KernelDensity are scaled down by (number of parallel processes X number of points). It should be just no.of samples. This results in broken tests in KernelDensitySuite which haven't been tested properly. Author: MechCoder <manojkumarsivaraj334@gmail.com> Closes #6383 from MechCoder/spark-7844 and squashes the following commits: ab81302 [MechCoder] Math->math 9b8ed50 [MechCoder] Make one pass to update count a92fe50 [MechCoder] [SPARK-7844] Fix broken tests in KernelDensity
* Revert "[SPARK-7042] [BUILD] use the standard akka artifacts with hadoop-2.x"Patrick Wendell2015-05-261-4/+2
| | | | This reverts commit 43aa819c041f6e8301ad1b8f82eb68e14254f636.
* [SPARK-7854] [TEST] refine Kryo test suiteZhang, Liye2015-05-261-24/+27
| | | | | | | | | | this modification is according to JoshRosen 's comments, for details, please refer to [#5934](https://github.com/apache/spark/pull/5934/files#r30949751). Author: Zhang, Liye <liye.zhang@intel.com> Closes #6395 from liyezhang556520/kryoTest and squashes the following commits: da214c8 [Zhang, Liye] refine Kryo test suite accroding to Josh's comments
* [DOCS] [MLLIB] Fixing misformatted links in v1.4 MLlib Naive Bayes ↵Mike Dusenberry2015-05-261-3/+2
| | | | | | | | | | | | documentation by removing space and newline characters. A couple of links in the MLlib Naive Bayes documentation for v1.4 were broken due to the addition of either space or newline characters between the link title and link URL in the markdown doc. (Interestingly enough, they are rendered correctly in the GitHub viewer, but not when compiled to HTML by Jekyll.) Author: Mike Dusenberry <dusenberrymw@gmail.com> Closes #6412 from dusenberrymw/Fix_Broken_Links_In_MLlib_Naive_Bayes_Docs and squashes the following commits: 91a4028 [Mike Dusenberry] Fixing misformatted links by removing space and newline characters.
* [SPARK-7806][EC2] Fixes that allow the spark_ec2.py tool to run with Python3meawoppl2015-05-261-5/+9
| | | | | | | | | | I have used this script to launch, destroy, start, and stop clusters successfully. Author: meawoppl <meawoppl@gmail.com> Closes #6336 from meawoppl/py3ec2spark and squashes the following commits: 2e87046 [meawoppl] Py3 compat fixes.
* [SPARK-7339] [PYSPARK] PySpark shuffle spill memory sometimes are not correctlinweizhong2015-05-261-4/+4
| | | | | | | | | | | | | | | | | | | | | | In PySpark we get memory used before and after spill, then use the difference of these two value as memorySpilled, but if the before value is small than after value, then we will get a negative value, but this scenario 0 value may be more reasonable. Below is the result in HistoryServer we have tested: Index ID Attempt Status Locality Level Executor ID / Host Launch Time Duration GC Time Input Size / Records Write Time Shuffle Write Size / Records Shuffle Spill (Memory) Shuffle Spill (Disk) Errors 0 0 0 SUCCESS NODE_LOCAL 3 / vm119 2015/05/04 17:31:06 21 s 0.1 s 128.1 MB (hadoop) / 3237 70 ms 10.1 MB / 2529 0.0 B 5.7 MB 2 2 0 SUCCESS NODE_LOCAL 1 / vm118 2015/05/04 17:31:06 22 s 89 ms 128.1 MB (hadoop) / 3205 0.1 s 10.1 MB / 2529 -1048576.0 B 5.9 MB 1 1 0 SUCCESS NODE_LOCAL 2 / vm117 2015/05/04 17:31:06 22 s 0.1 s 128.1 MB (hadoop) / 3271 68 ms 10.1 MB / 2529 -1048576.0 B 5.6 MB 4 4 0 SUCCESS NODE_LOCAL 2 / vm117 2015/05/04 17:31:06 22 s 0.1 s 128.1 MB (hadoop) / 3192 51 ms 10.1 MB / 2529 -1048576.0 B 5.9 MB 3 3 0 SUCCESS NODE_LOCAL 3 / vm119 2015/05/04 17:31:06 22 s 0.1 s 128.1 MB (hadoop) / 3262 51 ms 10.1 MB / 2529 1024.0 KB 5.8 MB 5 5 0 SUCCESS NODE_LOCAL 1 / vm118 2015/05/04 17:31:06 22 s 89 ms 128.1 MB (hadoop) / 3256 93 ms 10.1 MB / 2529 -1048576.0 B 5.7 MB /cc davies Author: linweizhong <linweizhong@huawei.com> Closes #5887 from Sephiroth-Lin/spark-7339 and squashes the following commits: 9186c81 [linweizhong] Use max function to get a nonnegative value d41672b [linweizhong] Update MemoryBytesSpilled when memorySpilled > 0
* [CORE] [TEST] Fix SimpleDateParamTestscwf2015-05-261-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | ``` sbt.ForkMain$ForkError: 1424424077190 was not equal to 1424474477190 at org.scalatest.MatchersHelper$.newTestFailedException(MatchersHelper.scala:160) at org.scalatest.Matchers$ShouldMethodHelper$.shouldMatcher(Matchers.scala:6231) at org.scalatest.Matchers$AnyShouldWrapper.should(Matchers.scala:6265) at org.apache.spark.status.api.v1.SimpleDateParamTest$$anonfun$1.apply$mcV$sp(SimpleDateParamTest.scala:25) at org.apache.spark.status.api.v1.SimpleDateParamTest$$anonfun$1.apply(SimpleDateParamTest.scala:23) at org.apache.spark.status.api.v1.SimpleDateParamTest$$anonfun$1.apply(SimpleDateParamTest.scala:23) at org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22) at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85) at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) at org.scalatest.Transformer.apply(Transformer.scala:22) at org.scalatest.Transformer.apply(Transformer.scala:20) at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166) at org.scalatest.Suite$class.withFixture(Suite.scala: ``` Set timezone to fix SimpleDateParamTest Author: scwf <wangfei1@huawei.com> Author: Fei Wang <wangfei1@huawei.com> Closes #6377 from scwf/fix-SimpleDateParamTest and squashes the following commits: b8df1e5 [Fei Wang] Update SimpleDateParamSuite.scala 8bb74f0 [scwf] fix SimpleDateParamSuite
* [SPARK-7042] [BUILD] use the standard akka artifacts with hadoop-2.xKonstantin Shaposhnikov2015-05-261-2/+4
| | | | | | | | | | | | | Both akka 2.3.x and hadoop-2.x use protobuf 2.5 so only hadoop-1 build needs custom 2.3.4-spark akka version that shades protobuf-2.5 This partially fixes SPARK-7042 (for hadoop-2.x builds) Author: Konstantin Shaposhnikov <Konstantin.Shaposhnikov@sc.com> Closes #6341 from kostya-sh/SPARK-7042 and squashes the following commits: 7eb8c60 [Konstantin Shaposhnikov] [SPARK-7042][BUILD] use the standard akka artifacts with hadoop-2.x
* [SQL][minor] Removed unused Catalyst logical plan DSL.Reynold Xin2015-05-251-102/+27
| | | | | | | | | | The Catalyst DSL is no longer used as a public facing API. This pull request removes the UDF and writeToFile feature from it since they are not used in unit tests. Author: Reynold Xin <rxin@databricks.com> Closes #6350 from rxin/unused-logical-dsl and squashes the following commits: 90b3de6 [Reynold Xin] [SQL][minor] Removed unused Catalyst logical plan DSL.
* [SPARK-7832] [Build] Always run SQL tests in master build.Yin Huai2015-05-252-17/+26
| | | | | | | | | | https://issues.apache.org/jira/browse/SPARK-7832 Author: Yin Huai <yhuai@databricks.com> Closes #6385 from yhuai/runSQLTests and squashes the following commits: 3d399bc [Yin Huai] Always run SQL tests in master build.
* [SPARK-6391][DOCS] Document Tachyon compatibility.Calvin Jia2015-05-251-2/+4
| | | | | | | | | | | Adds a section in the RDD persistence section of the programming-guide docs detailing Spark-Tachyon version compatibility as discussed in [[SPARK-6391]](https://issues.apache.org/jira/browse/SPARK-6391). Author: Calvin Jia <jia.calvin@gmail.com> Closes #6382 from calvinjia/spark-6391 and squashes the following commits: 113e863 [Calvin Jia] Move compatibility info to the offheap storage level section. 7942dc5 [Calvin Jia] Add a section in the programming-guide docs for Tachyon compatibility.
* [SPARK-7842] [SQL] Makes task committing/aborting in ↵Cheng Lian2015-05-263-10/+87
| | | | | | | | | | | | | | InsertIntoHadoopFsRelation more robust When committing/aborting a write task issued in `InsertIntoHadoopFsRelation`, if an exception is thrown from `OutputWriter.close()`, the committing/aborting process will be interrupted, and leaves messy stuff behind (e.g., the `_temporary` directory created by `FileOutputCommitter`). This PR makes these two process more robust by catching potential exceptions and falling back to normal task committment/abort. Author: Cheng Lian <lian@databricks.com> Closes #6378 from liancheng/spark-7838 and squashes the following commits: f18253a [Cheng Lian] Makes task committing/aborting in InsertIntoHadoopFsRelation more robust
* [SPARK-7684] [SQL] Invoking HiveContext.newTemporaryConfiguration() ↵Cheng Lian2015-05-262-2/+4
| | | | | | | | | | | | | shouldn't create new metastore directory The "Database does not exist" error reported in SPARK-7684 was caused by `HiveContext.newTemporaryConfiguration()`, which always creates a new temporary metastore directory and returns a metastore configuration pointing that directory. This makes `TestHive.reset()` always replaces old temporary metastore with an empty new one. Author: Cheng Lian <lian@databricks.com> Closes #6359 from liancheng/spark-7684 and squashes the following commits: 95d2eb8 [Cheng Lian] Addresses @marmbrust's comment 042769d [Cheng Lian] Don't create new temp directory in HiveContext.newTemporaryConfiguration()
* Add test which shows Kryo buffer size configured in mb is properly supportedtedyu2015-05-251-0/+4
| | | | | | | | | | | | | | | | This PR adds test which shows that Kryo buffer size configured in mb is supported properly Author: tedyu <yuzhihong@gmail.com> Closes #6390 from tedyu/master and squashes the following commits: c51ea64 [tedyu] Fix KryoSerializer creation f12ee04 [tedyu] Correct conf variable name in test 642de51 [tedyu] Drop change in KryoSerializer so that the new test runs d2fdbc4 [tedyu] Give bufferSizeKb initial value 9a17277 [tedyu] Rewrite bufferSize checking 4739998 [tedyu] Rewrite bufferSize checking 830d0d0 [tedyu] Kryo buffer size configured in mb should be properly supported
* Close HBaseAdmin at the end of HBaseTesttedyu2015-05-251-0/+1
| | | | | | | | Author: tedyu <yuzhihong@gmail.com> Closes #6381 from ted-yu/master and squashes the following commits: e2f0ea1 [tedyu] Close HBaseAdmin at the end of HBaseTest
* [SPARK-7811] Fix typo on slf4j configuration on metrics.properties.tem…Judy Nash2015-05-241-2/+2
| | | | | | | | | | Fix minor typo on metrics.properties.template where slf4j is incorrectly spelled as sl4j. Author: Judy Nash <judynash@microsoft.com> Closes #6362 from judynash/master and squashes the following commits: c644875 [Judy Nash] SPARK-7811: Fix typo on slf4j configuration on metrics.properties.template
* [SPARK-7833] [ML] Add python wrapper for RegressionEvaluatorRam Sriharsha2015-05-243-4/+69
| | | | | | | | | | Author: Ram Sriharsha <rsriharsha@hw11853.local> Closes #6365 from harsha2010/SPARK-7833 and squashes the following commits: 923f288 [Ram Sriharsha] cleanup 7623b7d [Ram Sriharsha] python style fix 9743f83 [Ram Sriharsha] [SPARK-7833][ml] Add python wrapper for RegressionEvaluator
* [SPARK-7805] [SQL] Move SQLTestUtils.scala and ParquetTest.scala to src/testYin Huai2015-05-244-1/+11
| | | | | | | | | | | | | https://issues.apache.org/jira/browse/SPARK-7805 Because `sql/hive`'s tests depend on the test jar of `sql/core`, we do not need to store `SQLTestUtils` and `ParquetTest` in `src/main`. We should only add stuff that will be needed by `sql/console` or Python tests (for Python, we need it in `src/main`, right? davies). Author: Yin Huai <yhuai@databricks.com> Closes #6334 from yhuai/SPARK-7805 and squashes the following commits: af6d0c9 [Yin Huai] mima b86746a [Yin Huai] Move SQLTestUtils.scala and ParquetTest.scala to src/test.
* [SPARK-7845] [BUILD] Bump "Hadoop 1" tests to version 1.2.1Yin Huai2015-05-241-1/+1
| | | | | | | | | | https://issues.apache.org/jira/browse/SPARK-7845 Author: Yin Huai <yhuai@databricks.com> Closes #6384 from yhuai/hadoop1Test and squashes the following commits: 82fcea8 [Yin Huai] Use hadoop 1.2.1 (a stable version) for hadoop 1 test.
* [SPARK-7287] [HOTFIX] Disable o.a.s.deploy.SparkSubmitSuite --packagesPatrick Wendell2015-05-231-1/+2
|
* [HOTFIX] Copy SparkR lib if it exists in make-distributionShivaram Venkataraman2015-05-231-2/+5
| | | | | | | | | | | | This is to fix an issue reported in #6373 where the `cp` would fail if `-Psparkr` was not used in the build cc dragos pwendell Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu> Closes #6379 from shivaram/make-distribution-hotfix and squashes the following commits: 08eb7e4 [Shivaram Venkataraman] Copy SparkR lib if it exists in make-distribution
* [SPARK-7654] [SQL] Move insertInto into reader/writer interface.Yin Huai2015-05-2314-89/+116
| | | | | | | | | | | | | | | | This one continues the work of https://github.com/apache/spark/pull/6216. Author: Yin Huai <yhuai@databricks.com> Author: Reynold Xin <rxin@databricks.com> Closes #6366 from yhuai/insert and squashes the following commits: 3d717fb [Yin Huai] Use insertInto to handle the casue when table exists and Append is used for saveAsTable. 56d2540 [Yin Huai] Add PreWriteCheck to HiveContext's analyzer. c636e35 [Yin Huai] Remove unnecessary empty lines. cf83837 [Yin Huai] Move insertInto to write. Also, remove the partition columns from InsertIntoHadoopFsRelation. 0841a54 [Reynold Xin] Removed experimental tag for deprecated methods. 33ed8ef [Reynold Xin] [SPARK-7654][SQL] Move insertInto into reader/writer interface.
* Fix install jira-pythonDavies Liu2015-05-233-3/+3
| | | | | | | | | | | | | | jira-pytyhon package should be installed by sudo pip install jira cc pwendell Author: Davies Liu <davies@databricks.com> Closes #6367 from davies/fix_jira_python2 and squashes the following commits: fbb3c8e [Davies Liu] Fix install jira-python