spark - Mirror of Apache Spark

	Commit message (Collapse)	Author	Age	Files	Lines
*	Preparing Spark release v1.4.1-rc4v1.4.1	Patrick Wendell	2015-07-08	30	-30/+30
\|
*	[SPARK-8902] Correctly print hostname in error	Daniel Darabos	2015-07-09	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	With "+" the strings are separate expressions, and format() is called on the last string before concatenation. (So substitution does not happen.) Without "+" the string literals are merged first by the parser, so format() is called on the complete string. Should I make a JIRA for this? Author: Daniel Darabos <darabos.daniel@gmail.com> Closes #7288 from darabos/patch-2 and squashes the following commits: be0d3b7 [Daniel Darabos] Correctly print hostname in error (cherry picked from commit 5687f76552369fa20b3a4385eab4810214653aa7) Signed-off-by: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
*	[SPARK-8903] Fix bug in cherry-pick of SPARK-8803	Josh Rosen	2015-07-08	1	-3/+3
\| \| \| \| \| \| \| \| \| \|	This fixes a bug introduced in the cherry-pick of #7201 which led to a NullPointerException when cross-tabulating a data set that contains null values. Author: Josh Rosen <joshrosen@databricks.com> Closes #7295 from JoshRosen/SPARK-8903 and squashes the following commits: 5489948 [Josh Rosen] [SPARK-8903] Fix bug in cherry-pick of SPARK-8803
*	[SPARK-8909][Documentation] Change the scala example in sql-programmi…	Alok Singh	2015-07-08	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	…ng-guide#Manually Specifying Options to be in sync with java,python, R version Author: Alok Singh <“singhal@us.ibm.com”> Closes #7299 from aloknsingh/aloknsingh_SPARK-8909 and squashes the following commits: d3c20ba [Alok Singh] fix the file to .parquet from .json d476140 [Alok Singh] [SPARK-8909][Documentation] Change the scala example in sql-programming-guide#Manually Specifying Options to be in sync with java,python, R version (cherry picked from commit 8f3cd93278337dc10b9dd3a344d6f7b51ba9960d) Signed-off-by: Reynold Xin <rxin@databricks.com>
*	[SPARK-8900] [SPARKR] Fix sparkPackages in init documentation	Shivaram Venkataraman	2015-07-08	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	cc pwendell Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu> Closes #7293 from shivaram/sparkr-packages-doc and squashes the following commits: c91471d [Shivaram Venkataraman] Fix sparkPackages in init documentation (cherry picked from commit 374c8a8a4a8ac4171d312a6c31080a6724e55c60) Signed-off-by: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
*	[HOTFIX] Fix style error introduced in e4313db38e81f6288f1704c22e17d0c6e81b4d75	Josh Rosen	2015-07-08	1	-2/+2
\|
*	[SPARK-8657] [YARN] [HOTFIX] Fail to upload resource to viewfs	Tao Li	2015-07-08	1	-53/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fail to upload resource to viewfs in spark-1.4 JIRA Link: https://issues.apache.org/jira/browse/SPARK-8657 Author: Tao Li <litao@sogou-inc.com> Closes #7125 from litao-buptsse/SPARK-8657-for-master and squashes the following commits: 65b13f4 [Tao Li] [SPARK-8657] [YARN] Fail to upload resource to viewfs (cherry picked from commit 26d9b6b8cae9ac6593f78ab98dd45a25d03cf71c) Signed-off-by: Sean Owen <sowen@cloudera.com>
*	[SPARK-8657] [YARN] Fail to upload resource to viewfs	Tao Li	2015-07-08	1	-0/+51
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fail to upload resource to viewfs in spark-1.4 JIRA Link: https://issues.apache.org/jira/browse/SPARK-8657 Author: Tao Li <litao@sogou-inc.com> Closes #7125 from litao-buptsse/SPARK-8657-for-master and squashes the following commits: 65b13f4 [Tao Li] [SPARK-8657] [YARN] Fail to upload resource to viewfs (cherry picked from commit 26d9b6b8cae9ac6593f78ab98dd45a25d03cf71c) Signed-off-by: Sean Owen <sowen@cloudera.com> # Conflicts: # yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala
*	[SPARK-8894] [SPARKR] [DOC] Example code errors in SparkR documentation.	Sun Rui	2015-07-08	2	-2/+2
\| \| \| \| \| \| \| \| \| \| \|	Author: Sun Rui <rui.sun@intel.com> Closes #7287 from sun-rui/SPARK-8894 and squashes the following commits: da63898 [Sun Rui] [SPARK-8894][SPARKR][DOC] Example code errors in SparkR documentation. (cherry picked from commit bf02e377168f39459d5c216e939097ae5705f573) Signed-off-by: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
*	[SPARK-8868] SqlSerializer2 can go into infinite loop when row consists only ↵	Yin Huai	2015-07-07	2	-6/+39
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	of NullType columns https://issues.apache.org/jira/browse/SPARK-8868 Author: Yin Huai <yhuai@databricks.com> Closes #7262 from yhuai/SPARK-8868 and squashes the following commits: cb58780 [Yin Huai] Andrew's comment. e456857 [Yin Huai] Josh's comments. 5122e65 [Yin Huai] If types of all columns are NullTypes, do not use serializer2. (cherry picked from commit 68a4a169714e11d8c537ad9431ae9974f6b7e8d3) Signed-off-by: Josh Rosen <joshrosen@databricks.com>
*	[SPARK-8821] [EC2] Switched to binary mode for file reading	Simon Hafner	2015-07-07	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Otherwise the script will crash with - Downloading boto... Traceback (most recent call last): File "ec2/spark_ec2.py", line 148, in <module> setup_external_libs(external_libs) File "ec2/spark_ec2.py", line 128, in setup_external_libs if hashlib.md5(tar.read()).hexdigest() != lib["md5"]: File "/usr/lib/python3.4/codecs.py", line 319, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte In case of an utf8 env setting. Author: Simon Hafner <hafnersimon@gmail.com> Closes #7215 from reactormonk/branch-1.4 and squashes the following commits: e86957a [Simon Hafner] [SPARK-8821] [EC2] Switched to binary mode
*	Preparing development version 1.4.2-SNAPSHOT	Patrick Wendell	2015-07-06	30	-30/+30
\|
*	Preparing Spark release v1.4.1-rc3	Patrick Wendell	2015-07-06	30	-30/+30
\|
*	[HOTFIX] Rename release-profile to release	Patrick Wendell	2015-07-06	2	-3/+3
\| \| \| \| \| \| \| \| \| \| \|	when publishing releases. We named it as 'release-profile' because that is the Maven convention. However, it turns out this special name causes several other things to kick-in when we are creating releases that are not desirable. For instance, it triggers the javadoc plugin to run, which actually fails in our current build set-up. The fix is just to rename this to a different profile to have no collateral damage associated with its use.
*	Preparing development version 1.4.2-SNAPSHOT	Patrick Wendell	2015-07-06	30	-30/+30
\|
*	Preparing Spark release v1.4.1-rc3	Patrick Wendell	2015-07-06	30	-30/+30
\|
*	Revert "[SPARK-8781] Fix variables in published pom.xml are not resolved"	Andrew Or	2015-07-06	2	-12/+15
\| \| \| \| \| \| \|	This reverts commit 82cf3315e690f4ac15b50edea6a3d673aa5be4c0. Conflicts: pom.xml
*	[SPARK-8819] Fix build for maven 3.3.x	Andrew Or	2015-07-06	2	-2/+26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is a workaround for MSHADE-148, which leads to an infinite loop when building Spark with maven 3.3.x. This was originally caused by #6441, which added a bunch of test dependencies on the spark-core test module. Recently, it was revealed by #7193. This patch adds a `-Prelease` profile. If present, it will set `createDependencyReducedPom` to true. The consequences are: - If you are releasing Spark with this profile, you are fine as long as you use maven 3.2.x or before. - If you are releasing Spark without this profile, you will run into SPARK-8781. - If you are not releasing Spark but you are using this profile, you may run into SPARK-8819. - If you are not releasing Spark and you did not include this profile, you are fine. This is all documented in `pom.xml` and tested locally with both versions of maven. Author: Andrew Or <andrew@databricks.com> Closes #7219 from andrewor14/fix-maven-build and squashes the following commits: 1d37e87 [Andrew Or] Merge branch 'master' of github.com:apache/spark into fix-maven-build 3574ae4 [Andrew Or] Review comments f39199c [Andrew Or] Create a -Prelease profile that flags `createDependencyReducedPom` (cherry picked from commit 9eae5fa642317dd11fc783d832d4cbb7e62db471) Signed-off-by: Andrew Or <andrew@databricks.com>
*	[SPARK-8463][SQL] Use DriverRegistry to load jdbc driver at writing path	Liang-Chi Hsieh	2015-07-06	1	-5/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	JIRA: https://issues.apache.org/jira/browse/SPARK-8463 Currently, at the reading path, `DriverRegistry` is used to load needed jdbc driver at executors. However, at the writing path, we also need `DriverRegistry` to load jdbc driver. Author: Liang-Chi Hsieh <viirya@gmail.com> Closes #6900 from viirya/jdbc_write_driver and squashes the following commits: 16cd04b [Liang-Chi Hsieh] Use DriverRegistry to load jdbc driver at writing path. (cherry picked from commit d4d6d31db5cc5c69ac369f754b7489f444c9ba2f) Signed-off-by: Reynold Xin <rxin@databricks.com>
*	Preparing development version 1.4.2-SNAPSHOT	Patrick Wendell	2015-07-02	30	-30/+30
\|
*	Preparing Spark release v1.4.1-rc2	Patrick Wendell	2015-07-02	30	-30/+30
\|
*	[SPARK-8803] handle special characters in elements in crosstab	Burak Yavuz	2015-07-02	4	-5/+50
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	cc rxin Having back ticks or null as elements causes problems. Since elements become column names, we have to drop them from the element as back ticks are special characters. Having null throws exceptions, we could replace them with empty strings. Handling back ticks should be improved for 1.5 Author: Burak Yavuz <brkyvz@gmail.com> Closes #7201 from brkyvz/weird-ct-elements and squashes the following commits: e06b840 [Burak Yavuz] fix scalastyle 93a0d3f [Burak Yavuz] added tests for NaN and Infinity 9dba6ce [Burak Yavuz] address cr1 db71dbd [Burak Yavuz] handle special characters in elements in crosstab (cherry picked from commit 9b23e92c727881ff9038b4fe9643c49b96914159) Signed-off-by: Reynold Xin <rxin@databricks.com> Conflicts: sql/core/src/main/scala/org/apache/spark/sql/execution/stat/StatFunctions.scala
*	[SPARK-8776] Increase the default MaxPermSize	Yin Huai	2015-07-02	3	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I am increasing the perm gen size to 256m. https://issues.apache.org/jira/browse/SPARK-8776 Author: Yin Huai <yhuai@databricks.com> Closes #7196 from yhuai/SPARK-8776 and squashes the following commits: 60901b4 [Yin Huai] Fix test. d44b713 [Yin Huai] Make sparkShell and hiveConsole use 256m PermGen size. 30aaf8e [Yin Huai] Increase the default PermGen size to 256m. (cherry picked from commit f743c79abe5a2fb66be32a896ea47e858569b0c7) Signed-off-by: Yin Huai <yhuai@databricks.com>
*	[SPARK-8501] [SQL] Avoids reading schema from empty ORC files (backport to 1.4)	Cheng Lian	2015-07-02	4	-48/+136
\| \| \| \| \| \| \| \| \| \| \|	This PR backports #7199 to branch-1.4 Author: Cheng Lian <lian@databricks.com> Closes #7200 from liancheng/spark-8501-for-1.4 and squashes the following commits: 725e9e3 [Cheng Lian] Addresses comments 0fa25af [Cheng Lian] Avoids reading schema from empty ORC files
*	fix string order for non-ascii character	Davies Liu	2015-07-02	2	-1/+7
\|
*	[SPARK-8781] Fix variables in published pom.xml are not resolved	Andrew Or	2015-07-02	3	-17/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The issue is summarized in the JIRA and is caused by this commit: 984ad60147c933f2d5a2040c87ae687c14eb1724. This patch reverts that commit and fixes the maven build in a different way. We limit the dependencies of `KinesisReceiverSuite` to avoid having to deal with the complexities in how maven deals with transitive test dependencies. Author: Andrew Or <andrew@databricks.com> Closes #7193 from andrewor14/fix-kinesis-pom and squashes the following commits: ca3d5d4 [Andrew Or] Limit kinesis test dependencies f24e09c [Andrew Or] Revert "[BUILD] Fix Maven build for Kinesis" (cherry picked from commit 82cf3315e690f4ac15b50edea6a3d673aa5be4c0) Signed-off-by: Andrew Or <andrew@databricks.com>
*	[SPARK-8746] [SQL] update download link for Hive 0.13.1	Christian Kadner	2015-07-02	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	updated the [Hive 0.13.1](https://archive.apache.org/dist/hive/hive-0.13.1) download link in `sql/README.md` Author: Christian Kadner <ckadner@us.ibm.com> Closes #7144 from ckadner/SPARK-8746 and squashes the following commits: 65d80f7 [Christian Kadner] [SPARK-8746][SQL] update download link for Hive 0.13.1 (cherry picked from commit 1bbdf9ead9e912f60dccbb23029b7de4948ebee3) Signed-off-by: Sean Owen <sowen@cloudera.com>
*	[SPARK-8787] [SQL] Changed parameter order of @deprecated in package object sql	Vinod K C	2015-07-02	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Parameter order of deprecated annotation in package object sql is wrong >>deprecated("1.3.0", "use DataFrame") . This has to be changed to deprecated("use DataFrame", "1.3.0") Author: Vinod K C <vinod.kc@huawei.com> Closes #7183 from vinodkc/fix_deprecated_param_order and squashes the following commits: 1cbdbe8 [Vinod K C] Modified the message 700911c [Vinod K C] Changed order of parameters (cherry picked from commit c572e25617f993c6b2e7d5f15f0fbf4426f89fab) Signed-off-by: Sean Owen <sowen@cloudera.com>
*	[DOCS] Fix minor wrong lambda expression example.	Kousuke Saruta	2015-07-02	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	It's a really minor issue but there is an example with wrong lambda-expression usage in `SQLContext.scala` like as follows. ``` sqlContext.udf().register("myUDF", (Integer arg1, String arg2) -> arg2 + arg1), <- We have an extra `)` here. DataTypes.StringType); ``` Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp> Closes #7187 from sarutak/fix-minor-wrong-lambda-expression and squashes the following commits: a13196d [Kousuke Saruta] Fixed minor wrong lambda expression example. (cherry picked from commit 41588365ad29408ccabd216b411e9c43f0053151) Signed-off-by: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
*	[SPARK-8687] [YARN] Fix bug: Executor can't fetch the new set configuration ↵	huangzhaowei	2015-07-01	1	-1/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	in yarn-client Spark initi the properties CoarseGrainedSchedulerBackend.start ```scala // TODO (prashant) send conf instead of properties driverEndpoint = rpcEnv.setupEndpoint( CoarseGrainedSchedulerBackend.ENDPOINT_NAME, new DriverEndpoint(rpcEnv, properties)) ``` Then the yarn logic will set some configuration but not update in this `properties`. So `Executor` won't gain the `properties`. [Jira](https://issues.apache.org/jira/browse/SPARK-8687) Author: huangzhaowei <carlmartinmax@gmail.com> Closes #7066 from SaintBacchus/SPARK-8687 and squashes the following commits: 1de4f48 [huangzhaowei] Ensure all necessary properties have already been set before startup ExecutorLaucher (cherry picked from commit 1b0c8e61040bf06213f9758f775679dcc41b0cce) Signed-off-by: Andrew Or <andrew@databricks.com>
*	[SPARK-8769] [TRIVIAL] [DOCS] toLocalIterator should mention it results in ↵	Holden Karau	2015-07-01	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \|	many jobs Author: Holden Karau <holden@pigscanfly.ca> Closes #7171 from holdenk/SPARK-8769-toLocalIterator-documentation-improvement and squashes the following commits: 97ddd99 [Holden Karau] Add note (cherry picked from commit 15d41cc501f5fa7ac82c4a6741e416bb557f610a) Signed-off-by: Andrew Or <andrew@databricks.com>
*	[SPARK-8754] [YARN] YarnClientSchedulerBackend doesn't stop gracefully in ↵	Devaraj K	2015-07-01	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	failure conditions In YarnClientSchedulerBackend.stop(), added a check for monitorThread. Author: Devaraj K <devaraj@apache.org> Closes #7153 from devaraj-kavali/master and squashes the following commits: 66be9ad [Devaraj K] https://issues.apache.org/jira/browse/SPARK-8754 YarnClientSchedulerBackend doesn't stop gracefully in failure conditions (cherry picked from commit 792fcd802c99a0aef2b67d54f0e6e58710e65956) Signed-off-by: Andrew Or <andrew@databricks.com>
*	[SPARK-8766] support non-ascii character in column names	Davies Liu	2015-07-01	3	-2/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Use UTF-8 to encode the name of column in Python 2, or it may failed to encode with default encoding ('ascii'). This PR also fix a bug when there is Java exception without error message. Author: Davies Liu <davies@databricks.com> Closes #7165 from davies/non_ascii and squashes the following commits: 02cb61a [Davies Liu] fix tests 3b09d31 [Davies Liu] add encoding in header 867754a [Davies Liu] support non-ascii character in column names (cherry picked from commit f958f27e2056f9e380373c2807d8bb5977ecf269) Signed-off-by: Davies Liu <davies@databricks.com> Conflicts: python/pyspark/sql/utils.py
*	[SPARK-3444] [CORE] Restore INFO level after log4j test.	Marcelo Vanzin	2015-07-01	1	-5/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Otherwise other tests don't log anything useful... Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #7140 from vanzin/SPARK-3444 and squashes the following commits: de14836 [Marcelo Vanzin] Better fix. 6cff13a [Marcelo Vanzin] [SPARK-3444] [core] Restore INFO level after log4j test. (cherry picked from commit 1ce6428907b4ddcf52dbf0c86196d82ab7392442) Signed-off-by: Sean Owen <sowen@cloudera.com>
*	[SPARK-7820] [BUILD] Fix Java8-tests suite compile and test error under sbt	jerryshao	2015-07-01	2	-2/+10
\| \| \| \| \| \| \| \| \| \| \|	Author: jerryshao <saisai.shao@intel.com> Closes #7120 from jerryshao/SPARK-7820 and squashes the following commits: 6902439 [jerryshao] fix Java8-tests suite compile error under sbt (cherry picked from commit 9f7db3486fcb403cae8da9dfce8978373c3f47b7) Signed-off-by: Josh Rosen <joshrosen@databricks.com>
*	[SPARK-8621] [SQL] support empty string as column name	Wenchen Fan	2015-07-01	2	-2/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	improve the empty check in `parseAttributeName` so that we can allow empty string as column name. Close https://github.com/apache/spark/pull/7117 Author: Wenchen Fan <cloud0fan@outlook.com> Closes #7149 from cloud-fan/8621 and squashes the following commits: efa9e3e [Wenchen Fan] support empty string (cherry picked from commit 31b4a3d7f2be9053a041e5ae67418562a93d80d8) Signed-off-by: Reynold Xin <rxin@databricks.com>
*	[SPARK-8535] [PYSPARK] PySpark : Can't create DataFrame from Pandas ↵	x1-	2015-06-30	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	dataframe with no explicit column name Because implicit name of `pandas.columns` are Int, but `StructField` json expect `String`. So I think `pandas.columns` are should be convert to `String`. ### issue * [SPARK-8535 PySpark : Can't create DataFrame from Pandas dataframe with no explicit column name](https://issues.apache.org/jira/browse/SPARK-8535) Author: x1- <viva008@gmail.com> Closes #7124 from x1-/SPARK-8535 and squashes the following commits: d68fd38 [x1-] modify unit-test using pandas. ea1897d [x1-] For implicit name of pandas.columns are Int, so should be convert to String. (cherry picked from commit b6e76edf3005c078b407f63b0a05d3a28c18c742) Signed-off-by: Davies Liu <davies@databricks.com>
*	[SPARK-8563] [MLLIB] Fixed a bug so that ↵	lee19	2015-06-30	2	-1/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	IndexedRowMatrix.computeSVD().U.numCols = k I'm sorry that I made https://github.com/apache/spark/pull/6949 closed by mistake. I pushed codes again. And, I added a test code. > There is a bug that `U.numCols() = self.nCols` in `IndexedRowMatrix.computeSVD()` It should have been `U.numCols() = k = svd.U.numCols()` > ``` self = U * sigma * V.transpose (m x n) = (m x n) * (k x k) * (k x n) //ASIS --> (m x n) = (m x k) * (k x k) * (k x n) //TOBE ``` Author: lee19 <lee19@live.co.kr> Closes #6953 from lee19/MLlibBugfix and squashes the following commits: c1812a0 [lee19] [SPARK-8563] [MLlib] Used nRows instead of numRows() to reduce a burden. 4b9803b [lee19] [SPARK-8563] [MLlib] Fixed a build error. c2ccd89 [lee19] Added a unit test that validates matrix sizes of svd for [SPARK-8563][MLlib] 8373424 [lee19] [SPARK-8563][MLlib] Fixed a bug so that IndexedRowMatrix.computeSVD().U.numCols = k (cherry picked from commit e72526227fdcf93b7a33375ef954746ac08753f5) Signed-off-by: Xiangrui Meng <meng@databricks.com>
*	[SPARK-8372] Do not show applications that haven't recorded their app ID yet.	Marcelo Vanzin	2015-06-30	2	-60/+147
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Showing these applications may lead to weird behavior in the History Server. For old logs, if the app ID is recorded later, you may end up with a duplicate entry. For new logs, the app might be listed with a ".inprogress" suffix. So ignore those, but still allow old applications that don't record app IDs at all (1.0 and 1.1) to be shown. Author: Marcelo Vanzin <vanzin@cloudera.com> Author: Carson Wang <carson.wang@intel.com> Closes #7097 from vanzin/SPARK-8372 and squashes the following commits: a24eab2 [Marcelo Vanzin] Feedback. 112ae8f [Marcelo Vanzin] Merge branch 'master' into SPARK-8372 7b91b74 [Marcelo Vanzin] Handle logs generated by 1.0 and 1.1. 1eca3fe [Carson Wang] [SPARK-8372] History server shows incorrect information for application not started Conflicts: core/src/test/scala/org/apache/spark/deploy/history/FsHistoryProviderSuite.scala
*	[SPARK-8736] [ML] GBTRegressor should not threshold prediction	Joseph K. Bradley	2015-06-30	2	-3/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Changed GBTRegressor so it does NOT threshold the prediction. Added test which fails with bug but works after fix. CC: feynmanliang mengxr Author: Joseph K. Bradley <joseph@databricks.com> Closes #7134 from jkbradley/gbrt-fix and squashes the following commits: 613b90e [Joseph K. Bradley] Changed GBTRegressor so it does NOT threshold the prediction (cherry picked from commit 3ba23ffd377d12383d923d1550ac8e2b916090fc) Signed-off-by: Xiangrui Meng <meng@databricks.com>
*	[SPARK-8628] [SQL] Race condition in AbstractSparkSQLParser.parse	Vinod K C	2015-06-30	2	-3/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Made lexical iniatialization as lazy val Author: Vinod K C <vinod.kc@huawei.com> Closes #7015 from vinodkc/handle_lexical_initialize_schronization and squashes the following commits: b6d1c74 [Vinod K C] Avoided repeated lexical initialization 5863cf7 [Vinod K C] Removed space e27c66c [Vinod K C] Avoid reinitialization of lexical in parse method ef4f60f [Vinod K C] Reverted import order e9fc49a [Vinod K C] handle synchronization in SqlLexical.initialize (cherry picked from commit b8e5bb6fc1553256e950fdad9cb5acc6b296816e) Signed-off-by: Michael Armbrust <michael@databricks.com>
*	[SPARK-8619] [STREAMING] Don't recover keytab and principal configuration ↵	huangzhaowei	2015-06-30	1	-2/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	within Streaming checkpoint [Client.scala](https://github.com/apache/spark/blob/master/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala#L786) will change these configurations, so this would cause the problem that the Streaming recover logic can't find the local keytab file(since configuration was changed) ```scala sparkConf.set("spark.yarn.keytab", keytabFileName) sparkConf.set("spark.yarn.principal", args.principal) ``` Problem described at [Jira](https://issues.apache.org/jira/browse/SPARK-8619) Author: huangzhaowei <carlmartinmax@gmail.com> Closes #7008 from SaintBacchus/SPARK-8619 and squashes the following commits: d50dbdf [huangzhaowei] Delect one blank space 9b8e92c [huangzhaowei] Fix code style and add a short comment. 0d8f800 [huangzhaowei] Don't recover keytab and principal configuration within Streaming checkpoint. (cherry picked from commit d16a9443750eebb7a3d7688d4b98a2ac39cc0da7) Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
*	[SPARK-8630] [STREAMING] Prevent from checkpointing QueueInputDStream	zsxwing	2015-06-30	5	-8/+56
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This PR throws an exception in `QueueInputDStream.writeObject` so that it can fail the application when calling `StreamingContext.start` rather than failing it during recovering QueueInputDStream. Author: zsxwing <zsxwing@gmail.com> Closes #7016 from zsxwing/queueStream-checkpoint and squashes the following commits: 89a3d73 [zsxwing] Fix JavaAPISuite.testQueueStream cc40fd7 [zsxwing] Prevent from checkpointing QueueInputDStream (cherry picked from commit 57264400ac7d9f9c59c387c252a9ed8d93fed4fa) Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
*	[SPARK-8679] [PYSPARK] [MLLIB] Default values in Pipeline API should be ↵	MechCoder	2015-06-30	2	-7/+21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	immutable It might be dangerous to have a mutable as value for default param. (http://stackoverflow.com/a/11416002/1170730) e.g def func(example, f={}): f[example] = 1 return f func(2) {2: 1} func(3) {2:1, 3:1} mengxr Author: MechCoder <manojkumarsivaraj334@gmail.com> Closes #7058 from MechCoder/pipeline_api_playground and squashes the following commits: 40a5eb2 [MechCoder] copy 95f7ff2 [MechCoder] [SPARK-8679] [PySpark] [MLlib] Default values in Pipeline API should be immutable (cherry picked from commit 5fa0863626aaf5a9a41756a0b1ec82bddccbf067) Signed-off-by: Xiangrui Meng <meng@databricks.com>
*	[SPARK-8437] [DOCS] Corrected: Using directory path without wildcard for ↵	Sean Owen	2015-06-30	1	-3/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	filename slow for large number of files with wholeTextFiles and binaryFiles Note that 'dir/' can be more efficient in some Hadoop FS implementations that 'dir/' (now fixed scaladoc by using HTML entity for ) Author: Sean Owen <sowen@cloudera.com> Closes #7126 from srowen/SPARK-8437.2 and squashes the following commits: 7bb45da [Sean Owen] Note that 'dir/' can be more efficient in some Hadoop FS implementations that 'dir/' (now fixed scaladoc by using HTML entity for ) (cherry picked from commit ada384b785c663392a0b69fad5bfe7a0a0584ee0) Signed-off-by: Andrew Or <andrew@databricks.com>
*	[SPARK-7756] [CORE] More robust SSL options processing.	Tim Ellison	2015-06-30	4	-23/+85
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Subset the enabled algorithms in an SSLOptions to the elements that are supported by the protocol provider. Update the list of ciphers in the sample config to include modern algorithms, and specify both Oracle and IBM names. In practice the user would either specify their own chosen cipher suites, or specify none, and delegate the decision to the provider. Author: Tim Ellison <t.p.ellison@gmail.com> Closes #7043 from tellison/SSLEnhancements and squashes the following commits: 034efa5 [Tim Ellison] Ensure Java imports are grouped and ordered by package. 3797f8b [Tim Ellison] Remove unnecessary use of Option to improve clarity, and fix import style ordering. 4b5c89f [Tim Ellison] More robust SSL options processing. (cherry picked from commit 2ed0c0ac4686ea779f98713978e37b97094edc1c) Signed-off-by: Sean Owen <sowen@cloudera.com>
*	[SPARK-8715] ArrayOutOfBoundsException fixed for DataFrameStatSuite.crosstab	Burak Yavuz	2015-06-29	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	cc yhuai Author: Burak Yavuz <brkyvz@gmail.com> Closes #7100 from brkyvz/ct-flakiness-fix and squashes the following commits: abc299a [Burak Yavuz] change 'to' to until 7e96d7c [Burak Yavuz] ArrayOutOfBoundsException fixed for DataFrameStatSuite.crosstab (cherry picked from commit ecacb1e88a135c802e253793e7c863d6ca8d2408) Signed-off-by: Yin Huai <yhuai@databricks.com>
*	Revert "[SPARK-8437] [DOCS] Using directory path without wildcard for ↵	Andrew Or	2015-06-29	1	-6/+2
\| \| \| \| \| \|	filename slow for large number of files with wholeTextFiles and binaryFiles" This reverts commit b2684557fa0d2ec14b7529324443c8154d81c348.
*	[SPARK-8410] [SPARK-8475] remove previous ivy resolution when using spark-submit	Burak Yavuz	2015-06-29	2	-17/+26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This PR also includes re-ordering the order that repositories are used when resolving packages. User provided repositories will be prioritized. cc andrewor14 Author: Burak Yavuz <brkyvz@gmail.com> Closes #7089 from brkyvz/delete-prev-ivy-resolution and squashes the following commits: a21f95a [Burak Yavuz] remove previous ivy resolution when using spark-submit (cherry picked from commit d7f796da45d9a7c76ee4c29a9e0661ef76d8028a) Signed-off-by: Andrew Or <andrew@databricks.com>
*	[SPARK-8437] [DOCS] Using directory path without wildcard for filename slow ↵	Sean Owen	2015-06-29	1	-2/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	for large number of files with wholeTextFiles and binaryFiles Note that 'dir/' can be more efficient in some Hadoop FS implementations that 'dir/' Author: Sean Owen <sowen@cloudera.com> Closes #7036 from srowen/SPARK-8437 and squashes the following commits: 0e813ae [Sean Owen] Note that 'dir/' can be more efficient in some Hadoop FS implementations that 'dir/' (cherry picked from commit 5d30eae56051c563a8427f330b09ef66db0a0d21) Signed-off-by: Andrew Or <andrew@databricks.com>