| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With "+" the strings are separate expressions, and format() is called on the last string before concatenation. (So substitution does not happen.) Without "+" the string literals are merged first by the parser, so format() is called on the complete string.
Should I make a JIRA for this?
Author: Daniel Darabos <darabos.daniel@gmail.com>
Closes #7288 from darabos/patch-2 and squashes the following commits:
be0d3b7 [Daniel Darabos] Correctly print hostname in error
(cherry picked from commit 5687f76552369fa20b3a4385eab4810214653aa7)
Signed-off-by: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
|
|
|
|
|
|
|
|
|
|
| |
This fixes a bug introduced in the cherry-pick of #7201 which led to a NullPointerException when cross-tabulating a data set that contains null values.
Author: Josh Rosen <joshrosen@databricks.com>
Closes #7295 from JoshRosen/SPARK-8903 and squashes the following commits:
5489948 [Josh Rosen] [SPARK-8903] Fix bug in cherry-pick of SPARK-8803
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
…ng-guide#Manually Specifying Options to be in sync with java,python, R version
Author: Alok Singh <“singhal@us.ibm.com”>
Closes #7299 from aloknsingh/aloknsingh_SPARK-8909 and squashes the following commits:
d3c20ba [Alok Singh] fix the file to .parquet from .json
d476140 [Alok Singh] [SPARK-8909][Documentation] Change the scala example in sql-programming-guide#Manually Specifying Options to be in sync with java,python, R version
(cherry picked from commit 8f3cd93278337dc10b9dd3a344d6f7b51ba9960d)
Signed-off-by: Reynold Xin <rxin@databricks.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
cc pwendell
Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
Closes #7293 from shivaram/sparkr-packages-doc and squashes the following commits:
c91471d [Shivaram Venkataraman] Fix sparkPackages in init documentation
(cherry picked from commit 374c8a8a4a8ac4171d312a6c31080a6724e55c60)
Signed-off-by: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fail to upload resource to viewfs in spark-1.4
JIRA Link: https://issues.apache.org/jira/browse/SPARK-8657
Author: Tao Li <litao@sogou-inc.com>
Closes #7125 from litao-buptsse/SPARK-8657-for-master and squashes the following commits:
65b13f4 [Tao Li] [SPARK-8657] [YARN] Fail to upload resource to viewfs
(cherry picked from commit 26d9b6b8cae9ac6593f78ab98dd45a25d03cf71c)
Signed-off-by: Sean Owen <sowen@cloudera.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fail to upload resource to viewfs in spark-1.4
JIRA Link: https://issues.apache.org/jira/browse/SPARK-8657
Author: Tao Li <litao@sogou-inc.com>
Closes #7125 from litao-buptsse/SPARK-8657-for-master and squashes the following commits:
65b13f4 [Tao Li] [SPARK-8657] [YARN] Fail to upload resource to viewfs
(cherry picked from commit 26d9b6b8cae9ac6593f78ab98dd45a25d03cf71c)
Signed-off-by: Sean Owen <sowen@cloudera.com>
# Conflicts:
# yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala
|
|
|
|
|
|
|
|
|
|
|
| |
Author: Sun Rui <rui.sun@intel.com>
Closes #7287 from sun-rui/SPARK-8894 and squashes the following commits:
da63898 [Sun Rui] [SPARK-8894][SPARKR][DOC] Example code errors in SparkR documentation.
(cherry picked from commit bf02e377168f39459d5c216e939097ae5705f573)
Signed-off-by: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
of NullType columns
https://issues.apache.org/jira/browse/SPARK-8868
Author: Yin Huai <yhuai@databricks.com>
Closes #7262 from yhuai/SPARK-8868 and squashes the following commits:
cb58780 [Yin Huai] Andrew's comment.
e456857 [Yin Huai] Josh's comments.
5122e65 [Yin Huai] If types of all columns are NullTypes, do not use serializer2.
(cherry picked from commit 68a4a169714e11d8c537ad9431ae9974f6b7e8d3)
Signed-off-by: Josh Rosen <joshrosen@databricks.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Otherwise the script will crash with
- Downloading boto...
Traceback (most recent call last):
File "ec2/spark_ec2.py", line 148, in <module>
setup_external_libs(external_libs)
File "ec2/spark_ec2.py", line 128, in setup_external_libs
if hashlib.md5(tar.read()).hexdigest() != lib["md5"]:
File "/usr/lib/python3.4/codecs.py", line 319, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte
In case of an utf8 env setting.
Author: Simon Hafner <hafnersimon@gmail.com>
Closes #7215 from reactormonk/branch-1.4 and squashes the following commits:
e86957a [Simon Hafner] [SPARK-8821] [EC2] Switched to binary mode
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
when publishing releases. We named it as 'release-profile' because that is
the Maven convention. However, it turns out this special name causes several
other things to kick-in when we are creating releases that are not desirable.
For instance, it triggers the javadoc plugin to run, which actually fails
in our current build set-up.
The fix is just to rename this to a different profile to have no
collateral damage associated with its use.
|
| |
|
| |
|
|
|
|
|
|
|
| |
This reverts commit 82cf3315e690f4ac15b50edea6a3d673aa5be4c0.
Conflicts:
pom.xml
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is a workaround for MSHADE-148, which leads to an infinite loop when building Spark with maven 3.3.x. This was originally caused by #6441, which added a bunch of test dependencies on the spark-core test module. Recently, it was revealed by #7193.
This patch adds a `-Prelease` profile. If present, it will set `createDependencyReducedPom` to true. The consequences are:
- If you are releasing Spark with this profile, you are fine as long as you use maven 3.2.x or before.
- If you are releasing Spark without this profile, you will run into SPARK-8781.
- If you are not releasing Spark but you are using this profile, you may run into SPARK-8819.
- If you are not releasing Spark and you did not include this profile, you are fine.
This is all documented in `pom.xml` and tested locally with both versions of maven.
Author: Andrew Or <andrew@databricks.com>
Closes #7219 from andrewor14/fix-maven-build and squashes the following commits:
1d37e87 [Andrew Or] Merge branch 'master' of github.com:apache/spark into fix-maven-build
3574ae4 [Andrew Or] Review comments
f39199c [Andrew Or] Create a -Prelease profile that flags `createDependencyReducedPom`
(cherry picked from commit 9eae5fa642317dd11fc783d832d4cbb7e62db471)
Signed-off-by: Andrew Or <andrew@databricks.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
JIRA: https://issues.apache.org/jira/browse/SPARK-8463
Currently, at the reading path, `DriverRegistry` is used to load needed jdbc driver at executors. However, at the writing path, we also need `DriverRegistry` to load jdbc driver.
Author: Liang-Chi Hsieh <viirya@gmail.com>
Closes #6900 from viirya/jdbc_write_driver and squashes the following commits:
16cd04b [Liang-Chi Hsieh] Use DriverRegistry to load jdbc driver at writing path.
(cherry picked from commit d4d6d31db5cc5c69ac369f754b7489f444c9ba2f)
Signed-off-by: Reynold Xin <rxin@databricks.com>
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
cc rxin
Having back ticks or null as elements causes problems.
Since elements become column names, we have to drop them from the element as back ticks are special characters.
Having null throws exceptions, we could replace them with empty strings.
Handling back ticks should be improved for 1.5
Author: Burak Yavuz <brkyvz@gmail.com>
Closes #7201 from brkyvz/weird-ct-elements and squashes the following commits:
e06b840 [Burak Yavuz] fix scalastyle
93a0d3f [Burak Yavuz] added tests for NaN and Infinity
9dba6ce [Burak Yavuz] address cr1
db71dbd [Burak Yavuz] handle special characters in elements in crosstab
(cherry picked from commit 9b23e92c727881ff9038b4fe9643c49b96914159)
Signed-off-by: Reynold Xin <rxin@databricks.com>
Conflicts:
sql/core/src/main/scala/org/apache/spark/sql/execution/stat/StatFunctions.scala
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I am increasing the perm gen size to 256m.
https://issues.apache.org/jira/browse/SPARK-8776
Author: Yin Huai <yhuai@databricks.com>
Closes #7196 from yhuai/SPARK-8776 and squashes the following commits:
60901b4 [Yin Huai] Fix test.
d44b713 [Yin Huai] Make sparkShell and hiveConsole use 256m PermGen size.
30aaf8e [Yin Huai] Increase the default PermGen size to 256m.
(cherry picked from commit f743c79abe5a2fb66be32a896ea47e858569b0c7)
Signed-off-by: Yin Huai <yhuai@databricks.com>
|
|
|
|
|
|
|
|
|
|
|
| |
This PR backports #7199 to branch-1.4
Author: Cheng Lian <lian@databricks.com>
Closes #7200 from liancheng/spark-8501-for-1.4 and squashes the following commits:
725e9e3 [Cheng Lian] Addresses comments
0fa25af [Cheng Lian] Avoids reading schema from empty ORC files
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The issue is summarized in the JIRA and is caused by this commit: 984ad60147c933f2d5a2040c87ae687c14eb1724.
This patch reverts that commit and fixes the maven build in a different way. We limit the dependencies of `KinesisReceiverSuite` to avoid having to deal with the complexities in how maven deals with transitive test dependencies.
Author: Andrew Or <andrew@databricks.com>
Closes #7193 from andrewor14/fix-kinesis-pom and squashes the following commits:
ca3d5d4 [Andrew Or] Limit kinesis test dependencies
f24e09c [Andrew Or] Revert "[BUILD] Fix Maven build for Kinesis"
(cherry picked from commit 82cf3315e690f4ac15b50edea6a3d673aa5be4c0)
Signed-off-by: Andrew Or <andrew@databricks.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
updated the [Hive 0.13.1](https://archive.apache.org/dist/hive/hive-0.13.1) download link in `sql/README.md`
Author: Christian Kadner <ckadner@us.ibm.com>
Closes #7144 from ckadner/SPARK-8746 and squashes the following commits:
65d80f7 [Christian Kadner] [SPARK-8746][SQL] update download link for Hive 0.13.1
(cherry picked from commit 1bbdf9ead9e912f60dccbb23029b7de4948ebee3)
Signed-off-by: Sean Owen <sowen@cloudera.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Parameter order of deprecated annotation in package object sql is wrong
>>deprecated("1.3.0", "use DataFrame") .
This has to be changed to deprecated("use DataFrame", "1.3.0")
Author: Vinod K C <vinod.kc@huawei.com>
Closes #7183 from vinodkc/fix_deprecated_param_order and squashes the following commits:
1cbdbe8 [Vinod K C] Modified the message
700911c [Vinod K C] Changed order of parameters
(cherry picked from commit c572e25617f993c6b2e7d5f15f0fbf4426f89fab)
Signed-off-by: Sean Owen <sowen@cloudera.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It's a really minor issue but there is an example with wrong lambda-expression usage in `SQLContext.scala` like as follows.
```
sqlContext.udf().register("myUDF",
(Integer arg1, String arg2) -> arg2 + arg1), <- We have an extra `)` here.
DataTypes.StringType);
```
Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
Closes #7187 from sarutak/fix-minor-wrong-lambda-expression and squashes the following commits:
a13196d [Kousuke Saruta] Fixed minor wrong lambda expression example.
(cherry picked from commit 41588365ad29408ccabd216b411e9c43f0053151)
Signed-off-by: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
in yarn-client
Spark initi the properties CoarseGrainedSchedulerBackend.start
```scala
// TODO (prashant) send conf instead of properties
driverEndpoint = rpcEnv.setupEndpoint(
CoarseGrainedSchedulerBackend.ENDPOINT_NAME, new DriverEndpoint(rpcEnv, properties))
```
Then the yarn logic will set some configuration but not update in this `properties`.
So `Executor` won't gain the `properties`.
[Jira](https://issues.apache.org/jira/browse/SPARK-8687)
Author: huangzhaowei <carlmartinmax@gmail.com>
Closes #7066 from SaintBacchus/SPARK-8687 and squashes the following commits:
1de4f48 [huangzhaowei] Ensure all necessary properties have already been set before startup ExecutorLaucher
(cherry picked from commit 1b0c8e61040bf06213f9758f775679dcc41b0cce)
Signed-off-by: Andrew Or <andrew@databricks.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
many jobs
Author: Holden Karau <holden@pigscanfly.ca>
Closes #7171 from holdenk/SPARK-8769-toLocalIterator-documentation-improvement and squashes the following commits:
97ddd99 [Holden Karau] Add note
(cherry picked from commit 15d41cc501f5fa7ac82c4a6741e416bb557f610a)
Signed-off-by: Andrew Or <andrew@databricks.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
failure conditions
In YarnClientSchedulerBackend.stop(), added a check for monitorThread.
Author: Devaraj K <devaraj@apache.org>
Closes #7153 from devaraj-kavali/master and squashes the following commits:
66be9ad [Devaraj K] https://issues.apache.org/jira/browse/SPARK-8754 YarnClientSchedulerBackend doesn't stop gracefully in failure conditions
(cherry picked from commit 792fcd802c99a0aef2b67d54f0e6e58710e65956)
Signed-off-by: Andrew Or <andrew@databricks.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Use UTF-8 to encode the name of column in Python 2, or it may failed to encode with default encoding ('ascii').
This PR also fix a bug when there is Java exception without error message.
Author: Davies Liu <davies@databricks.com>
Closes #7165 from davies/non_ascii and squashes the following commits:
02cb61a [Davies Liu] fix tests
3b09d31 [Davies Liu] add encoding in header
867754a [Davies Liu] support non-ascii character in column names
(cherry picked from commit f958f27e2056f9e380373c2807d8bb5977ecf269)
Signed-off-by: Davies Liu <davies@databricks.com>
Conflicts:
python/pyspark/sql/utils.py
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Otherwise other tests don't log anything useful...
Author: Marcelo Vanzin <vanzin@cloudera.com>
Closes #7140 from vanzin/SPARK-3444 and squashes the following commits:
de14836 [Marcelo Vanzin] Better fix.
6cff13a [Marcelo Vanzin] [SPARK-3444] [core] Restore INFO level after log4j test.
(cherry picked from commit 1ce6428907b4ddcf52dbf0c86196d82ab7392442)
Signed-off-by: Sean Owen <sowen@cloudera.com>
|
|
|
|
|
|
|
|
|
|
|
| |
Author: jerryshao <saisai.shao@intel.com>
Closes #7120 from jerryshao/SPARK-7820 and squashes the following commits:
6902439 [jerryshao] fix Java8-tests suite compile error under sbt
(cherry picked from commit 9f7db3486fcb403cae8da9dfce8978373c3f47b7)
Signed-off-by: Josh Rosen <joshrosen@databricks.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
improve the empty check in `parseAttributeName` so that we can allow empty string as column name.
Close https://github.com/apache/spark/pull/7117
Author: Wenchen Fan <cloud0fan@outlook.com>
Closes #7149 from cloud-fan/8621 and squashes the following commits:
efa9e3e [Wenchen Fan] support empty string
(cherry picked from commit 31b4a3d7f2be9053a041e5ae67418562a93d80d8)
Signed-off-by: Reynold Xin <rxin@databricks.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
dataframe with no explicit column name
Because implicit name of `pandas.columns` are Int, but `StructField` json expect `String`.
So I think `pandas.columns` are should be convert to `String`.
### issue
* [SPARK-8535 PySpark : Can't create DataFrame from Pandas dataframe with no explicit column name](https://issues.apache.org/jira/browse/SPARK-8535)
Author: x1- <viva008@gmail.com>
Closes #7124 from x1-/SPARK-8535 and squashes the following commits:
d68fd38 [x1-] modify unit-test using pandas.
ea1897d [x1-] For implicit name of pandas.columns are Int, so should be convert to String.
(cherry picked from commit b6e76edf3005c078b407f63b0a05d3a28c18c742)
Signed-off-by: Davies Liu <davies@databricks.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
IndexedRowMatrix.computeSVD().U.numCols = k
I'm sorry that I made https://github.com/apache/spark/pull/6949 closed by mistake.
I pushed codes again.
And, I added a test code.
>
There is a bug that `U.numCols() = self.nCols` in `IndexedRowMatrix.computeSVD()`
It should have been `U.numCols() = k = svd.U.numCols()`
>
```
self = U * sigma * V.transpose
(m x n) = (m x n) * (k x k) * (k x n) //ASIS
-->
(m x n) = (m x k) * (k x k) * (k x n) //TOBE
```
Author: lee19 <lee19@live.co.kr>
Closes #6953 from lee19/MLlibBugfix and squashes the following commits:
c1812a0 [lee19] [SPARK-8563] [MLlib] Used nRows instead of numRows() to reduce a burden.
4b9803b [lee19] [SPARK-8563] [MLlib] Fixed a build error.
c2ccd89 [lee19] Added a unit test that validates matrix sizes of svd for [SPARK-8563][MLlib]
8373424 [lee19] [SPARK-8563][MLlib] Fixed a bug so that IndexedRowMatrix.computeSVD().U.numCols = k
(cherry picked from commit e72526227fdcf93b7a33375ef954746ac08753f5)
Signed-off-by: Xiangrui Meng <meng@databricks.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Showing these applications may lead to weird behavior in the History Server. For old logs, if
the app ID is recorded later, you may end up with a duplicate entry. For new logs, the app might
be listed with a ".inprogress" suffix.
So ignore those, but still allow old applications that don't record app IDs at all (1.0 and 1.1) to be shown.
Author: Marcelo Vanzin <vanzin@cloudera.com>
Author: Carson Wang <carson.wang@intel.com>
Closes #7097 from vanzin/SPARK-8372 and squashes the following commits:
a24eab2 [Marcelo Vanzin] Feedback.
112ae8f [Marcelo Vanzin] Merge branch 'master' into SPARK-8372
7b91b74 [Marcelo Vanzin] Handle logs generated by 1.0 and 1.1.
1eca3fe [Carson Wang] [SPARK-8372] History server shows incorrect information for application not started
Conflicts:
core/src/test/scala/org/apache/spark/deploy/history/FsHistoryProviderSuite.scala
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Changed GBTRegressor so it does NOT threshold the prediction. Added test which fails with bug but works after fix.
CC: feynmanliang mengxr
Author: Joseph K. Bradley <joseph@databricks.com>
Closes #7134 from jkbradley/gbrt-fix and squashes the following commits:
613b90e [Joseph K. Bradley] Changed GBTRegressor so it does NOT threshold the prediction
(cherry picked from commit 3ba23ffd377d12383d923d1550ac8e2b916090fc)
Signed-off-by: Xiangrui Meng <meng@databricks.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Made lexical iniatialization as lazy val
Author: Vinod K C <vinod.kc@huawei.com>
Closes #7015 from vinodkc/handle_lexical_initialize_schronization and squashes the following commits:
b6d1c74 [Vinod K C] Avoided repeated lexical initialization
5863cf7 [Vinod K C] Removed space
e27c66c [Vinod K C] Avoid reinitialization of lexical in parse method
ef4f60f [Vinod K C] Reverted import order
e9fc49a [Vinod K C] handle synchronization in SqlLexical.initialize
(cherry picked from commit b8e5bb6fc1553256e950fdad9cb5acc6b296816e)
Signed-off-by: Michael Armbrust <michael@databricks.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
within Streaming checkpoint
[Client.scala](https://github.com/apache/spark/blob/master/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala#L786) will change these configurations, so this would cause the problem that the Streaming recover logic can't find the local keytab file(since configuration was changed)
```scala
sparkConf.set("spark.yarn.keytab", keytabFileName)
sparkConf.set("spark.yarn.principal", args.principal)
```
Problem described at [Jira](https://issues.apache.org/jira/browse/SPARK-8619)
Author: huangzhaowei <carlmartinmax@gmail.com>
Closes #7008 from SaintBacchus/SPARK-8619 and squashes the following commits:
d50dbdf [huangzhaowei] Delect one blank space
9b8e92c [huangzhaowei] Fix code style and add a short comment.
0d8f800 [huangzhaowei] Don't recover keytab and principal configuration within Streaming checkpoint.
(cherry picked from commit d16a9443750eebb7a3d7688d4b98a2ac39cc0da7)
Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This PR throws an exception in `QueueInputDStream.writeObject` so that it can fail the application when calling `StreamingContext.start` rather than failing it during recovering QueueInputDStream.
Author: zsxwing <zsxwing@gmail.com>
Closes #7016 from zsxwing/queueStream-checkpoint and squashes the following commits:
89a3d73 [zsxwing] Fix JavaAPISuite.testQueueStream
cc40fd7 [zsxwing] Prevent from checkpointing QueueInputDStream
(cherry picked from commit 57264400ac7d9f9c59c387c252a9ed8d93fed4fa)
Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
immutable
It might be dangerous to have a mutable as value for default param. (http://stackoverflow.com/a/11416002/1170730)
e.g
def func(example, f={}):
f[example] = 1
return f
func(2)
{2: 1}
func(3)
{2:1, 3:1}
mengxr
Author: MechCoder <manojkumarsivaraj334@gmail.com>
Closes #7058 from MechCoder/pipeline_api_playground and squashes the following commits:
40a5eb2 [MechCoder] copy
95f7ff2 [MechCoder] [SPARK-8679] [PySpark] [MLlib] Default values in Pipeline API should be immutable
(cherry picked from commit 5fa0863626aaf5a9a41756a0b1ec82bddccbf067)
Signed-off-by: Xiangrui Meng <meng@databricks.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
filename slow for large number of files with wholeTextFiles and binaryFiles
Note that 'dir/*' can be more efficient in some Hadoop FS implementations that 'dir/' (now fixed scaladoc by using HTML entity for *)
Author: Sean Owen <sowen@cloudera.com>
Closes #7126 from srowen/SPARK-8437.2 and squashes the following commits:
7bb45da [Sean Owen] Note that 'dir/*' can be more efficient in some Hadoop FS implementations that 'dir/' (now fixed scaladoc by using HTML entity for *)
(cherry picked from commit ada384b785c663392a0b69fad5bfe7a0a0584ee0)
Signed-off-by: Andrew Or <andrew@databricks.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Subset the enabled algorithms in an SSLOptions to the elements that are supported by the protocol provider.
Update the list of ciphers in the sample config to include modern algorithms, and specify both Oracle and IBM names. In practice the user would either specify their own chosen cipher suites, or specify none, and delegate the decision to the provider.
Author: Tim Ellison <t.p.ellison@gmail.com>
Closes #7043 from tellison/SSLEnhancements and squashes the following commits:
034efa5 [Tim Ellison] Ensure Java imports are grouped and ordered by package.
3797f8b [Tim Ellison] Remove unnecessary use of Option to improve clarity, and fix import style ordering.
4b5c89f [Tim Ellison] More robust SSL options processing.
(cherry picked from commit 2ed0c0ac4686ea779f98713978e37b97094edc1c)
Signed-off-by: Sean Owen <sowen@cloudera.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
cc yhuai
Author: Burak Yavuz <brkyvz@gmail.com>
Closes #7100 from brkyvz/ct-flakiness-fix and squashes the following commits:
abc299a [Burak Yavuz] change 'to' to until
7e96d7c [Burak Yavuz] ArrayOutOfBoundsException fixed for DataFrameStatSuite.crosstab
(cherry picked from commit ecacb1e88a135c802e253793e7c863d6ca8d2408)
Signed-off-by: Yin Huai <yhuai@databricks.com>
|
|
|
|
|
|
| |
filename slow for large number of files with wholeTextFiles and binaryFiles"
This reverts commit b2684557fa0d2ec14b7529324443c8154d81c348.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This PR also includes re-ordering the order that repositories are used when resolving packages. User provided repositories will be prioritized.
cc andrewor14
Author: Burak Yavuz <brkyvz@gmail.com>
Closes #7089 from brkyvz/delete-prev-ivy-resolution and squashes the following commits:
a21f95a [Burak Yavuz] remove previous ivy resolution when using spark-submit
(cherry picked from commit d7f796da45d9a7c76ee4c29a9e0661ef76d8028a)
Signed-off-by: Andrew Or <andrew@databricks.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
for large number of files with wholeTextFiles and binaryFiles
Note that 'dir/*' can be more efficient in some Hadoop FS implementations that 'dir/'
Author: Sean Owen <sowen@cloudera.com>
Closes #7036 from srowen/SPARK-8437 and squashes the following commits:
0e813ae [Sean Owen] Note that 'dir/*' can be more efficient in some Hadoop FS implementations that 'dir/'
(cherry picked from commit 5d30eae56051c563a8427f330b09ef66db0a0d21)
Signed-off-by: Andrew Or <andrew@databricks.com>
|