spark - Mirror of Apache Spark

	Commit message (Collapse)	Author	Age	Files	Lines
*	Preparing Spark release v1.2.0-rc2v1.2.0	Patrick Wendell	2014-12-10	4	-4/+4
\|
*	Revert "Preparing Spark release v1.2.0-rc2"	Patrick Wendell	2014-12-10	4	-4/+4
\| \| \| \|	This reverts commit 2b72c569a674cccf79ebbe8d067b8dbaaf78007f.
*	Revert "Preparing development version 1.2.1-SNAPSHOT"	Patrick Wendell	2014-12-10	4	-4/+4
\| \| \| \|	This reverts commit bc05df8a23ba7ad485f6844f28f96551b13ba461.
*	[SPARK-4785][SQL] Initilize Hive UDFs on the driver and serialize them with ↵	Cheng Hao	2014-12-09	5	-50/+173
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	a wrapper Different from Hive 0.12.0, in Hive 0.13.1 UDF/UDAF/UDTF (aka Hive function) objects should only be initialized once on the driver side and then serialized to executors. However, not all function objects are serializable (e.g. GenericUDF doesn't implement Serializable). Hive 0.13.1 solves this issue with Kryo or XML serializer. Several utility ser/de methods are provided in class o.a.h.h.q.e.Utilities for this purpose. In this PR we chose Kryo for efficiency. The Kryo serializer used here is created in Hive. Spark Kryo serializer wasn't used because there's no available SparkConf instance. Author: Cheng Hao <hao.cheng@intel.com> Author: Cheng Lian <lian@databricks.com> Closes #3640 from chenghao-intel/udf_serde and squashes the following commits: 8e13756 [Cheng Hao] Update the comment 74466a3 [Cheng Hao] refactor as feedbacks 396c0e1 [Cheng Hao] avoid Simple UDF to be serialized e9c3212 [Cheng Hao] update the comment 19cbd46 [Cheng Hao] support udf instance ser/de after initialization (cherry picked from commit 383c5555c9f26c080bc9e3a463aab21dd5b3797f) Signed-off-by: Michael Armbrust <michael@databricks.com>
*	[SPARK-4769] [SQL] CTAS does not work when reading from temporary tables	Cheng Hao	2014-12-08	4	-16/+49
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is the code refactor and follow ups for #2570 Author: Cheng Hao <hao.cheng@intel.com> Closes #3336 from chenghao-intel/createtbl and squashes the following commits: 3563142 [Cheng Hao] remove the unused variable e215187 [Cheng Hao] eliminate the compiling warning 4f97f14 [Cheng Hao] fix bug in unittest 5d58812 [Cheng Hao] revert the API changes b85b620 [Cheng Hao] fix the regression of temp tabl not found in CTAS (cherry picked from commit 51b1fe1426ffecac6c4644523633ea1562ff9a4e) Signed-off-by: Michael Armbrust <michael@databricks.com>
*	[SPARK-4761][SQL] Enables Kryo by default in Spark SQL Thrift server	Cheng Lian	2014-12-05	1	-2/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Enables Kryo and disables reference tracking by default in Spark SQL Thrift server. Configurations explicitly defined by users in `spark-defaults.conf` are respected (the Thrift server is started by `spark-submit`, which handles configuration properties properly). <!-- Reviewable:start --> [<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/3621) <!-- Reviewable:end --> Author: Cheng Lian <lian@databricks.com> Closes #3621 from liancheng/kryo-by-default and squashes the following commits: 70c2775 [Cheng Lian] Enables Kryo by default in Spark SQL Thrift server (cherry picked from commit 6f61e1f961826a6c9e98a66d10b271b7e3c7dd55) Signed-off-by: Patrick Wendell <pwendell@gmail.com>
*	[SPARK-4753][SQL] Use catalyst for partition pruning in newParquet.	Michael Armbrust	2014-12-04	1	-30/+28
\| \| \| \| \| \| \| \| \| \| \|	Author: Michael Armbrust <michael@databricks.com> Closes #3613 from marmbrus/parquetPartitionPruning and squashes the following commits: 4f138f8 [Michael Armbrust] Use catalyst for partition pruning in newParquet. (cherry picked from commit f5801e813f3c2573ebaf1af839341489ddd3ec78) Signed-off-by: Patrick Wendell <pwendell@gmail.com>
*	Preparing development version 1.2.1-SNAPSHOT	Patrick Wendell	2014-12-04	4	-4/+4
\|
*	Preparing Spark release v1.2.0-rc2	Patrick Wendell	2014-12-04	4	-4/+4
\|
*	Revert "Preparing Spark release v1.2.0-rc1"	Patrick Wendell	2014-12-04	4	-4/+4
\| \| \| \|	This reverts commit 1056e9ec13203d0c51564265e94d77a054498fdb.
*	Revert "Preparing development version 1.2.1-SNAPSHOT"	Patrick Wendell	2014-12-04	4	-4/+4
\| \| \| \|	This reverts commit 00316cc87983b844f6603f351a8f0b84fe1f6035.
*	[SQL] Minor: Avoid calling Seq#size in a loop	Aaron Davidson	2014-12-04	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \|	Just found this instance while doing some jstack-based profiling of a Spark SQL job. It is very unlikely that this is causing much of a perf issue anywhere, but it is unnecessarily suboptimal. Author: Aaron Davidson <aaron@databricks.com> Closes #3593 from aarondav/seq-opt and squashes the following commits: 962cdfc [Aaron Davidson] [SQL] Minor: Avoid calling Seq#size in a loop (cherry picked from commit c6c7165e7ecf1690027d6bd4e0620012cd0d2310) Signed-off-by: Reynold Xin <rxin@databricks.com>
*	[SPARK-4552][SQL] Avoid exception when reading empty parquet data through Hive	Michael Armbrust	2014-12-03	3	-45/+62
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is a very small fix that catches one specific exception and returns an empty table. #3441 will address this in a more principled way. Author: Michael Armbrust <michael@databricks.com> Closes #3586 from marmbrus/fixEmptyParquet and squashes the following commits: 2781d9f [Michael Armbrust] Handle empty lists for newParquet 04dd376 [Michael Armbrust] Avoid exception when reading empty parquet data through Hive (cherry picked from commit 513ef82e85661552e596d0b483b645ac24e86d4d) Signed-off-by: Michael Armbrust <michael@databricks.com>
*	[SPARK-4695][SQL] Get result using executeCollect	wangfei	2014-12-02	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Using ```executeCollect``` to collect the result, because executeCollect is a custom implementation of collect in spark sql which better than rdd's collect Author: wangfei <wangfei1@huawei.com> Closes #3547 from scwf/executeCollect and squashes the following commits: a5ab68e [wangfei] Revert "adding debug info" a60d680 [wangfei] fix test failure 0db7ce8 [wangfei] adding debug info 184c594 [wangfei] using executeCollect instead collect (cherry picked from commit 3ae0cda83c5106136e90d59c20e61db345a5085f) Signed-off-by: Michael Armbrust <michael@databricks.com>
*	[SPARK-4670] [SQL] wrong symbol for bitwise not	Daoyuan Wang	2014-12-02	2	-10/+25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We should use `~` instead of `-` for bitwise NOT. Author: Daoyuan Wang <daoyuan.wang@intel.com> Closes #3528 from adrian-wang/symbol and squashes the following commits: affd4ad [Daoyuan Wang] fix code gen test case 56efb79 [Daoyuan Wang] ensure bitwise NOT over byte and short persist data type f55fbae [Daoyuan Wang] wrong symbol for bitwise not (cherry picked from commit 1f5ddf17e831ad9717f0f4b60a727a3381fad4f9) Signed-off-by: Michael Armbrust <michael@databricks.com>
*	[SPARK-4593][SQL] Return null when denominator is 0	Daoyuan Wang	2014-12-02	4	-5/+83
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	SELECT max(1/0) FROM src would return a very large number, which is obviously not right. For hive-0.12, hive would return `Infinity` for 1/0, while for hive-0.13.1, it is `NULL` for 1/0. I think it is better to keep our behavior with newer Hive version. This PR ensures that when the divider is 0, the result of expression should be NULL, same with hive-0.13.1 Author: Daoyuan Wang <daoyuan.wang@intel.com> Closes #3443 from adrian-wang/div and squashes the following commits: 2e98677 [Daoyuan Wang] fix code gen for divide 0 85c28ba [Daoyuan Wang] temp 36236a5 [Daoyuan Wang] add test cases 6f5716f [Daoyuan Wang] fix comments cee92bd [Daoyuan Wang] avoid evaluation 2 times 22ecd9a [Daoyuan Wang] fix style cf28c58 [Daoyuan Wang] divide fix 2dfe50f [Daoyuan Wang] return null when divider is 0 of Double type (cherry picked from commit f6df609dcc4f4a18c0f1c74b1ae0800cf09fa7ae) Signed-off-by: Michael Armbrust <michael@databricks.com>
*	[SPARK-4676][SQL] JavaSchemaRDD.schema may throw NullType MatchError if sql ↵	YanTangZhai	2014-12-02	5	-0/+59
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	has null val jsc = new org.apache.spark.api.java.JavaSparkContext(sc) val jhc = new org.apache.spark.sql.hive.api.java.JavaHiveContext(jsc) val nrdd = jhc.hql("select null from spark_test.for_test") println(nrdd.schema) Then the error is thrown as follows: scala.MatchError: NullType (of class org.apache.spark.sql.catalyst.types.NullType$) at org.apache.spark.sql.types.util.DataTypeConversions$.asJavaDataType(DataTypeConversions.scala:43) Author: YanTangZhai <hakeemzhai@tencent.com> Author: yantangzhai <tyz0303@163.com> Author: Michael Armbrust <michael@databricks.com> Closes #3538 from YanTangZhai/MatchNullType and squashes the following commits: e052dff [yantangzhai] [SPARK-4676] [SQL] JavaSchemaRDD.schema may throw NullType MatchError if sql has null 4b4bb34 [yantangzhai] [SPARK-4676] [SQL] JavaSchemaRDD.schema may throw NullType MatchError if sql has null 896c7b7 [yantangzhai] fix NullType MatchError in JavaSchemaRDD when sql has null 6e643f8 [YanTangZhai] Merge pull request #11 from apache/master e249846 [YanTangZhai] Merge pull request #10 from apache/master d26d982 [YanTangZhai] Merge pull request #9 from apache/master 76d4027 [YanTangZhai] Merge pull request #8 from apache/master 03b62b0 [YanTangZhai] Merge pull request #7 from apache/master 8a00106 [YanTangZhai] Merge pull request #6 from apache/master cbcba66 [YanTangZhai] Merge pull request #3 from apache/master cdef539 [YanTangZhai] Merge pull request #1 from apache/master (cherry picked from commit 10664276007beca3843638e558f504cad44b1fb3) Signed-off-by: Michael Armbrust <michael@databricks.com>
*	[SPARK-4663][sql]add finally to avoid resource leak	baishuo	2014-12-02	1	-4/+7
\| \| \| \| \| \| \| \| \| \| \| \| \|	Author: baishuo <vc_java@hotmail.com> Closes #3526 from baishuo/master-trycatch and squashes the following commits: d446e14 [baishuo] correct the code style b36bf96 [baishuo] correct the code style ae0e447 [baishuo] add finally to avoid resource leak (cherry picked from commit 69b6fed206565ecb0173d3757bcb5110422887c3) Signed-off-by: Michael Armbrust <michael@databricks.com>
*	[SPARK-4536][SQL] Add sqrt and abs to Spark SQL DSL	Kousuke Saruta	2014-12-02	4	-1/+74
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Spark SQL has embeded sqrt and abs but DSL doesn't support those functions. Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp> Closes #3401 from sarutak/dsl-missing-operator and squashes the following commits: 07700cf [Kousuke Saruta] Modified Literal(null, NullType) to Literal(null) in DslQuerySuite 8f366f8 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into dsl-missing-operator 1b88e2e [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into dsl-missing-operator 0396f89 [Kousuke Saruta] Added sqrt and abs to Spark SQL DSL (cherry picked from commit e75e04f980281389b881df76f59ba1adc6338629) Signed-off-by: Michael Armbrust <michael@databricks.com>
*	[SPARK-4529] [SQL] support view with column alias	Daoyuan Wang	2014-12-01	2	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Support view definition like CREATE VIEW view3(valoo) TBLPROPERTIES ("fear" = "factor") AS SELECT upper(value) FROM src WHERE key=86; [valoo as the alias of upper(value)]. This is missing part of SPARK-4239, for a fully view support. Author: Daoyuan Wang <daoyuan.wang@intel.com> Closes #3396 from adrian-wang/viewcolumn and squashes the following commits: 4d001d0 [Daoyuan Wang] support view with column alias (cherry picked from commit 4df60a8cbc58f2877787245c2a83b2de85579c82) Signed-off-by: Michael Armbrust <michael@databricks.com>
*	[SQL] Minor fix for doc and comment	wangfei	2014-12-01	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \|	Author: wangfei <wangfei1@huawei.com> Closes #3533 from scwf/sql-doc1 and squashes the following commits: 962910b [wangfei] doc and comment fix (cherry picked from commit 7b79957879db4dfcc7c3601cb40ac4fd576259a5) Signed-off-by: Michael Armbrust <michael@databricks.com>
*	[SPARK-4658][SQL] Code documentation issue in DDL of datasource API	ravipesala	2014-12-01	2	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \|	Author: ravipesala <ravindra.pesala@huawei.com> Closes #3516 from ravipesala/ddl_doc and squashes the following commits: d101fdf [ravipesala] Style issues fixed d2238cd [ravipesala] Corrected documentation (cherry picked from commit bc353819cc86c3b0ad75caf81b47744bfc2aeeb3) Signed-off-by: Michael Armbrust <michael@databricks.com>
*	[SPARK-4650][SQL] Supporting multi column support in countDistinct function ↵	ravipesala	2014-12-01	2	-1/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	like count(distinct c1,c2..) in Spark SQL Supporting multi column support in countDistinct function like count(distinct c1,c2..) in Spark SQL Author: ravipesala <ravindra.pesala@huawei.com> Author: Michael Armbrust <michael@databricks.com> Closes #3511 from ravipesala/countdistinct and squashes the following commits: cc4dbb1 [ravipesala] style 070e12a [ravipesala] Supporting multi column support in count(distinct c1,c2..) in Spark SQL (cherry picked from commit 6a9ff19dc06745144d5b311d4f87073c81d53a8f) Signed-off-by: Michael Armbrust <michael@databricks.com>
*	[SPARK-4358][SQL] Let BigDecimal do checking type compatibility	Liang-Chi Hsieh	2014-12-01	1	-8/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Remove hardcoding max and min values for types. Let BigDecimal do checking type compatibility. Author: Liang-Chi Hsieh <viirya@gmail.com> Closes #3208 from viirya/more_numericLit and squashes the following commits: e9834b4 [Liang-Chi Hsieh] Remove byte and short types for number literal. 1bd1825 [Liang-Chi Hsieh] Fix Indentation and make the modification clearer. cf1a997 [Liang-Chi Hsieh] Modified for comment to add a rule of analysis that adds a cast. 91fe489 [Liang-Chi Hsieh] add Byte and Short. 1bdc69d [Liang-Chi Hsieh] Let BigDecimal do checking type compatibility. (cherry picked from commit b57365a1ec89e31470f424ff37d5ebc7c90a39d8) Signed-off-by: Michael Armbrust <michael@databricks.com>
*	[SQL] add @group tab in limit() and count()	Jacky Li	2014-12-01	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \|	group tab is missing for scaladoc Author: Jacky Li <jacky.likun@gmail.com> Closes #3458 from jackylk/patch-7 and squashes the following commits: 0121a70 [Jacky Li] add @group tab in limit() and count() (cherry picked from commit bafee67ebad01f7aea2cd393a70b57eb8345eeb0) Signed-off-by: Michael Armbrust <michael@databricks.com>
*	[SPARK-4661][Core] Minor code and docs cleanup	zsxwing	2014-12-01	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \|	Author: zsxwing <zsxwing@gmail.com> Closes #3521 from zsxwing/SPARK-4661 and squashes the following commits: 03cbe3f [zsxwing] Minor code and docs cleanup (cherry picked from commit 30a86acdefd5428af6d6264f59a037e0eefd74b4) Signed-off-by: Reynold Xin <rxin@databricks.com>
*	Preparing development version 1.2.1-SNAPSHOT	Patrick Wendell	2014-11-28	4	-4/+4
\|
*	Preparing Spark release v1.2.0-rc1	Patrick Wendell	2014-11-28	4	-4/+4
\|
*	Revert "Preparing Spark release v1.2.0-rc1"	Patrick Wendell	2014-11-28	4	-4/+4
\| \| \| \|	This reverts commit 39c7d1c1f9a7785285cf4c20dfbffd96f72d5634.
*	Revert "Preparing development version 1.2.1-SNAPSHOT"	Patrick Wendell	2014-11-28	4	-4/+4
\| \| \| \|	This reverts commit fc7bff00ac731d2632213a98cd92dc5e84ce7dcd.
*	Preparing development version 1.2.1-SNAPSHOT	Patrick Wendell	2014-11-28	4	-4/+4
\|
*	Preparing Spark release v1.2.0-rc1	Patrick Wendell	2014-11-28	4	-4/+4
\|
*	[SPARK-4645][SQL] Disables asynchronous execution in Hive 0.13.1 ↵	Cheng Lian	2014-11-28	1	-100/+39
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	HiveThriftServer2 This PR disables HiveThriftServer2 asynchronous execution by setting `runInBackground` argument in `ExecuteStatementOperation` to `false`, and reverting `SparkExecuteStatementOperation.run` in Hive 13 shim to Hive 12 version. This change makes Simba ODBC driver v1.0.0.1000 work. <!-- Reviewable:start --> [<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/3506) <!-- Reviewable:end --> Author: Cheng Lian <lian@databricks.com> Closes #3506 from liancheng/disable-async-exec and squashes the following commits: 593804d [Cheng Lian] Disables asynchronous execution in Hive 0.13.1 HiveThriftServer2
*	[SPARK-4308][SQL] Sets SQL operation state to ERROR when exception is thrown	Cheng Lian	2014-11-28	3	-29/+21
\| \| \| \| \| \| \| \| \| \|	In `HiveThriftServer2`, when an exception is thrown during a SQL execution, the SQL operation state should be set to `ERROR`, but now it remains `RUNNING`. This affects the result of the `GetOperationStatus` Thrift API. Author: Cheng Lian <lian@databricks.com> Closes #3175 from liancheng/fix-op-state and squashes the following commits: 6d4c1fe [Cheng Lian] Sets SQL operation state to ERROR when exception is thrown
*	Revert "Preparing Spark release v1.2.0-rc1"	Patrick Wendell	2014-11-26	4	-4/+4
\| \| \| \|	This reverts commit cc2c05e4ee81d2f34873a2ebb9a5272867cb65c2.
*	Revert "Preparing development version 1.2.1-SNAPSHOT"	Patrick Wendell	2014-11-26	4	-4/+4
\| \| \| \|	This reverts commit 380eba5f49eca1dbd4084e6c84e19866fffd4efa.
*	Preparing development version 1.2.1-SNAPSHOT	Patrick Wendell	2014-11-26	4	-4/+4
\|
*	Preparing Spark release v1.2.0-rc1	Patrick Wendell	2014-11-26	4	-4/+4
\|
*	Revert "Preparing Spark release v1.2.0-rc1"	Patrick Wendell	2014-11-26	4	-4/+4
\| \| \| \|	This reverts commit 5247dd859b95a440baa562b9827bdeb26aa6530e.
*	Revert "Preparing development version 1.2.1-SNAPSHOT"	Patrick Wendell	2014-11-26	4	-4/+4
\| \| \| \|	This reverts commit 79df6b43ae762263a8120f423ddb4a0811dd4b6f.
*	Preparing development version 1.2.1-SNAPSHOT	Patrick Wendell	2014-11-26	4	-4/+4
\|
*	Preparing Spark release v1.2.0-rc1	Patrick Wendell	2014-11-26	4	-4/+4
\|
*	Revert "Preparing Spark release v1.2.0-rc1"	Patrick Wendell	2014-11-26	4	-4/+4
\| \| \| \|	This reverts commit db7f4a898af22a02b36428507f8ef2b429d78dc1.
*	Revert "Preparing development version 1.2.1-SNAPSHOT"	Patrick Wendell	2014-11-26	4	-4/+4
\| \| \| \|	This reverts commit d7b1ecb25676d228deb6fe05efdb4e2ab9c3e30b.
*	Preparing development version 1.2.1-SNAPSHOT	Ubuntu	2014-11-26	4	-4/+4
\|
*	Preparing Spark release v1.2.0-rc1	Ubuntu	2014-11-26	4	-4/+4
\|
*	Revert "Preparing Spark release v1.2.0-snapshot1"	Patrick Wendell	2014-11-26	4	-4/+4
\| \| \| \|	This reverts commit 38c1fbd9694430cefd962c90bc36b0d108c6124b.
*	Revert "Preparing development version 1.2.1-SNAPSHOT"	Patrick Wendell	2014-11-26	4	-4/+4
\| \| \| \|	This reverts commit d7ac6013483e83caff8ea54c228f37aeca159db8.
*	[SQL] Compute timeTaken correctly	w00228970	2014-11-24	1	-7/+4
\| \| \| \| \| \| \| \| \| \| \| \| \|	```timeTaken``` should not count the time of printing result. Author: w00228970 <wangfei1@huawei.com> Closes #3423 from scwf/time-taken-bug and squashes the following commits: da7e102 [w00228970] compute time taken correctly (cherry picked from commit 723be60e233d0f85944d948efd06845ef546c9f5) Signed-off-by: Reynold Xin <rxin@databricks.com>
*	[SPARK-4548] []SPARK-4517] improve performance of python broadcast	Davies Liu	2014-11-24	2	-3/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Re-implement the Python broadcast using file: 1) serialize the python object using cPickle, write into disks. 2) Create a wrapper in JVM (for the dumped file), it read data from during serialization 3) Using TorrentBroadcast or HttpBroadcast to transfer the data (compressed) into executors 4) During deserialization, writing the data into disk. 5) Passing the path into Python worker, read data from disk and unpickle it into python object, until the first access. It fixes the performance regression introduced in #2659, has similar performance as 1.1, but support object larger than 2G, also improve the memory efficiency (only one compressed copy in driver and executor). Testing with a 500M broadcast and 4 tasks (excluding the benefit from reused worker in 1.2): name \| 1.1 \| 1.2 with this patch \| improvement ---------\|--------\|---------\|-------- python-broadcast-w-bytes \| 25.20 \| 9.33 \| 170.13% \| python-broadcast-w-set \| 4.13 \| 4.50 \| -8.35% \| Testing with 100 tasks (16 CPUs): name \| 1.1 \| 1.2 with this patch \| improvement ---------\|--------\|---------\|-------- python-broadcast-w-bytes \| 38.16 \| 8.40 \| 353.98% python-broadcast-w-set \| 23.29 \| 9.59 \| 142.80% Author: Davies Liu <davies@databricks.com> Closes #3417 from davies/pybroadcast and squashes the following commits: 50a58e0 [Davies Liu] address comments b98de1d [Davies Liu] disable gc while unpickle e5ee6b9 [Davies Liu] support large string 09303b8 [Davies Liu] read all data into memory dde02dd [Davies Liu] improve performance of python broadcast (cherry picked from commit 6cf507685efd01df77d663145ae08e48c7f92948) Signed-off-by: Josh Rosen <joshrosen@databricks.com>