aboutsummaryrefslogtreecommitdiff
path: root/docs/running-on-yarn.md
diff options
context:
space:
mode:
authorCheng Hao <hao.cheng@intel.com>2014-12-09 10:28:15 -0800
committerMichael Armbrust <michael@databricks.com>2014-12-09 10:28:33 -0800
commit383c5555c9f26c080bc9e3a463aab21dd5b3797f (patch)
treedeecc2fce5cb6263415a53a0541110f259810265 /docs/running-on-yarn.md
parentbcb5cdad614d4fce43725dfec3ce88172d2f8c11 (diff)
downloadspark-383c5555c9f26c080bc9e3a463aab21dd5b3797f.tar.gz
spark-383c5555c9f26c080bc9e3a463aab21dd5b3797f.tar.bz2
spark-383c5555c9f26c080bc9e3a463aab21dd5b3797f.zip
[SPARK-4785][SQL] Initilize Hive UDFs on the driver and serialize them with a wrapper
Different from Hive 0.12.0, in Hive 0.13.1 UDF/UDAF/UDTF (aka Hive function) objects should only be initialized once on the driver side and then serialized to executors. However, not all function objects are serializable (e.g. GenericUDF doesn't implement Serializable). Hive 0.13.1 solves this issue with Kryo or XML serializer. Several utility ser/de methods are provided in class o.a.h.h.q.e.Utilities for this purpose. In this PR we chose Kryo for efficiency. The Kryo serializer used here is created in Hive. Spark Kryo serializer wasn't used because there's no available SparkConf instance. Author: Cheng Hao <hao.cheng@intel.com> Author: Cheng Lian <lian@databricks.com> Closes #3640 from chenghao-intel/udf_serde and squashes the following commits: 8e13756 [Cheng Hao] Update the comment 74466a3 [Cheng Hao] refactor as feedbacks 396c0e1 [Cheng Hao] avoid Simple UDF to be serialized e9c3212 [Cheng Hao] update the comment 19cbd46 [Cheng Hao] support udf instance ser/de after initialization
Diffstat (limited to 'docs/running-on-yarn.md')
0 files changed, 0 insertions, 0 deletions