diff options
author | Dongjoon Hyun <dongjoon@apache.org> | 2016-05-23 14:19:25 -0700 |
---|---|---|
committer | Michael Armbrust <michael@databricks.com> | 2016-05-23 14:19:25 -0700 |
commit | 37c617e4f580482b59e1abbe3c0c27c7125cf605 (patch) | |
tree | f6608e06c3732555e9ec3d2ca33464010cf7b7c5 /python/pyspark/sql | |
parent | 2585d2b322f3b6b85a0a12ddf7dcde957453000d (diff) | |
download | spark-37c617e4f580482b59e1abbe3c0c27c7125cf605.tar.gz spark-37c617e4f580482b59e1abbe3c0c27c7125cf605.tar.bz2 spark-37c617e4f580482b59e1abbe3c0c27c7125cf605.zip |
[MINOR][SQL][DOCS] Add notes of the deterministic assumption on UDF functions
## What changes were proposed in this pull request?
Spark assumes that UDF functions are deterministic. This PR adds explicit notes about that.
## How was this patch tested?
It's only about docs.
Author: Dongjoon Hyun <dongjoon@apache.org>
Closes #13087 from dongjoon-hyun/SPARK-15282.
Diffstat (limited to 'python/pyspark/sql')
-rw-r--r-- | python/pyspark/sql/functions.py | 3 |
1 files changed, 3 insertions, 0 deletions
diff --git a/python/pyspark/sql/functions.py b/python/pyspark/sql/functions.py index dac842c0ce..716b16fdc9 100644 --- a/python/pyspark/sql/functions.py +++ b/python/pyspark/sql/functions.py @@ -1756,6 +1756,9 @@ class UserDefinedFunction(object): @since(1.3) def udf(f, returnType=StringType()): """Creates a :class:`Column` expression representing a user defined function (UDF). + Note that the user-defined functions must be deterministic. Due to optimization, + duplicate invocations may be eliminated or the function may even be invoked more times than + it is present in the query. >>> from pyspark.sql.types import IntegerType >>> slen = udf(lambda s: len(s), IntegerType()) |