aboutsummaryrefslogtreecommitdiff
path: root/python/pyspark/rdd.py
diff options
context:
space:
mode:
authorgatorsmile <gatorsmile@gmail.com>2016-02-18 21:19:36 -0800
committerReynold Xin <rxin@databricks.com>2016-02-18 21:19:36 -0800
commitc776fce99b496a789ffcf2cfab78cf51eeea032b (patch)
treeed84e65ca21ffcb1401383a810dda58af8dd384f /python/pyspark/rdd.py
parent95e1ab223e87fc216f3256d404fe3be50d111a9d (diff)
downloadspark-c776fce99b496a789ffcf2cfab78cf51eeea032b.tar.gz
spark-c776fce99b496a789ffcf2cfab78cf51eeea032b.tar.bz2
spark-c776fce99b496a789ffcf2cfab78cf51eeea032b.zip
[SPARK-13380][SQL][DOCUMENT] Document Rand(seed) and Randn(seed) Return Indeterministic Results When Data Partitions are not fixed.
`rand` and `randn` functions with a `seed` argument are commonly used. Based on the common sense, the results of `rand` and `randn` should be deterministic if the `seed` parameter value is provided. For example, in MS SQL Server, it also has a function `rand`. Regarding the parameter `seed`, the description is like: ```Seed is an integer expression (tinyint, smallint, or int) that gives the seed value. If seed is not specified, the SQL Server Database Engine assigns a seed value at random. For a specified seed value, the result returned is always the same.``` Update: the current implementation is unable to generate deterministic results when the partitions are not fixed. This PR documents this issue in the function descriptions. jkbradley hit an issue and provided an example in the following JIRA: https://issues.apache.org/jira/browse/SPARK-13333 Author: gatorsmile <gatorsmile@gmail.com> Closes #11232 from gatorsmile/randSeed.
Diffstat (limited to 'python/pyspark/rdd.py')
0 files changed, 0 insertions, 0 deletions