[SPARK-13380][SQL][DOCUMENT] Document Rand(seed) and Randn(seed) Return Indeterministic Results When Data Partitions are not fixed. - spark

diff options

author	gatorsmile <gatorsmile@gmail.com>	2016-02-18 21:19:36 -0800
committer	Reynold Xin <rxin@databricks.com>	2016-02-18 21:19:36 -0800
commit	c776fce99b496a789ffcf2cfab78cf51eeea032b (patch)
tree	ed84e65ca21ffcb1401383a810dda58af8dd384f /python/pyspark/rdd.py
parent	95e1ab223e87fc216f3256d404fe3be50d111a9d (diff)
download	spark-c776fce99b496a789ffcf2cfab78cf51eeea032b.tar.gz spark-c776fce99b496a789ffcf2cfab78cf51eeea032b.tar.bz2 spark-c776fce99b496a789ffcf2cfab78cf51eeea032b.zip

[SPARK-13380][SQL][DOCUMENT] Document Rand(seed) and Randn(seed) Return Indeterministic Results When Data Partitions are not fixed.

`rand` and `randn` functions with a `seed` argument are commonly used. Based on the common sense, the results of `rand` and `randn` should be deterministic if the `seed` parameter value is provided. For example, in MS SQL Server, it also has a function `rand`. Regarding the parameter `seed`, the description is like: ```Seed is an integer expression (tinyint, smallint, or int) that gives the seed value. If seed is not specified, the SQL Server Database Engine assigns a seed value at random. For a specified seed value, the result returned is always the same.``` Update: the current implementation is unable to generate deterministic results when the partitions are not fixed. This PR documents this issue in the function descriptions. jkbradley hit an issue and provided an example in the following JIRA: https://issues.apache.org/jira/browse/SPARK-13333 Author: gatorsmile <gatorsmile@gmail.com> Closes #11232 from gatorsmile/randSeed.

Diffstat (limited to 'python/pyspark/rdd.py')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: