diff options
author | Bryan Cutler <cutlerb@gmail.com> | 2017-03-03 16:43:45 -0800 |
---|---|---|
committer | Joseph K. Bradley <joseph@databricks.com> | 2017-03-03 16:43:45 -0800 |
commit | 44281ca81d4eda02b627ba21841108438b7d1c27 (patch) | |
tree | 4125cfa2e8dd98e247ae7240d88f3845ce871734 /sql | |
parent | 2a7921a813ecd847fd933ffef10edc64684e9df7 (diff) | |
download | spark-44281ca81d4eda02b627ba21841108438b7d1c27.tar.gz spark-44281ca81d4eda02b627ba21841108438b7d1c27.tar.bz2 spark-44281ca81d4eda02b627ba21841108438b7d1c27.zip |
[SPARK-19348][PYTHON] PySpark keyword_only decorator is not thread-safe
## What changes were proposed in this pull request?
The `keyword_only` decorator in PySpark is not thread-safe. It writes kwargs to a static class variable in the decorator, which is then retrieved later in the class method as `_input_kwargs`. If multiple threads are constructing the same class with different kwargs, it becomes a race condition to read from the static class variable before it's overwritten. See [SPARK-19348](https://issues.apache.org/jira/browse/SPARK-19348) for reproduction code.
This change will write the kwargs to a member variable so that multiple threads can operate on separate instances without the race condition. It does not protect against multiple threads operating on a single instance, but that is better left to the user to synchronize.
## How was this patch tested?
Added new unit tests for using the keyword_only decorator and a regression test that verifies `_input_kwargs` can be overwritten from different class instances.
Author: Bryan Cutler <cutlerb@gmail.com>
Closes #16782 from BryanCutler/pyspark-keyword_only-threadsafe-SPARK-19348.
Diffstat (limited to 'sql')
0 files changed, 0 insertions, 0 deletions