aboutsummaryrefslogtreecommitdiff
path: root/scalastyle-config.xml
diff options
context:
space:
mode:
authorYuhao Yang <hhbyyh@gmail.com>2017-03-16 12:49:59 +0200
committerNick Pentreath <nickp@za.ibm.com>2017-03-16 12:49:59 +0200
commitd647aae278ef31a07fc64715eb07e48294d94bb8 (patch)
tree13570e50f38a430469158ff5305a67edf2d301d1 /scalastyle-config.xml
parent1472cac4bb31c1886f82830778d34c4dd9030d7a (diff)
downloadspark-d647aae278ef31a07fc64715eb07e48294d94bb8.tar.gz
spark-d647aae278ef31a07fc64715eb07e48294d94bb8.tar.bz2
spark-d647aae278ef31a07fc64715eb07e48294d94bb8.zip
[SPARK-13568][ML] Create feature transformer to impute missing values
## What changes were proposed in this pull request? jira: https://issues.apache.org/jira/browse/SPARK-13568 It is quite common to encounter missing values in data sets. It would be useful to implement a Transformer that can impute missing data points, similar to e.g. Imputer in scikit-learn. Initially, options for imputation could include mean, median and most frequent, but we could add various other approaches, where possible existing DataFrame code can be used (e.g. for approximate quantiles etc). Currently this PR supports imputation for Double and Vector (null and NaN in Vector). ## How was this patch tested? new unit tests and manual test Author: Yuhao Yang <hhbyyh@gmail.com> Author: Yuhao Yang <yuhao.yang@intel.com> Author: Yuhao <yuhao.yang@intel.com> Closes #11601 from hhbyyh/imputer.
Diffstat (limited to 'scalastyle-config.xml')
0 files changed, 0 insertions, 0 deletions