[SPARK-16356][FOLLOW-UP][ML] Enforce ML test of exception for local/distributed Dataset. - spark

diff options

author	Yanbo Liang <ybliang8@gmail.com>	2016-09-29 00:54:26 -0700
committer	Yanbo Liang <ybliang8@gmail.com>	2016-09-29 00:54:26 -0700
commit	a19a1bb59411177caaf99581e89098826b7d0c7b (patch)
tree	649a504d904cce2f0783def6e0114ab68a9e1024 /docs/streaming-kafka-0-10-integration.md
parent	37eb9184f1e9f1c07142c66936671f4711ef407d (diff)
download	spark-a19a1bb59411177caaf99581e89098826b7d0c7b.tar.gz spark-a19a1bb59411177caaf99581e89098826b7d0c7b.tar.bz2 spark-a19a1bb59411177caaf99581e89098826b7d0c7b.zip

[SPARK-16356][FOLLOW-UP][ML] Enforce ML test of exception for local/distributed Dataset.

## What changes were proposed in this pull request? #14035 added ```testImplicits``` to ML unit tests and promoted ```toDF()```, but left one minor issue at ```VectorIndexerSuite```. If we create the DataFrame by ```Seq(...).toDF()```, it will throw different error/exception compared with ```sc.parallelize(Seq(...)).toDF()``` for one of the test cases. After in-depth study, I found it was caused by different behavior of local and distributed Dataset if the UDF failed at ```assert```. If the data is local Dataset, it throws ```AssertionError``` directly; If the data is distributed Dataset, it throws ```SparkException``` which is the wrapper of ```AssertionError```. I think we should enforce this test to cover both case. ## How was this patch tested? Unit test. Author: Yanbo Liang <ybliang8@gmail.com> Closes #15261 from yanboliang/spark-16356.

Diffstat (limited to 'docs/streaming-kafka-0-10-integration.md')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: