aboutsummaryrefslogtreecommitdiff
path: root/mllib/src/main/resources/org/apache/spark/ml/feature/stopwords/README
blob: ec08a5080774d740cf14267444cd13c7e9e0880b (plain) (blame)
1
2
3
4
5
6
7
8
9
10
11
12
Stopwords Corpus

This corpus contains lists of stop words for several languages.  These
are high-frequency grammatical words which are usually ignored in text
retrieval applications.

They were obtained from:
http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/backend/snowball/stopwords/

The English list has been augmented
https://github.com/nltk/nltk_data/issues/22