spaCy/spacy/lang/nb
Haakon Meland Eriksen 251119455d
Remove NER words from stop words in Norwegian (#9820)
Default stop words in Norwegian bokmål (nb) in Spacy contain important entities, e.g. France, Germany, Russia, Sweden and USA, police district, important units of time, e.g. months and days of the week, and organisations.

Nobody expects their presence among the default stop words. There is a danger of users complying with the general recommendation of filtering out stop words, while being unaware of filtering out important entities from their data.

See explanation in https://github.com/explosion/spaCy/issues/3052#issuecomment-986756711 and comment https://github.com/explosion/spaCy/issues/3052#issuecomment-986951831
2021-12-07 09:45:10 +01:00
..
__init__.py Merge remote-tracking branch 'upstream/master' into chore/update-develop-from-master-v3.2-1 2021-10-26 11:53:50 +02:00
examples.py
punctuation.py Add / to nb infixes (#7991) 2021-05-04 11:00:10 +02:00
stop_words.py Remove NER words from stop words in Norwegian (#9820) 2021-12-07 09:45:10 +01:00
syntax_iterators.py 🏷 Add Mypy check to CI and ignore all existing Mypy errors (#9167) 2021-10-14 15:21:40 +02:00
tokenizer_exceptions.py