spaCy/spacy/lang/nl/__init__.py

# coding: utf8
from __future__ import unicode_literals

from .stop_words import STOP_WORDS
from .lex_attrs import LEX_ATTRS
from .tag_map import TAG_MAP
from .tokenizer_exceptions import TOKENIZER_EXCEPTIONS
from .punctuation import TOKENIZER_PREFIXES, TOKENIZER_INFIXES
from .punctuation import TOKENIZER_SUFFIXES
from .lemmatizer import DutchLemmatizer
from ..tokenizer_exceptions import BASE_EXCEPTIONS
from ..norm_exceptions import BASE_NORMS
from ...language import Language
from ...lookups import Lookups
from ...attrs import LANG, NORM
from ...util import update_exc, add_lookups


class DutchDefaults(Language.Defaults):
    lex_attr_getters = dict(Language.Defaults.lex_attr_getters)
    lex_attr_getters.update(LEX_ATTRS)
    lex_attr_getters[LANG] = lambda text: "nl"
    lex_attr_getters[NORM] = add_lookups(
        Language.Defaults.lex_attr_getters[NORM], BASE_NORMS
    )
    tokenizer_exceptions = update_exc(BASE_EXCEPTIONS, TOKENIZER_EXCEPTIONS)
    stop_words = STOP_WORDS
    tag_map = TAG_MAP
    prefixes = TOKENIZER_PREFIXES
    infixes = TOKENIZER_INFIXES
    suffixes = TOKENIZER_SUFFIXES

    @classmethod
    def create_lemmatizer(cls, nlp=None, lookups=None):
        if lookups is None:
            lookups = Lookups()
        return DutchLemmatizer(lookups)


class Dutch(Language):
    lang = "nl"
    Defaults = DutchDefaults


__all__ = ["Dutch"]
Use consistent unicode declarations 2017-03-12 12:07:28 +00:00			`# coding: utf8`
Reorganise Dutch language data 2017-05-08 13:51:39 +00:00			`from __future__ import unicode_literals`
Added language class and some language data (with some TODOs) for Dutch 2016-11-24 14:56:38 +00:00
Reorganise Dutch language data 2017-05-08 13:51:39 +00:00			`from .stop_words import STOP_WORDS`
Implement like_num getter for Dutch (via #1177) 2017-09-26 14:39:15 +00:00			`from .lex_attrs import LEX_ATTRS`
Add Dutch tag map 2017-11-05 12:48:07 +00:00			`from .tag_map import TAG_MAP`
Improved Dutch language resources and Dutch lemmatization (#3409) * Improved Dutch language resources and Dutch lemmatization * Fix conftest * Update punctuation.py * Auto-format * Format and fix tests * Remove unused test file * Re-add deleted test * removed redundant infix regex pattern for ','; note: brackets + simple hyphen remains * Cleaner lemmatization files 2019-04-03 12:13:26 +00:00			`from .tokenizer_exceptions import TOKENIZER_EXCEPTIONS`
Improve tokenization for UD Dutch corpora (#5259) * Improve tokenization for UD Dutch corpora Improve tokenization for UD Dutch Alpino and LassySmall. * Format Dutch tokenizer exceptions 2020-04-06 11:18:07 +00:00			`from .punctuation import TOKENIZER_PREFIXES, TOKENIZER_INFIXES`
			`from .punctuation import TOKENIZER_SUFFIXES`
💫 WIP: Basic lookup class scaffolding and JSON for all lemmati… (#4167) * Improve load_language_data helper * WIP: Add Lookups implementation * Start moving lemma data over to JSON * WIP: move data over for more languages * Convert more languages * Fix lemmatizer fixtures in tests * Finish conversion * Auto-format JSON files * Fix test for now * Make sure tables are stored on instance 2019-08-22 12:21:32 +00:00			`from .lemmatizer import DutchLemmatizer`
Fix relative imports 2017-05-08 20:29:04 +00:00			`from ..tokenizer_exceptions import BASE_EXCEPTIONS`
Add norm exceptions to other Language classes 2017-06-03 20:29:21 +00:00			`from ..norm_exceptions import BASE_NORMS`
Fix relative imports 2017-05-08 20:29:04 +00:00			`from ...language import Language`
Refactor lemmatizer and data table integration (#4353) * Move test * Allow default in Lookups.get_table * Start with blank tables in Lookups.from_bytes * Refactor lemmatizer to hold instance of Lookups * Get lookups table within the lemmatization methods to make sure it references the correct table (even if the table was replaced or modified, e.g. when loading a model from disk) * Deprecate other arguments on Lemmatizer.__init__ and expect Lookups for consistency * Remove old and unsupported Lemmatizer.load classmethod * Refactor language-specific lemmatizers to inherit as much as possible from base class and override only what they need * Update tests and docs * Fix more tests * Fix lemmatizer * Upgrade pytest to try and fix weird CI errors * Try pytest 4.6.5 2019-10-01 19:36:04 +00:00			`from ...lookups import Lookups`
Add norm exceptions to other Language classes 2017-06-03 20:29:21 +00:00			`from ...attrs import LANG, NORM`
Refactor lemmatizer and data table integration (#4353) * Move test * Allow default in Lookups.get_table * Start with blank tables in Lookups.from_bytes * Refactor lemmatizer to hold instance of Lookups * Get lookups table within the lemmatization methods to make sure it references the correct table (even if the table was replaced or modified, e.g. when loading a model from disk) * Deprecate other arguments on Lemmatizer.__init__ and expect Lookups for consistency * Remove old and unsupported Lemmatizer.load classmethod * Refactor language-specific lemmatizers to inherit as much as possible from base class and override only what they need * Update tests and docs * Fix more tests * Fix lemmatizer * Upgrade pytest to try and fix weird CI errors * Try pytest 4.6.5 2019-10-01 19:36:04 +00:00			`from ...util import update_exc, add_lookups`
Reorganise Dutch language data 2017-05-08 13:51:39 +00:00
Added language class and some language data (with some TODOs) for Dutch 2016-11-24 14:56:38 +00:00
Move Defaults subclass to module scope (necessary for pickling) 2017-05-20 17:02:27 +00:00			`class DutchDefaults(Language.Defaults):`
			`lex_attr_getters = dict(Language.Defaults.lex_attr_getters)`
Implement like_num getter for Dutch (via #1177) 2017-09-26 14:39:15 +00:00			`lex_attr_getters.update(LEX_ATTRS)`
Tidy up and auto-format 2019-04-09 09:40:19 +00:00			`lex_attr_getters[LANG] = lambda text: "nl"`
			`lex_attr_getters[NORM] = add_lookups(`
			`Language.Defaults.lex_attr_getters[NORM], BASE_NORMS`
			`)`
Improved Dutch language resources and Dutch lemmatization (#3409) * Improved Dutch language resources and Dutch lemmatization * Fix conftest * Update punctuation.py * Auto-format * Format and fix tests * Remove unused test file * Re-add deleted test * removed redundant infix regex pattern for ','; note: brackets + simple hyphen remains * Cleaner lemmatization files 2019-04-03 12:13:26 +00:00			`tokenizer_exceptions = update_exc(BASE_EXCEPTIONS, TOKENIZER_EXCEPTIONS)`
Don't make copies of language data components 2017-10-11 13:34:55 +00:00			`stop_words = STOP_WORDS`
Add Dutch tag map 2017-11-05 12:48:07 +00:00			`tag_map = TAG_MAP`
Improve tokenization for UD Dutch corpora (#5259) * Improve tokenization for UD Dutch corpora Improve tokenization for UD Dutch Alpino and LassySmall. * Format Dutch tokenizer exceptions 2020-04-06 11:18:07 +00:00			`prefixes = TOKENIZER_PREFIXES`
Improved Dutch language resources and Dutch lemmatization (#3409) * Improved Dutch language resources and Dutch lemmatization * Fix conftest * Update punctuation.py * Auto-format * Format and fix tests * Remove unused test file * Re-add deleted test * removed redundant infix regex pattern for ','; note: brackets + simple hyphen remains * Cleaner lemmatization files 2019-04-03 12:13:26 +00:00			`infixes = TOKENIZER_INFIXES`
			`suffixes = TOKENIZER_SUFFIXES`

			`@classmethod`
💫 WIP: Basic lookup class scaffolding and JSON for all lemmati… (#4167) * Improve load_language_data helper * WIP: Add Lookups implementation * Start moving lemma data over to JSON * WIP: move data over for more languages * Convert more languages * Fix lemmatizer fixtures in tests * Finish conversion * Auto-format JSON files * Fix test for now * Make sure tables are stored on instance 2019-08-22 12:21:32 +00:00			`def create_lemmatizer(cls, nlp=None, lookups=None):`
Refactor lemmatizer and data table integration (#4353) * Move test * Allow default in Lookups.get_table * Start with blank tables in Lookups.from_bytes * Refactor lemmatizer to hold instance of Lookups * Get lookups table within the lemmatization methods to make sure it references the correct table (even if the table was replaced or modified, e.g. when loading a model from disk) * Deprecate other arguments on Lemmatizer.__init__ and expect Lookups for consistency * Remove old and unsupported Lemmatizer.load classmethod * Refactor language-specific lemmatizers to inherit as much as possible from base class and override only what they need * Update tests and docs * Fix more tests * Fix lemmatizer * Upgrade pytest to try and fix weird CI errors * Try pytest 4.6.5 2019-10-01 19:36:04 +00:00			`if lookups is None:`
			`lookups = Lookups()`
			`return DutchLemmatizer(lookups)`
Added language class and some language data (with some TODOs) for Dutch 2016-11-24 14:56:38 +00:00

Move Defaults subclass to module scope (necessary for pickling) 2017-05-20 17:02:27 +00:00			`class Dutch(Language):`
Tidy up and auto-format 2019-04-09 09:40:19 +00:00			`lang = "nl"`
Move Defaults subclass to module scope (necessary for pickling) 2017-05-20 17:02:27 +00:00			`Defaults = DutchDefaults`
Lazy imports language 2017-05-03 09:01:42 +00:00

Tidy up and auto-format 2019-04-09 09:40:19 +00:00			`__all__ = ["Dutch"]`