spaCy/spacy
adrianeboyd 3bf111585d
Update Japanese tokenizer config and add serialization (#5562)
* Use `config` dict for tokenizer settings
* Add serialization of split mode setting
* Add tests for tokenizer split modes and serialization of split mode
setting

Based on #5561
2020-06-08 16:29:05 +02:00
..
cli prevent loading a pretrained Tok2Vec layer AND pretrained components 2020-05-29 17:38:33 +02:00
data
displacy Add missing import 2020-04-28 13:48:37 +02:00
lang Update Japanese tokenizer config and add serialization (#5562) 2020-06-08 16:29:05 +02:00
matcher Switch to new add API in PhraseMatcher unpickle 2020-05-25 11:22:47 +02:00
ml
pipeline Preserve _SP when filtering tag map in Tagger 2020-05-31 19:57:54 +02:00
syntax Revert "Remove peeking from Parser.begin_training (#5456)" 2020-05-29 23:21:55 +02:00
tests Update Japanese tokenizer config and add serialization (#5562) 2020-06-08 16:29:05 +02:00
tokens Remove MorphAnalysis __str__ and __repr__ 2020-05-29 14:33:47 +02:00
__init__.pxd
__init__.py Simplify warnings 2020-04-28 13:37:37 +02:00
__main__.py
_ml.py Skip duplicate lexeme rank setting (#5401) 2020-05-14 18:26:12 +02:00
about.py Switch to v2.3.0.dev0 2020-05-25 12:57:20 +02:00
analysis.py Simplify warnings 2020-04-28 13:37:37 +02:00
attrs.pxd Reduce stored lexemes data, move feats to lookups (#5238) 2020-05-19 15:59:14 +02:00
attrs.pyx Reduce stored lexemes data, move feats to lookups (#5238) 2020-05-19 15:59:14 +02:00
compat.py
errors.py Add rudimentary version checks on model load 2020-06-02 17:33:48 +02:00
glossary.py
gold.pxd
gold.pyx Add warning for misaligned character offset spans (#5007) 2020-05-19 16:01:18 +02:00
kb.pxd Tidy up and avoid absolute spacy imports in core 2020-05-21 20:05:03 +02:00
kb.pyx Merge pull request #5264 from lfiedler/issue-5230 2020-05-22 00:31:07 +02:00
language.py Improve vector name loading from model meta 2020-05-27 14:48:54 +02:00
lemmatizer.py Return lowercase form as default except for PROPN 2020-05-20 15:35:08 +02:00
lexeme.pxd Reduce stored lexemes data, move feats to lookups (#5238) 2020-05-19 15:59:14 +02:00
lexeme.pyx Avoid libc.stdint for UINT64_MAX (#5545) 2020-06-04 20:02:05 +02:00
lookups.py Reduce stored lexemes data, move feats to lookups (#5238) 2020-05-19 15:59:14 +02:00
morphology.pxd
morphology.pyx Prefer _SP over SP for default tag map space attrs 2020-05-26 14:57:13 +02:00
parts_of_speech.pxd
parts_of_speech.pyx
scorer.py Fix GoldParse init when token count differs (#5191) 2020-03-26 10:46:23 +01:00
strings.pxd
strings.pyx
structs.pxd Reduce stored lexemes data, move feats to lookups (#5238) 2020-05-19 15:59:14 +02:00
symbols.pxd Reduce stored lexemes data, move feats to lookups (#5238) 2020-05-19 15:59:14 +02:00
symbols.pyx Reduce stored lexemes data, move feats to lookups (#5238) 2020-05-19 15:59:14 +02:00
tokenizer.pxd Rename to url_match 2020-05-22 12:41:03 +02:00
tokenizer.pyx Rename to url_match 2020-05-22 12:41:03 +02:00
typedefs.pxd
typedefs.pyx
util.py Remove unnecessary check 2020-06-02 17:41:25 +02:00
vectors.pyx fix deserialization order 2020-05-30 12:53:32 +02:00
vocab.pxd Reduce stored lexemes data, move feats to lookups (#5238) 2020-05-19 15:59:14 +02:00
vocab.pyx Reduce stored lexemes data, move feats to lookups (#5238) 2020-05-19 15:59:14 +02:00