spaCy/spacy
Sofie 9a478b6db8 Clean up of char classes, few tokenizer fixes and faster default French tokenizer (#3293)
* splitting up latin unicode interval

* removing hyphen as infix for French

* adding failing test for issue 1235

* test for issue #3002 which now works

* partial fix for issue #2070

* keep the hyphen as infix for French (as it was)

* restore french expressions with hyphen as infix (as it was)

* added succeeding unit test for Issue #2656

* Fix issue #2822 with custom Italian exception

* Fix issue #2926 by allowing numbers right before infix /

* splitting up latin unicode interval

* removing hyphen as infix for French

* adding failing test for issue 1235

* test for issue #3002 which now works

* partial fix for issue #2070

* keep the hyphen as infix for French (as it was)

* restore french expressions with hyphen as infix (as it was)

* added succeeding unit test for Issue #2656

* Fix issue #2822 with custom Italian exception

* Fix issue #2926 by allowing numbers right before infix /

* remove duplicate

* remove xfail for Issue #2179 fixed by Matt

* adjust documentation and remove reference to regex lib
2019-02-20 22:10:13 +01:00
..
cli 💫 Replace {Doc,Span}.merge with Doc.retokenize (#3280) 2019-02-15 10:29:44 +01:00
data
displacy 💫 Replace {Doc,Span}.merge with Doc.retokenize (#3280) 2019-02-15 10:29:44 +01:00
lang Clean up of char classes, few tokenizer fixes and faster default French tokenizer (#3293) 2019-02-20 22:10:13 +01:00
matcher 💫 Fix bugs in matcher extensions. Closes #1971 (#3301) 2019-02-20 21:30:39 +01:00
pipeline 💫 Replace {Doc,Span}.merge with Doc.retokenize (#3280) 2019-02-15 10:29:44 +01:00
syntax 💫 Prevent parser from predicting unseen classes (#3075) 2018-12-20 16:12:22 +01:00
tests Clean up of char classes, few tokenizer fixes and faster default French tokenizer (#3293) 2019-02-20 22:10:13 +01:00
tokens Refinements to retokenize.split() function (#3282) 2019-02-15 17:32:31 +01:00
__init__.pxd
__init__.py Tidy up and format remaining files 2018-11-30 17:43:08 +01:00
__main__.py 💫 New JSON helpers, training data internals & CLI rewrite (#2932) 2018-11-30 20:16:14 +01:00
_align.pyx Improve alignment around quotes 2018-08-16 01:04:34 +02:00
_ml.py 💫 Better support for semi-supervised learning (#3035) 2018-12-10 16:25:33 +01:00
about.py Set version to v2.1.0a7 2019-02-16 17:48:34 +01:00
attrs.pxd
attrs.pyx
compat.py 💫 Replace ujson, msgpack and dill/pickle/cloudpickle with srsly (#3003) 2018-12-03 01:28:22 +01:00
errors.py Auto-format 2019-02-17 12:22:07 +01:00
glossary.py 💫 Tidy up and auto-format .py files (#2983) 2018-11-30 17:03:03 +01:00
gold.pxd
gold.pyx Add gold.spans_from_biluo_tags helper (#3227) 2019-02-06 21:50:26 +11:00
language.py Improve entry points and allow custom language classes via entry points (#3080) 2018-12-20 23:58:43 +01:00
lemmatizer.py 💫 Tidy up and auto-format .py files (#2983) 2018-11-30 17:03:03 +01:00
lexeme.pxd
lexeme.pyx 💫 Add .similarity warnings for no vectors and option to exclude warnings (#2197) 2018-05-21 01:22:38 +02:00
morphology.pxd
morphology.pyx Fix lemmatization 2018-07-05 13:56:02 +02:00
parts_of_speech.pxd
parts_of_speech.pyx
scorer.py 💫 Tidy up and auto-format .py files (#2983) 2018-11-30 17:03:03 +01:00
strings.pxd
strings.pyx Add get_string_id helper to spacy.strings 2018-12-10 16:09:26 +01:00
structs.pxd Make NORM a token attribute (#3029) 2018-12-08 10:49:10 +01:00
symbols.pxd
symbols.pyx
tokenizer.pxd
tokenizer.pyx Replacing regex library with re to increase tokenization speed (#3218) 2019-02-01 18:05:22 +11:00
typedefs.pxd
typedefs.pyx
util.py Also raise original error message in util.get_lang_class 2019-02-13 16:52:25 +01:00
vectors.pyx Fix KeyError in Vectors.most_similar. Fixes #2648 2018-12-10 16:19:18 +01:00
vocab.pxd 💫 Small efficiency fixes to tokenizer (#2587) 2018-07-24 23:35:54 +02:00
vocab.pyx Prevent exceptions from setting POS but not TAG. Closes #1773 2018-12-30 13:16:05 +01:00