spaCy

History

Sofie 9a478b6db8 Clean up of char classes, few tokenizer fixes and faster default French tokenizer (#3293 ) * splitting up latin unicode interval * removing hyphen as infix for French * adding failing test for issue 1235 * test for issue #3002 which now works * partial fix for issue #2070 * keep the hyphen as infix for French (as it was) * restore french expressions with hyphen as infix (as it was) * added succeeding unit test for Issue #2656 * Fix issue #2822 with custom Italian exception * Fix issue #2926 by allowing numbers right before infix / * splitting up latin unicode interval * removing hyphen as infix for French * adding failing test for issue 1235 * test for issue #3002 which now works * partial fix for issue #2070 * keep the hyphen as infix for French (as it was) * restore french expressions with hyphen as infix (as it was) * added succeeding unit test for Issue #2656 * Fix issue #2822 with custom Italian exception * Fix issue #2926 by allowing numbers right before infix / * remove duplicate * remove xfail for Issue #2179 fixed by Matt * adjust documentation and remove reference to regex lib		2019-02-20 22:10:13 +01:00
..
cli	💫 Replace {Doc,Span}.merge with Doc.retokenize (#3280 )	2019-02-15 10:29:44 +01:00
data	…
displacy	💫 Replace {Doc,Span}.merge with Doc.retokenize (#3280 )	2019-02-15 10:29:44 +01:00
lang	Clean up of char classes, few tokenizer fixes and faster default French tokenizer (#3293 )	2019-02-20 22:10:13 +01:00
matcher	💫 Fix bugs in matcher extensions. Closes #1971 (#3301 )	2019-02-20 21:30:39 +01:00
pipeline	💫 Replace {Doc,Span}.merge with Doc.retokenize (#3280 )	2019-02-15 10:29:44 +01:00
syntax	💫 Prevent parser from predicting unseen classes (#3075 )	2018-12-20 16:12:22 +01:00
tests	Clean up of char classes, few tokenizer fixes and faster default French tokenizer (#3293 )	2019-02-20 22:10:13 +01:00
tokens	Refinements to retokenize.split() function (#3282 )	2019-02-15 17:32:31 +01:00
__init__.pxd	…
__init__.py	Tidy up and format remaining files	2018-11-30 17:43:08 +01:00
__main__.py	💫 New JSON helpers, training data internals & CLI rewrite (#2932 )	2018-11-30 20:16:14 +01:00
_align.pyx	Improve alignment around quotes	2018-08-16 01:04:34 +02:00
_ml.py	💫 Better support for semi-supervised learning (#3035 )	2018-12-10 16:25:33 +01:00
about.py	Set version to v2.1.0a7	2019-02-16 17:48:34 +01:00
attrs.pxd	…
attrs.pyx	…
compat.py	💫 Replace ujson, msgpack and dill/pickle/cloudpickle with srsly (#3003 )	2018-12-03 01:28:22 +01:00
errors.py	Auto-format	2019-02-17 12:22:07 +01:00
glossary.py	💫 Tidy up and auto-format .py files (#2983 )	2018-11-30 17:03:03 +01:00
gold.pxd	…
gold.pyx	Add gold.spans_from_biluo_tags helper (#3227 )	2019-02-06 21:50:26 +11:00
language.py	Improve entry points and allow custom language classes via entry points (#3080 )	2018-12-20 23:58:43 +01:00
lemmatizer.py	💫 Tidy up and auto-format .py files (#2983 )	2018-11-30 17:03:03 +01:00
lexeme.pxd	…
lexeme.pyx	💫 Add .similarity warnings for no vectors and option to exclude warnings (#2197 )	2018-05-21 01:22:38 +02:00
morphology.pxd	…
morphology.pyx	Fix lemmatization	2018-07-05 13:56:02 +02:00
parts_of_speech.pxd	…
parts_of_speech.pyx	…
scorer.py	💫 Tidy up and auto-format .py files (#2983 )	2018-11-30 17:03:03 +01:00
strings.pxd	…
strings.pyx	Add get_string_id helper to spacy.strings	2018-12-10 16:09:26 +01:00
structs.pxd	Make NORM a token attribute (#3029 )	2018-12-08 10:49:10 +01:00
symbols.pxd	…
symbols.pyx	…
tokenizer.pxd	…
tokenizer.pyx	Replacing regex library with re to increase tokenization speed (#3218 )	2019-02-01 18:05:22 +11:00
typedefs.pxd	…
typedefs.pyx	…
util.py	Also raise original error message in util.get_lang_class	2019-02-13 16:52:25 +01:00
vectors.pyx	Fix KeyError in Vectors.most_similar. Fixes #2648	2018-12-10 16:19:18 +01:00
vocab.pxd	💫 Small efficiency fixes to tokenizer (#2587 )	2018-07-24 23:35:54 +02:00
vocab.pyx	Prevent exceptions from setting POS but not TAG. Closes #1773	2018-12-30 13:16:05 +01:00