spaCy

History

Adriane Boyd cd6bd91c3a Switch default train corpus max_length to 0 in quickstart (#8142 ) The behavior of `spacy.Corpus.v1` is unexpected enough for `max_length != 0` that `0` is a better default for users creating a new config with the quickstart. If not, documents are skipped, sometimes the entire corpus is skipped, and sometimes documents are (quite unexpectedly for your average user) split into sentences.		2021-05-20 14:48:09 +02:00
..
cli	Switch default train corpus max_length to 0 in quickstart (#8142 )	2021-05-20 14:48:09 +02:00
displacy	Also exclude user hooks in displacy conversion (#7419 )	2021-03-12 09:41:59 +01:00
lang	Update Vietnamese tokenizer (#8099 )	2021-05-17 18:16:20 +10:00
matcher	Support match alignments (#7321 )	2021-04-08 18:10:14 +10:00
ml	Set up GPU CI testing (#7293 )	2021-04-22 14:58:29 +02:00
pipeline	Replace negative rows with 0 in StaticVectors (#7674 )	2021-04-22 18:04:15 +10:00
tests	Update Vietnamese tokenizer (#8099 )	2021-05-17 18:16:20 +10:00
tokens	Fix/update extension copying in Span.as_doc and Doc.from_docs (#7574 )	2021-03-30 09:49:12 +02:00
training	Add training option to set annotations on update (#7767 )	2021-04-26 16:53:53 +02:00
__init__.pxd	…
__init__.py	Add vocab kwarg back to spacy.load	2021-03-11 10:58:59 +01:00
__main__.py	…
about.py	Set version to v3.0.6 (#7854 )	2021-04-22 16:33:26 +02:00
attrs.pxd	…
attrs.pyx	…
compat.py	Use Literal type for nr_feature_tokens	2020-09-23 16:00:03 +02:00
default_config.cfg	Add training option to set annotations on update (#7767 )	2021-04-26 16:53:53 +02:00
default_config_pretraining.cfg	pretrain architectures (#6451 )	2020-12-08 14:41:03 +08:00
errors.py	Add callback to copy vocab/tokenizer from model (#7750 )	2021-04-22 12:36:50 +02:00
glossary.py	…
kb.pxd	Replace cpdef variables with cdef (#7834 )	2021-04-26 16:54:02 +02:00
kb.pyx	Replace links to nightly docs [ci skip]	2021-01-30 20:09:38 +11:00
language.py	Add training option to set annotations on update (#7767 )	2021-04-26 16:53:53 +02:00
lexeme.pxd	Fix Lexeme.from_ptr	2020-08-10 16:43:37 +02:00
lexeme.pyx	reduce memory load when reading all vectors from file (#6945 )	2021-02-07 08:05:43 +08:00
lookups.py	Replace links to nightly docs [ci skip]	2021-01-30 20:09:38 +11:00
morphology.pxd	Clean up Morphology imports and definitions (#7441 )	2021-04-26 16:54:23 +02:00
morphology.pyx	Clean up Morphology imports and definitions (#7441 )	2021-04-26 16:54:23 +02:00
parts_of_speech.pxd	…
parts_of_speech.pyx	…
pipe_analysis.py	Tidy up and auto-format	2020-09-29 21:39:28 +02:00
py.typed	Add py.typed	2021-03-16 09:48:31 +01:00
schemas.py	Add training option to set annotations on update (#7767 )	2021-04-26 16:53:53 +02:00
scorer.py	Extend score_spans for overlapping & non-labeled spans (#7209 )	2021-04-08 12:19:17 +02:00
strings.pxd	Remove 'cleanup' of strings (#6007 )	2020-09-01 16:12:15 +02:00
strings.pyx	Make vocab update in get_docs deterministic (#7603 )	2021-04-09 11:53:13 +02:00
structs.pxd	Add SpanGroup and Graph container types to represent arbitrary annotations (#6696 )	2021-01-14 17:30:41 +11:00
symbols.pxd	introduce token.has_head and refer to MISSING_DEP_ (WIP)	2021-01-12 17:17:06 +01:00
symbols.pyx	introduce token.has_head and refer to MISSING_DEP_ (WIP)	2021-01-12 17:17:06 +01:00
tokenizer.pxd	Replace cpdef variables with cdef (#7834 )	2021-04-26 16:54:02 +02:00
tokenizer.pyx	Fix tokenizer cache flushing (#7836 )	2021-04-22 18:14:57 +10:00
typedefs.pxd	Merge remote-tracking branch 'upstream/master' into chore/update-develop-from-master	2020-11-25 11:49:34 +01:00
typedefs.pyx	…
util.py	Fix scoring normalization (#7629 )	2021-04-26 16:53:38 +02:00
vectors.pyx	Fix vectors data on GPU (#7626 )	2021-04-19 18:30:03 +10:00
vocab.pxd	Replace cpdef variables with cdef (#7834 )	2021-04-26 16:54:02 +02:00
vocab.pyx	Fix vectors data on GPU (#7626 )	2021-04-19 18:30:03 +10:00