spaCy

History

adrianeboyd aec755d3a3 Modify retokenizer to use span root attributes (#4219 ) * Modify retokenizer to use span root attributes * tag/pos/morph are set to root tag/pos/morph * lemma and norm are reset and end up as orth (not ideal, but better than orth of first token) * Also handle individual merge case * Add test * Attempt to handle ent_iob and ent_type in merges * Fix check for whether B-ENT should become I-ENT * Move IOB consistency check to after attrs Move all IOB consistency checks after attrs are set and simplify to check entire document, modifying I to B at the beginning of the document or if the entity type of the previous token isn't the same. * Move IOB consistency check for single merge Move IOB consistency check after the token array is compressed for the single merge case. * Update spacy/tokens/_retokenize.pyx Co-Authored-By: Matthew Honnibal <honnibal+gh@gmail.com> * Remove single vs. multiple merge distinction Remove original single-instance `_merge()` and use `_bulk_merge()` (now renamed `_merge()`) for all merges. * Add out-of-bound check in previous entity check		2019-09-08 13:04:49 +02:00
..
cli	Tidy up and auto-format [ci skip]	2019-08-31 13:39:06 +02:00
data	…
displacy	Improve token pattern checking without validation (#4105 )	2019-08-21 14:00:37 +02:00
lang	Fix ValueError exception on empty Korean text. (#4245 )	2019-09-06 10:29:40 +02:00
matcher	Tidy up and auto-format [ci skip]	2019-08-31 13:39:06 +02:00
pipeline	Fix #3830 : 'subtok' label being added even if learn_tokens=False (#4188 )	2019-08-23 17:54:00 +02:00
syntax	Fix handling of preset entities in NER	2019-09-04 13:42:42 +02:00
tests	Modify retokenizer to use span root attributes (#4219 )	2019-09-08 13:04:49 +02:00
tokens	Modify retokenizer to use span root attributes (#4219 )	2019-09-08 13:04:49 +02:00
__init__.pxd	…
__init__.py	Fix formatting (hopefully also restarts build properly)	2019-03-20 09:55:45 +01:00
__main__.py	Update __main__.py	2019-03-20 09:43:26 +01:00
_align.pyx	Improve alignment around quotes	2018-08-16 01:04:34 +02:00
_ml.py	Fix absolute imports and avoid importing from cli	2019-08-20 15:08:59 +02:00
about.py	Set version to v2.1.8	2019-08-07 13:53:58 +02:00
attrs.pxd	Fix attrs alignment	2019-07-12 17:59:47 +02:00
attrs.pyx	ensure Span.as_doc keeps the entity links + unit test	2019-06-25 15:28:51 +02:00
compat.py	Fix symlink creation to show error message on failure (#3589 ) (resolves #3307 ))	2019-04-16 11:58:31 +02:00
errors.py	Check for is_tagged/is_parsed for Matcher attrs (#4163 )	2019-08-21 20:52:36 +02:00
glossary.py	Update glossary.py to match information found in documentation (#3704 ) (closes ##3679)	2019-05-10 14:23:20 +02:00
gold.pxd	fixes in kb and gold	2019-07-17 17:18:26 +02:00
gold.pyx	WIP: Extending debug-data (#4114 )	2019-08-16 10:52:46 +02:00
kb.pxd	rename entity frequency	2019-07-19 17:40:28 +02:00
kb.pyx	CLI scripts for entity linking (wikipedia & generic) (#4091 )	2019-08-13 15:38:59 +02:00
language.py	Tidy up and auto-format [ci skip]	2019-08-31 13:39:06 +02:00
lemmatizer.py	Fix inconsistant lemmatizer issue #3484 (#3646 )	2019-05-04 18:16:03 +02:00
lexeme.pxd	💫 Support lexical attributes in retokenizer attrs (closes #2390 ) (#3325 )	2019-02-24 21:13:51 +01:00
lexeme.pyx	Tidy up property code style (#3391 )	2019-03-11 15:59:09 +01:00
lookups.py	💫 WIP: Basic lookup class scaffolding and JSON for all lemmati… (#4167 )	2019-08-22 14:21:32 +02:00
morphology.pxd	annotate kb_id through ents in doc	2019-03-22 11:36:44 +01:00
morphology.pyx	Fix issue #3551 : Upper case lemmas	2019-04-16 12:27:15 +02:00
parts_of_speech.pxd	…
parts_of_speech.pyx	…
scorer.py	Tidy up and auto-format	2019-08-18 15:09:16 +02:00
strings.pxd	…
strings.pyx	💫 Make serialization methods consistent (#3385 )	2019-03-10 19:16:45 +01:00
structs.pxd	rename entity frequency	2019-07-19 17:40:28 +02:00
symbols.pxd	Fix symbol alignment	2019-07-12 17:48:38 +02:00
symbols.pyx	ensure Span.as_doc keeps the entity links + unit test	2019-06-25 15:28:51 +02:00
tokenizer.pxd	…
tokenizer.pyx	fix loading custom tokenizer rules/exceptions from file	2019-08-28 14:17:44 +02:00
typedefs.pxd	…
typedefs.pyx	…
util.py	💫 WIP: Basic lookup class scaffolding and JSON for all lemmati… (#4167 )	2019-08-22 14:21:32 +02:00
vectors.pyx	Update Vectors.find docs [ci skip]	2019-03-16 17:10:57 +01:00
vocab.pxd	💫 WIP: Basic lookup class scaffolding and JSON for all lemmati… (#4167 )	2019-08-22 14:21:32 +02:00
vocab.pyx	💫 WIP: Basic lookup class scaffolding and JSON for all lemmati… (#4167 )	2019-08-22 14:21:32 +02:00