mirror of https://github.com/explosion/spaCy.git
aec755d3a3
* Modify retokenizer to use span root attributes * tag/pos/morph are set to root tag/pos/morph * lemma and norm are reset and end up as orth (not ideal, but better than orth of first token) * Also handle individual merge case * Add test * Attempt to handle ent_iob and ent_type in merges * Fix check for whether B-ENT should become I-ENT * Move IOB consistency check to after attrs Move all IOB consistency checks after attrs are set and simplify to check entire document, modifying I to B at the beginning of the document or if the entity type of the previous token isn't the same. * Move IOB consistency check for single merge Move IOB consistency check after the token array is compressed for the single merge case. * Update spacy/tokens/_retokenize.pyx Co-Authored-By: Matthew Honnibal <honnibal+gh@gmail.com> * Remove single vs. multiple merge distinction Remove original single-instance `_merge()` and use `_bulk_merge()` (now renamed `_merge()`) for all merges. * Add out-of-bound check in previous entity check |
||
---|---|---|
.. | ||
__init__.pxd | ||
__init__.py | ||
_retokenize.pyx | ||
_serialize.py | ||
doc.pxd | ||
doc.pyx | ||
span.pxd | ||
span.pyx | ||
token.pxd | ||
token.pyx | ||
underscore.py |