spaCy

History

Adriane Boyd c62fd878a3 Allow Doc.char_span to snap to token boundaries (#5849 ) * Allow Doc.char_span to snap to token boundaries Add a `mode` option to allow `Doc.char_span` to snap to token boundaries. The `mode` options: * `strict`: character offsets must match token boundaries (default, same as before) * `inside`: all tokens completely within the character span * `outside`: all tokens at least partially covered by the character span Add a new helper function `token_by_char` that returns the token corresponding to a character position in the text. Update `token_by_start` and `token_by_end` to use `token_by_char` for more efficient searching. * Remove unused import * Rename mode to alignment_mode Rename `mode` to `alignment_mode` with the options `strict`/`contract`/`expand`. Any unrecognized modes are silently converted to `strict`.		2020-08-04 13:36:32 +02:00
..
__init__.py	Revert #4334	2019-09-29 17:32:12 +02:00
test_add_entities.py	Fix test imports	2019-09-29 17:34:56 +02:00
test_array.py	Tidy up and auto-format	2020-03-25 12:28:12 +01:00
test_creation.py	Tidy up and auto-format	2020-05-21 14:14:01 +02:00
test_doc_api.py	Add strings and ENT_KB_ID to Doc serialization (#5691 )	2020-07-02 17:11:57 +02:00
test_morphanalysis.py	Revert #4334	2019-09-29 17:32:12 +02:00
test_pickle_doc.py	Revert #4334	2019-09-29 17:32:12 +02:00
test_retokenize_merge.py	Disallow merging 0-length spans	2020-05-22 10:14:34 +02:00
test_retokenize_split.py	Fix realloc in retokenizer.split() (#4606 )	2019-11-11 16:26:46 +01:00
test_span.py	Allow Doc.char_span to snap to token boundaries (#5849 )	2020-08-04 13:36:32 +02:00
test_to_json.py	Revert #4334	2019-09-29 17:32:12 +02:00
test_token_api.py	Tidy up and auto-format	2020-05-21 14:14:01 +02:00
test_underscore.py	use clean_underscore fixture	2020-02-23 15:49:20 +01:00