Commit Graph

7 Commits

Author SHA1 Message Date
github-actions[bot] d637b34e2f
Auto-format code with black (#10377)
Co-authored-by: explosion-bot <explosion-bot@users.noreply.github.com>
2022-02-25 10:00:21 +01:00
Adriane Boyd f32ee2e533
Fix NER check in CoNLL-U converter (#10302)
* Fix NER check in CoNLL-U converter

Leave ents unset if no NER annotation is found in the MISC column.

* Revert to global rather than per-sentence NER check

* Update spacy/training/converters/conllu_to_docs.py

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2022-02-21 10:24:52 +01:00
Andrew Janco 3cfeb518ee
Handle "_" value for token pos in conllu data (#9903)
* change '_' to '' to allow Token.pos, when no value for token pos in conllu data

* Minor code style

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2021-12-21 15:46:33 +01:00
explosion-bot ee37288a1f Auto-format code with black 2021-07-02 07:48:26 +00:00
Adriane Boyd 1ddf2f39c7
Switch converters to generator functions (#6547)
* Switch converters to generator functions

To reduce the memory usage when converting large corpora, refactor the
convert methods to be generator functions.

* Update tests
2020-12-15 16:47:16 +08:00
Adriane Boyd 27cbffff1b
Minor edit to CoNLL-U converter (#6172)
This doesn't make a difference given how the `merged_morph` values
override the `morph` values for all the final docs, but could have led
to unexpected bugs in the future if the converter is modified.
2020-10-01 16:23:42 +02:00
svlandeg b556a10808 rename converts in_to_out 2020-09-22 11:50:19 +02:00