Handle "_" value for token pos in conllu data (#9903)

* change '_' to '' to allow Token.pos, when no value for token pos in conllu data

* Minor code style

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
This commit is contained in:
Andrew Janco 2021-12-21 09:46:33 -05:00 committed by GitHub
parent 7847839003
commit 3cfeb518ee
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
1 changed files with 1 additions and 0 deletions

View File

@ -188,6 +188,7 @@ def conllu_sentence_to_doc(
id_ = int(id_) - 1 id_ = int(id_) - 1
head = (int(head) - 1) if head not in ("0", "_") else id_ head = (int(head) - 1) if head not in ("0", "_") else id_
tag = pos if tag == "_" else tag tag = pos if tag == "_" else tag
pos = pos if pos != "_" else ""
morph = morph if morph != "_" else "" morph = morph if morph != "_" else ""
dep = "ROOT" if dep == "root" else dep dep = "ROOT" if dep == "root" else dep
lemmas.append(lemma) lemmas.append(lemma)