Prefer _SP over SP for default tag map space attrs

If `_SP` is already in the tag map, use the mapping from `_SP` instead
of `SP` so that `SP` can be a valid non-space tag. (Chinese has a
non-space tag `SP` which was overriding the mapping of `_SP` to
`SPACE`.)
This commit is contained in:
Adriane Boyd 2020-05-26 14:50:53 +02:00
parent 1eed101be9
commit b6b5908f5e
1 changed files with 4 additions and 1 deletions

View File

@ -152,7 +152,10 @@ cdef class Morphology:
self.tags = PreshMap()
# Add special space symbol. We prefix with underscore, to make sure it
# always sorts to the end.
space_attrs = tag_map.get('SP', {POS: SPACE})
if '_SP' in tag_map:
space_attrs = tag_map.get('_SP')
else:
space_attrs = tag_map.get('SP', {POS: SPACE})
if '_SP' not in tag_map:
self.strings.add('_SP')
tag_map = dict(tag_map)