Update Tokenizer documentation to reflect token_match and url_match signatures (#9859)

This commit is contained in:
antonpibm 2021-12-15 10:34:33 +02:00 committed by GitHub
parent ba0fa7a64e
commit ac45ae3779
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
1 changed files with 5 additions and 3 deletions

View File

@ -45,10 +45,12 @@ cdef class Tokenizer:
`re.compile(string).search` to match suffixes. `re.compile(string).search` to match suffixes.
`infix_finditer` (callable): A function matching the signature of `infix_finditer` (callable): A function matching the signature of
`re.compile(string).finditer` to find infixes. `re.compile(string).finditer` to find infixes.
token_match (callable): A boolean function matching strings to be token_match (callable): A function matching the signature of
`re.compile(string).match`, for matching strings to be
recognized as tokens. recognized as tokens.
url_match (callable): A boolean function matching strings to be url_match (callable): A function matching the signature of
recognized as tokens after considering prefixes and suffixes. `re.compile(string).match`, for matching strings to be
recognized as urls.
EXAMPLE: EXAMPLE:
>>> tokenizer = Tokenizer(nlp.vocab) >>> tokenizer = Tokenizer(nlp.vocab)