From ac45ae3779f9b021ebc2227ae6b3104811132979 Mon Sep 17 00:00:00 2001 From: antonpibm <51074867+antonpibm@users.noreply.github.com> Date: Wed, 15 Dec 2021 10:34:33 +0200 Subject: [PATCH] Update Tokenizer documentation to reflect token_match and url_match signatures (#9859) --- spacy/tokenizer.pyx | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/spacy/tokenizer.pyx b/spacy/tokenizer.pyx index f8df13610..4a148b356 100644 --- a/spacy/tokenizer.pyx +++ b/spacy/tokenizer.pyx @@ -45,10 +45,12 @@ cdef class Tokenizer: `re.compile(string).search` to match suffixes. `infix_finditer` (callable): A function matching the signature of `re.compile(string).finditer` to find infixes. - token_match (callable): A boolean function matching strings to be + token_match (callable): A function matching the signature of + `re.compile(string).match`, for matching strings to be recognized as tokens. - url_match (callable): A boolean function matching strings to be - recognized as tokens after considering prefixes and suffixes. + url_match (callable): A function matching the signature of + `re.compile(string).match`, for matching strings to be + recognized as urls. EXAMPLE: >>> tokenizer = Tokenizer(nlp.vocab)