diff --git a/website/docs/usage/index.md b/website/docs/usage/index.md index d7cfdf5ac..f80e5c778 100644 --- a/website/docs/usage/index.md +++ b/website/docs/usage/index.md @@ -286,6 +286,29 @@ version of pip. To see which version you have installed, run `pip --version`. + + +``` +sre_constants.error: bad character range +``` + +In [v2.1](/usage/v2-1), spaCy changed its implementation of regular expressions +for tokenization to make it up to 2-3 times faster. But this also means that +it's very important now that you run spaCy with a wide unicode build of Python. +This means that the build has 1114111 unicode characters available, instead of +only 65535 in a narrow unicode build. You can check this by running the +following command: + +```bash +python -c "import sys; print(sys.maxunicode)" +``` + +If you're running a narrow unicode build, reinstall Python and use a wide +unicode build instead. You can also rebuild Python and set the +`--enable-unicode=ucs4` flag. + + + ```