Update Japanese docs and pin for sudachipy

This commit is contained in:
Adriane Boyd 2020-10-02 10:12:44 +02:00
parent 7670df04dd
commit 351f352cdc
2 changed files with 5 additions and 6 deletions

View File

@ -84,7 +84,7 @@ cuda102 =
cupy-cuda102>=5.0.0b4,<9.0.0 cupy-cuda102>=5.0.0b4,<9.0.0
# Language tokenizers with external dependencies # Language tokenizers with external dependencies
ja = ja =
sudachipy>=0.4.5 sudachipy>=0.4.9
sudachidict_core>=20200330 sudachidict_core>=20200330
ko = ko =
natto-py==0.9.0 natto-py==0.9.0

View File

@ -199,20 +199,19 @@ nlp.tokenizer.initialize(pkuseg_model="/path/to/pkuseg_model")
> >
> # Load SudachiPy with split mode B > # Load SudachiPy with split mode B
> cfg = {"split_mode": "B"} > cfg = {"split_mode": "B"}
> nlp = Japanese(meta={"tokenizer": {"config": cfg}}) > nlp = Japanese.from_config({"nlp": {"tokenizer": cfg}})
> ``` > ```
The Japanese language class uses The Japanese language class uses
[SudachiPy](https://github.com/WorksApplications/SudachiPy) for word [SudachiPy](https://github.com/WorksApplications/SudachiPy) for word
segmentation and part-of-speech tagging. The default Japanese language class and segmentation and part-of-speech tagging. The default Japanese language class and
the provided Japanese pipelines use SudachiPy split mode `A`. The `meta` the provided Japanese pipelines use SudachiPy split mode `A`. The tokenizer
argument of the `Japanese` language class can be used to configure the split config can be used to configure the split mode to `A`, `B` or `C`.
mode to `A`, `B` or `C`.
<Infobox variant="warning"> <Infobox variant="warning">
If you run into errors related to `sudachipy`, which is currently under active If you run into errors related to `sudachipy`, which is currently under active
development, we suggest downgrading to `sudachipy==0.4.5`, which is the version development, we suggest downgrading to `sudachipy==0.4.9`, which is the version
used for training the current [Japanese pipelines](/models/ja). used for training the current [Japanese pipelines](/models/ja).
</Infobox> </Infobox>