mirror of https://github.com/explosion/spaCy.git
Update Japanese docs and pin for sudachipy
This commit is contained in:
parent
7670df04dd
commit
351f352cdc
|
@ -84,7 +84,7 @@ cuda102 =
|
||||||
cupy-cuda102>=5.0.0b4,<9.0.0
|
cupy-cuda102>=5.0.0b4,<9.0.0
|
||||||
# Language tokenizers with external dependencies
|
# Language tokenizers with external dependencies
|
||||||
ja =
|
ja =
|
||||||
sudachipy>=0.4.5
|
sudachipy>=0.4.9
|
||||||
sudachidict_core>=20200330
|
sudachidict_core>=20200330
|
||||||
ko =
|
ko =
|
||||||
natto-py==0.9.0
|
natto-py==0.9.0
|
||||||
|
|
|
@ -199,20 +199,19 @@ nlp.tokenizer.initialize(pkuseg_model="/path/to/pkuseg_model")
|
||||||
>
|
>
|
||||||
> # Load SudachiPy with split mode B
|
> # Load SudachiPy with split mode B
|
||||||
> cfg = {"split_mode": "B"}
|
> cfg = {"split_mode": "B"}
|
||||||
> nlp = Japanese(meta={"tokenizer": {"config": cfg}})
|
> nlp = Japanese.from_config({"nlp": {"tokenizer": cfg}})
|
||||||
> ```
|
> ```
|
||||||
|
|
||||||
The Japanese language class uses
|
The Japanese language class uses
|
||||||
[SudachiPy](https://github.com/WorksApplications/SudachiPy) for word
|
[SudachiPy](https://github.com/WorksApplications/SudachiPy) for word
|
||||||
segmentation and part-of-speech tagging. The default Japanese language class and
|
segmentation and part-of-speech tagging. The default Japanese language class and
|
||||||
the provided Japanese pipelines use SudachiPy split mode `A`. The `meta`
|
the provided Japanese pipelines use SudachiPy split mode `A`. The tokenizer
|
||||||
argument of the `Japanese` language class can be used to configure the split
|
config can be used to configure the split mode to `A`, `B` or `C`.
|
||||||
mode to `A`, `B` or `C`.
|
|
||||||
|
|
||||||
<Infobox variant="warning">
|
<Infobox variant="warning">
|
||||||
|
|
||||||
If you run into errors related to `sudachipy`, which is currently under active
|
If you run into errors related to `sudachipy`, which is currently under active
|
||||||
development, we suggest downgrading to `sudachipy==0.4.5`, which is the version
|
development, we suggest downgrading to `sudachipy==0.4.9`, which is the version
|
||||||
used for training the current [Japanese pipelines](/models/ja).
|
used for training the current [Japanese pipelines](/models/ja).
|
||||||
|
|
||||||
</Infobox>
|
</Infobox>
|
||||||
|
|
Loading…
Reference in New Issue