Commit Graph

51 Commits

Author SHA1 Message Date
Zhangrp 9f986af120
Add example sentence for Chinese in website meta (#11879) 2022-11-28 14:50:30 +09:00
Adriane Boyd 8740e4341f
Update languages and version in README and website (#11694) 2022-10-25 14:54:54 +02:00
Adriane Boyd 81874265e9 Merge remote-tracking branch 'upstream/master' into chore/update-develop-from-master-v3.5-1 2022-08-24 12:47:42 +02:00
Tobius Saul c09d2fa25b
luganda language extension (#10847)
* luganda language extension

* __init__.py changes

* New enhancements

* Lexical attribute changed

* punctuaction and sentence additions

* Remove comment header

* Fix typos, reformat

* reformated version

* Add tokenizer test

* Remove contractions from stop words

* Format

* Add Luganda to website

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2022-08-23 13:09:36 +02:00
Adriane Boyd 5fa8f4faca
Switch ru and uk lemmatizers to pymorphy3 (#11345)
* Switch ru and uk lemmatizers to pymorphy3

* Switch to pymorphy3 in tests
2022-08-22 11:27:14 +02:00
Adriane Boyd 09b3118b26
Add uk pipelines to website (#11332) 2022-08-18 14:04:57 +02:00
Adriane Boyd 11f859c132
Docs for v3.4 (#11057)
* Add draft of v3.4 usage

* Add Croatian models

* Add Matcher min/max

* Update release notes

* Minor edits

* Add updates, tables

* Update pydantic/mypy versions

* Update version in README

* Fix sidebar
2022-07-11 15:36:31 +02:00
Adriane Boyd 497a708c71
Docs for v3.3 (#10628)
* Temporarily disable CI tests

* Start v3.3 website updates

* Add trainable lemmatizer to pipeline design

* Fix Vectors.most_similar

* Add floret vector info to pipeline design

* Add Lower and Upper Sorbian

* Add span to sidebar

* Work on release notes

* Copy from release notes

* Update pipeline design graphic

* Upgrading note about Doc.from_docs

* Add tables and details

* Update website/docs/models/index.md

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Fix da lemma acc

* Add minimal intro, various updates

* Round lemma acc

* Add section on floret / word lists

* Add new pipelines table, minor edits

* Fix displacy spans example title

* Clarify adding non-trainable lemmatizer

* Update adding-languages URLs

* Revert "Temporarily disable CI tests"

This reverts commit 1dee505920.

* Spell out words/sec

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2022-04-28 14:09:35 +02:00
Adriane Boyd b2bbefd0b5
Add Finnish, Korean, and Swedish models and Korean support notes (#10355)
* Add Finnish, Korean, and Swedish models to website

* Add Korean language support notes
2022-03-07 17:03:45 +01:00
Adriane Boyd 3f181b73d0
Add ja_core_news_trf to website (#9515) 2021-10-20 10:18:02 +02:00
Adriane Boyd 1ee5bee29d
Add Macedonian models to website (#8637) 2021-07-08 09:32:14 +02:00
Adriane Boyd 63d748f80e
Add Catalan and Danish trf to website models (#8378) 2021-06-14 09:50:13 +02:00
vincent d warmerdam 1b0d413e45
Removed Languages that were listed twice on Docs (#7272)
* removed languages that were listed twice

* sorted

* d0h

* the d0h strikes back when you dont hit save
2021-03-05 14:31:15 +01:00
Ines Montani 06e66d4ced Update languages.json [ci skip] 2021-02-13 12:33:17 +11:00
Ines Montani 230e651ad6 Merge branch 'develop' into master-tmp 2021-01-27 13:26:29 +11:00
muratjumashev 7d0154a36e Added language meta data 2021-01-25 00:42:19 +06:00
Adriane Boyd 7cd5c9e098 Add xx_sent_ud_sm model to website 2021-01-19 09:02:35 +01:00
Adriane Boyd e8f6400923 Update languages for website
* Add Macedonian
* Add Russian dependencies
* Switch Chinese dependency to spacy-pkuseg
2021-01-18 14:09:34 +01:00
Adriane Boyd 724831b066 Merge remote-tracking branch 'upstream/master' into chore/update-develop-from-master
* Update Macedonian for v3
* Update Turkish for v3
2020-11-25 11:49:34 +01:00
Adriane Boyd 8cc5ed6771 Add Macedonian to website languages 2020-10-29 08:49:56 +01:00
Adriane Boyd 4dd86306e9
Add Nepali to supported languages on website (#6315) 2020-10-28 16:32:07 +01:00
Adriane Boyd e896803792 Add and update website license links 2020-10-16 17:01:52 +02:00
Ines Montani 050aa1e0e2 Update languages.json [ci skip] 2020-10-14 20:51:50 +02:00
Ines Montani a966c271f7 Update models docs [ci skip] 2020-10-14 20:50:23 +02:00
Ines Montani 741796e500 Update docs [ci skip] 2020-10-08 14:31:34 +02:00
Ines Montani e06ff8b71d Update docs [ci skip] 2020-09-26 13:18:08 +02:00
Ines Montani d8f661c910 Update docs [ci skip] 2020-09-23 09:30:26 +02:00
Ines Montani 19b9ea0436 Fix languages.json 2020-06-16 18:34:11 +02:00
Adriane Boyd d5110ffbf2
Documentation updates for v2.3.0 (#5593)
* Update website models for v2.3.0

* Add docs for Chinese word segmentation

* Tighten up Chinese docs section

* Merge branch 'master' into docs/v2.3.0 [ci skip]

* Merge branch 'master' into docs/v2.3.0 [ci skip]

* Auto-format and update version

* Update matcher.md

* Update languages and sorting

* Typo in landing page

* Infobox about token_match behavior

* Add meta and basic docs for Japanese

* POS -> TAG in models table

* Add info about lookups for normalization

* Updates to API docs for v2.3

* Update adding norm exceptions for adding languages

* Add --omit-extra-lookups to CLI API docs

* Add initial draft of "What's New in v2.3"

* Add new in v2.3 tags to Chinese and Japanese sections

* Add tokenizer to migration section

* Add new in v2.3 flags to init-model

* Typo

* More what's new in v2.3

Co-authored-by: Ines Montani <ines@ines.io>
2020-06-16 15:37:35 +02:00
Baciccin 3b53617a69 Add Ligurian language 2020-03-19 21:37:01 -07:00
Ines Montani 1d6aec805d Fix formatting and update docs for v2.2.4 2020-03-09 11:17:20 +01:00
Ines Montani 1b838d1313 Divide models into core and starters [ci skip] 2019-12-21 14:10:22 +01:00
Paul O'Leary McCann f0e3e606a6 Replace python-mecab3 with fugashi for Japanese (#4621)
* Switch from mecab-python3 to fugashi

mecab-python3 has been the best MeCab binding for a long time but it's
not very actively maintained, and since it's based on old SWIG code
distributed with MeCab there's a limit to how effectively it can be
maintained.

Fugashi is a new Cython-based MeCab wrapper I wrote. Since it's not
based on the old SWIG code it's easier to keep it current and make small
deviations from the MeCab C/C++ API where that makes sense.

* Change mecab-python3 to fugashi in setup.cfg

* Change "mecab tags" to "unidic tags"

The tags come from MeCab, but the tag schema is specified by Unidic, so
it's more proper to refer to it that way.

* Update conftest

* Add fugashi link to external deps list for Japanese
2019-11-23 14:31:04 +01:00
Ines Montani 1180304449 Update languages.json [ci skip] 2019-10-26 13:51:42 +02:00
Ines Montani 8f76d6c9ef Update transformer model details [ci skip] 2019-10-08 15:39:38 +02:00
Ines Montani 3624153591 Update languages.json [ci skip] 2019-09-27 15:15:41 +02:00
Ines Montani 23e28e2844 Merge branch 'master' into develop 2019-09-15 17:57:09 +02:00
Ines Montani c7e4ea7154 Update examples and languages.json [ci skip] 2019-09-15 17:56:40 +02:00
Ines Montani fe87ccc8d1 Update languages.json [ci skip] 2019-09-14 16:23:50 +02:00
Ines Montani 2f31f96fce Update languages.json [ci skip] 2019-09-04 18:15:42 +02:00
Ines Montani 2245e95e2d Update languages.json [ci skip] 2019-09-04 17:11:40 +02:00
Ines Montani 48385552c6 Update languages.json [ci skip] 2019-08-27 11:52:51 +02:00
Pavle Vidanović 4fe9329bfb Serbian language code update "rs" -> "sr" (#4159)
* Serbian stopwords added. (cyrillic alphabet)

* spaCy Contribution agreement included.

* Test initialize updated

* Serbian language code update. --bugfix
2019-08-21 19:57:37 +02:00
Ines Montani 3e60afacf9 Add Serbian to languages [ci skip] 2019-08-07 13:38:25 +02:00
Ines Montani 7f3212e2f5
💫 Sync branches (#4084) [ci skip]
* Update from master

* Re-added Universe readme (#3688) (closes #3680)

* Fix typo

* Add version tag to `--base-model` argument (closes #3720)

* fixing regex matcher examples (#3708) (#3719)

* Improve Token.prob and Lexeme.prob docs (resolves #3701)

* Fix DependencyParser.predict docs (resolves #3561)

* Update languages.json


Co-authored-by: Bram Vanroy <Bram.Vanroy@UGent.be>
Co-authored-by: Aaron Kub <aaronkub@gmail.com>
2019-08-05 14:32:54 +02:00
Ines Montani 4ebb4865fe Update languages.json 2019-07-10 11:19:48 +02:00
cedar101 58f06e6180 Korean support (#3901)
* start lang/ko

* add test codes

* using natto-py

* add test_ko_tokenizer_full_tags()

* spaCy contributor agreement

* external dependency for ko

* collections.namedtuple for python version < 3.5

* case fix

* tuple unpacking

* add jongseong(final consonant)

* apply mecab option

* Remove Pipfile for now


Co-authored-by: Ines Montani <ines@ines.io>
2019-07-09 22:23:16 +02:00
Ines Montani 4f1dae1c6b Update languages and examples (see #1107) 2019-06-26 16:19:17 +02:00
Ines Montani 9e14b2b69f Add Estonian to docs [ci skip] (closes #3482) 2019-03-25 18:01:54 +01:00
Ines Montani c5476bd75b Update languages.json 2019-02-18 10:03:35 +01:00