Zhangrp
9f986af120
Add example sentence for Chinese in website meta ( #11879 )
2022-11-28 14:50:30 +09:00
Adriane Boyd
8740e4341f
Update languages and version in README and website ( #11694 )
2022-10-25 14:54:54 +02:00
Adriane Boyd
81874265e9
Merge remote-tracking branch 'upstream/master' into chore/update-develop-from-master-v3.5-1
2022-08-24 12:47:42 +02:00
Tobius Saul
c09d2fa25b
luganda language extension ( #10847 )
...
* luganda language extension
* __init__.py changes
* New enhancements
* Lexical attribute changed
* punctuaction and sentence additions
* Remove comment header
* Fix typos, reformat
* reformated version
* Add tokenizer test
* Remove contractions from stop words
* Format
* Add Luganda to website
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2022-08-23 13:09:36 +02:00
Adriane Boyd
5fa8f4faca
Switch ru and uk lemmatizers to pymorphy3 ( #11345 )
...
* Switch ru and uk lemmatizers to pymorphy3
* Switch to pymorphy3 in tests
2022-08-22 11:27:14 +02:00
Adriane Boyd
09b3118b26
Add uk pipelines to website ( #11332 )
2022-08-18 14:04:57 +02:00
Adriane Boyd
11f859c132
Docs for v3.4 ( #11057 )
...
* Add draft of v3.4 usage
* Add Croatian models
* Add Matcher min/max
* Update release notes
* Minor edits
* Add updates, tables
* Update pydantic/mypy versions
* Update version in README
* Fix sidebar
2022-07-11 15:36:31 +02:00
Adriane Boyd
497a708c71
Docs for v3.3 ( #10628 )
...
* Temporarily disable CI tests
* Start v3.3 website updates
* Add trainable lemmatizer to pipeline design
* Fix Vectors.most_similar
* Add floret vector info to pipeline design
* Add Lower and Upper Sorbian
* Add span to sidebar
* Work on release notes
* Copy from release notes
* Update pipeline design graphic
* Upgrading note about Doc.from_docs
* Add tables and details
* Update website/docs/models/index.md
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Fix da lemma acc
* Add minimal intro, various updates
* Round lemma acc
* Add section on floret / word lists
* Add new pipelines table, minor edits
* Fix displacy spans example title
* Clarify adding non-trainable lemmatizer
* Update adding-languages URLs
* Revert "Temporarily disable CI tests"
This reverts commit 1dee505920
.
* Spell out words/sec
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2022-04-28 14:09:35 +02:00
Adriane Boyd
b2bbefd0b5
Add Finnish, Korean, and Swedish models and Korean support notes ( #10355 )
...
* Add Finnish, Korean, and Swedish models to website
* Add Korean language support notes
2022-03-07 17:03:45 +01:00
Adriane Boyd
3f181b73d0
Add ja_core_news_trf to website ( #9515 )
2021-10-20 10:18:02 +02:00
Adriane Boyd
1ee5bee29d
Add Macedonian models to website ( #8637 )
2021-07-08 09:32:14 +02:00
Adriane Boyd
63d748f80e
Add Catalan and Danish trf to website models ( #8378 )
2021-06-14 09:50:13 +02:00
vincent d warmerdam
1b0d413e45
Removed Languages that were listed twice on Docs ( #7272 )
...
* removed languages that were listed twice
* sorted
* d0h
* the d0h strikes back when you dont hit save
2021-03-05 14:31:15 +01:00
Ines Montani
06e66d4ced
Update languages.json [ci skip]
2021-02-13 12:33:17 +11:00
Ines Montani
230e651ad6
Merge branch 'develop' into master-tmp
2021-01-27 13:26:29 +11:00
muratjumashev
7d0154a36e
Added language meta data
2021-01-25 00:42:19 +06:00
Adriane Boyd
7cd5c9e098
Add xx_sent_ud_sm model to website
2021-01-19 09:02:35 +01:00
Adriane Boyd
e8f6400923
Update languages for website
...
* Add Macedonian
* Add Russian dependencies
* Switch Chinese dependency to spacy-pkuseg
2021-01-18 14:09:34 +01:00
Adriane Boyd
724831b066
Merge remote-tracking branch 'upstream/master' into chore/update-develop-from-master
...
* Update Macedonian for v3
* Update Turkish for v3
2020-11-25 11:49:34 +01:00
Adriane Boyd
8cc5ed6771
Add Macedonian to website languages
2020-10-29 08:49:56 +01:00
Adriane Boyd
4dd86306e9
Add Nepali to supported languages on website ( #6315 )
2020-10-28 16:32:07 +01:00
Adriane Boyd
e896803792
Add and update website license links
2020-10-16 17:01:52 +02:00
Ines Montani
050aa1e0e2
Update languages.json [ci skip]
2020-10-14 20:51:50 +02:00
Ines Montani
a966c271f7
Update models docs [ci skip]
2020-10-14 20:50:23 +02:00
Ines Montani
741796e500
Update docs [ci skip]
2020-10-08 14:31:34 +02:00
Ines Montani
e06ff8b71d
Update docs [ci skip]
2020-09-26 13:18:08 +02:00
Ines Montani
d8f661c910
Update docs [ci skip]
2020-09-23 09:30:26 +02:00
Ines Montani
19b9ea0436
Fix languages.json
2020-06-16 18:34:11 +02:00
Adriane Boyd
d5110ffbf2
Documentation updates for v2.3.0 ( #5593 )
...
* Update website models for v2.3.0
* Add docs for Chinese word segmentation
* Tighten up Chinese docs section
* Merge branch 'master' into docs/v2.3.0 [ci skip]
* Merge branch 'master' into docs/v2.3.0 [ci skip]
* Auto-format and update version
* Update matcher.md
* Update languages and sorting
* Typo in landing page
* Infobox about token_match behavior
* Add meta and basic docs for Japanese
* POS -> TAG in models table
* Add info about lookups for normalization
* Updates to API docs for v2.3
* Update adding norm exceptions for adding languages
* Add --omit-extra-lookups to CLI API docs
* Add initial draft of "What's New in v2.3"
* Add new in v2.3 tags to Chinese and Japanese sections
* Add tokenizer to migration section
* Add new in v2.3 flags to init-model
* Typo
* More what's new in v2.3
Co-authored-by: Ines Montani <ines@ines.io>
2020-06-16 15:37:35 +02:00
Baciccin
3b53617a69
Add Ligurian language
2020-03-19 21:37:01 -07:00
Ines Montani
1d6aec805d
Fix formatting and update docs for v2.2.4
2020-03-09 11:17:20 +01:00
Ines Montani
1b838d1313
Divide models into core and starters [ci skip]
2019-12-21 14:10:22 +01:00
Paul O'Leary McCann
f0e3e606a6
Replace python-mecab3 with fugashi for Japanese ( #4621 )
...
* Switch from mecab-python3 to fugashi
mecab-python3 has been the best MeCab binding for a long time but it's
not very actively maintained, and since it's based on old SWIG code
distributed with MeCab there's a limit to how effectively it can be
maintained.
Fugashi is a new Cython-based MeCab wrapper I wrote. Since it's not
based on the old SWIG code it's easier to keep it current and make small
deviations from the MeCab C/C++ API where that makes sense.
* Change mecab-python3 to fugashi in setup.cfg
* Change "mecab tags" to "unidic tags"
The tags come from MeCab, but the tag schema is specified by Unidic, so
it's more proper to refer to it that way.
* Update conftest
* Add fugashi link to external deps list for Japanese
2019-11-23 14:31:04 +01:00
Ines Montani
1180304449
Update languages.json [ci skip]
2019-10-26 13:51:42 +02:00
Ines Montani
8f76d6c9ef
Update transformer model details [ci skip]
2019-10-08 15:39:38 +02:00
Ines Montani
3624153591
Update languages.json [ci skip]
2019-09-27 15:15:41 +02:00
Ines Montani
23e28e2844
Merge branch 'master' into develop
2019-09-15 17:57:09 +02:00
Ines Montani
c7e4ea7154
Update examples and languages.json [ci skip]
2019-09-15 17:56:40 +02:00
Ines Montani
fe87ccc8d1
Update languages.json [ci skip]
2019-09-14 16:23:50 +02:00
Ines Montani
2f31f96fce
Update languages.json [ci skip]
2019-09-04 18:15:42 +02:00
Ines Montani
2245e95e2d
Update languages.json [ci skip]
2019-09-04 17:11:40 +02:00
Ines Montani
48385552c6
Update languages.json [ci skip]
2019-08-27 11:52:51 +02:00
Pavle Vidanović
4fe9329bfb
Serbian language code update "rs" -> "sr" ( #4159 )
...
* Serbian stopwords added. (cyrillic alphabet)
* spaCy Contribution agreement included.
* Test initialize updated
* Serbian language code update. --bugfix
2019-08-21 19:57:37 +02:00
Ines Montani
3e60afacf9
Add Serbian to languages [ci skip]
2019-08-07 13:38:25 +02:00
Ines Montani
7f3212e2f5
💫 Sync branches ( #4084 ) [ci skip]
...
* Update from master
* Re-added Universe readme (#3688 ) (closes #3680 )
* Fix typo
* Add version tag to `--base-model` argument (closes #3720 )
* fixing regex matcher examples (#3708 ) (#3719 )
* Improve Token.prob and Lexeme.prob docs (resolves #3701 )
* Fix DependencyParser.predict docs (resolves #3561 )
* Update languages.json
Co-authored-by: Bram Vanroy <Bram.Vanroy@UGent.be>
Co-authored-by: Aaron Kub <aaronkub@gmail.com>
2019-08-05 14:32:54 +02:00
Ines Montani
4ebb4865fe
Update languages.json
2019-07-10 11:19:48 +02:00
cedar101
58f06e6180
Korean support ( #3901 )
...
* start lang/ko
* add test codes
* using natto-py
* add test_ko_tokenizer_full_tags()
* spaCy contributor agreement
* external dependency for ko
* collections.namedtuple for python version < 3.5
* case fix
* tuple unpacking
* add jongseong(final consonant)
* apply mecab option
* Remove Pipfile for now
Co-authored-by: Ines Montani <ines@ines.io>
2019-07-09 22:23:16 +02:00
Ines Montani
4f1dae1c6b
Update languages and examples (see #1107 )
2019-06-26 16:19:17 +02:00
Ines Montani
9e14b2b69f
Add Estonian to docs [ci skip] ( closes #3482 )
2019-03-25 18:01:54 +01:00
Ines Montani
c5476bd75b
Update languages.json
2019-02-18 10:03:35 +01:00