Commit Graph

11272 Commits

Author SHA1 Message Date
adrianeboyd b71dd44dbc
Improved Romanian tokenization for UD RRT (#5206)
Modifications to Romanian tokenization to improve tokenization for
UD_Romanian-RRT.
2020-03-25 11:28:19 +01:00
adrianeboyd 86c43e55fa
Improve Lithuanian tokenization (#5205)
* Improve Lithuanian tokenization

Modify Lithuanian tokenization to improve performance for
UD_Lithuanian-ALKSNIS.

* Update Lithuanian tokenizer tests
2020-03-25 11:28:12 +01:00
adrianeboyd 1a944e5976
Improve Italian tokenization (#5204)
Improve Italian tokenization for UD_Italian-ISDT.
2020-03-25 11:28:02 +01:00
adrianeboyd 923a453449
Modifications/updates to Portuguese tokenization (#5203)
Modifications to Portuguese tokenization for UD_Portuguese-Bosque.
Instead of splitting contactions as exceptions, they are kept as merged
tokens.
2020-03-25 11:27:53 +01:00
adrianeboyd 4117a5c705
Improve French tokenization (#5202)
Improve French tokenization for UD_French-Sequoia.
2020-03-25 11:27:42 +01:00
Ines Montani a3d09ffe61
Merge pull request #5201 from adrianeboyd/feature/ud-tokenization-nb-v2
Improved tokenization for UD_Norwegian-Bokmaal
2020-03-25 11:27:31 +01:00
Ines Montani 0e8dfdf77e
Merge pull request #5065 from adrianeboyd/feature/ud-tokenization-da
Add a few more Danish tokenizer exceptions
2020-03-25 11:27:19 +01:00
Adriane Boyd 09d442f5ad Merge remote-tracking branch 'upstream/master' into feature/ud-tokenization-da 2020-03-25 09:41:52 +01:00
Adriane Boyd cba2d1d972 Disable failing abbreviation test
UD_Danish-DDT has (as far as I can tell) hallucinated periods after
abbreviations, so the changes are an artifact of the corpus and not due
to anything meaningful about Danish tokenization.
2020-03-25 09:39:26 +01:00
Adriane Boyd 79737adb90 Improved tokenization for UD_Norwegian-Bokmaal 2020-03-25 08:54:02 +01:00
Ines Montani 5f2afa0479
Merge pull request #5185 from adrianeboyd/bugfix/de-punctuation-style
Improve German tokenizer settings style
2020-03-24 16:38:32 +01:00
Ines Montani 3fc2309c48
Merge pull request #5174 from Baciccin/master
Add Ligurian language
2020-03-24 16:33:59 +01:00
Ines Montani f434d6aaa9
Merge pull request #5190 from guerda/patch-1
Remove max_length parameter in PhraseMatcher example
2020-03-24 16:32:12 +01:00
Philip Gillißen 128acb9ee1
Update guerda.md 2020-03-24 10:42:30 +01:00
Philip Gillißen 5d067bcc5e
Add SCA for guerda 2020-03-24 10:42:10 +01:00
Philip Gillißen f8b4407a29
Remove max_length parameter
The parameter max_length is deprecated in PhraseMatcher, as stated here: https://spacy.io/api/phrasematcher#init
2020-03-24 10:22:12 +01:00
Ines Montani 494ec23adb
Merge pull request #5187 from adrianeboyd/update/azure-images
Update from macOS-10.13 to macOS-10.14
2020-03-23 20:47:49 +01:00
Adriane Boyd 30d862d4d8 Update from macOS-10.13 to macOS-10.14 2020-03-23 19:52:57 +01:00
Adriane Boyd 2897a73559 Improve German tokenizer settings style 2020-03-23 19:23:47 +01:00
Baciccin 3b53617a69 Add Ligurian language 2020-03-19 21:37:01 -07:00
Ines Montani 80e7e1347e Update universe.json [ci skip] 2020-03-17 22:21:34 +01:00
Ines Montani eda6eff8b1 Update universe.json [ci skip] 2020-03-17 22:19:29 +01:00
Ines Montani 16e7301d34
Merge pull request #5161 from pmbaumgartner/master
add gobbli to spacy-universe 🥳
2020-03-17 22:18:30 +01:00
Peter B b04057c204 add mentions of spaCy use 2020-03-17 15:03:43 -04:00
Ines Montani b2b01a5c8b Update universe.json [ci skip] 2020-03-17 19:53:31 +01:00
Peter B d2ffb406ad add gobbli to spacy-universe 🥳 2020-03-17 08:30:29 -04:00
Ines Montani 17bd9ed84f
Merge pull request #5153 from pinealan/fix/website-docs
Fix website typos and weird sentences
2020-03-16 15:03:01 +01:00
Ines Montani 2044216bd5
Merge pull request #5150 from sloev/master
add spacy_syllables to universe
2020-03-16 15:02:12 +01:00
Ines Montani fb74679559
Merge pull request #5147 from mabraham/master
Fix broken link in docs
2020-03-16 14:59:52 +01:00
Ines Montani c68f20b398
Merge pull request #5146 from adrianeboyd/bugfix/assert-docs-equal-sents
Fix sents comparison in test util
2020-03-16 14:59:32 +01:00
Alan Chan 1ae01684cf Fill in contributor agreement 2020-03-15 03:45:20 +08:00
Alan Chan 2124be100d Tweak run-on sentence 2020-03-15 03:45:20 +08:00
Alan Chan 7c3a4ce933 Missing word in api/cli doc 2020-03-15 03:45:20 +08:00
Alan Chan 36e3532475 Remove unfinished sentence 2020-03-15 03:45:17 +08:00
nihil 9cde7eb08c add spacy_syllables to universe + sign contributor agreement 2020-03-13 18:09:42 +01:00
Mark Abraham a0ffa346c0 Fix broken link in docs 2020-03-13 14:07:26 +01:00
Adriane Boyd 423849f94a Fix sents comparison in test util
Due to changes to `Span` (#5005), spans from different documents are now
never equal. Check `Token.is_sent_start` values instead.
2020-03-13 09:25:23 +01:00
Matthew Honnibal 26a90f011b Set version to v2.2.4 2020-03-12 11:30:41 +01:00
Ines Montani c669435c62
Merge pull request #5125 from renaud/patch-1
small typo in code sample
2020-03-12 11:19:12 +01:00
Ines Montani 4130fef4ec
Merge pull request #5127 from svlandeg/docs/empty-doc
is_XXX is True if doc is empty
2020-03-12 11:18:10 +01:00
Ines Montani 3497b2973d
Merge pull request #5130 from merrcury/patch-1
DOC : Update LICENSE Year
2020-03-12 11:17:38 +01:00
Himanshu Garg 27d1300bdb
Create merrcury.md 2020-03-10 15:11:07 +05:30
Himanshu Garg ba47d5a5cb
Update LICENSE Year 2020-03-10 15:03:29 +05:30
svlandeg c4d030dbf6 remove accidental commit 2020-03-09 18:10:54 +01:00
svlandeg 1724a4f75b additional information if doc is empty 2020-03-09 18:08:18 +01:00
Renaud Richardet eccf6b1686
small typo in code sample 2020-03-09 14:49:11 +01:00
Ines Montani 1d6aec805d Fix formatting and update docs for v2.2.4 2020-03-09 11:17:20 +01:00
Ines Montani 5f68004264 Port over gitignore changes from develop
Prevents stale files when switching branches
2020-03-09 11:05:00 +01:00
Mark Abraham 0345135167
Tokenizer to_disk and from_disk now ensure paths (#5116)
* Tokenizer to_disk and from_disk now ensure strings are converted to paths

Fixes #5115

* Sign contributor agreement
2020-03-08 13:25:56 +01:00
Yohei Tamura 31755630a7
fix typ (#5106) 2020-03-08 13:24:38 +01:00