Ines Montani
4fe2299586
xfail hanging test
2020-03-26 20:58:13 +01:00
Ines Montani
f12a46472c
Remove unicode declarations
2020-03-26 15:18:32 +01:00
Ines Montani
7453df79d1
Fix argument
2020-03-26 14:09:02 +01:00
Ines Montani
e7341db5dc
Add sent_start to pattern schema
2020-03-26 14:05:40 +01:00
Ines Montani
70ee4ef4fd
Fix small errors
2020-03-26 13:47:31 +01:00
Ines Montani
46568f40a7
Merge branch 'master' into tmp/sync
2020-03-26 13:38:14 +01:00
Tiljander
e53232533b
Describing priority rules for overlapping matches ( #5197 )
...
* Describing priority rules for overlapping matches
* Create Tiljander.md
* Describing priority rules for overlapping matches
* Update website/docs/api/entityruler.md
Co-Authored-By: Ines Montani <ines@ines.io>
Co-authored-by: Ines Montani <ines@ines.io>
2020-03-26 13:13:22 +01:00
adrianeboyd
8d3563f1c4
Minor bugfixes for train CLI ( #5186 )
...
* Omit per_type scores from model-best calculations
The addition of per_type scores to the included metrics (#4911 ) causes
errors when they're compared while determining the best model, so omit
them for this `max()` comparison.
* Add default speed data for interrupted train CLI
Add better speed meta defaults so that an interrupted iteration still
produces a best model.
Co-authored-by: Ines Montani <ines@ines.io>
2020-03-26 10:46:50 +01:00
adrianeboyd
a04f802099
Fix GoldParse init when token count differs ( #5191 )
...
Fix the `GoldParse` initialization when the number of tokens has changed
(due to merging subtokens with the parser).
2020-03-26 10:46:23 +01:00
adrianeboyd
d88a377bed
Remove Vectors.from_glove ( #5209 )
2020-03-26 10:45:47 +01:00
Ines Montani
828acffc12
Tidy up and auto-format
2020-03-25 12:28:12 +01:00
adrianeboyd
b71dd44dbc
Improved Romanian tokenization for UD RRT ( #5206 )
...
Modifications to Romanian tokenization to improve tokenization for
UD_Romanian-RRT.
2020-03-25 11:28:19 +01:00
adrianeboyd
86c43e55fa
Improve Lithuanian tokenization ( #5205 )
...
* Improve Lithuanian tokenization
Modify Lithuanian tokenization to improve performance for
UD_Lithuanian-ALKSNIS.
* Update Lithuanian tokenizer tests
2020-03-25 11:28:12 +01:00
adrianeboyd
1a944e5976
Improve Italian tokenization ( #5204 )
...
Improve Italian tokenization for UD_Italian-ISDT.
2020-03-25 11:28:02 +01:00
adrianeboyd
923a453449
Modifications/updates to Portuguese tokenization ( #5203 )
...
Modifications to Portuguese tokenization for UD_Portuguese-Bosque.
Instead of splitting contactions as exceptions, they are kept as merged
tokens.
2020-03-25 11:27:53 +01:00
adrianeboyd
4117a5c705
Improve French tokenization ( #5202 )
...
Improve French tokenization for UD_French-Sequoia.
2020-03-25 11:27:42 +01:00
Ines Montani
a3d09ffe61
Merge pull request #5201 from adrianeboyd/feature/ud-tokenization-nb-v2
...
Improved tokenization for UD_Norwegian-Bokmaal
2020-03-25 11:27:31 +01:00
Ines Montani
0e8dfdf77e
Merge pull request #5065 from adrianeboyd/feature/ud-tokenization-da
...
Add a few more Danish tokenizer exceptions
2020-03-25 11:27:19 +01:00
Sofie Van Landeghem
218e1706ac
Bugfix linking vectors ( #5196 )
...
* restore call to _load_vectors
* bump to thinc 8.0.0a3
* bump to 3.0.0.dev4
2020-03-25 10:20:11 +01:00
Adriane Boyd
09d442f5ad
Merge remote-tracking branch 'upstream/master' into feature/ud-tokenization-da
2020-03-25 09:41:52 +01:00
Adriane Boyd
cba2d1d972
Disable failing abbreviation test
...
UD_Danish-DDT has (as far as I can tell) hallucinated periods after
abbreviations, so the changes are an artifact of the corpus and not due
to anything meaningful about Danish tokenization.
2020-03-25 09:39:26 +01:00
Adriane Boyd
79737adb90
Improved tokenization for UD_Norwegian-Bokmaal
2020-03-25 08:54:02 +01:00
Ines Montani
5f2afa0479
Merge pull request #5185 from adrianeboyd/bugfix/de-punctuation-style
...
Improve German tokenizer settings style
2020-03-24 16:38:32 +01:00
Ines Montani
3fc2309c48
Merge pull request #5174 from Baciccin/master
...
Add Ligurian language
2020-03-24 16:33:59 +01:00
Ines Montani
f434d6aaa9
Merge pull request #5190 from guerda/patch-1
...
Remove max_length parameter in PhraseMatcher example
2020-03-24 16:32:12 +01:00
Philip Gillißen
128acb9ee1
Update guerda.md
2020-03-24 10:42:30 +01:00
Philip Gillißen
5d067bcc5e
Add SCA for guerda
2020-03-24 10:42:10 +01:00
Philip Gillißen
f8b4407a29
Remove max_length parameter
...
The parameter max_length is deprecated in PhraseMatcher, as stated here: https://spacy.io/api/phrasematcher#init
2020-03-24 10:22:12 +01:00
Ines Montani
fcac1ace78
Update macOS image on Azure Pipelines
2020-03-23 22:55:47 +01:00
Ines Montani
494ec23adb
Merge pull request #5187 from adrianeboyd/update/azure-images
...
Update from macOS-10.13 to macOS-10.14
2020-03-23 20:47:49 +01:00
Adriane Boyd
30d862d4d8
Update from macOS-10.13 to macOS-10.14
2020-03-23 19:52:57 +01:00
Adriane Boyd
2897a73559
Improve German tokenizer settings style
2020-03-23 19:23:47 +01:00
Ines Montani
341e8687f7
Merge pull request #5168 from svlandeg/fix/streamlit
...
fix showing dep arcs in streamlit script
2020-03-22 11:35:31 +01:00
Baciccin
3b53617a69
Add Ligurian language
2020-03-19 21:37:01 -07:00
svlandeg
02d87a8b2b
fix showing dep arcs in streamlit script
2020-03-19 10:30:20 +01:00
Ines Montani
80e7e1347e
Update universe.json [ci skip]
2020-03-17 22:21:34 +01:00
Ines Montani
eda6eff8b1
Update universe.json [ci skip]
2020-03-17 22:19:29 +01:00
Ines Montani
16e7301d34
Merge pull request #5161 from pmbaumgartner/master
...
add gobbli to spacy-universe 🥳
2020-03-17 22:18:30 +01:00
Peter B
b04057c204
add mentions of spaCy use
2020-03-17 15:03:43 -04:00
Ines Montani
b2b01a5c8b
Update universe.json [ci skip]
2020-03-17 19:53:31 +01:00
Peter B
d2ffb406ad
add gobbli to spacy-universe 🥳
2020-03-17 08:30:29 -04:00
Ines Montani
558032017e
Merge pull request #5157 from svlandeg/bugfix/language
...
remove unnecessary itertools call
2020-03-16 15:04:25 +01:00
Ines Montani
3944c1a65d
Merge pull request #5148 from svlandeg/fix/empty-docbin
...
Fix serialization of empty doc
2020-03-16 15:03:54 +01:00
Ines Montani
17bd9ed84f
Merge pull request #5153 from pinealan/fix/website-docs
...
Fix website typos and weird sentences
2020-03-16 15:03:01 +01:00
Ines Montani
2044216bd5
Merge pull request #5150 from sloev/master
...
add spacy_syllables to universe
2020-03-16 15:02:12 +01:00
Ines Montani
fb74679559
Merge pull request #5147 from mabraham/master
...
Fix broken link in docs
2020-03-16 14:59:52 +01:00
Ines Montani
c68f20b398
Merge pull request #5146 from adrianeboyd/bugfix/assert-docs-equal-sents
...
Fix sents comparison in test util
2020-03-16 14:59:32 +01:00
svlandeg
fba219f737
remove unnecessary itertools call
2020-03-16 08:31:36 +01:00
Alan Chan
1ae01684cf
Fill in contributor agreement
2020-03-15 03:45:20 +08:00
Alan Chan
2124be100d
Tweak run-on sentence
2020-03-15 03:45:20 +08:00