Commit Graph

11385 Commits

Author SHA1 Message Date
adrianeboyd ce0e538068
Check whether doc is instantiated in Example.get_gold_parses() (#5167)
* Check whether doc is instantiated

When creating docs to pair with gold parses, modify test to check
whether a doc is unset rather than whether it contains tokens.

* Restore test of evaluate on an empty doc

* Set a minimal gold.orig for the scorer

Without a minimal gold.orig the scorer can't evaluate empty docs. This
is the v3 equivalent of #4925.
2020-03-29 13:57:00 +02:00
Sofie Van Landeghem d6d95674c1
bugfix in span similarity (#5155)
* bugfix in span similarity

* also rewrite doc.pyx for clarity

* formatting
2020-03-29 13:56:07 +02:00
Sofie Van Landeghem 1f9852abc3
Fix parser @ GPU (#5210)
* ensure self.bias is numpy array in parser model

* 2 more little bug fixes for parser on GPU

* removing testing GPU statement

* remove commented code
2020-03-28 23:09:35 +01:00
Sofie Van Landeghem 9b412516e7
Fixing pickling of the parser (#5218)
* fix __reduce__ for pickling parser

* setting the move object as 'state' during pickling

* unskip test_issue4725 - works again
2020-03-27 19:35:26 +01:00
Ines Montani a0858ae761
Merge pull request #5213 from explosion/tmp/sync
Try master -> develop sync again (part 2)
2020-03-27 11:39:46 +01:00
Ines Montani 92b9b631ef xfail -> skip 2020-03-27 10:51:32 +01:00
Ines Montani ee4bb0e3b6 Fix import 2020-03-26 21:44:18 +01:00
Ines Montani 4fe2299586 xfail hanging test 2020-03-26 20:58:13 +01:00
Ines Montani f12a46472c Remove unicode declarations 2020-03-26 15:18:32 +01:00
Ines Montani 7453df79d1 Fix argument 2020-03-26 14:09:02 +01:00
Ines Montani e7341db5dc Add sent_start to pattern schema 2020-03-26 14:05:40 +01:00
Ines Montani 70ee4ef4fd Fix small errors 2020-03-26 13:47:31 +01:00
Ines Montani 46568f40a7 Merge branch 'master' into tmp/sync 2020-03-26 13:38:14 +01:00
Tiljander e53232533b
Describing priority rules for overlapping matches (#5197)
* Describing priority rules for overlapping matches

* Create Tiljander.md

* Describing priority rules for overlapping matches

* Update website/docs/api/entityruler.md

Co-Authored-By: Ines Montani <ines@ines.io>

Co-authored-by: Ines Montani <ines@ines.io>
2020-03-26 13:13:22 +01:00
adrianeboyd 8d3563f1c4
Minor bugfixes for train CLI (#5186)
* Omit per_type scores from model-best calculations

The addition of per_type scores to the included metrics (#4911) causes
errors when they're compared while determining the best model, so omit
them for this `max()` comparison.

* Add default speed data for interrupted train CLI

Add better speed meta defaults so that an interrupted iteration still
produces a best model.

Co-authored-by: Ines Montani <ines@ines.io>
2020-03-26 10:46:50 +01:00
adrianeboyd a04f802099
Fix GoldParse init when token count differs (#5191)
Fix the `GoldParse` initialization when the number of tokens has changed
(due to merging subtokens with the parser).
2020-03-26 10:46:23 +01:00
adrianeboyd d88a377bed
Remove Vectors.from_glove (#5209) 2020-03-26 10:45:47 +01:00
Ines Montani 828acffc12 Tidy up and auto-format 2020-03-25 12:28:12 +01:00
adrianeboyd b71dd44dbc
Improved Romanian tokenization for UD RRT (#5206)
Modifications to Romanian tokenization to improve tokenization for
UD_Romanian-RRT.
2020-03-25 11:28:19 +01:00
adrianeboyd 86c43e55fa
Improve Lithuanian tokenization (#5205)
* Improve Lithuanian tokenization

Modify Lithuanian tokenization to improve performance for
UD_Lithuanian-ALKSNIS.

* Update Lithuanian tokenizer tests
2020-03-25 11:28:12 +01:00
adrianeboyd 1a944e5976
Improve Italian tokenization (#5204)
Improve Italian tokenization for UD_Italian-ISDT.
2020-03-25 11:28:02 +01:00
adrianeboyd 923a453449
Modifications/updates to Portuguese tokenization (#5203)
Modifications to Portuguese tokenization for UD_Portuguese-Bosque.
Instead of splitting contactions as exceptions, they are kept as merged
tokens.
2020-03-25 11:27:53 +01:00
adrianeboyd 4117a5c705
Improve French tokenization (#5202)
Improve French tokenization for UD_French-Sequoia.
2020-03-25 11:27:42 +01:00
Ines Montani a3d09ffe61
Merge pull request #5201 from adrianeboyd/feature/ud-tokenization-nb-v2
Improved tokenization for UD_Norwegian-Bokmaal
2020-03-25 11:27:31 +01:00
Ines Montani 0e8dfdf77e
Merge pull request #5065 from adrianeboyd/feature/ud-tokenization-da
Add a few more Danish tokenizer exceptions
2020-03-25 11:27:19 +01:00
Sofie Van Landeghem 218e1706ac
Bugfix linking vectors (#5196)
* restore call to _load_vectors

* bump to thinc 8.0.0a3

* bump to 3.0.0.dev4
2020-03-25 10:20:11 +01:00
Adriane Boyd 09d442f5ad Merge remote-tracking branch 'upstream/master' into feature/ud-tokenization-da 2020-03-25 09:41:52 +01:00
Adriane Boyd cba2d1d972 Disable failing abbreviation test
UD_Danish-DDT has (as far as I can tell) hallucinated periods after
abbreviations, so the changes are an artifact of the corpus and not due
to anything meaningful about Danish tokenization.
2020-03-25 09:39:26 +01:00
Adriane Boyd 79737adb90 Improved tokenization for UD_Norwegian-Bokmaal 2020-03-25 08:54:02 +01:00
Ines Montani 5f2afa0479
Merge pull request #5185 from adrianeboyd/bugfix/de-punctuation-style
Improve German tokenizer settings style
2020-03-24 16:38:32 +01:00
Ines Montani 3fc2309c48
Merge pull request #5174 from Baciccin/master
Add Ligurian language
2020-03-24 16:33:59 +01:00
Ines Montani f434d6aaa9
Merge pull request #5190 from guerda/patch-1
Remove max_length parameter in PhraseMatcher example
2020-03-24 16:32:12 +01:00
Philip Gillißen 128acb9ee1
Update guerda.md 2020-03-24 10:42:30 +01:00
Philip Gillißen 5d067bcc5e
Add SCA for guerda 2020-03-24 10:42:10 +01:00
Philip Gillißen f8b4407a29
Remove max_length parameter
The parameter max_length is deprecated in PhraseMatcher, as stated here: https://spacy.io/api/phrasematcher#init
2020-03-24 10:22:12 +01:00
Ines Montani fcac1ace78 Update macOS image on Azure Pipelines 2020-03-23 22:55:47 +01:00
Ines Montani 494ec23adb
Merge pull request #5187 from adrianeboyd/update/azure-images
Update from macOS-10.13 to macOS-10.14
2020-03-23 20:47:49 +01:00
Adriane Boyd 30d862d4d8 Update from macOS-10.13 to macOS-10.14 2020-03-23 19:52:57 +01:00
Adriane Boyd 2897a73559 Improve German tokenizer settings style 2020-03-23 19:23:47 +01:00
Ines Montani 341e8687f7
Merge pull request #5168 from svlandeg/fix/streamlit
fix showing dep arcs in streamlit script
2020-03-22 11:35:31 +01:00
Baciccin 3b53617a69 Add Ligurian language 2020-03-19 21:37:01 -07:00
svlandeg 02d87a8b2b fix showing dep arcs in streamlit script 2020-03-19 10:30:20 +01:00
Ines Montani 80e7e1347e Update universe.json [ci skip] 2020-03-17 22:21:34 +01:00
Ines Montani eda6eff8b1 Update universe.json [ci skip] 2020-03-17 22:19:29 +01:00
Ines Montani 16e7301d34
Merge pull request #5161 from pmbaumgartner/master
add gobbli to spacy-universe 🥳
2020-03-17 22:18:30 +01:00
Peter B b04057c204 add mentions of spaCy use 2020-03-17 15:03:43 -04:00
Ines Montani b2b01a5c8b Update universe.json [ci skip] 2020-03-17 19:53:31 +01:00
Peter B d2ffb406ad add gobbli to spacy-universe 🥳 2020-03-17 08:30:29 -04:00
Ines Montani 558032017e
Merge pull request #5157 from svlandeg/bugfix/language
remove unnecessary itertools call
2020-03-16 15:04:25 +01:00
Ines Montani 3944c1a65d
Merge pull request #5148 from svlandeg/fix/empty-docbin
Fix serialization of empty doc
2020-03-16 15:03:54 +01:00