Commit Graph

8228 Commits

Author SHA1 Message Date
Adriane Boyd 084fc575aa Set version to v3.0.0rc3 2020-11-03 17:29:57 +01:00
Adriane Boyd 1c4df8fd09
Replace pytokenizations with internal alignment (#6293)
* Replace pytokenizations with internal alignment

Replace pytokenizations with internal alignment algorithm that is
restricted to only allow differences in whitespace and capitalization.

* Rename `spacy.training.align` to `spacy.training.alignment` to contain
the `Alignment` dataclass
* Implement `get_alignments` in `spacy.training.align`

* Refactor trailing whitespace handling

* Remove unnecessary exception for empty docs

Allow a non-empty whitespace-only doc to be aligned with an empty doc

* Remove empty docs exceptions completely
2020-11-03 16:24:38 +01:00
Adriane Boyd a4b32b9552
Handle missing reference values in scorer (#6286)
* Handle missing reference values in scorer

Handle missing values in reference doc during scoring where it is
possible to detect an unset state for the attribute. If no reference
docs contain annotation, `None` is returned instead of a score. `spacy
evaluate` displays `-` for missing scores and the missing scores are
saved as `None`/`null` in the metrics.

Attributes without unset states:

* `token.head`: relies on `token.dep` to recognize unset values
* `doc.cats`: unable to handle missing annotation

Additional changes:

* add optional `has_annotation` check to `score_scans` to replace
`doc.sents` hack
* update `score_token_attr_per_feat` to handle missing and empty morph
representations
* fix bug in `Doc.has_annotation` for normalization of `IS_SENT_START`
vs. `SENT_START`

* Fix import

* Update return types
2020-11-03 15:47:18 +01:00
Adriane Boyd 5d2cb86c34
Fix on_match callback for DependencyMatcher (#6313)
Fix `DependencyMatcher` so that the callback is called only once per
match.
2020-10-31 12:20:27 +01:00
Sofie Van Landeghem 2918923541
fix resolving of dot notation (#6326) 2020-10-31 12:17:06 +01:00
Ines Montani 2c9804038d Fix success message [ci skip] 2020-10-23 16:11:54 +02:00
Adriane Boyd 563a21834e Save raw scores in evaluate output 2020-10-19 15:49:09 +02:00
Adriane Boyd dd207ca6d0 Add dep_las_per_type and more generic PRF printer 2020-10-19 15:49:02 +02:00
Adriane Boyd 4300858ecb Include per-type/feat scores in evaluate output 2020-10-19 15:48:55 +02:00
Sofie Van Landeghem 75a202ce65
TextCat updates and fixes (#6263)
* small fix in example imports

* throw error when train_corpus or dev_corpus is not a string

* small fix in custom logger example

* limit macro_auc to labels with 2 annotations

* fix typo

* also create parents of output_dir if need be

* update documentation of textcat scores

* refactor TextCatEnsemble

* fix tests for new AUC definition

* bump to 3.0.0a42

* update docs

* rename to spacy.TextCatEnsemble.v2

* spacy.TextCatEnsemble.v1 in legacy

* cleanup

* small fix

* update to 3.0.0rc2

* fix import that got lost in merge

* cursed IDE

* fix two typos
2020-10-18 14:50:41 +02:00
Ines Montani 5a6ed01ce0
Merge pull request #6262 from adrianeboyd/bugfix/template-en-vectors 2020-10-16 15:38:08 +02:00
Adriane Boyd c8d04b79e2 Sort and add vectors for langs without transformers 2020-10-16 08:25:16 +02:00
Adriane Boyd 2fbd43c603 Use core lg models as vectors models in quickstart 2020-10-16 08:17:53 +02:00
Jan Margeta 1ad2213349 Fix TokenPatternSchema pattern field validation
Empty pattern field should be considered invalid

This is fixed by replacing minItems with min_items
as described in Pydantic docs:
https://pydantic-docs.helpmanual.io/usage/schema/
2020-10-16 00:41:21 +02:00
Ines Montani ff4267d181 Fix success message [ci skip] 2020-10-15 14:42:08 +02:00
Ines Montani 10611bf56a Increment version [ci skip] 2020-10-15 13:30:11 +02:00
Ines Montani 4e17ddf75e
Merge pull request #6256 from adrianeboyd/bugfix/docs-to-json-raw 2020-10-15 10:35:01 +02:00
Ines Montani b1d568a4df Tidy up tests 2020-10-15 10:20:21 +02:00
Ines Montani d165af26be Auto-format [ci skip] 2020-10-15 10:08:53 +02:00
Adriane Boyd a93d42861d Use null raw for has_unknown_spaces in docs_to_json 2020-10-15 09:57:54 +02:00
Ines Montani 5665a21517 Tidy up 2020-10-15 09:30:32 +02:00
Ines Montani 5d62499266 Fix tests 2020-10-15 09:29:15 +02:00
Ines Montani 178760855f Merge branch 'develop' into master-tmp 2020-10-15 09:06:03 +02:00
Ines Montani bc85b12e6d
Merge pull request #6249 from svlandeg/feature/batch-tests 2020-10-15 08:57:56 +02:00
svlandeg 0796401c19 call NumpyOps instead of get_current_ops() 2020-10-14 16:55:00 +02:00
svlandeg 44e14ccae8 one more losses fix 2020-10-14 15:11:34 +02:00
svlandeg 0aa8851878 always return losses 2020-10-14 15:00:49 +02:00
svlandeg e94a21638e adding tests for trained models to ensure predict reproducibility 2020-10-13 21:07:13 +02:00
svlandeg ede979d42f formattting 2020-10-13 18:53:17 +02:00
svlandeg ff83bfae3f naming 2020-10-13 18:52:37 +02:00
svlandeg 6ccacff54e add tests for individual spacy layers 2020-10-13 18:50:07 +02:00
svlandeg c23041ae60 component tests single or multiple prediction 2020-10-13 16:26:53 +02:00
Ines Montani 1f49300862 Update transformer recommendations [ci skip] 2020-10-13 15:41:17 +02:00
Sofie Van Landeghem f8a1c1afd6
avoid dropout at runtime (#6247) 2020-10-13 14:39:59 +02:00
Ines Montani 86d648740f Fix morph representation in Doc.to_json 2020-10-13 11:39:03 +02:00
Ines Montani 7f92a5ee6a
Update spacy/lang/ta/examples.py 2020-10-13 11:03:35 +02:00
Ines Montani a0e12c136b Increment version [ci skip] 2020-10-13 10:00:53 +02:00
Ines Montani f090f39f17
Merge pull request #6245 from svlandeg/bugfix/else
bugfix in _pipe
2020-10-13 09:59:06 +02:00
svlandeg 1f465bea18 if-else 2020-10-13 09:27:19 +02:00
svlandeg 40276fd3be update NEL docs after latest refactor 2020-10-12 11:41:27 +02:00
Ines Montani 4fa967ea84 Increment version [ci skip] 2020-10-11 13:10:58 +02:00
Ines Montani ab890a35f9 Make console logger table more compact 2020-10-11 12:55:46 +02:00
Ines Montani 99606e46fe Relax meta.json schema [ci skip] 2020-10-11 12:30:57 +02:00
svlandeg 3a505e7e14 small edit to ensure the new word was indeed new 2020-10-10 21:05:28 +02:00
svlandeg 68d79796c6 add test for vocab after serializing KB 2020-10-10 20:59:48 +02:00
Ines Montani 539b0c10da Tidy up and auto-format 2020-10-10 19:14:48 +02:00
Ines Montani bfa3931c9d
Revert added_strings change (#6236) 2020-10-10 18:55:07 +02:00
Ines Montani 796f8b9424 Increment version 2020-10-09 18:00:27 +02:00
Ines Montani 525f798841 Fix typo in test 2020-10-09 18:00:21 +02:00
Ines Montani 8ac5f22253 Adjust error message 2020-10-09 18:00:16 +02:00