Commit Graph

13781 Commits

Author SHA1 Message Date
svlandeg 331ec83493 edits and updates to implementing REL component docs 2020-11-20 21:41:52 +01:00
svlandeg 4a3e611abc small fixes and formatting 2020-11-20 15:55:05 +01:00
svlandeg 124f49feb6 update REL model code 2020-11-20 15:25:20 +01:00
svlandeg 99d0412b6e add link to REL project 2020-11-15 18:35:56 +01:00
Adriane Boyd a7e7d6c6c9
Ignore misaligned in Morphologizer.get_loss (#6363)
Fix bug where `Morphologizer.get_loss` treated misaligned annotation as
`EMPTY_MORPH` rather than ignoring it. Remove unneeded default `EMPTY_MORPH`
mappings.
2020-11-10 20:15:09 +08:00
Sofie Van Landeghem a0c899a0ff
Fix textcat + transformer architecture (#6371)
* add pooling to textcat TransformerListener

* maybe_get_dim in case it's null
2020-11-10 20:14:47 +08:00
Ines Montani 3ca5c7082d Use pip install . in quickstart [ci skip] 2020-11-10 17:27:49 +08:00
Ines Montani de6453940e
Merge pull request #6305 from svlandeg/feature/score-docs [ci skip] 2020-11-10 02:52:11 +01:00
Ines Montani d7950c5ada
Merge pull request #6297 from adrianeboyd/docs/nightly-conda-install [ci skip] 2020-11-10 02:45:52 +01:00
Ines Montani 448bfbdc30 Remove conda from nightly install widget [ci skip] 2020-11-10 09:44:52 +08:00
Ines Montani 363ac73c72 Update docs [ci skip] 2020-11-09 12:43:26 +08:00
Sofie Van Landeghem 8ef056cf98
fix embed_size in Entity Linker architecture (#6343) 2020-11-04 22:20:13 +01:00
Ines Montani 019a1dd5e8 Fix v3 overview [ci skip] 2020-11-03 18:10:06 +01:00
Adriane Boyd 1c4df8fd09
Replace pytokenizations with internal alignment (#6293)
* Replace pytokenizations with internal alignment

Replace pytokenizations with internal alignment algorithm that is
restricted to only allow differences in whitespace and capitalization.

* Rename `spacy.training.align` to `spacy.training.alignment` to contain
the `Alignment` dataclass
* Implement `get_alignments` in `spacy.training.align`

* Refactor trailing whitespace handling

* Remove unnecessary exception for empty docs

Allow a non-empty whitespace-only doc to be aligned with an empty doc

* Remove empty docs exceptions completely
2020-11-03 16:24:38 +01:00
Adriane Boyd a4b32b9552
Handle missing reference values in scorer (#6286)
* Handle missing reference values in scorer

Handle missing values in reference doc during scoring where it is
possible to detect an unset state for the attribute. If no reference
docs contain annotation, `None` is returned instead of a score. `spacy
evaluate` displays `-` for missing scores and the missing scores are
saved as `None`/`null` in the metrics.

Attributes without unset states:

* `token.head`: relies on `token.dep` to recognize unset values
* `doc.cats`: unable to handle missing annotation

Additional changes:

* add optional `has_annotation` check to `score_scans` to replace
`doc.sents` hack
* update `score_token_attr_per_feat` to handle missing and empty morph
representations
* fix bug in `Doc.has_annotation` for normalization of `IS_SENT_START`
vs. `SENT_START`

* Fix import

* Update return types
2020-11-03 15:47:18 +01:00
Adriane Boyd 5d2cb86c34
Fix on_match callback for DependencyMatcher (#6313)
Fix `DependencyMatcher` so that the callback is called only once per
match.
2020-10-31 12:20:27 +01:00
Sofie Van Landeghem 2918923541
fix resolving of dot notation (#6326) 2020-10-31 12:17:06 +01:00
Adriane Boyd dc816bba9d
Fix node name typo in dependency matcher example (#6311) 2020-10-28 16:32:46 +01:00
Sofie Van Landeghem ace6ae435b
set pydantic upper pin to 1.7 for now (#6308) 2020-10-26 23:31:08 +01:00
svlandeg 77688b0072 fix config 2020-10-26 11:14:34 +01:00
svlandeg 5878ff6bcd cleanup 2020-10-26 11:13:02 +01:00
svlandeg e95d9caa87 small edits 2020-10-26 11:09:25 +01:00
svlandeg a664994a81 adding score method to explanation of new component 2020-10-26 10:52:47 +01:00
svlandeg 080066ae74 remove TODO note 2020-10-26 10:37:25 +01:00
Ines Montani 2c9804038d Fix success message [ci skip] 2020-10-23 16:11:54 +02:00
Adriane Boyd 253480353c Remove zh from quickstart extras 2020-10-23 11:39:25 +02:00
Adriane Boyd af26886fff Fix formatting 2020-10-23 11:38:14 +02:00
Adriane Boyd c0b76f4c19 Add install step to "Compile from source" 2020-10-23 11:36:36 +02:00
Adriane Boyd 8fe7ede667 Add install step to source install quickstart 2020-10-23 11:34:43 +02:00
Adriane Boyd 4299a7f654 Setup / install / quickstart updates
* Add `cuda110` to setup.cfg and quickstart dropdown
* Switch to `pip` for pip-only packages in conda quickstart instructions
* Update zh pkuseg install message with version range and conda
* Remove `zh` from `extras_require` because the default doesn't require
additional packages
2020-10-23 11:27:54 +02:00
Ines Montani 270c836bd6
Merge pull request #6276 from adrianeboyd/chore/add-jinja2 2020-10-20 10:05:53 +02:00
Ines Montani 6523f2daac
Merge pull request #6273 from adrianeboyd/bugfix/detailed-scores-in-evaluate2 2020-10-20 10:03:09 +02:00
Adriane Boyd 3629296757 Fix requirements, remove version pins 2020-10-19 19:04:42 +02:00
Adriane Boyd 56077e7e64 Add dependency for jinja2 2020-10-19 18:58:15 +02:00
Adriane Boyd fbe65b257b Convert accuracy numbers on website models page 2020-10-19 18:55:55 +02:00
Ines Montani b6b1c1e23c
Merge pull request #6271 from walterhenry/develop-proof [ci skip] 2020-10-19 16:31:43 +02:00
Adriane Boyd 563a21834e Save raw scores in evaluate output 2020-10-19 15:49:09 +02:00
Adriane Boyd dd207ca6d0 Add dep_las_per_type and more generic PRF printer 2020-10-19 15:49:02 +02:00
Adriane Boyd 4300858ecb Include per-type/feat scores in evaluate output 2020-10-19 15:48:55 +02:00
walterhenry db24dc5614 Proofread remarks
I think these may the last remarks for the nightly docs. Only two minor things actually.
2020-10-19 11:11:32 +02:00
Sofie Van Landeghem 75a202ce65
TextCat updates and fixes (#6263)
* small fix in example imports

* throw error when train_corpus or dev_corpus is not a string

* small fix in custom logger example

* limit macro_auc to labels with 2 annotations

* fix typo

* also create parents of output_dir if need be

* update documentation of textcat scores

* refactor TextCatEnsemble

* fix tests for new AUC definition

* bump to 3.0.0a42

* update docs

* rename to spacy.TextCatEnsemble.v2

* spacy.TextCatEnsemble.v1 in legacy

* cleanup

* small fix

* update to 3.0.0rc2

* fix import that got lost in merge

* cursed IDE

* fix two typos
2020-10-18 14:50:41 +02:00
Ines Montani e2f3c4e12d Fix robots [ci skip] 2020-10-16 17:44:13 +02:00
Ines Montani a9d2293661
Merge pull request #6264 from adrianeboyd/docs/license-links [ci skip] 2020-10-16 17:05:11 +02:00
Adriane Boyd e896803792 Add and update website license links 2020-10-16 17:01:52 +02:00
Ines Montani c655742b8b Remove docs references to starters for now (see #6262) [ci skip] 2020-10-16 15:46:34 +02:00
Ines Montani 5a6ed01ce0
Merge pull request #6262 from adrianeboyd/bugfix/template-en-vectors 2020-10-16 15:38:08 +02:00
Ines Montani 7904285991
Merge pull request #6259 from jmargeta/fix-empty-list-validation 2020-10-16 15:35:32 +02:00
Ines Montani c968d1560f Fix docs example [ci skip] 2020-10-16 11:33:20 +02:00
Adriane Boyd c8d04b79e2 Sort and add vectors for langs without transformers 2020-10-16 08:25:16 +02:00
Adriane Boyd 2fbd43c603 Use core lg models as vectors models in quickstart 2020-10-16 08:17:53 +02:00