Sofie Van Landeghem
cfc72c2995
Bugfix multi-label textcat reproducibility ( #6481 )
...
* add test for multi-label textcat reproducibility
* remove positive_label
* fix lengths dtype
* fix comments
* remove comment that we should not have forgotten :-)
2020-12-09 06:29:15 +08:00
Sofie Van Landeghem
de108ed3e8
Add specific error when StaticVectors can't read the vectors data ( #6450 )
2020-12-09 06:16:07 +08:00
Ines Montani
8921364579
Merge pull request #6521 from explosion/feature/config-stdin
...
Allow reading config from stdin in spacy train
2020-12-08 22:07:43 +11:00
Ines Montani
6c7a930ee8
Fix variable
2020-12-08 20:44:59 +11:00
Ines Montani
94a5a9814f
Update argument handling and documentation
2020-12-08 20:41:18 +11:00
Ines Montani
ef59ce783b
Adjust install instructions [ci skip]
2020-12-08 18:06:50 +11:00
Ines Montani
d25b1606d6
Allow reading config from sdtin in spacy train
2020-12-08 18:01:40 +11:00
Ines Montani
6cfa66ed1c
Make training.loop return nlp object and path ( #6520 )
2020-12-08 14:55:55 +08:00
Sofie Van Landeghem
2c27093c5f
require_cpu functionality ( #6336 )
...
* add require_cpu from Thinc 8.0.0rc2
* add docs
* fix test if cupy is not installed
2020-12-08 14:42:40 +08:00
Ines Montani
d8e01ca931
Merge pull request #6391 from adrianeboyd/docs/install-guide
2020-12-08 07:42:16 +01:00
Sofie Van Landeghem
f98a04434a
pretrain architectures ( #6451 )
...
* define new architectures for the pretraining objective
* add loss function as attr of the omdel
* cleanup
* cleanup
* shorten name
* fix typo
* remove unused error
2020-12-08 14:41:03 +08:00
Adriane Boyd
29b058ebdc
Fix spacy when retokenizing cases with affixes ( #6475 )
...
Preserve `token.spacy` corresponding to the span end token in the
original doc rather than adjusting for the current offset.
* If not modifying in place, this checks in the original document
(`doc.c` rather than `tokens`).
* If modifying in place, the document has not been modified past the
current span start position so the value at the current span end
position is valid.
2020-12-08 14:25:56 +08:00
Adriane Boyd
4448680750
Fix alignment for 1-to-1 tokens and lowercasing ( #6476 )
...
* When checking for token alignments, check not only that the tokens are
identical but that the character positions are both at the start of a
token.
It's possible for the tokens to be identical even though the two
tokens aren't aligned one-to-one in a case like `["a'", "''"]` vs.
`["a", "''", "'"]`, where the middle tokens are identical but should not
be aligned on the token level at character position 2 since it's the
start of one token but the middle of another.
* Use the lowercased version of the token texts to create the
character-to-token alignment because lowercasing can change the string
length (e.g., for `İ`, see the not-a-bug bug report:
https://bugs.python.org/issue34723 )
2020-12-08 14:25:16 +08:00
Ines Montani
ee2ec52f48
Merge pull request #6409 from svlandeg/feature/trf-docs
2020-12-08 06:32:10 +01:00
Ines Montani
c2b196c2c1
Merge pull request #6419 from svlandeg/feature/rel-docs
2020-12-08 06:30:41 +01:00
Ines Montani
82e88f0e3b
Merge pull request #6379 from svlandeg/fix/labels-constructor
2020-12-08 06:29:56 +01:00
Adriane Boyd
78085fab1f
Check for spacy-nightly package in download ( #6502 )
...
Also check for spacy-nightly in download so that `--no-deps` isn't set
for normal nightly installs.
2020-12-04 09:40:03 +01:00
Ines Montani
63f83e7034
Merge pull request #6470 from adrianeboyd/feature/license-in-package
2020-12-04 03:55:54 +01:00
Sofie Van Landeghem
d6c616a125
Fixes in test suite ( #6457 )
...
* fix slow test for textcat readers
* cleanup test_issue5551
* add explicit score weight
* cleanup
2020-12-02 12:57:08 +01:00
Adriane Boyd
31ec9a906e
Clean up 3rd party license info ( #6478 )
...
Move scikit-learn license from `Scorer` to
`licenses/3rd_party_licenses.txt`.
2020-12-02 10:15:23 +01:00
Adriane Boyd
591cd48aa8
Remove config.cfg from MANIFEST
2020-12-01 12:58:02 +01:00
Adriane Boyd
b0dd13e0ba
Support LICENSE in spacy package
...
If present, include the file `input_dir/LICENSE` at the top level of the
packaged model.
2020-11-30 13:43:58 +01:00
Adriane Boyd
1442d2f213
Improve simple training example in v3 migration ( #6438 )
...
* Create the examples once
* Use the examples in the initialization
* Provide the batch size
* Fix `begin_training` migration example
2020-11-30 09:39:45 +08:00
Sofie Van Landeghem
079f6ea474
avoid resolving the full config ( #6465 )
2020-11-30 09:34:29 +08:00
Ines Montani
9beba7164f
Make jinja2 top-level import
...
No problem anymore since it's now an official dependency
2020-11-27 15:17:14 +08:00
Ines Montani
d21d2c2e59
Don't multiply accuracy by 100
2020-11-27 15:15:51 +08:00
Adriane Boyd
26296ab223
Add error message if DocBin zlib decompress fails ( #6394 )
...
Add a better error message if DocBin zlib decompress fails, indicating
that the data is not in `DocBin` format.
2020-11-27 14:39:49 +08:00
Adriane Boyd
6f133877aa
Update source install instructions
...
* Don't recommend an editable install in the default source
instructions.
* Use `pip install --no-build-isolation` for editable installs.
* Remove reference to `virtualenv`.
2020-11-24 14:44:13 +01:00
svlandeg
218abaa69a
typo
2020-11-20 22:36:49 +01:00
svlandeg
e861e928df
more small corrections
2020-11-20 22:29:58 +01:00
svlandeg
5ac0867427
final fixes
2020-11-20 22:18:53 +01:00
svlandeg
331ec83493
edits and updates to implementing REL component docs
2020-11-20 21:41:52 +01:00
svlandeg
4a3e611abc
small fixes and formatting
2020-11-20 15:55:05 +01:00
svlandeg
124f49feb6
update REL model code
2020-11-20 15:25:20 +01:00
svlandeg
636be3c791
Merge remote-tracking branch 'upstream/develop' into feature/trf-docs
2020-11-19 14:15:35 +01:00
Sofie Van Landeghem
165993d8e5
fix typo in transformer docs ( #6404 )
2020-11-19 14:11:38 +01:00
Adriane Boyd
96726ec1f6
Fix DocBin init in training example ( #6396 )
2020-11-17 14:36:44 +01:00
Adriane Boyd
ed32fa80cd
Update source install instructions
...
* Use `pip install` instead of `python setup.py install`
* For developers recommend:
* `python setup.py build_ext --inplace -j N`
* `python setup.py develop`
2020-11-16 10:13:51 +01:00
svlandeg
99d0412b6e
add link to REL project
2020-11-15 18:35:56 +01:00
svlandeg
73fc1ed963
remove labels from morphologizer constructor
2020-11-11 21:48:50 +01:00
svlandeg
d5a920325f
remove labels from constructor
2020-11-11 21:34:12 +01:00
svlandeg
fcd79e0655
remove set_morphology from docs
2020-11-11 21:32:34 +01:00
Adriane Boyd
a7e7d6c6c9
Ignore misaligned in Morphologizer.get_loss ( #6363 )
...
Fix bug where `Morphologizer.get_loss` treated misaligned annotation as
`EMPTY_MORPH` rather than ignoring it. Remove unneeded default `EMPTY_MORPH`
mappings.
2020-11-10 20:15:09 +08:00
Sofie Van Landeghem
a0c899a0ff
Fix textcat + transformer architecture ( #6371 )
...
* add pooling to textcat TransformerListener
* maybe_get_dim in case it's null
2020-11-10 20:14:47 +08:00
Ines Montani
3ca5c7082d
Use pip install . in quickstart [ci skip]
2020-11-10 17:27:49 +08:00
Ines Montani
de6453940e
Merge pull request #6305 from svlandeg/feature/score-docs [ci skip]
2020-11-10 02:52:11 +01:00
Ines Montani
d7950c5ada
Merge pull request #6297 from adrianeboyd/docs/nightly-conda-install [ci skip]
2020-11-10 02:45:52 +01:00
Ines Montani
448bfbdc30
Remove conda from nightly install widget [ci skip]
2020-11-10 09:44:52 +08:00
svlandeg
789fb3d124
add docs for upstream argument of TransformerListener
2020-11-09 21:42:58 +01:00
Ines Montani
363ac73c72
Update docs [ci skip]
2020-11-09 12:43:26 +08:00