Commit Graph

14060 Commits

Author SHA1 Message Date
Adriane Boyd 7b277661f6 Set version to v2.3.5 2020-12-10 13:32:10 +01:00
Ines Montani 2a6043fabb
Merge pull request #6530 from explosion/feature/init-config-cpu-gpu 2020-12-10 09:38:46 +11:00
Ines Montani dfe148935e
Merge pull request #6532 from adrianeboyd/feature/nlp-batch-size-setting 2020-12-10 09:01:58 +11:00
Ines Montani 9d32e839d3 Merge branch 'develop' into feature/init-config-cpu-gpu 2020-12-10 08:50:53 +11:00
Adriane Boyd 972820e2b3 Add batch_size to data formats docs 2020-12-09 12:44:04 +01:00
Adriane Boyd 80ac8af1bf Format 2020-12-09 12:44:01 +01:00
Adriane Boyd 795b5bd049
Update website/docs/api/language.md
Co-authored-by: Ines Montani <ines@ines.io>
2020-12-09 12:23:32 +01:00
Adriane Boyd 6ee6e41234 Update docstring for Language.evaluate 2020-12-09 10:21:39 +01:00
Adriane Boyd fa8fa474a3 Add nlp.batch_size setting
Add a default `batch_size` setting for `Language.pipe` and
`Language.evaluate` as `nlp.batch_size`.
2020-12-09 09:13:26 +01:00
Ines Montani e09588e6ca Update README.md [ci skip] 2020-12-09 13:10:49 +11:00
Ines Montani f2571b5ec4
Merge pull request #6444 from adrianeboyd/chore/update-develop-from-master 2020-12-09 13:09:58 +11:00
Ines Montani d1a0e2f116 Don't build 3.9 for now 2020-12-09 12:10:48 +11:00
Ines Montani 90171f2031
Merge pull request #6528 from svlandeg/feature/pipe_fill_config 2020-12-09 12:01:22 +11:00
Ines Montani dfaef27f90
Merge pull request #6503 from adrianeboyd/feature/lemmatizer-rule-warning-pos
Warn on empty POS for the rule-based lemmatizer
2020-12-09 11:34:16 +11:00
Ines Montani 271923eaea Fix retokenizer 2020-12-09 11:29:55 +11:00
Ines Montani b85bd63eca Fix test 2020-12-09 11:24:01 +11:00
Ines Montani febf71af28 Fix test 2020-12-09 11:23:07 +11:00
Ines Montani 04b3068747 Revert landing [ci skip] 2020-12-09 11:20:45 +11:00
Ines Montani 1da1568110 Remove tag map 2020-12-09 11:13:49 +11:00
Ines Montani 34449b66fd Update matcher.md 2020-12-09 11:09:45 +11:00
Ines Montani 1980203229 Merge branch 'master' into pr/6444 2020-12-09 11:09:40 +11:00
Ines Montani 05a2812ae0 Merge branch 'develop' into pr/6444 2020-12-09 11:04:03 +11:00
Ines Montani 758ad6c3cd Make CPU the default for init config 2020-12-09 11:00:51 +11:00
Ines Montani 5d605d539d Remove output_file from init_config helper 2020-12-09 10:57:55 +11:00
Sofie Van Landeghem cfc72c2995
Bugfix multi-label textcat reproducibility (#6481)
* add test for multi-label textcat reproducibility

* remove positive_label

* fix lengths dtype

* fix comments

* remove comment that we should not have forgotten :-)
2020-12-09 06:29:15 +08:00
Sofie Van Landeghem de108ed3e8
Add specific error when StaticVectors can't read the vectors data (#6450) 2020-12-09 06:16:07 +08:00
Koichi Yasuoka 0afb54ac93
JapaneseTokenizer.pipe added (#6515)
* JapaneseTokenizer.pipe added

For [spacymoji](https://spacy.io/universe/project/spacymoji)  with `Japanese()`.

* DummyTokenizer.pipe added instead
2020-12-08 20:02:23 +01:00
svlandeg 8f8a7f1733 returning config in init_config 2020-12-08 17:37:20 +01:00
Adriane Boyd df4891bed1
Remove blis python version constraints (#6522)
* Remove blis version constraints

After updating the blis sdist in v0.7.4, remove python version
constraints for blis build and install dependencies.

* Install sdist with --prefer-binary for python 3.5

* Fix duplicate sdist install steps

* Fix sdist install step types

* Fix blis pins in requirements.txt

* Remove wheel hack for python 3.5 from CI
2020-12-08 15:25:19 +01:00
Ines Montani 4e77349106
Merge pull request #6524 from adrianeboyd/bugfix/entity-ruler-subsequent
Fix subsequent pipe detection in EntityRuler
2020-12-08 22:17:28 +11:00
Ines Montani 8921364579
Merge pull request #6521 from explosion/feature/config-stdin
Allow reading config from stdin in spacy train
2020-12-08 22:07:43 +11:00
Ines Montani 6c7a930ee8 Fix variable 2020-12-08 20:44:59 +11:00
Ines Montani 94a5a9814f Update argument handling and documentation 2020-12-08 20:41:18 +11:00
Adriane Boyd 6c221d4841 Fix subsequent pipe detection in EntityRuler
Fix subsequent pipe detection to detect the position of the current
object by comparing the component itself rather than from the factory
name.
2020-12-08 10:01:30 +01:00
Ines Montani b87793a89a
Merge pull request #6523 from adrianeboyd/bugfix/remove-use-chars
Remove non-working --use-chars from train CLI
2020-12-08 09:30:48 +01:00
Adriane Boyd 5ceac425ee Remove non-working --use-chars from train CLI
Remove the non-working `--use-chars` option from the train CLI. The
implementation of the option across component types and the CLI settings
could be fixed, but the `CharacterEmbed` model does not work on GPU in
v2 so it's better to remove it.
2020-12-08 08:30:00 +01:00
Ines Montani ef59ce783b Adjust install instructions [ci skip] 2020-12-08 18:06:50 +11:00
Ines Montani d25b1606d6 Allow reading config from sdtin in spacy train 2020-12-08 18:01:40 +11:00
Ines Montani 6cfa66ed1c
Make training.loop return nlp object and path (#6520) 2020-12-08 14:55:55 +08:00
Sofie Van Landeghem 2c27093c5f
require_cpu functionality (#6336)
* add require_cpu from Thinc 8.0.0rc2

* add docs

* fix test if cupy is not installed
2020-12-08 14:42:40 +08:00
Ines Montani d8e01ca931
Merge pull request #6391 from adrianeboyd/docs/install-guide 2020-12-08 07:42:16 +01:00
Sofie Van Landeghem f98a04434a
pretrain architectures (#6451)
* define new architectures for the pretraining objective

* add loss function as attr of the omdel

* cleanup

* cleanup

* shorten name

* fix typo

* remove unused error
2020-12-08 14:41:03 +08:00
Adriane Boyd dcecc75270
Improve blis and numpy build dependencies (#6455)
* Fix blis build dependencies

* Add blis with python_version constraints to pyproject.toml
* Add blis to setup_requires

* Remove --only-binary from CI

* Reduce number of builds to speed up CI

* Add hack to install wheel for python 3.5 in linux

* Remove os spec from CI

* Remove detailed numpy build constraints

* Remove detailed numpy build constraints from `pyproject.toml` because
  it is too difficult to maintain for many architectures
  * These constraints are more a reflection of what is available on
    pypi as binary wheels rather than any real build requirements that
    it is necessary for users to follow when building from source
  * Users building their own binary packages will need to enforce the
    constraints that make sense in their environments, e.g., the `conda`
    compatible numpy pins

* Keep the build constraints in `build-constraints.txt` for use with our
  builds
  * Our builds with wheelwright are built against the earliest
    compatible binary versions of numpy on pypi
  * These constraints are documented within the distribution

* Revert "Remove os spec from CI"

This reverts commit 7489476688.
2020-12-08 14:29:34 +08:00
Adriane Boyd 29b058ebdc
Fix spacy when retokenizing cases with affixes (#6475)
Preserve `token.spacy` corresponding to the span end token in the
original doc rather than adjusting for the current offset.

* If not modifying in place, this checks in the original document
(`doc.c` rather than `tokens`).
* If modifying in place, the document has not been modified past the
current span start position so the value at the current span end
position is valid.
2020-12-08 14:25:56 +08:00
Adriane Boyd 4448680750
Fix alignment for 1-to-1 tokens and lowercasing (#6476)
* When checking for token alignments, check not only that the tokens are
identical but that the character positions are both at the start of a
token.

  It's possible for the tokens to be identical even though the two
tokens aren't aligned one-to-one in a case like `["a'", "''"]` vs.
`["a", "''", "'"]`, where the middle tokens are identical but should not
be aligned on the token level at character position 2 since it's the
start of one token but the middle of another.

* Use the lowercased version of the token texts to create the
character-to-token alignment because lowercasing can change the string
length (e.g., for `İ`, see the not-a-bug bug report:
https://bugs.python.org/issue34723)
2020-12-08 14:25:16 +08:00
Adriane Boyd e931d3f72b
Move max_length to nlp.make_doc() (#6512)
Move max_length check to `nlp.make_doc()` so that's it's also checked
for `nlp.pipe()`.
2020-12-08 14:24:02 +08:00
Ines Montani ee2ec52f48
Merge pull request #6409 from svlandeg/feature/trf-docs 2020-12-08 06:32:10 +01:00
Ines Montani c2b196c2c1
Merge pull request #6419 from svlandeg/feature/rel-docs 2020-12-08 06:30:41 +01:00
Ines Montani 82e88f0e3b
Merge pull request #6379 from svlandeg/fix/labels-constructor 2020-12-08 06:29:56 +01:00
Sofie Van Landeghem 52fa46dd58
tested EL scripts with 2.3.4 (#6517) 2020-12-07 20:46:38 +01:00