Commit Graph

13944 Commits

Author SHA1 Message Date
svlandeg 712a78b74a add simple unit test 2020-12-30 12:35:26 +01:00
svlandeg 4347e6d39b fixes for CLI info command 2020-12-30 12:05:58 +01:00
Adriane Boyd 5ca57d8221
Add logger warning when serializing user hooks (#6595)
Add a warning that user hooks are lost on serialization.

Add a `user_hooks` exclude to skip the warning with pickle.
2020-12-29 11:54:32 +01:00
Adriane Boyd cabd4ae5b1
Use logger.warning instead of logger.warn (#6596)
Use `logger.warning` instead of deprecated `logger.warn`.
2020-12-21 08:25:10 +08:00
Sofie Van Landeghem 282a3b49ea
Fix parser resizing when there is no upper layer (#6460)
* allow resizing of the parser model even when upper=False

* update from spacy.TransitionBasedParser.v1 to v2

* bugfix
2020-12-18 18:56:57 +08:00
Sofie Van Landeghem 0a923a7915
Tagger robustness (#6580)
* require labels in taggers

* ensure tagger works with incomplete data
2020-12-18 18:51:47 +08:00
Adriane Boyd e10295c9fd
Fix memory leak when adding empty morph (#6581)
Fix lookup of empty morph in the morphology table, which fixes a memory
leak where a new morphology tag was allocated each time the empty morph
tag was added.
2020-12-18 18:51:01 +08:00
Ines Montani fd640afcd8 Add comment on CI strategy [ci skip] 2020-12-17 22:13:05 +11:00
Ines Montani e9b0963827
Merge pull request #6333 from adrianeboyd/chore/python39 2020-12-17 22:11:57 +11:00
Adriane Boyd 51820180ba Reduce CI builds 2020-12-17 08:55:05 +01:00
Adriane Boyd 2df1ab8a1f Remove detailed numpy constraints from pyproject.toml 2020-12-17 08:54:20 +01:00
Ines Montani e99cd82367 Update version pins 2020-12-17 10:21:08 +11:00
Ines Montani 47c1ec678b Merge branch 'develop' into pr/6333 2020-12-17 10:19:28 +11:00
Ines Montani 3f90bffa27
Merge pull request #6571 from adrianeboyd/bugfix/debug-data-missing-vectors
Fix alignment and vector checks in debug data
2020-12-17 10:10:47 +11:00
Ines Montani 546af3966a
Merge pull request #6577 from LeapBeyond/bug/root_logger
Prevent root logger from initialising
2020-12-16 16:42:54 +11:00
Thomas Bird cbb8c66da3 prevent the root logger from inialising 2020-12-15 19:50:34 +00:00
Adriane Boyd 1ddf2f39c7
Switch converters to generator functions (#6547)
* Switch converters to generator functions

To reduce the memory usage when converting large corpora, refactor the
convert methods to be generator functions.

* Update tests
2020-12-15 16:47:16 +08:00
Adriane Boyd 20e18cc246 Fix alignment and vector checks in debug data
* Update token alignment check to use Example alignment
* Update missing vector check further related to changes in v3
2020-12-15 09:43:14 +01:00
Matthew Honnibal 8656a08777
Add beam_parser and beam_ner components for v3 (#6369)
* Get basic beam tests working

* Get basic beam tests working

* Compile _beam_utils

* Remove prints

* Test beam density

* Beam parser seems to train

* Draft beam NER

* Upd beam

* Add hypothesis as dev dependency

* Implement missing is-gold-parse method

* Implement early update

* Fix state hashing

* Fix test

* Fix test

* Default to non-beam in parser constructor

* Improve oracle for beam

* Start refactoring beam

* Update test

* Refactor beam

* Update nn

* Refactor beam and weight by cost

* Update ner beam settings

* Update test

* Add __init__.pxd

* Upd test

* Fix test

* Upd test

* Fix test

* Remove ring buffer history from StateC

* WIP change arc-eager transitions

* Add state tests

* Support ternary sent start values

* Fix arc eager

* Fix NER

* Pass oracle cut size for beam

* Fix ner test

* Fix beam

* Improve StateC.clone

* Improve StateClass.borrow

* Work directly with StateC, not StateClass

* Remove print statements

* Fix state copy

* Improve state class

* Refactor parser oracles

* Fix arc eager oracle

* Fix arc eager oracle

* Use a vector to implement the stack

* Refactor state data structure

* Fix alignment of sent start

* Add get_aligned_sent_starts method

* Add test for ae oracle when bad sentence starts

* Fix sentence segment handling

* Avoid Reduce that inserts illegal sentence

* Update preset SBD test

* Fix test

* Remove prints

* Fix sent starts in Example

* Improve python API of StateClass

* Tweak comments and debug output of arc eager

* Upd test

* Fix state test

* Fix state test
2020-12-13 09:08:32 +08:00
Ines Montani 85ca8c2bdd Merge branch 'master' into develop 2020-12-11 13:44:41 +11:00
Ines Montani 1d4b1dea25 Update contributing guide and issue template [ci skip] 2020-12-11 13:39:26 +11:00
Ines Montani 37c5d7e826
Merge pull request #6542 from adrianeboyd/chore/prepare-v2.3.5
Set version to v2.3.5
2020-12-11 10:33:18 +11:00
Ines Montani fb43a30a71
Merge pull request #6545 from svlandeg/feature/discussions [ci skip] 2020-12-11 10:20:35 +11:00
Ines Montani 76cfd89dea Update site.json 2020-12-11 10:19:42 +11:00
Ines Montani c9b67b02f8 Update issue templates 2020-12-11 10:05:47 +11:00
Ines Montani 43a69eecb7 Update site.json 2020-12-11 10:05:21 +11:00
Ines Montani 73896fcbc8 Update README.md 2020-12-11 10:05:19 +11:00
Ines Montani 25186fa431
Merge pull request #6543 from adrianeboyd/docs/install-v2
Docs and extras updates for v2.3.5
2020-12-11 09:53:53 +11:00
svlandeg 4afcd9567e refer to GH discussions 2020-12-10 20:56:12 +01:00
svlandeg d156b423ae remove gitter and reddit links 2020-12-10 20:41:02 +01:00
svlandeg 5afa567767 replace gitter with discussions in 101 2020-12-10 20:17:36 +01:00
svlandeg ae1ccf2b04 update link to discussion forum 2020-12-10 20:02:49 +01:00
svlandeg 52cdb12d26 add GH discussions to readme 2020-12-10 19:58:43 +01:00
Adriane Boyd 27bb75e2a0 Docs and extras updates for v2.3.5
* Update install instructions for updated packages

* Add `cuda110` and `cuda111` extras, remove upper `cupy` pins (only
compatible with `thinc>=7.4.4`)
2020-12-10 15:34:34 +01:00
Ines Montani 513c4e332a
Include custom code via spacy package command (#6531) 2020-12-10 20:36:46 +08:00
Adriane Boyd 7b277661f6 Set version to v2.3.5 2020-12-10 13:32:10 +01:00
Ines Montani 2a6043fabb
Merge pull request #6530 from explosion/feature/init-config-cpu-gpu 2020-12-10 09:38:46 +11:00
Ines Montani dfe148935e
Merge pull request #6532 from adrianeboyd/feature/nlp-batch-size-setting 2020-12-10 09:01:58 +11:00
Ines Montani 9d32e839d3 Merge branch 'develop' into feature/init-config-cpu-gpu 2020-12-10 08:50:53 +11:00
Adriane Boyd 972820e2b3 Add batch_size to data formats docs 2020-12-09 12:44:04 +01:00
Adriane Boyd 80ac8af1bf Format 2020-12-09 12:44:01 +01:00
Adriane Boyd 795b5bd049
Update website/docs/api/language.md
Co-authored-by: Ines Montani <ines@ines.io>
2020-12-09 12:23:32 +01:00
Adriane Boyd 6ee6e41234 Update docstring for Language.evaluate 2020-12-09 10:21:39 +01:00
Adriane Boyd fa8fa474a3 Add nlp.batch_size setting
Add a default `batch_size` setting for `Language.pipe` and
`Language.evaluate` as `nlp.batch_size`.
2020-12-09 09:13:26 +01:00
Ines Montani e09588e6ca Update README.md [ci skip] 2020-12-09 13:10:49 +11:00
Ines Montani f2571b5ec4
Merge pull request #6444 from adrianeboyd/chore/update-develop-from-master 2020-12-09 13:09:58 +11:00
Ines Montani d1a0e2f116 Don't build 3.9 for now 2020-12-09 12:10:48 +11:00
Ines Montani 90171f2031
Merge pull request #6528 from svlandeg/feature/pipe_fill_config 2020-12-09 12:01:22 +11:00
Ines Montani dfaef27f90
Merge pull request #6503 from adrianeboyd/feature/lemmatizer-rule-warning-pos
Warn on empty POS for the rule-based lemmatizer
2020-12-09 11:34:16 +11:00
Ines Montani 271923eaea Fix retokenizer 2020-12-09 11:29:55 +11:00