Commit Graph

85 Commits

Author SHA1 Message Date
Sofie Van Landeghem d093d6343b
TrainablePipe (#6213)
* rename Pipe to TrainablePipe

* split functionality between Pipe and TrainablePipe

* remove unnecessary methods from certain components

* cleanup

* hasattr(component, "pipe") should be sufficient again

* remove serialization and vocab/cfg from Pipe

* unify _ensure_examples and validate_examples

* small fixes

* hasattr checks for self.cfg and self.vocab

* make is_resizable and is_trainable properties

* serialize strings.json instead of vocab

* fix KB IO + tests

* fix typos

* more typos

* _added_strings as a set

* few more tests specifically for _added_strings field

* bump to 3.0.0a36
2020-10-08 21:33:49 +02:00
Ines Montani 568e12215d
Merge pull request #6206 from svlandeg/fix/patterns-init 2020-10-06 10:27:23 +02:00
Ines Montani 126268ce50 Auto-format [ci skip] 2020-10-05 21:58:18 +02:00
Ines Montani be99f1e4de
Remove output dirs before training (#6204)
* Remove output dirs before training

* Re-raise error if cleaning fails
2020-10-05 20:11:16 +02:00
svlandeg 9eb813a35d Merge remote-tracking branch 'upstream/develop' into fix/patterns-init 2020-10-05 17:49:44 +02:00
svlandeg 4e3ace4b8c is_trainable method 2020-10-05 17:43:42 +02:00
Matthew Honnibal 3ee3649b52 Fix augment 2020-10-05 16:59:49 +02:00
Matthew Honnibal 8deed614e9 Fix augment 2020-10-05 16:41:45 +02:00
Matthew Honnibal 4ed3e037df Fix augment 2020-10-05 16:40:55 +02:00
svlandeg dc06912c76 prevent loss keyerror for non-trainable components 2020-10-05 16:33:28 +02:00
Ines Montani 8171e28b20 Remove logging [ci skip]
This would be fired on each example, which is wrong
2020-10-05 15:09:52 +02:00
svlandeg 251b3eb4e5 add initialize method for entity_ruler 2020-10-05 14:59:13 +02:00
Matthew Honnibal 6a9d14e35a Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2020-10-05 14:17:41 +02:00
Matthew Honnibal d2b9aafb8c Fix augmenter 2020-10-05 14:14:49 +02:00
svlandeg fd2d48556c fix E902 and E903 numbering 2020-10-05 13:43:32 +02:00
Ines Montani 3c36a57e84
Update data augmenters (#6196)
* Draft lower-case augmenter

* Make warning a debug log

* Update lowercase augmenter, docs and tests

Co-authored-by: Matthew Honnibal <honnibal+gh@gmail.com>
2020-10-04 17:46:29 +02:00
Matthew Honnibal 84ae197dd6 Fix logger 2020-10-04 14:16:53 +02:00
Ines Montani bcd52e5486 Tidy up errors and warnings 2020-10-04 11:16:31 +02:00
Ines Montani ff914f4e6f Lazy-load xx 2020-10-04 11:10:26 +02:00
Matthew Honnibal 85ede32680 Format 2020-10-03 19:26:23 +02:00
Matthew Honnibal b305f2ff5a Fix loggers 2020-10-03 19:26:10 +02:00
Ines Montani 3bc3c05fcc Tidy up and auto-format 2020-10-03 17:20:18 +02:00
Ines Montani dd542ec6a4
Fix label initialization of textcat component (#6190) 2020-10-03 17:07:38 +02:00
Ines Montani 989a96308f Tidy up, auto-format, types 2020-10-03 16:31:58 +02:00
Matthew Honnibal db419f6b2f
Improve control of training progress and logging (#6184)
* Make logging and progress easier to control

* Update docs

* Cleanup errors

* Fix ConfigValidationError

* Pass stdout/stderr, not wasabi.Printer

* Fix type

* Upd logging example

* Fix logger example

* Fix type
2020-10-03 14:57:46 +02:00
Ines Montani 01c1538c72 Integrate file readers 2020-10-02 01:36:06 +02:00
Adriane Boyd 86c3ec9c2b
Refactor Token morph setting (#6175)
* Refactor Token morph setting

* Remove `Token.morph_`
* Add `Token.set_morph()`
  * `0` resets `token.c.morph` to unset
  * Any other values are passed to `Morphology.add`

* Add token.morph setter to set from MorphAnalysis
2020-10-01 22:21:46 +02:00
Ines Montani f2627157c8 Update docs [ci skip] 2020-10-01 17:38:17 +02:00
Adriane Boyd 27cbffff1b
Minor edit to CoNLL-U converter (#6172)
This doesn't make a difference given how the `merged_morph` values
override the `morph` values for all the final docs, but could have led
to unexpected bugs in the future if the converter is modified.
2020-10-01 16:23:42 +02:00
Adriane Boyd df98d3ef9f
Update import from collections.abc (#6174) 2020-10-01 16:21:49 +02:00
Ines Montani 44160cd52f Tidy up [ci skip] 2020-10-01 10:41:19 +02:00
Ines Montani a103ab5f1a Update augmenter lookups and docs 2020-09-30 23:03:47 +02:00
Matthew Honnibal c379a4274a Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2020-09-30 16:52:42 +02:00
Matthew Honnibal e58dca3028 Add read_labels 2020-09-30 16:52:27 +02:00
Ines Montani fe3f111c37
Merge pull request #6168 from explosion/fix/default-corpus-values 2020-09-30 00:24:02 +02:00
Matthew Honnibal f52249fe2e Fix data augmentation 2020-09-29 23:40:54 +02:00
Matthew Honnibal 14c4da547f Try to fix augmentation 2020-09-29 23:08:56 +02:00
Ines Montani df8dd91b6f Merge branch 'develop' into fix/default-corpus-values 2020-09-29 22:55:39 +02:00
Ines Montani ad6d40d028 Add logging 2020-09-29 22:53:14 +02:00
Ines Montani 1aeef3bfbb Make corpus paths default to None and improve errors 2020-09-29 22:33:46 +02:00
Ines Montani fa47f87924 Tidy up and auto-format 2020-09-29 21:39:28 +02:00
Ines Montani d3c63b7965 Merge branch 'develop' into feature/prepare 2020-09-29 20:53:05 +02:00
Ines Montani 2be80379ec Fix small issues, resolve_dot_names and debug model 2020-09-29 20:38:35 +02:00
Ines Montani fd594cfb9b Tighten up format 2020-09-29 16:47:55 +02:00
Ines Montani 978ab54a84 Fix logging 2020-09-29 16:22:41 +02:00
Ines Montani aa2a6882d0 Fix logging 2020-09-29 16:08:39 +02:00
Ines Montani 63d1598137 Simplify config use in Language.initialize 2020-09-29 16:05:48 +02:00
Ines Montani 612bbf85ab Update initialize.py 2020-09-29 12:14:47 +02:00
Ines Montani 42f0e4c946 Clean up 2020-09-29 12:14:08 +02:00
Ines Montani 78396d137f Integrate initialize settings 2020-09-29 11:57:08 +02:00