Commit Graph

12173 Commits

Author SHA1 Message Date
Ines Montani 7bcf9f7cfb Document new features 2020-07-09 21:10:36 +02:00
Ines Montani 797ca6f3dd Merge branch 'develop' into nightly.spacy.io 2020-07-09 20:48:24 +02:00
Ines Montani 018319a640 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2020-07-09 19:44:41 +02:00
Ines Montani 05e182e421 Update CLI args and docstrings 2020-07-09 19:44:28 +02:00
Sofie Van Landeghem dd207a28be
cleanup components API (#5726)
* add keyword separator for update functions and drop unused "state"

* few more Example tests and various small fixes

* consistently return losses after update call

* eliminate unused tensors field across pipe components

* fix name

* fix arg name
2020-07-09 19:43:39 +02:00
Ines Montani ea01831f6a Update projects docs etc. 2020-07-09 19:43:25 +02:00
Adriane Boyd ac4297ee39
Minor refactor to conversion of output docs (#5718)
Minor refactor of conversion of docs to output format to avoid
duplicate conversion steps.
2020-07-09 19:42:32 +02:00
Sofie Van Landeghem c1ea55307b
Fixing reproducible training (#5735)
* Add initial reproducibility tests

* failing test for default_text_classifier (WIP)

* track trouble to underlying tok2vec layer

* add regression test for Issue 5551

* tests go green with https://github.com/explosion/thinc/pull/359

* update test

* adding fixed seeds to HashEmbed layers, seems to fix the reproducility issue

Co-authored-by: Matthew Honnibal <honnibal+gh@gmail.com>
2020-07-09 19:39:31 +02:00
Matthew Honnibal 1827f22f56 Set version to v3.0.0a3 2020-07-09 19:38:04 +02:00
Matthw Honnibal 7010f1a2be Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2020-07-09 19:34:11 +02:00
Matthw Honnibal 0becc5954b Update NER config 2020-07-09 19:33:54 +02:00
Matthw Honnibal 77af0a6bb4 Offer option of padding-sensitive batching 2020-07-09 14:50:20 +02:00
Matthw Honnibal 3a7f275c02 Add extra batch util 2020-07-09 14:38:41 +02:00
Matthw Honnibal eb0798c421 Add __len__ method for Example 2020-07-09 14:38:26 +02:00
Ines Montani 175d34d8f9 Update sidebar menu 2020-07-09 11:44:09 +02:00
Ines Montani 9ee5b71412 Update cli.md 2020-07-09 11:44:00 +02:00
Ines Montani 028f8210e8 Merge branch 'develop' into nightly.spacy.io 2020-07-09 11:43:57 +02:00
Ines Montani 8f9552d9e7
Refactor project CLI (#5732)
* Make project command a submodule

* Update with WIP

* Add helper for joining commands

* Update docstrins, formatting and types

* Update assets and add support for copying local files

* Fix type

* Update success messages
2020-07-09 01:42:51 +02:00
Adriane Boyd ad15499b3b
Fix get_loss for values outside of labels in senter (#5730)
* Fix get_loss for None alignments in senter

When converting the `sent_start` values back to `SentenceRecognizer`
labels, handle `None` alignments.

* Handle SENT_START as -1

Handle SENT_START as -1 (or -1 converted to uint64) by treating any
values other than 1 the same as 0 in `SentenceRecognizer.get_loss`.
2020-07-09 01:41:58 +02:00
Matthw Honnibal 9b49787f35 Update NER config. Getting 84.8 2020-07-08 21:38:01 +02:00
Matthw Honnibal 1b20ffac38 batch_by_words by default 2020-07-08 21:37:06 +02:00
Matthw Honnibal 93e50da46a Remove auto 'set_annotation' in training to address GPU memory 2020-07-08 21:36:51 +02:00
Matthw Honnibal fb8a5967c1 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2020-07-08 15:27:50 +02:00
Ines Montani 0a3d41bb1d
Deprecat model shortcuts and simplify download (#5722) 2020-07-08 14:00:07 +02:00
Adriane Boyd c9f0f75778
Update get_loss for senter and morphologizer (#5724)
* Update get_loss for senter

Update `SentenceRecognizer.get_loss` to keep it similar to `Tagger`.

* Update get_loss for morphologizer

Update `Morphologizer.get_loss` to keep it similar to `Tagger`.
2020-07-08 13:59:28 +02:00
Ines Montani 9ae4040183 Update API docs 2020-07-08 13:34:35 +02:00
svlandeg c94279ac1b remove tensors, fix predict, get_loss and set_annotations 2020-07-08 13:11:54 +02:00
svlandeg 90b100c39f remove component.Model, update constructor, losses is return value of update 2020-07-08 12:14:30 +02:00
Matthw Honnibal ca989f4cc4 Improve cutting logic in parser 2020-07-08 11:27:54 +02:00
Matthw Honnibal 42e1109def Support option to not batch by number of words 2020-07-08 11:26:54 +02:00
Ines Montani 8cb7f9ccff
Improve assets and DVC handling (#5719)
* Improve assets and DVC handling

* Remove outdated comment [ci skip]
2020-07-07 20:51:50 +02:00
Ines Montani 2298e129e6 Update example and training docs 2020-07-07 20:30:12 +02:00
svlandeg 2b60e894cb fix component constructors, update, begin_training, reference to GoldParse 2020-07-07 19:17:19 +02:00
Sofie Van Landeghem a39a110c4e
Few more Example unit tests (#5720)
* small fixes in Example, UX

* add gold tests for aligned_spans and get_aligned_parse

* sentencizer unnecessary
2020-07-07 18:46:00 +02:00
Matthw Honnibal 433dc3c9c9 Simplify PrecomputableAffine slightly 2020-07-07 17:22:47 +02:00
Matthw Honnibal a4164f67ca Don't normalize gradients 2020-07-07 17:21:58 +02:00
Matthw Honnibal 8177f25b6c Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2020-07-07 17:21:10 +02:00
svlandeg 14a796e3f9 add Example API with examples of Example usage 2020-07-07 14:46:41 +02:00
Ines Montani fa00a85828
Merge pull request #5715 from explosion/chore/tidy-regression-tests 2020-07-07 11:22:07 +02:00
Matthw Honnibal d1fd3438c3 Add dropout to parser hidden layer 2020-07-07 01:38:15 +02:00
Ines Montani bb3ee38cf9 Update WIP 2020-07-06 22:22:37 +02:00
Ines Montani 44da24ddd0 Update doc.md 2020-07-06 18:17:00 +02:00
Ines Montani 44790c1c32 Update docs and add keyword-only tag 2020-07-06 18:14:57 +02:00
Matthw Honnibal 1eb1654941 Update configs 2020-07-06 17:51:37 +02:00
Matthw Honnibal f25761e513 Dont randomize cuts in parser 2020-07-06 17:51:25 +02:00
Matthw Honnibal 709fc5e4ad Clarify dropout and seed in Tok2Vec 2020-07-06 17:50:21 +02:00
Matthew Honnibal 19d42f42de Set version to v3.0.0a2 2020-07-06 17:43:12 +02:00
Matthew Honnibal cc477be952
Improve gold-standard alignment (#5711)
* Remove previous alignment

* Implement better alignment, using ragged data structure

* Use pytokenizations for alignment

* Fixes

* Fixes

* Fix overlapping entities in alignment

* Fix align split_sents

* Update test

* Commit align.py

* Try to appease setuptools

* Fix flake8

* use realistic entities for testing

* Update tests for better alignment

* Improve alignment heuristic

Co-authored-by: svlandeg <sofie.vanlandeghem@gmail.com>
2020-07-06 17:39:31 +02:00
Ines Montani b6deef80f8 Fix class to pickling works as expected 2020-07-06 16:43:45 +02:00
Ines Montani a35236e5f0 Update v3 docs WIP [ci skip] 2020-07-06 15:57:44 +02:00