Commit Graph

6 Commits

Author SHA1 Message Date
Matthew Honnibal 61e435610e
💫 Feature/improve pretraining (#2971)
* Improve spacy pretrain script

* Implement BERT-style 'masked language model' objective. Much better
results.

* Improve logging.

* Add length cap for documents, to avoid memory errors.

* Require thinc 7.0.0.dev1

* Require thinc 7.0.0.dev1

* Add argument for using pretrained vectors

* Fix defaults

* Fix syntax error

* Improve spacy pretrain script

* Implement BERT-style 'masked language model' objective. Much better
results.

* Improve logging.

* Add length cap for documents, to avoid memory errors.

* Require thinc 7.0.0.dev1

* Require thinc 7.0.0.dev1

* Add argument for using pretrained vectors

* Fix defaults

* Fix syntax error

* Tweak pretraining script

* Fix data limits in spacy.gold

* Fix pretrain script
2018-11-28 18:04:58 +01:00
Matthew Honnibal 2ddd428834 Fix pretrain script 2018-11-15 23:34:35 +00:00
Matthew Honnibal f8afaa0c1c Fix pretrain 2018-11-15 22:46:53 +00:00
Matthew Honnibal 6af6950e46 Fix pretrain 2018-11-15 22:45:36 +00:00
Matthew Honnibal 3e7b214e57 Make pretrain script work with stream from stdin 2018-11-15 22:44:07 +00:00
Matthew Honnibal 8fdb9bc278
💫 Add experimental ULMFit/BERT/Elmo-like pretraining (#2931)
* Add 'spacy pretrain' command

* Fix pretrain command for Python 2

* Fix pretrain command

* Fix pretrain command
2018-11-15 22:17:16 +01:00