Commit Graph

68 Commits

Author SHA1 Message Date
Peter Baumgartner 7ce3460b23
add floret to static vectors docs (#10833) 2022-05-23 09:16:31 +02:00
Adriane Boyd e8357923ec
Various install docs updates (#10487)
* Simplify quickstart source install to use only editable pip install

* Update pytorch install instructions to more recent versions
2022-03-15 11:12:50 +01:00
Paul O'Leary McCann f3981bd0c8
Clarify how to fill in init_tok2vec after pretraining (#9639)
* Clarify how to fill in init_tok2vec after pretraining

* Ignore init_tok2vec arg in pretraining

* Update docs, config setting

* Remove obsolete note about not filling init_tok2vec early

This seems to have also caught some lines that needed cleanup.
2021-11-18 15:38:30 +01:00
Elia Robyn Lake (Robyn Speer) fa70837f28
clarify how to connect pretraining to training (#9450)
* clarify how to connect pretraining to training

Signed-off-by: Elia Robyn Speer <elia@explosion.ai>

* Update website/docs/usage/embeddings-transformers.md

* Update website/docs/usage/embeddings-transformers.md

* Update website/docs/usage/embeddings-transformers.md

* Update website/docs/usage/embeddings-transformers.md

Co-authored-by: Elia Robyn Speer <elia@explosion.ai>
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2021-10-22 13:15:47 +02:00
Paul O'Leary McCann 222cf9b6d2
Clarify how to change base Transformer model (#9498)
* Add note about how the model name is used

* Add link to TransformersModel docs, separate paragraph

* Local link

* Revise docs

* Update website/docs/usage/embeddings-transformers.md

* Update website/docs/usage/embeddings-transformers.md

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2021-10-19 23:28:20 +02:00
Sofie Van Landeghem 3fd3531e12
Docs for new spacy-trf architectures (#8954)
* use TransformerModel.v2 in quickstart

* update docs for new transformer architectures

* bump spacy_transformers to 1.1.0

* Add new arguments spacy-transformers.TransformerModel.v3

* Mention that mixed-precision support is experimental

* Describe delta transformers.Tok2VecTransformer versions

* add dot

* add dot, again

* Update some more TransformerModel references v2 -> v3

* Add mixed-precision options to the training quickstart

Disable mixed-precision training/prediction by default.

* Update setup.cfg

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Apply suggestions from code review

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Update website/docs/usage/embeddings-transformers.md

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

Co-authored-by: Daniël de Kok <me@danieldk.eu>
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2021-10-18 14:15:06 +02:00
Paul O'Leary McCann 1d57d78758 Make docs consistent (fix #9126) 2021-09-16 15:54:12 +09:00
Paul O'Leary McCann 66bfabd839
Fix pretraining objectives fragment (#8005)
* Fix pretraining objectives fragment

The fragment here is reused from a heading higher up, so you couldn't
link to this section.

* Fix section link to new fragment
2021-05-06 08:27:36 +02:00
Adriane Boyd d2bdaa7823
Replace negative rows with 0 in StaticVectors (#7674)
* Replace negative rows with 0 in StaticVectors

Replace negative row indices with 0-vectors in `StaticVectors`.

* Increase versions related to StaticVectors

* Increase versions of all architctures and layers related to
`StaticVectors`
* Improve efficiency of 0-vector operations

Parallel `spacy-legacy` PR: https://github.com/explosion/spacy-legacy/pull/5

* Update config defaults to new versions

* Update docs
2021-04-22 18:04:15 +10:00
Adriane Boyd 8b76cb8095 Rephrase transformers PyTorch instructions 2021-01-29 13:36:56 +01:00
Adriane Boyd e3e87e7275 Update transfomers install docs
* Recommend installing PyTorch separately
* Add instructions for `sentencepiece`
2021-01-29 13:27:43 +01:00
Adriane Boyd 61c9f8bf24
Remove transformers model max length section (#6807) 2021-01-25 19:59:34 +08:00
Adriane Boyd bf0cdae8d4
Add token_splitter component (#6726)
* Add long_token_splitter component

Add a `long_token_splitter` component for use with transformer
pipelines. This component splits up long tokens like URLs into smaller
tokens. This is particularly relevant for pretrained pipelines with
`strided_spans`, since the user can't change the length of the span
`window` and may not wish to preprocess the input texts.

The `long_token_splitter` splits tokens that are at least
`long_token_length` tokens long into smaller tokens of `split_length`
size.

Notes:

* Since this is intended for use as the first component in a pipeline,
the token splitter does not try to preserve any token annotation.
* API docs to come when the API is stable.

* Adjust API, add test

* Fix name in factory
2021-01-17 19:54:41 +08:00
Sofie Van Landeghem 75d9019343
Fix types of Tok2Vec encoding architectures (#6442)
* fix TorchBiLSTMEncoder documentation

* ensure the types of the encoding Tok2vec layers are correct

* update references from v1 to v2 for the new architectures
2021-01-07 16:39:27 +11:00
Sofie Van Landeghem 82ae95267a
Docs for pretrain architectures (#6605)
* document pretraining architectures

* formatting

* bit more info

* small fixes
2021-01-06 16:12:30 +11:00
Ines Montani c968d1560f Fix docs example [ci skip] 2020-10-16 11:33:20 +02:00
Ines Montani ba1e004049 Fix typo [ci skip] 2020-10-15 23:39:04 +02:00
svlandeg 08cb085f6c Merge remote-tracking branch 'upstream/develop' into fix/various 2020-10-09 17:01:27 +02:00
Ines Montani 9fb3244672
Merge pull request #6231 from adrianeboyd/feature/include-static-vectors 2020-10-09 15:54:52 +02:00
Adriane Boyd 2dd79454af Update docs 2020-10-09 14:42:07 +02:00
svlandeg 853edace37 fix MultiHashEmbed example in documentation 2020-10-09 14:11:06 +02:00
Ines Montani e50dc2c1c9 Update docs [ci skip] 2020-10-09 12:04:52 +02:00
Ines Montani d1602e1ece Update docs [ci skip] 2020-10-08 11:56:50 +02:00
Ines Montani 43e59bb22a Update docs and install extras [ci skip] 2020-10-08 10:58:50 +02:00
Ines Montani 01c1538c72 Integrate file readers 2020-10-02 01:36:06 +02:00
Sofie Van Landeghem a22215f427
Add FeatureExtractor from Thinc (#6170)
* move featureextractor from Thinc

* Update website/docs/api/architectures.md

Co-authored-by: Ines Montani <ines@ines.io>

* Update website/docs/api/architectures.md

Co-authored-by: Ines Montani <ines@ines.io>

Co-authored-by: Ines Montani <ines@ines.io>
2020-10-01 16:22:48 +02:00
Ines Montani 0a8a124a6e Update docs [ci skip] 2020-10-01 12:15:53 +02:00
Ines Montani 361f91e286
Merge pull request #6135 from walterhenry/develop-proof 2020-09-29 20:49:06 +02:00
walterhenry 1d80b3dc1b Proofreading
Finished with the API docs and started on the Usage, but Embedding & Transformers
2020-09-29 12:39:10 +02:00
Sofie Van Landeghem 009ba14aaf
Fix pretraining in train script (#6143)
* update pretraining API in train CLI

* bump thinc to 8.0.0a35

* bump to 3.0.0a26

* doc fixes

* small doc fix
2020-09-25 15:47:10 +02:00
Ines Montani 6836b66433 Update docs and resolve todos [ci skip] 2020-09-24 13:41:25 +02:00
svlandeg 6c85fab316 state_type and extra_state_tokens instead of nr_feature_tokens 2020-09-23 13:35:09 +02:00
Ines Montani 012b3a7096 Update docs [ci skip] 2020-09-20 17:44:58 +02:00
Ines Montani 554c9a2497 Update docs [ci skip] 2020-09-20 12:30:53 +02:00
Ines Montani a0b4389a38 Update docs [ci skip] 2020-09-17 19:24:48 +02:00
Matthew Honnibal 6efb7688a6 Draft pretrain usage 2020-09-17 18:17:03 +02:00
Ines Montani a2c8cda26f Update docs [ci skip] 2020-09-17 17:12:51 +02:00
Matthew Honnibal ec751068f3 Draft text for static vectors intro 2020-09-17 16:42:53 +02:00
Ines Montani 8b0dabe987 Update docs [ci skip] 2020-09-12 17:05:10 +02:00
Sofie Van Landeghem 8e7557656f
Renaming gold & annotation_setter (#6042)
* version bump to 3.0.0a16

* rename "gold" folder to "training"

* rename 'annotation_setter' to 'set_extra_annotations'

* formatting
2020-09-09 10:31:03 +02:00
Ines Montani 23b7d9cfa3 Prefix span getters 2020-09-03 17:37:06 +02:00
Ines Montani 690bd77669 Add todos [ci skip] 2020-09-01 14:04:36 +02:00
svlandeg e47ea88aeb revert annotations refactor 2020-08-31 14:40:55 +02:00
svlandeg c18eb63483 Merge remote-tracking branch 'upstream/develop' into feature/vectors-docs
# Conflicts:
#	website/docs/usage/embeddings-transformers.md
2020-08-31 13:21:36 +02:00
Sofie Van Landeghem ec14744ee4
Rename Transformer listener (#6001)
* rename to spacy-transformers.TransformerListener

* add some more tok2vec tests

* use select_pipes

* fix docs - annotation setter was not changed in the end
2020-08-31 12:41:39 +02:00
Ines Montani bc0730be3f Update docs [ci skip] 2020-08-29 12:53:14 +02:00
svlandeg 9f00a20ce4 proofreading and custom examples 2020-08-28 21:50:42 +02:00
svlandeg 556e975a30 various fixes 2020-08-27 19:24:44 +02:00
svlandeg 329e490560 small import fixes 2020-08-27 14:50:43 +02:00
svlandeg 28e4ba7270 fix references to TransformerListener 2020-08-27 14:33:28 +02:00