Commit Graph

911 Commits

Author SHA1 Message Date
Ines Montani 329b61ee7b Update docs [ci skip] 2020-10-09 10:36:06 +02:00
delzac 668507be1b Reflect on usage doc that IS_SENT_START attribute exist (#6114)
* Reflect on usage doc that IS_SENT_START attribute exist

* Create delzac.md
2020-10-09 10:14:40 +02:00
Sofie Van Landeghem d093d6343b
TrainablePipe (#6213)
* rename Pipe to TrainablePipe

* split functionality between Pipe and TrainablePipe

* remove unnecessary methods from certain components

* cleanup

* hasattr(component, "pipe") should be sufficient again

* remove serialization and vocab/cfg from Pipe

* unify _ensure_examples and validate_examples

* small fixes

* hasattr checks for self.cfg and self.vocab

* make is_resizable and is_trainable properties

* serialize strings.json instead of vocab

* fix KB IO + tests

* fix typos

* more typos

* _added_strings as a set

* few more tests specifically for _added_strings field

* bump to 3.0.0a36
2020-10-08 21:33:49 +02:00
Ines Montani 5ebd1fc2cf Update docs [ci skip] 2020-10-08 16:23:12 +02:00
Ines Montani d1602e1ece Update docs [ci skip] 2020-10-08 11:56:50 +02:00
Ines Montani 064575d79d
Merge pull request #6216 from svlandeg/feature/nel-initialize 2020-10-08 11:14:12 +02:00
Ines Montani 43e59bb22a Update docs and install extras [ci skip] 2020-10-08 10:58:50 +02:00
svlandeg bcaad28eda fix typos 2020-10-07 13:05:37 +02:00
delzac 15ea401b39
Reflect on usage doc that IS_SENT_START attribute exist (#6114)
* Reflect on usage doc that IS_SENT_START attribute exist

* Create delzac.md
2020-10-06 15:11:01 +02:00
Ines Montani ce14520789 Update docs [ci skip] 2020-10-06 14:35:17 +02:00
Ines Montani 2a17566da3 Update docs [ci skip] 2020-10-06 14:15:08 +02:00
Ines Montani 967377287a
Merge pull request #6210 from adrianeboyd/docs/various-v3-3 [ci skip] 2020-10-06 11:28:45 +02:00
Adriane Boyd aa9c9f3bf0 Update Chinese usage for spacy-pkuseg 2020-10-06 11:21:17 +02:00
Ines Montani 2e961817cb Update docs [ci skip] 2020-10-06 10:23:01 +02:00
svlandeg fd0f60e2bc updates to data format for training and pretraining 2020-10-06 09:28:53 +02:00
Ines Montani 706b7f6973 Update docs 2020-10-05 20:51:22 +02:00
Ines Montani e3acad6264 Update docs [ci skip] 2020-10-05 13:06:20 +02:00
Ines Montani 0f64556c04
Merge pull request #6197 from svlandeg/feature/pipe-docs [ci skip] 2020-10-05 11:55:40 +02:00
svlandeg 9a6c9b133b various small fixes 2020-10-05 01:05:37 +02:00
svlandeg 52b660e9dc initialize and update explanation 2020-10-05 00:39:36 +02:00
svlandeg b0463fbf75 set_annotations explanation 2020-10-04 14:56:48 +02:00
Ines Montani 43d7652635
Merge pull request #6192 from explosion/feature/init-attr-ruler 2020-10-04 14:46:37 +02:00
Ines Montani 9b3a934361 Update docs [ci skip] 2020-10-04 14:14:55 +02:00
svlandeg 9f40d963fd highlight the two steps: the model and the pipeline component 2020-10-04 14:11:53 +02:00
Ines Montani 11347f34da Tidy up, tests and docs 2020-10-04 13:54:05 +02:00
svlandeg 452b8309f9 slight rewrite to hide some thinc implementation details 2020-10-04 13:26:46 +02:00
svlandeg 08ad349a18 tok2vec layer 2020-10-04 00:08:02 +02:00
svlandeg 2c4b2ee5e9 REL intro and get_candidates function 2020-10-03 23:27:05 +02:00
Ines Montani 3b8f352eda Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2020-10-03 16:08:27 +02:00
Ines Montani 35d695a031 Update docs 2020-10-03 16:08:24 +02:00
Matthew Honnibal db419f6b2f
Improve control of training progress and logging (#6184)
* Make logging and progress easier to control

* Update docs

* Cleanup errors

* Fix ConfigValidationError

* Pass stdout/stderr, not wasabi.Printer

* Fix type

* Upd logging example

* Fix logger example

* Fix type
2020-10-03 14:57:46 +02:00
Ines Montani 5fb776556a Update docs [ci skip] 2020-10-03 14:47:02 +02:00
Ines Montani eb9b3ff9c5 Update install docs and quickstarts [ci skip] 2020-10-03 11:35:42 +02:00
Ines Montani df06f7a792 Update docs [ci skip] 2020-10-02 13:24:33 +02:00
Ines Montani d2aa662ab2
Merge pull request #6179 from adrianeboyd/feature/token-morph-refactor-2 [ci skip] 2020-10-02 12:10:27 +02:00
Ines Montani 0f11c2150d Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2020-10-02 11:38:05 +02:00
Ines Montani 32cdc1c4f4 Update docs [ci skip] 2020-10-02 11:38:03 +02:00
Ines Montani 6d8df081bd
Merge pull request #6180 from adrianeboyd/docs/minor-v3-2 [ci skip] 2020-10-02 11:37:25 +02:00
Adriane Boyd 351f352cdc Update Japanese docs and pin for sudachipy 2020-10-02 10:12:44 +02:00
Adriane Boyd 7670df04dd Update Chinese usage docs 2020-10-02 10:09:03 +02:00
Adriane Boyd 3908fff899 Remove tag map sidebar 2020-10-02 09:07:55 +02:00
Adriane Boyd fd09e6b140 Update docs for Token.morph / Token.set_morph 2020-10-02 09:05:15 +02:00
Ines Montani 01c1538c72 Integrate file readers 2020-10-02 01:36:06 +02:00
Ines Montani b6b73a3ca8 Update docs [ci skip] 2020-10-01 17:45:29 +02:00
Ines Montani f2627157c8 Update docs [ci skip] 2020-10-01 17:38:17 +02:00
Sofie Van Landeghem a22215f427
Add FeatureExtractor from Thinc (#6170)
* move featureextractor from Thinc

* Update website/docs/api/architectures.md

Co-authored-by: Ines Montani <ines@ines.io>

* Update website/docs/api/architectures.md

Co-authored-by: Ines Montani <ines@ines.io>

Co-authored-by: Ines Montani <ines@ines.io>
2020-10-01 16:22:48 +02:00
Ines Montani 0a8a124a6e Update docs [ci skip] 2020-10-01 12:15:53 +02:00
Ines Montani a103ab5f1a Update augmenter lookups and docs 2020-09-30 23:03:47 +02:00
Ines Montani 115481aca7 Update docs [ci skip] 2020-09-30 15:16:00 +02:00
walterhenry 1c65b3b2c0 Proofreading
A few more small things in Usage.
2020-09-30 11:33:40 +02:00
Ines Montani 469f0e539c Fix docs [ci skip] 2020-09-30 10:24:06 +02:00
Ines Montani d3c63b7965 Merge branch 'develop' into feature/prepare 2020-09-29 20:53:05 +02:00
Ines Montani 361f91e286
Merge pull request #6135 from walterhenry/develop-proof 2020-09-29 20:49:06 +02:00
Sofie Van Landeghem 6a04e5adea
encoding UTF8 (#6161) 2020-09-29 14:49:55 +02:00
walterhenry 1d80b3dc1b Proofreading
Finished with the API docs and started on the Usage, but Embedding & Transformers
2020-09-29 12:39:10 +02:00
Ines Montani ff9a63bfbd begin_training -> initialize 2020-09-28 21:35:09 +02:00
Matthew Honnibal a976da168c
Support data augmentation in Corpus (#6155)
* Support data augmentation in Corpus

* Note initial docs for data augmentation

* Add augmenter to quickstart

* Fix flake8

* Format

* Fix test

* Update spacy/tests/training/test_training.py

* Improve data augmentation arguments

* Update templates

* Move randomization out into caller

* Refactor

* Update spacy/training/augment.py

* Update spacy/tests/training/test_training.py

* Fix augment

* Fix test
2020-09-28 03:03:27 +02:00
Ines Montani e06ff8b71d Update docs [ci skip] 2020-09-26 13:18:08 +02:00
Sofie Van Landeghem 009ba14aaf
Fix pretraining in train script (#6143)
* update pretraining API in train CLI

* bump thinc to 8.0.0a35

* bump to 3.0.0a26

* doc fixes

* small doc fix
2020-09-25 15:47:10 +02:00
Adriane Boyd 3c062b3911
Add MORPH handling to Matcher (#6107)
* Add MORPH handling to Matcher

* Add `MORPH` to `Matcher` schema
* Rename `_SetMemberPredicate` to `_SetPredicate`
* Add `ISSUBSET` and `ISSUPERSET` operators to `_SetPredicate`
  * Add special handling for normalization and conversion of morph
    values into sets
  * For other attrs, `ISSUBSET` acts like `IN` and `ISSUPERSET` only
    matches for 0 or 1 values

* Update test

* Rename to IS_SUBSET and IS_SUPERSET
2020-09-24 16:55:09 +02:00
Ines Montani 3b58a8be2b Update docs 2020-09-24 14:32:42 +02:00
Ines Montani 6836b66433 Update docs and resolve todos [ci skip] 2020-09-24 13:41:25 +02:00
Ines Montani d7ab6a2ffe Update docs [ci skip] 2020-09-24 12:37:21 +02:00
Ines Montani ae51f580c1 Fix handling of score_weights 2020-09-24 10:27:33 +02:00
Ines Montani e2ffe51fb5 Update docs [ci skip] 2020-09-24 10:13:41 +02:00
Ines Montani 02008e9a55 Update docs [ci skip] 2020-09-23 22:02:31 +02:00
Ines Montani c8bda92243 Update benchmarks [ci skip] 2020-09-23 20:05:02 +02:00
svlandeg 35dbc63578 Merge remote-tracking branch 'upstream/develop' into fix/nr_features
# Conflicts:
#	spacy/ml/models/parser.py
#	spacy/tests/serialize/test_serialize_config.py
#	website/docs/api/architectures.md
2020-09-23 17:01:13 +02:00
Ines Montani e4e7f5b00d Update docs [ci skip] 2020-09-23 15:44:40 +02:00
svlandeg 6c85fab316 state_type and extra_state_tokens instead of nr_feature_tokens 2020-09-23 13:35:09 +02:00
Ines Montani 6ca06cb62c Update docs and formatting [ci skip] 2020-09-23 10:14:27 +02:00
Ines Montani 60a317520a
Merge pull request #6109 from svlandeg/feature/2rename 2020-09-23 09:47:12 +02:00
Ines Montani 930b116f00 Update docs [ci skip] 2020-09-23 09:35:21 +02:00
svlandeg b556a10808 rename converts in_to_out 2020-09-22 11:50:19 +02:00
Ines Montani f9af7d365c Update docs [ci skip] 2020-09-22 09:45:41 +02:00
Ines Montani 49e80dbcac
Merge pull request #6103 from explosion/chore/tidy-up-tests-docs-get-doc 2020-09-22 09:45:04 +02:00
Adriane Boyd 844db6ff12 Update architecture overview 2020-09-22 09:31:47 +02:00
Adriane Boyd 5fbb8dfcbc Merge remote-tracking branch 'upstream/develop' into docs/various-v3-2 2020-09-22 09:22:58 +02:00
Ines Montani 67fbcb3da5 Tidy up tests and docs 2020-09-21 20:43:54 +02:00
Ines Montani e548654aca Update docs [ci skip] 2020-09-21 14:46:55 +02:00
Ines Montani 9d32cac736 Update docs [ci skip] 2020-09-21 10:55:36 +02:00
Adriane Boyd cc71ec901f Fix typo in saving and loading usage docs 2020-09-21 09:08:55 +02:00
Ines Montani 012b3a7096 Update docs [ci skip] 2020-09-20 17:44:58 +02:00
Ines Montani 554c9a2497 Update docs [ci skip] 2020-09-20 12:30:53 +02:00
Sofie Van Landeghem 39872de1f6
Introducing the gpu_allocator (#6091)
* rename 'use_pytorch_for_gpu_memory' to 'gpu_allocator'

* --code instead of --code-path

* update documentation

* avoid querying the "system" section directly

* add explanation of gpu_allocator to TF/PyTorch section in docs

* fix typo

* fix typo 2

* use set_gpu_allocator from thinc 8.0.0a34

* default null instead of empty string
2020-09-19 01:17:02 +02:00
Ines Montani a127fa475e
Merge pull request #6078 from svlandeg/fix/corpus 2020-09-18 14:44:21 +02:00
Ines Montani a0b4389a38 Update docs [ci skip] 2020-09-17 19:24:48 +02:00
Matthew Honnibal 6efb7688a6 Draft pretrain usage 2020-09-17 18:17:03 +02:00
Ines Montani a2c8cda26f Update docs [ci skip] 2020-09-17 17:12:51 +02:00
Matthew Honnibal ec751068f3 Draft text for static vectors intro 2020-09-17 16:42:53 +02:00
svlandeg c8c84f1ccd Merge remote-tracking branch 'upstream/develop' into fix/corpus 2020-09-17 15:43:04 +02:00
Ines Montani c8fa2247e3 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2020-09-17 12:34:15 +02:00
Ines Montani 6761028c6f Update docs [ci skip] 2020-09-17 12:34:11 +02:00
svlandeg 0c35885751 generalize corpora, dot notation for dev and train corpus 2020-09-17 11:38:59 +02:00
svlandeg 781fae678b Merge remote-tracking branch 'upstream/develop' into fix/corpus 2020-09-17 09:24:36 +02:00
Adriane Boyd 7e4cd7575c
Refactor Docs.is_ flags (#6044)
* Refactor Docs.is_ flags

* Add derived `Doc.has_annotation` method

  * `Doc.has_annotation(attr)` returns `True` for partial annotation

  * `Doc.has_annotation(attr, require_complete=True)` returns `True` for
    complete annotation

* Add deprecation warnings to `is_tagged`, `is_parsed`, `is_sentenced`
and `is_nered`

* Add `Doc._get_array_attrs()`, which returns a full list of `Doc` attrs
for use with `Doc.to_array`, `Doc.to_bytes` and `Doc.from_docs`. The
list is the `DocBin` attributes list plus `SPACY` and `LENGTH`.

Notes on `Doc.has_annotation`:

* `HEAD` is converted to `DEP` because heads don't have an unset state

* Accept `IS_SENT_START` as a synonym of `SENT_START`

Additional changes:

* Add `NORM`, `ENT_ID` and `SENT_START` to default attributes for
`DocBin`

* In `Doc.from_array()` the presence of `DEP` causes `HEAD` to override
`SENT_START`

* In `Doc.from_array()` using `attrs` other than
`Doc._get_array_attrs()` (i.e., a user's custom list rather than our
default internal list) with both `HEAD` and `SENT_START` shows a warning
that `HEAD` will override `SENT_START`

* `set_children_from_heads` does not require dependency labels to set
sentence boundaries and sets `sent_start` for all non-sentence starts to
`-1`

* Fix call to set_children_form_heads

Co-authored-by: Matthew Honnibal <honnibal+gh@gmail.com>
2020-09-17 00:14:01 +02:00
svlandeg 51fa929f47 rewrite train_corpus to corpus.train in config 2020-09-15 21:58:04 +02:00
Ines Montani b7faa38960 Update docs [ci skip] 2020-09-15 12:44:03 +02:00
Ines Montani 154752f9c2 Update docs and consistency [ci skip] 2020-09-15 00:32:49 +02:00
Ines Montani 85e5910102 Update docs [ci skip] 2020-09-13 23:09:19 +02:00
Ines Montani 5ebb2a2ac8 Update docs [ci skip] 2020-09-13 22:36:20 +02:00
Ines Montani 47acb45850 Update docs [ci skip] 2020-09-13 22:30:33 +02:00
Ines Montani 2e3d067a7b Update docs [ci skip] 2020-09-13 19:29:06 +02:00
Ines Montani 99b26fe492 Update docs [ci skip] 2020-09-13 17:59:38 +02:00
Ines Montani 1316071086 Update docs [ci skip] 2020-09-13 11:31:50 +02:00
Ines Montani 368ecf705a Update docs [ci skip] 2020-09-12 17:40:50 +02:00
Ines Montani 8b0dabe987 Update docs [ci skip] 2020-09-12 17:05:10 +02:00
Ines Montani 4fec8c39a3 Update project teaser [ci skip] 2020-09-10 13:23:03 +02:00
Ines Montani 763e302dcc Update project widgets and examples [ci skip] 2020-09-10 13:04:16 +02:00
Ines Montani 908f3a4494 Update default projects repo [ci skip] 2020-09-10 11:42:14 +02:00
Ines Montani 2e567a47c2 Update docs and formatting 2020-09-09 21:26:10 +02:00
svlandeg aa27e3f1f2 PyTorch spelling 2020-09-09 16:27:21 +02:00
svlandeg a8aa9a8068 document Pipe API details, crossreferences etc 2020-09-09 15:56:27 +02:00
svlandeg 9a7c6cc61a references to usage page on layers and architectures 2020-09-09 14:47:32 +02:00
svlandeg e80898092b Merge branch 'feature/more-layers-docs' of https://github.com/svlandeg/spaCy into feature/more-layers-docs 2020-09-09 14:44:28 +02:00
svlandeg 4c080b3a98 details on Thinc shape inference 2020-09-09 13:57:05 +02:00
svlandeg 39aa740777 Merge remote-tracking branch 'upstream/develop' into feature/more-layers-docs 2020-09-09 11:59:34 +02:00
svlandeg e39242c4e6 formatting 2020-09-09 11:25:35 +02:00
Ines Montani 24053d83ec Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2020-09-09 11:20:14 +02:00
Ines Montani 406aed78ee Update docs [ci skip] 2020-09-09 11:20:07 +02:00
Sofie Van Landeghem 8e7557656f
Renaming gold & annotation_setter (#6042)
* version bump to 3.0.0a16

* rename "gold" folder to "training"

* rename 'annotation_setter' to 'set_extra_annotations'

* formatting
2020-09-09 10:31:03 +02:00
svlandeg a16afb79e3 add section on Thinc implementation details 2020-09-08 20:43:09 +02:00
svlandeg 1c476b4b41 how to register and use custom function 2020-09-08 20:22:20 +02:00
svlandeg b35a26ea5d example wrapped Torch model and chaining with Thinc 2020-09-08 18:32:58 +02:00
svlandeg bd8f9b188b small fixes 2020-09-08 17:24:36 +02:00
Ines Montani d98ae9d918 Update docs [ci skip] 2020-09-08 10:33:48 +02:00
Ines Montani c443c82722 Update docs [ci skip] 2020-09-05 13:41:10 +02:00
Ines Montani b3e338d65e Update docs [ci skip] 2020-09-04 20:58:36 +02:00
Ines Montani 157caf4dfa WIP: update docs [ci skip] 2020-09-04 16:30:31 +02:00
Ines Montani f174c7b1f3 Merge branch 'develop' into pr/6018 2020-09-04 15:54:49 +02:00
Ines Montani 864a697e63 Merge branch 'develop' into master-tmp 2020-09-04 13:15:36 +02:00
Adriane Boyd b927893309
Merge branch 'develop' into feature/dependency-matcher-v3 2020-09-04 13:03:30 +02:00
Ines Montani 2189046869
Merge pull request #6024 from explosion/chore/registry-renaming 2020-09-04 10:54:10 +02:00
Ines Montani b1eb98b15c Remove todos [ci skip] 2020-09-03 17:43:58 +02:00
Ines Montani 23b7d9cfa3 Prefix span getters 2020-09-03 17:37:06 +02:00
Ines Montani 5afe6447cd registry.assets -> registry.misc 2020-09-03 17:31:14 +02:00
Ines Montani 121809dd1e Fix anchor [ci skip] 2020-09-03 16:49:56 +02:00
Ines Montani 25a595dc10 Fix typos and wording [ci skip] 2020-09-03 16:37:45 +02:00
Ines Montani b5a0657fd6 "model" terminology consistency in docs 2020-09-03 13:13:03 +02:00
Ines Montani b02ad8045b Update docs [ci skip] 2020-09-03 10:10:13 +02:00
Ines Montani 1815c613c9 Update docs [ci skip] 2020-09-03 10:07:45 +02:00
Adriane Boyd 960d9cfadc Officially support DependencyMatcher
Add official support for the `DependencyMatcher`. Redesign the pattern
specification. Fix and extend operator implementations. Update API docs
and add usage docs.

Patterns
--------

Refactor pattern structure to:

```
{
  "LEFT_ID": str,
  "REL_OP": str,
  "RIGHT_ID": str,
  "RIGHT_ATTRS": dict,
}
```

The first node contains only `RIGHT_ID` and `RIGHT_ATTRS` and all
subsequent nodes contain all four keys.

New operators
-------------

Because of the way patterns are constructed from left to right, it's
helpful to have `follows` operators along with `precedes` operators. Add
operators for simple precedes / follows alongside immediate precedes /
follows.

* `.*`: precedes
* `;`: immediately follows
* `;*`: follows

Operator fixes
--------------

* `<` and `<<` do not include the node itself
* Fix reversed order for all operators involving linear precedence (`.`,
  all sibling operators)
* Linear precedence operators do not match nodes outside the same parse

Additional fixes
----------------

* Use v3 Matcher API
* Support `get` and `remove`
* Support pickling
2020-09-02 17:45:29 +02:00
svlandeg 19298de352 small fix 2020-09-02 17:43:11 +02:00
svlandeg bbaea530f6 sublayers paragraph 2020-09-02 17:36:22 +02:00
svlandeg 1be7ff02a6 swapping section 2020-09-02 15:26:07 +02:00
svlandeg 57e432ba2a editor tip as Accordion instead of Infobox 2020-09-02 14:26:57 +02:00
svlandeg d19ec6c67b small rewrites in types paragraph 2020-09-02 14:25:18 +02:00
svlandeg 821b2d4e63 update examples 2020-09-02 14:15:50 +02:00
svlandeg e29a33449d rewrite intro, simpel Model example 2020-09-02 13:41:18 +02:00
svlandeg 422df9c2e2 Merge remote-tracking branch 'upstream/develop' into feature/docs-layers
# Conflicts:
#	website/docs/usage/layers-architectures.md
2020-09-02 13:17:11 +02:00