Commit Graph

56 Commits

Author SHA1 Message Date
Ines Montani ff9a63bfbd begin_training -> initialize 2020-09-28 21:35:09 +02:00
Ines Montani e06ff8b71d Update docs [ci skip] 2020-09-26 13:18:08 +02:00
Ines Montani 6ca06cb62c Update docs and formatting [ci skip] 2020-09-23 10:14:27 +02:00
Ines Montani 60a317520a
Merge pull request #6109 from svlandeg/feature/2rename 2020-09-23 09:47:12 +02:00
Ines Montani 930b116f00 Update docs [ci skip] 2020-09-23 09:35:21 +02:00
svlandeg b556a10808 rename converts in_to_out 2020-09-22 11:50:19 +02:00
Ines Montani 67fbcb3da5 Tidy up tests and docs 2020-09-21 20:43:54 +02:00
Ines Montani 012b3a7096 Update docs [ci skip] 2020-09-20 17:44:58 +02:00
Ines Montani c8fa2247e3 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2020-09-17 12:34:15 +02:00
Ines Montani 6761028c6f Update docs [ci skip] 2020-09-17 12:34:11 +02:00
Adriane Boyd 7e4cd7575c
Refactor Docs.is_ flags (#6044)
* Refactor Docs.is_ flags

* Add derived `Doc.has_annotation` method

  * `Doc.has_annotation(attr)` returns `True` for partial annotation

  * `Doc.has_annotation(attr, require_complete=True)` returns `True` for
    complete annotation

* Add deprecation warnings to `is_tagged`, `is_parsed`, `is_sentenced`
and `is_nered`

* Add `Doc._get_array_attrs()`, which returns a full list of `Doc` attrs
for use with `Doc.to_array`, `Doc.to_bytes` and `Doc.from_docs`. The
list is the `DocBin` attributes list plus `SPACY` and `LENGTH`.

Notes on `Doc.has_annotation`:

* `HEAD` is converted to `DEP` because heads don't have an unset state

* Accept `IS_SENT_START` as a synonym of `SENT_START`

Additional changes:

* Add `NORM`, `ENT_ID` and `SENT_START` to default attributes for
`DocBin`

* In `Doc.from_array()` the presence of `DEP` causes `HEAD` to override
`SENT_START`

* In `Doc.from_array()` using `attrs` other than
`Doc._get_array_attrs()` (i.e., a user's custom list rather than our
default internal list) with both `HEAD` and `SENT_START` shows a warning
that `HEAD` will override `SENT_START`

* `set_children_from_heads` does not require dependency labels to set
sentence boundaries and sets `sent_start` for all non-sentence starts to
`-1`

* Fix call to set_children_form_heads

Co-authored-by: Matthew Honnibal <honnibal+gh@gmail.com>
2020-09-17 00:14:01 +02:00
Ines Montani b7faa38960 Update docs [ci skip] 2020-09-15 12:44:03 +02:00
Ines Montani 154752f9c2 Update docs and consistency [ci skip] 2020-09-15 00:32:49 +02:00
Ines Montani 5ebb2a2ac8 Update docs [ci skip] 2020-09-13 22:36:20 +02:00
Ines Montani 47acb45850 Update docs [ci skip] 2020-09-13 22:30:33 +02:00
Ines Montani 8b0dabe987 Update docs [ci skip] 2020-09-12 17:05:10 +02:00
Ines Montani c443c82722 Update docs [ci skip] 2020-09-05 13:41:10 +02:00
Ines Montani b3e338d65e Update docs [ci skip] 2020-09-04 20:58:36 +02:00
Ines Montani 157caf4dfa WIP: update docs [ci skip] 2020-09-04 16:30:31 +02:00
Adriane Boyd b927893309
Merge branch 'develop' into feature/dependency-matcher-v3 2020-09-04 13:03:30 +02:00
Ines Montani 121809dd1e Fix anchor [ci skip] 2020-09-03 16:49:56 +02:00
Ines Montani b5a0657fd6 "model" terminology consistency in docs 2020-09-03 13:13:03 +02:00
Adriane Boyd 960d9cfadc Officially support DependencyMatcher
Add official support for the `DependencyMatcher`. Redesign the pattern
specification. Fix and extend operator implementations. Update API docs
and add usage docs.

Patterns
--------

Refactor pattern structure to:

```
{
  "LEFT_ID": str,
  "REL_OP": str,
  "RIGHT_ID": str,
  "RIGHT_ATTRS": dict,
}
```

The first node contains only `RIGHT_ID` and `RIGHT_ATTRS` and all
subsequent nodes contain all four keys.

New operators
-------------

Because of the way patterns are constructed from left to right, it's
helpful to have `follows` operators along with `precedes` operators. Add
operators for simple precedes / follows alongside immediate precedes /
follows.

* `.*`: precedes
* `;`: immediately follows
* `;*`: follows

Operator fixes
--------------

* `<` and `<<` do not include the node itself
* Fix reversed order for all operators involving linear precedence (`.`,
  all sibling operators)
* Linear precedence operators do not match nodes outside the same parse

Additional fixes
----------------

* Use v3 Matcher API
* Support `get` and `remove`
* Support pickling
2020-09-02 17:45:29 +02:00
Ines Montani add9de5487 Deprecate (Phrase)Matcher.pipe 2020-08-31 17:01:24 +02:00
Sofie Van Landeghem ec14744ee4
Rename Transformer listener (#6001)
* rename to spacy-transformers.TransformerListener

* add some more tok2vec tests

* use select_pipes

* fix docs - annotation setter was not changed in the end
2020-08-31 12:41:39 +02:00
Adriane Boyd 216efaf5f5 Restrict tokenizer exceptions to ORTH and NORM 2020-08-31 09:55:01 +02:00
Ines Montani 9b86312bab Update docs [ci skip] 2020-08-29 18:43:19 +02:00
Adriane Boyd 870774f475
Merge branch 'develop' into docs/morph-usage-v3 2020-08-29 16:00:50 +02:00
Adriane Boyd f9ed31a757 Update usage docs for lemmatization and morphology 2020-08-29 15:56:50 +02:00
Ines Montani 66d76f5126 Update docs 2020-08-29 12:36:05 +02:00
Ines Montani 8ac5ef1284 Update docs 2020-08-25 11:54:37 +02:00
Matthew Honnibal e559867605
Allow spacy project to push and pull to/from remote storage (#5949)
* Add utils for working with remote storage

* WIP add remote_cache for project

* WIP add push and pull commands

* Use pathy in remote_cache

* Updarte util

* Update remote_cache

* Update util

* Update project assets

* Update pull script

* Update push script

* Fix type annotation in util

* Work on remote storage

* Remove site and env hash

* Fix imports

* Fix type annotation

* Require pathy

* Require pathy

* Fix import

* Add a util to handle project variable substitution

* Import push and pull commands

* Fix pull command

* Fix push command

* Fix tarfile in remote_storage

* Improve printing

* Fiddle with status messages

* Set version to v3.0.0a9

* Draft docs for spacy project remote storages

* Update docs [ci skip]

* Use Thinc config to simplify and unify template variables

* Auto-format

* Don't import Pathy globally for now

Causes slow and annoying Google Cloud warning

* Tidy up test

* Tidy up and update tests

* Update to latest Thinc

* Update docs

* variables -> vars

* Update docs [ci skip]

* Update docs [ci skip]

Co-authored-by: Ines Montani <ines@ines.io>
2020-08-23 18:32:09 +02:00
svlandeg 1b7cfa7347 Merge remote-tracking branch 'upstream/develop' into feature/docs-docs-docs 2020-08-21 18:36:18 +02:00
svlandeg 942adf0f4d comma 2020-08-21 18:36:02 +02:00
svlandeg 262552010d context manager with space (for consistency) 2020-08-21 18:34:02 +02:00
Ines Montani 2cc4640385 Update docs [ci skip] 2020-08-21 16:21:55 +02:00
Ines Montani aa6a7cd6e7 Update docs and consistency [ci skip] 2020-08-21 13:49:18 +02:00
Ines Montani 52bd3a8b48 Update docs [ci skip] 2020-08-21 13:22:59 +02:00
svlandeg 7a2e6a96f5 fix typo 2020-08-19 16:54:16 +02:00
Ines Montani 9c25656ccc Update docs [ci skip] 2020-08-19 12:14:41 +02:00
Ines Montani 13291e97ba Update docs [ci skip] 2020-08-19 00:28:37 +02:00
Ines Montani 82f0e20318 Update docs and consistency [ci skip] 2020-08-18 14:39:40 +02:00
Ines Montani ef6cf3b276 Update docs [ci skip] 2020-08-18 01:29:34 +02:00
Ines Montani 728fec0194 Update docs [ci skip] 2020-08-18 00:49:19 +02:00
Ines Montani 3ae5e02f4f Update docs, types and API consistency 2020-08-17 16:45:24 +02:00
Ines Montani a570c304df Update quickstart, template and docs 2020-08-15 14:50:29 +02:00
Ines Montani b7ec06e331 Update docs [ci skip] 2020-08-11 20:57:23 +02:00
Ines Montani 12052bd8f6 Update docs [ci skip] 2020-08-10 01:20:10 +02:00
Ines Montani c044460823 Update docs [ci skip] 2020-08-10 00:01:38 +02:00
Ines Montani e5995904d6 Update docs 2020-08-06 19:30:43 +02:00