Commit Graph

7606 Commits

Author SHA1 Message Date
Sofie Van Landeghem 6bfb1b3a29
Fix sparse checkout for 'spacy project' (#6008)
* exit if cloning fails

* UX

* rewrite http link to git protocol, don't use stdin

* fixes to sparse checkout

* formatting
2020-09-01 19:49:01 +02:00
Matthew Honnibal 4cce32f090 Fix tagger initialization 2020-09-01 16:38:34 +02:00
Matthew Honnibal 046c38bd26
Remove 'cleanup' of strings (#6007)
A long time ago we went to some trouble to try to clean up "unused"
strings, to avoid the `StringStore` growing in long-running processes.

This never really worked reliably, and I think it was a really wrong
approach. It's much better to let the user reload the `nlp` object as
necessary, now that the string encoding is stable (in v1, the string IDs
were sequential integers, making reloading the NLP object really
annoying.)

The extra book-keeping does make some performance difference, and the
feature is unsed, so it's past time we killed it.
2020-09-01 16:12:15 +02:00
Ines Montani 70b226f69d Support ignore marker in project document [ci skip] 2020-09-01 12:49:04 +02:00
Ines Montani a4c51f0f18 Add v3 info to project docs [ci skip] 2020-09-01 12:36:21 +02:00
Ines Montani ef9005273b Update fill-config command and add silent mode [ci skip] 2020-09-01 12:07:04 +02:00
Matthew Honnibal ec660e3131 Fix use_pytorch_for_gpu_memory 2020-09-01 00:41:38 +02:00
Adriane Boyd 9130094199
Prevent Tagger model init with 0 labels (#5984)
* Prevent Tagger model init with 0 labels

Raise an error before trying to initialize a tagger model with 0 labels.

* Add dummy tagger label for test

* Remove tagless tagger model initializiation

* Fix error number after merge

* Add dummy tagger label to test

* Fix formatting

Co-authored-by: Matthew Honnibal <honnibal+gh@gmail.com>
2020-08-31 21:24:33 +02:00
Matthw Honnibal c38298b8fa Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2020-08-31 19:55:55 +02:00
Matthw Honnibal fe298fa50a Shuffle on first epoch of train 2020-08-31 19:55:22 +02:00
Ines Montani 9af82f3f11
Merge pull request #6003 from explosion/feature/matcher-as-spans 2020-08-31 17:50:56 +02:00
Ines Montani add9de5487 Deprecate (Phrase)Matcher.pipe 2020-08-31 17:01:24 +02:00
Ines Montani 83aff38c59
Make argument keyword-only
Co-authored-by: Matthew Honnibal <honnibal+gh@gmail.com>
2020-08-31 15:39:03 +02:00
Ines Montani 6340d1c63d Add as_spans to Matcher/PhraseMatcher 2020-08-31 14:53:22 +02:00
svlandeg 13ee742fb4 example of custom logger 2020-08-31 14:24:41 +02:00
svlandeg c18eb63483 Merge remote-tracking branch 'upstream/develop' into feature/vectors-docs
# Conflicts:
#	website/docs/usage/embeddings-transformers.md
2020-08-31 13:21:36 +02:00
Sofie Van Landeghem ec14744ee4
Rename Transformer listener (#6001)
* rename to spacy-transformers.TransformerListener

* add some more tok2vec tests

* use select_pipes

* fix docs - annotation setter was not changed in the end
2020-08-31 12:41:39 +02:00
Adriane Boyd 216efaf5f5 Restrict tokenizer exceptions to ORTH and NORM 2020-08-31 09:55:01 +02:00
Matthew Honnibal 9341cbc013 Set version to v3.0.0a13 2020-08-30 23:10:43 +02:00
Ines Montani 45f46a5c85
Merge pull request #5993 from explosion/feature/disabled-components 2020-08-29 15:58:41 +02:00
Ines Montani 34146750d4 Use frozen list with custom errors
We don't want to break backwards compatibility too much but we also want to provide the best possible UX
2020-08-29 15:20:11 +02:00
Ines Montani 744f432420
Merge pull request #5994 from explosion/feature/idempotent-component-decorator 2020-08-29 13:17:13 +02:00
Ines Montani 5de3f8604d
Update spacy/util.py
Co-authored-by: Matthew Honnibal <honnibal+gh@gmail.com>
2020-08-29 13:17:06 +02:00
Ines Montani 091a9b522a Remove unused variable [ci skip] 2020-08-29 13:11:26 +02:00
Ines Montani 2bc31e15c9 Tidy up and auto-format [ci skip] 2020-08-29 13:01:10 +02:00
Ines Montani 6520d1a1df Work around set order in Language.disabled 2020-08-29 12:58:22 +02:00
Ines Montani f45095a666
Merge pull request #5995 from adrianeboyd/bugfix/attribute-ruler-bugfixes 2020-08-29 12:38:30 +02:00
Ines Montani e0b4984aa4 Make deprecated disable_pipes call into select_pipes 2020-08-29 12:08:46 +02:00
Ines Montani 15d73f4dc3 Make user-facing Language.disabled return list
More consistent with all the other properties
2020-08-29 12:08:33 +02:00
Matthew Honnibal 58f19421b1 Return empty batch from tok2vec listener if no doc.tensor 2020-08-29 03:46:50 +02:00
svlandeg 5230529de2 add loggers registry & logger docs sections 2020-08-28 21:44:04 +02:00
Ines Montani 0687d7148e Rename user-facing API 2020-08-28 21:04:02 +02:00
Adriane Boyd 0104bd1600 Sort the AttributeRuler matches by rule order
Sort the returned matches by rule order (the `match_id`) so that the
rules are applied in the order they were added. This is necessary, for
instance, if the `AttributeRuler` is used for the tag map and later
rules require POS tags.
2020-08-28 21:01:06 +02:00
Ines Montani 6a999c9303 Remove outdated component attr check 2020-08-28 20:59:19 +02:00
Adriane Boyd 8674b17651 Serialize AttributeRuler.patterns
Serialize `AttributeRuler.patterns` instead of the individual lists to
simplify the serialized and so that patterns are reloaded exactly as
they were originally provided (preserving `_attrs_unnormed`).
2020-08-28 20:44:45 +02:00
Ines Montani 10da74382f Raise if disabled components are removed before DisabledPipes.restore 2020-08-28 20:35:26 +02:00
Ines Montani 1e0363290e Remove todos and update docstrings 2020-08-28 20:34:46 +02:00
Ines Montani cad988da7f Allow component decorators to re-run with same function 2020-08-28 16:27:22 +02:00
Ines Montani 3ce5be4b76 Allow loaded but disabled components 2020-08-28 15:20:14 +02:00
Ines Montani 89f692bc8a
Merge pull request #5992 from svlandeg/feature/wandb-restrict-config 2020-08-28 15:05:29 +02:00
Ines Montani 9c4049b57f
Merge pull request #5986 from explosion/fix/language-config-interpolate-disk-bytes 2020-08-28 15:03:52 +02:00
Ines Montani adc050cdc5 Fix code style in test [ci skip] 2020-08-28 15:03:21 +02:00
svlandeg 05a1bafa15 fix type 2020-08-28 14:08:33 +02:00
svlandeg 33883aa764 rename field 2020-08-28 14:06:23 +02:00
svlandeg 1d8c4070aa add disable_fields to wandb_logger 2020-08-28 13:55:32 +02:00
Ines Montani a51b4f3a19 Merge branch 'develop' into fix/language-config-interpolate-disk-bytes 2020-08-28 13:21:17 +02:00
Ines Montani 03dde511b4
Merge pull request #5987 from explosion/feature/debug-config [ci skip] 2020-08-28 11:30:18 +02:00
Ines Montani 62e9967228 Merge branch 'develop' into fix/language-config-interpolate-disk-bytes 2020-08-28 11:19:36 +02:00
Ines Montani 4ca2698f85 Merge branch 'develop' into feature/debug-config 2020-08-28 11:19:17 +02:00
svlandeg 9a8255ffd5 two tests because of different exit type 2020-08-28 10:50:26 +02:00