spaCy

Commit Graph

Author	SHA1	Message	Date
Matthew Honnibal	8ce9f44433	Merge branch 'feature/prepare' of https://github.com/explosion/spaCy into feature/prepare	2020-09-29 16:57:38 +02:00
Matthew Honnibal	ca72608059	Fix language	2020-09-29 16:48:33 +02:00
Ines Montani	fd594cfb9b	Tighten up format	2020-09-29 16:47:55 +02:00
Ines Montani	63d1598137	Simplify config use in Language.initialize	2020-09-29 16:05:48 +02:00
Ines Montani	adca08a12f	Pass nlp forward	2020-09-29 12:21:52 +02:00
Ines Montani	42f0e4c946	Clean up	2020-09-29 12:14:08 +02:00
Matthew Honnibal	9c8b2524fe	Upd initialize args	2020-09-29 12:08:37 +02:00
Matthew Honnibal	f2d1b7feb5	Clean up sgd	2020-09-29 12:00:08 +02:00
Ines Montani	78396d137f	Integrate initialize settings	2020-09-29 11:57:08 +02:00
Ines Montani	dec984a9c1	Update Language.initialize and support components/tokenizer settings	2020-09-29 11:52:45 +02:00
Matthew Honnibal	5276db6f3f	Remove 'device' argument from Language, clean up 'sgd' arg	2020-09-29 11:42:19 +02:00
Ines Montani	ff9a63bfbd	begin_training -> initialize	2020-09-28 21:35:09 +02:00
Ines Montani	658fad428a	Fix base schema integration	2020-09-27 22:50:36 +02:00
Ines Montani	7e938ed63e	Update config resolution to use new Thinc	2020-09-27 22:21:31 +02:00
Ines Montani	4bbe41f017	Fix combined scores and update test	2020-09-24 10:42:47 +02:00
Ines Montani	ae51f580c1	Fix handling of score_weights	2020-09-24 10:27:33 +02:00
Ines Montani	1114219ae3	Tidy up and auto-format	2020-09-21 10:59:07 +02:00
Adriane Boyd	47080fba98	Minor renaming / refactoring * Rename loader to `spacy.LookupsDataLoader.v1`, add debugging message * Make `Vocab.lookups` a property	2020-09-18 19:43:19 +02:00
Adriane Boyd	eed4b785f5	Load vocab lookups tables at beginning of training Similar to how vectors are handled, move the vocab lookups to be loaded at the start of training rather than when the vocab is initialized, since the vocab doesn't have access to the full config when it's created. The option moves from `nlp.load_vocab_data` to `training.lookups`. Typically these tables will come from `spacy-lookups-data`, but any `Lookups` object can be provided. The loading from `spacy-lookups-data` is now strict, so configs for each language should specify the exact tables required. This also makes it easier to control whether the larger clusters and probs tables are included. To load `lexeme_norm` from `spacy-lookups-data`: ``` [training.lookups] @misc = "spacy.LoadLookupsData.v1" lang = ${nlp.lang} tables = ["lexeme_norm"] ```	2020-09-18 15:59:16 +02:00
Matthew Honnibal	c776594ab1	Fix	2020-09-16 18:15:14 +02:00
Matthew Honnibal	4a573d18b3	Add comment	2020-09-16 17:51:29 +02:00
Matthew Honnibal	d31afc8334	Fix Language.link_components when model is None	2020-09-16 17:49:48 +02:00
Ines Montani	aaf01689a1	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2020-09-15 14:24:42 +02:00
Ines Montani	91a6637f74	Remove extra pipe config values before merging	2020-09-15 14:24:17 +02:00
Ines Montani	d3d7f92f05	Fix lang check and error handling in Language.from_config	2020-09-15 14:24:06 +02:00
Ines Montani	253ba5ef14	Raise for bad Vocab values	2020-09-15 13:25:34 +02:00
Ines Montani	7dfc4bc062	Allow overriding meta from spacy.blank	2020-09-15 11:12:12 +02:00
Matthew Honnibal	b693d2d224	Fix speed report in table	2020-09-13 17:39:31 +02:00
Ines Montani	febb99916d	Tidy up and auto-format [ci skip]	2020-09-13 10:55:36 +02:00
Sofie Van Landeghem	e92e850c72	Raise if empty examples (#6052 ) * raise error if no valid Example objects were found during initialization * fix max_length parameter * remove commit from other branch Co-authored-by: Matthew Honnibal <honnibal+gh@gmail.com>	2020-09-12 21:01:53 +02:00
svlandeg	711166a75a	prevent overwriting score_weights	2020-09-11 15:12:05 +02:00
Sofie Van Landeghem	8e7557656f	Renaming gold & annotation_setter (#6042 ) * version bump to 3.0.0a16 * rename "gold" folder to "training" * rename 'annotation_setter' to 'set_extra_annotations' * formatting	2020-09-09 10:31:03 +02:00
Sofie Van Landeghem	60f22e1800	Pipe API (#6034 ) * ensure Language passes on valid examples for initialization * fix tagger model initialization * check for valid get_examples across components * assume labels were added before begin_training * fix senter initialization * fix morphologizer initialization * use methods to check arguments * test textcat init, requires thinc>=8.0.0a31 * fix tok2vec init * fix entity linker init * use islice * fix simple NER * cleanup debug model * fix assert statements * fix tests * throw error when adding a label if the output layer can't be resized anymore * fix test * add failing test for simple_ner * UX improvements * morphologizer UX * assume begin_training gets a representative set and processes the labels * remove assumptions for output of untrained NER model * restore test for original purpose	2020-09-08 22:44:25 +02:00
Ines Montani	f06eed800e	Merge pull request #6029 from explosion/master-tmp	2020-09-04 15:11:55 +02:00
Ines Montani	f9550b4493	Fix components in meta.json and website [ci skip]	2020-09-04 14:42:12 +02:00
Ines Montani	90043a6f9b	Tidy up and auto-format	2020-09-04 13:42:33 +02:00
Ines Montani	ba600f91c5	Tidy up imports	2020-09-04 13:15:44 +02:00
Ines Montani	ab1bb421ed	Update docs links in codebase	2020-09-04 12:58:50 +02:00
Ines Montani	896caf45e3	Merge pull request #6023 from explosion/ux/model-terminology-consistency [ci skip]	2020-09-03 17:13:44 +02:00
Ines Montani	b5a0657fd6	"model" terminology consistency in docs	2020-09-03 13:13:03 +02:00
Matthew Honnibal	ef0d0630a4	Let Langugae.use_params work with falsey inputs The Language.use_params method was failing if you passed in None, which meant we had to use awkward conditionals for the parameter averaging. This solves the problem.	2020-09-03 12:51:04 +02:00
Matthew Honnibal	046c38bd26	Remove 'cleanup' of strings (#6007 ) A long time ago we went to some trouble to try to clean up "unused" strings, to avoid the `StringStore` growing in long-running processes. This never really worked reliably, and I think it was a really wrong approach. It's much better to let the user reload the `nlp` object as necessary, now that the string encoding is stable (in v1, the string IDs were sequential integers, making reloading the NLP object really annoying.) The extra book-keeping does make some performance difference, and the feature is unsed, so it's past time we killed it.	2020-09-01 16:12:15 +02:00
Ines Montani	45f46a5c85	Merge pull request #5993 from explosion/feature/disabled-components	2020-08-29 15:58:41 +02:00
Ines Montani	34146750d4	Use frozen list with custom errors We don't want to break backwards compatibility too much but we also want to provide the best possible UX	2020-08-29 15:20:11 +02:00
Ines Montani	6520d1a1df	Work around set order in Language.disabled	2020-08-29 12:58:22 +02:00
Ines Montani	e0b4984aa4	Make deprecated disable_pipes call into select_pipes	2020-08-29 12:08:46 +02:00
Ines Montani	15d73f4dc3	Make user-facing Language.disabled return list More consistent with all the other properties	2020-08-29 12:08:33 +02:00
Ines Montani	0687d7148e	Rename user-facing API	2020-08-28 21:04:02 +02:00
Ines Montani	6a999c9303	Remove outdated component attr check	2020-08-28 20:59:19 +02:00
Ines Montani	10da74382f	Raise if disabled components are removed before DisabledPipes.restore	2020-08-28 20:35:26 +02:00

1 2 3 4 5 ...

530 Commits