Commit Graph

11457 Commits

Author SHA1 Message Date
svlandeg 5b350a6c99 bugfix of the bugfix 2020-06-02 17:49:33 +02:00
svlandeg fdfd822936 rewrite minibatch_by_words function 2020-06-02 15:22:54 +02:00
svlandeg ec52e7f886 add oversize examples before StopIteration returns 2020-06-02 13:21:55 +02:00
Ines Montani b5ae2edcba
Merge pull request #5516 from explosion/feature/improve-model-version-deps 2020-05-31 12:54:01 +02:00
Matthw Honnibal cd5f748e09 Add onto-joint experiment file 2020-05-30 20:27:47 +02:00
Matthw Honnibal d1c2e88d0f Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2020-05-30 19:23:12 +02:00
Ines Montani dc186afdc5 Add warning 2020-05-30 15:34:54 +02:00
Ines Montani 2bdf787417 Merge branch 'develop' into feature/improve-model-version-deps 2020-05-30 15:20:20 +02:00
Ines Montani 368182776e Tidy up dependencies 2020-05-30 15:19:53 +02:00
Ines Montani b7aff6020c Make functions more general purpose and update docstrings and tests 2020-05-30 15:18:53 +02:00
Ines Montani a7e370bcbf Don't override spaCy version 2020-05-30 15:03:18 +02:00
Ines Montani e47e5a4b10 Use more sophisticated version parsing logic 2020-05-30 15:01:58 +02:00
Ines Montani bed62991ad Tidy up requirements 2020-05-30 14:59:55 +02:00
Ines Montani 4fd087572a WIP: improve model version deps 2020-05-28 12:51:37 +02:00
Matthw Honnibal 58750b06f8 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2020-05-27 22:18:36 +02:00
Matthew Honnibal a44d51a3d8
Merge pull request #5496 from explosion/docs/unicode-str
unicode -> str consistency
2020-05-26 10:30:37 +02:00
Ines Montani 1a15896ba9 unicode -> str consistency [ci skip] 2020-05-24 18:51:10 +02:00
Ines Montani 262d306eaa unicode -> str consistency 2020-05-24 17:23:00 +02:00
Ines Montani 5d3806e059 unicode -> str consistency 2020-05-24 17:20:58 +02:00
Ines Montani cf156ed2f4
Merge pull request #5495 from explosion/fix/simplify-is-package 2020-05-24 15:42:55 +02:00
Ines Montani 387c7aba15 Update test 2020-05-24 14:55:16 +02:00
Ines Montani f9786d765e Simplify is_package check 2020-05-24 14:48:56 +02:00
Ines Montani 15d3a0ac3a
Merge pull request #5491 from explosion/chore/rename-pipe-analysis 2020-05-23 12:41:54 +02:00
Matthw Honnibal 2d9de8684d Support use_pytorch_for_gpu_memory config 2020-05-22 23:10:40 +02:00
Ines Montani 4465cad6c5 Rename spacy.analysis to spacy.pipe_analysis 2020-05-22 17:42:06 +02:00
Ines Montani 25d6ed3fb8
Merge pull request #5489 from explosion/feature/connected-components 2020-05-22 17:40:11 +02:00
Ines Montani 841c05b47b
Merge pull request #5490 from explosion/fix/remove-jsonschema 2020-05-22 17:39:54 +02:00
Ines Montani 569a65b60e Auto-format 2020-05-22 16:55:42 +02:00
Ines Montani d844528c5f Add test for is_compatible_model 2020-05-22 16:55:15 +02:00
Ines Montani 12b7be1d98 Remove jsonschema from dependencies 2020-05-22 16:49:26 +02:00
Matthew Honnibal 7a73a9dcf6
Merge pull request #5488 from explosion/feature/better-model-compat
Better model compatibility and validation
2020-05-22 16:44:29 +02:00
Matthew Honnibal f7f6df7275 Move to spacy.analysis 2020-05-22 16:43:18 +02:00
Matthew Honnibal 78d79d94ce Guess set_annotations=True in nlp.update
During `nlp.update`, components can be passed a boolean set_annotations
to indicate whether they should assign annotations to the `Doc`. This
needs to be called if downstream components expect to use the
annotations during training, e.g. if we wanted to use tagger features in
the parser.

Components can specify their assignments and requirements, so we can
figure out which components have these inter-dependencies. After
figuring this out, we can guess whether to pass set_annotations=True.

We could also call set_annotations=True always, or even just have this
as the only behaviour. The downside of this is that it would require the
`Doc` objects to be created afresh to avoid problematic modifications.
One approach would be to make a fresh copy of the `Doc` objects within
`nlp.update()`, so that we can write to the objects without any
problems. If we do that, we can drop this logic and also drop the
`set_annotations` mechanism. I would be fine with that approach,
although it runs the risk of introducing some performance overhead, and
we'll have to take care to copy all extension attributes etc.
2020-05-22 15:55:45 +02:00
Ines Montani 6e6db6afb6 Better model compatibility and validation 2020-05-22 15:42:46 +02:00
Matthw Honnibal 25b51f4fc8 Set version to v3.0.0.dev9 2020-05-21 20:47:52 +02:00
Matthw Honnibal bc94fdabd0 Fix begin_training 2020-05-21 20:46:21 +02:00
Matthw Honnibal d507ac28d8 Fix shape inference 2020-05-21 20:46:10 +02:00
Matthw Honnibal df87c32a40 Pass smaller doc sample into model initialize 2020-05-21 20:17:24 +02:00
Matthw Honnibal 3b5cfec1fc Tweak memory management in train_from_config 2020-05-21 19:32:04 +02:00
Matthw Honnibal f075655deb Fix shape inference in begin_training 2020-05-21 19:26:29 +02:00
Matthw Honnibal 1729165e90 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2020-05-21 19:11:08 +02:00
Matthew Honnibal e6c4c1a507
Merge pull request #5468 from adrianeboyd/feature/cli-conllu-misc-ner
Improve handling of NER in CoNLL-U MISC
2020-05-21 16:39:46 +02:00
Adriane Boyd 4b229bfc22 Improve handling of NER in CoNLL-U MISC 2020-05-20 18:48:51 +02:00
Matthew Honnibal 609c0ba557
Fix accidentally quadratic runtime in Example.split_sents (#5464)
* Tidy up train-from-config a bit

* Fix accidentally quadratic perf in TokenAnnotation.brackets

When we're reading in the gold data, we had a nested loop where
we looped over the brackets for each token, looking for brackets
that start on that word. This is accidentally quadratic, because
we have one bracket per word (for the POS tags). So we had
an O(N**2) behaviour here that ended up being pretty slow.

To solve this I'm indexing the brackets by their starting word
on the TokenAnnotations object, and having a property to provide
the previous view.

* Fixes
2020-05-20 18:48:18 +02:00
Matthw Honnibal 60e8da4813 Tidy up train-from-config a bit 2020-05-20 12:56:27 +02:00
Matthw Honnibal fda7355508 Fix train-from-config 2020-05-20 12:30:21 +02:00
Matthw Honnibal 24efd54a42 Merge from develop 2020-05-20 12:27:31 +02:00
Sofie Van Landeghem 7f5715a081
Various fixes to NEL functionality, Example class etc (#5460)
* setting KB in the EL constructor, similar to how the model is passed on

* removing wikipedia example files - moved to projects

* throw an error when nlp.update is called with 2 positional arguments

* rewriting the config logic in create pipe to accomodate for other objects (e.g. KB) in the config

* update config files with new parameters

* avoid training pipeline components that don't have a model (like sentencizer)

* various small fixes + UX improvements

* small fixes

* set thinc to 8.0.0a9 everywhere

* remove outdated comment
2020-05-20 11:41:12 +02:00
Matthew Honnibal 664a3603b0 Set version to v3.0.0.dev8 2020-05-19 17:15:39 +02:00
Matthew Honnibal a2830c3ef5 Use thinc 8.0.0a9 2020-05-19 16:23:11 +02:00