Commit Graph

11471 Commits

Author SHA1 Message Date
svlandeg eac12cbb77 make dropout in embed layers configurable 2020-06-03 11:50:16 +02:00
svlandeg e91485dfc4 add discard_oversize parameter, move optimizer to training subsection 2020-06-03 10:04:16 +02:00
svlandeg 03c58b488c prevent infinite loop, custom warning 2020-06-03 10:00:21 +02:00
svlandeg 6504b7f161 Merge remote-tracking branch 'upstream/develop' into feature/pretrain-config 2020-06-03 08:30:16 +02:00
Matthew Honnibal f74784575c
Merge pull request #5533 from svlandeg/bugfix/minibatch-oversize
add oversize examples before StopIteration returns
2020-06-02 22:54:38 +02:00
svlandeg c5ac382f0a fix name clash 2020-06-02 22:24:57 +02:00
svlandeg 2bf5111ecf additional test with discard_oversize=False 2020-06-02 22:09:37 +02:00
svlandeg aa6271b16c extending algorithm to deal better with edge cases 2020-06-02 22:05:08 +02:00
svlandeg f2e162fc60 it's only oversized if the tolerance level is also exceeded 2020-06-02 19:59:04 +02:00
svlandeg ef834b4cd7 fix comments 2020-06-02 19:50:44 +02:00
svlandeg 6208d322d3 slightly more challenging unit test 2020-06-02 19:47:30 +02:00
svlandeg 6651fafd5c using overflow buffer for examples within the tolerance margin 2020-06-02 19:43:39 +02:00
svlandeg 85b0597ed5 add test for minibatch util 2020-06-02 18:26:21 +02:00
svlandeg 5b350a6c99 bugfix of the bugfix 2020-06-02 17:49:33 +02:00
svlandeg fdfd822936 rewrite minibatch_by_words function 2020-06-02 15:22:54 +02:00
svlandeg ec52e7f886 add oversize examples before StopIteration returns 2020-06-02 13:21:55 +02:00
svlandeg e0f9f448f1 remove Tensorizer 2020-06-01 23:38:48 +02:00
Ines Montani b5ae2edcba
Merge pull request #5516 from explosion/feature/improve-model-version-deps 2020-05-31 12:54:01 +02:00
Matthw Honnibal cd5f748e09 Add onto-joint experiment file 2020-05-30 20:27:47 +02:00
Matthw Honnibal d1c2e88d0f Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2020-05-30 19:23:12 +02:00
Ines Montani dc186afdc5 Add warning 2020-05-30 15:34:54 +02:00
Ines Montani 2bdf787417 Merge branch 'develop' into feature/improve-model-version-deps 2020-05-30 15:20:20 +02:00
Ines Montani 368182776e Tidy up dependencies 2020-05-30 15:19:53 +02:00
Ines Montani b7aff6020c Make functions more general purpose and update docstrings and tests 2020-05-30 15:18:53 +02:00
Ines Montani a7e370bcbf Don't override spaCy version 2020-05-30 15:03:18 +02:00
Ines Montani e47e5a4b10 Use more sophisticated version parsing logic 2020-05-30 15:01:58 +02:00
Ines Montani bed62991ad Tidy up requirements 2020-05-30 14:59:55 +02:00
Ines Montani 4fd087572a WIP: improve model version deps 2020-05-28 12:51:37 +02:00
Matthw Honnibal 58750b06f8 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2020-05-27 22:18:36 +02:00
Matthew Honnibal a44d51a3d8
Merge pull request #5496 from explosion/docs/unicode-str
unicode -> str consistency
2020-05-26 10:30:37 +02:00
Ines Montani 1a15896ba9 unicode -> str consistency [ci skip] 2020-05-24 18:51:10 +02:00
Ines Montani 262d306eaa unicode -> str consistency 2020-05-24 17:23:00 +02:00
Ines Montani 5d3806e059 unicode -> str consistency 2020-05-24 17:20:58 +02:00
Ines Montani cf156ed2f4
Merge pull request #5495 from explosion/fix/simplify-is-package 2020-05-24 15:42:55 +02:00
Ines Montani 387c7aba15 Update test 2020-05-24 14:55:16 +02:00
Ines Montani f9786d765e Simplify is_package check 2020-05-24 14:48:56 +02:00
Ines Montani 15d3a0ac3a
Merge pull request #5491 from explosion/chore/rename-pipe-analysis 2020-05-23 12:41:54 +02:00
Matthw Honnibal 2d9de8684d Support use_pytorch_for_gpu_memory config 2020-05-22 23:10:40 +02:00
Ines Montani 4465cad6c5 Rename spacy.analysis to spacy.pipe_analysis 2020-05-22 17:42:06 +02:00
Ines Montani 25d6ed3fb8
Merge pull request #5489 from explosion/feature/connected-components 2020-05-22 17:40:11 +02:00
Ines Montani 841c05b47b
Merge pull request #5490 from explosion/fix/remove-jsonschema 2020-05-22 17:39:54 +02:00
Ines Montani 569a65b60e Auto-format 2020-05-22 16:55:42 +02:00
Ines Montani d844528c5f Add test for is_compatible_model 2020-05-22 16:55:15 +02:00
Ines Montani 12b7be1d98 Remove jsonschema from dependencies 2020-05-22 16:49:26 +02:00
Matthew Honnibal 7a73a9dcf6
Merge pull request #5488 from explosion/feature/better-model-compat
Better model compatibility and validation
2020-05-22 16:44:29 +02:00
Matthew Honnibal f7f6df7275 Move to spacy.analysis 2020-05-22 16:43:18 +02:00
Matthew Honnibal 78d79d94ce Guess set_annotations=True in nlp.update
During `nlp.update`, components can be passed a boolean set_annotations
to indicate whether they should assign annotations to the `Doc`. This
needs to be called if downstream components expect to use the
annotations during training, e.g. if we wanted to use tagger features in
the parser.

Components can specify their assignments and requirements, so we can
figure out which components have these inter-dependencies. After
figuring this out, we can guess whether to pass set_annotations=True.

We could also call set_annotations=True always, or even just have this
as the only behaviour. The downside of this is that it would require the
`Doc` objects to be created afresh to avoid problematic modifications.
One approach would be to make a fresh copy of the `Doc` objects within
`nlp.update()`, so that we can write to the objects without any
problems. If we do that, we can drop this logic and also drop the
`set_annotations` mechanism. I would be fine with that approach,
although it runs the risk of introducing some performance overhead, and
we'll have to take care to copy all extension attributes etc.
2020-05-22 15:55:45 +02:00
Ines Montani 6e6db6afb6 Better model compatibility and validation 2020-05-22 15:42:46 +02:00
Matthw Honnibal 25b51f4fc8 Set version to v3.0.0.dev9 2020-05-21 20:47:52 +02:00
Matthw Honnibal bc94fdabd0 Fix begin_training 2020-05-21 20:46:21 +02:00