Commit Graph

363 Commits

Author SHA1 Message Date
ines 4046823699 Only check component in factories if string (see #1911) 2018-01-30 16:29:07 +01:00
ines ce10d320c4 Fix component check in self.factories (see #1911) 2018-01-30 16:09:37 +01:00
ines 8901814248 Improve error handling if pipeline component is not callable (resolves #1911)
Also add help message if user accidentally calls nlp.add_pipe() with a string of a built-in component name.
2018-01-30 15:43:03 +01:00
ines a31506e060 Fix off-by-one error in nlp.add_pipe(after=name) (fixes #1654) 2017-11-28 20:37:55 +01:00
Ines Montani 6362024cf8
Merge pull request #1645 from GreenRiverRUS/fix_default_meta
Fixed spaCy version string in default meta
2017-11-27 11:58:02 +00:00
Vadim Mazaev 59f03ab1d7 Fixed spacy version string in default meta 2017-11-26 23:02:07 +03:00
Matthew Honnibal 8fec7268eb Move string cleanup under a setting flag 2017-11-23 12:19:18 +00:00
Matthew Honnibal 5949777b12 Fix misleading multi-threading docstring 2017-11-23 12:18:59 +00:00
Roman Domrachev 61d28d03e4 Try again to do selective remove cache 2017-11-15 19:11:12 +03:00
Roman Domrachev 505c6a2f2f Completely cleanup tokenizer cache
Tokenizer cache can have be different keys than string

That modification can slow down tokenizer and need to be measured
2017-11-15 17:55:48 +03:00
Roman Domrachev a33d5a068d Try to hold origin data instead of restore it 2017-11-14 22:40:03 +03:00
Roman Domrachev 91e2fa6561 Clean all caches 2017-11-14 21:15:04 +03:00
Roman Domrachev 86ca434c93 Merge github.com:explosion/spaCy 2017-11-14 17:46:22 +03:00
Roman Domrachev a2745b0e84 StringStore now actually cleaned
Do not lose docs in ref tracking
2017-11-14 17:45:50 +03:00
Matthew Honnibal dd1678eab3
Edit comment 2017-11-11 18:37:08 +01:00
Roman Domrachev ee60a52ee7 Fix test imports and last batch cleanup 2017-11-11 11:32:16 +03:00
Roman Domrachev 4a6b094e09 Remove unused import 2017-11-11 03:13:05 +03:00
Roman Domrachev 3c600adf23 Try to fix StringStore clean up (see #1506) 2017-11-11 03:11:27 +03:00
Matthew Honnibal 45e0617e61 Allow Language.update to take unicode text and dict objects 2017-11-06 22:07:38 +01:00
Matthew Honnibal 5c85bf3791 Fix missing import 2017-11-06 15:06:27 +01:00
Matthew Honnibal 465adfee94 Remove unused resume_training method, and pass optimizer through 2017-11-06 14:26:00 +01:00
Matthew Honnibal 38109a0e4a Register SentenceSegmenter in Language.factories 2017-11-05 18:45:57 +01:00
Matthew Honnibal d185927998 Undo harmful pickling hacks on Language class 2017-11-04 23:07:03 +01:00
Matthew Honnibal 2bf21cbe29 Update model after optimising it instead of waiting 2017-11-03 20:20:01 +01:00
ines 5f661a1b3a Remove tensorizer from pre-set pipe_names 2017-11-01 19:48:33 +01:00
ines bfe17b7df1 Fix begin_training if get_gold_tuples is None 2017-11-01 13:14:31 +01:00
ines 37e62ab0e2 Update vector meta in meta.json 2017-11-01 01:25:09 +01:00
ines 8e02294241 Add vectors to Language.meta 2017-10-30 18:39:48 +01:00
ines d96e72f656 Tidy up rest 2017-10-27 21:07:59 +02:00
ines 91899d337b Tidy up language, lemmatizer and scorer 2017-10-27 14:40:14 +02:00
Ines Montani 4033e70c71 Merge pull request #1461 from explosion/feature/disable-pipes
💫 Add Language.disable_pipes(), to temporarily edit pipeline and update code examples
2017-10-27 12:21:40 +02:00
ines 2d6ec99884 Set 'model' as default model name to prevent meta.json errors 2017-10-26 16:12:23 +02:00
Matthew Honnibal 90d1d9b230 Remove obsolete parser code 2017-10-26 13:22:45 +02:00
Matthew Honnibal b0f3ea2200 Fix names of pipeline components
NeuralDependencyParser --> DependencyParser
NeuralEntityRecognizer --> EntityRecognizer
TokenVectorEncoder     --> Tensorizer
NeuralLabeller         --> MultitaskObjective
2017-10-26 12:38:23 +02:00
ines 1a722dac31 Merge branch 'develop' into feature/disable-pipes 2017-10-25 15:18:18 +02:00
ines 6a00de4f77 Fix check of unexpected pipe names in restore() 2017-10-25 14:56:35 +02:00
ines 7f03932477 Return self on __enter__ 2017-10-25 14:56:16 +02:00
Matthew Honnibal e70f80f29e Add Language.disable_pipes() 2017-10-25 13:46:41 +02:00
ines 3484174e48 Add Language.path 2017-10-25 11:57:43 +02:00
Matthew Honnibal 65bf5e85bd Improve piping in language.pipe 2017-10-18 21:46:12 +02:00
Matthew Honnibal e35a83d142 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-10-17 18:22:06 +02:00
Matthew Honnibal 1cc85a89ef Allow reasonably efficient pickling of Language class, using to_bytes() and from_bytes(). 2017-10-17 18:18:49 +02:00
Ines Montani afa67de7ee Merge pull request #1428 from roanuz/develop
Fix trailing whitespace and Language.from_disk overwrites
2017-10-17 16:29:15 +02:00
Anto Binish Kaspar 8f5b60c168 Fix Language.from_disk overwrites the meta.json file. 2017-10-17 17:15:32 +05:30
ines 8ca344712d Add Language.has_pipe method 2017-10-17 11:20:07 +02:00
Matthew Honnibal 2bc06e4b22 Bump rolling buffer size to 10k 2017-10-16 19:38:29 +02:00
Matthew Honnibal 5c14f3f033 Create a rolling buffer for the StringStore in Language.pipe() 2017-10-16 19:22:40 +02:00
Ines Montani 37aa523a8e Merge pull request #1408 from explosion/feature/dot-underscore
💫 Custom attributes via Doc._, Token._ and Span._
2017-10-11 18:35:56 +02:00
ines 9620c1a640 Add lemma_lookup to Language defaults 2017-10-11 13:26:05 +02:00
ines 67350fa496 Use better logic for auto-generating component name
Instances don't have __name__, so we try __class__.__name__ as well,
before giving up and defaulting to repr(component).
2017-10-10 04:23:05 +02:00
Matthew Honnibal 0384f08218 Trigger nonproj.deprojectivize as a postprocess 2017-10-07 02:00:47 +02:00
ines e43530269c Update docstrings 2017-10-07 01:04:50 +02:00
ines 2586b61b15 Fix formatting, tidy up and remove unused imports 2017-10-07 00:26:05 +02:00
ines 212c8f0711 Implement new Language methods and pipeline API 2017-10-07 00:25:54 +02:00
Matthew Honnibal 96da86b3e5 Add support for verbose flag to Language 2017-10-03 09:14:57 -05:00
Matthew Honnibal 4ae9ea7684 Remove unused argument in Language 2017-09-26 05:41:35 -05:00
Matthew Honnibal 8716ffe57d Serialize vocab last 2017-09-24 05:01:45 -05:00
Matthew Honnibal 5a7fd0fd36 Fix vector linkage 2017-09-22 20:11:52 -05:00
Matthew Honnibal 4348c479fc Merge pre-trained vectors and noshare patches 2017-09-22 20:07:28 -05:00
Matthew Honnibal 7dc61b3f43 Whitespace 2017-09-22 20:00:50 -05:00
Matthew Honnibal 20193371f5 Don't share CNN, to reduce complexities 2017-09-21 14:59:48 +02:00
Matthew Honnibal b832f89ff8 Add resume_training function 2017-09-20 19:15:20 -05:00
Matthew Honnibal c858927271 Copy vectors to GPU on begin training 2017-09-18 18:04:16 -05:00
Matthew Honnibal 43210abacc Resolve fine-tuning conflict 2017-09-17 05:30:04 -05:00
Matthew Honnibal e37a50a436 Pass documents to tensorizer, not 'features' 2017-09-16 12:46:36 -05:00
Matthew Honnibal 70da88a3a7 Update comment on Language.begin_training 2017-09-14 16:18:30 +02:00
Matthew Honnibal 78a5f842e9 Fix update when update_shared=False 2017-08-20 15:58:34 -05:00
Matthew Honnibal 8875590081 Add optimizer in Language.update if sgd=None 2017-08-20 14:42:07 +02:00
Matthew Honnibal a3c51a0355 Fix creation of pipeline 2017-08-19 21:58:57 +02:00
Matthew Honnibal 97aabafb5f Document as_tuples keyword arg of Language.pipe 2017-08-19 12:21:33 +02:00
Matthew Honnibal 11c31d285c Restore changes from nn-beam-parser 2017-08-18 22:26:12 +02:00
Matthew Honnibal 52c180ecf5 Revert "Merge branch 'develop' of https://github.com/explosion/spaCy into develop"
This reverts commit ea8de11ad5, reversing
changes made to 08e443e083.
2017-08-14 13:00:23 +02:00
Matthew Honnibal 4363b4aa4a Fix redundant tokvecs updates during update 2017-08-13 12:36:55 +02:00
Matthew Honnibal 0acce0521b Fix Language.update for pipeline 2017-08-06 14:13:03 +02:00
Matthew Honnibal 0eec7c9e9b Fix Language.evaluate 2017-08-06 02:18:31 +02:00
Matthew Honnibal cc19ea0e7c Add update_tensors flag to Language.update. Experimental, re #1182 2017-08-06 02:17:10 +02:00
Matthew Honnibal 2e00361522 Fix update when 0 docs 2017-08-01 22:10:17 +02:00
Matthew Honnibal 523b0df2c9 Update text classification model 2017-07-25 18:57:59 +02:00
Matthew Honnibal d8aa721664 Compute Language.meta with a property 2017-07-23 00:50:18 +02:00
Matthew Honnibal baa3d81c35 Add text categorizer to Language 2017-07-22 01:13:36 +02:00
Matthew Honnibal 836bfa2d0f Add factory for experimental SimilarityHook component 2017-06-05 15:40:22 +02:00
Matthew Honnibal 2479cde446 Support disable keyword in Language.__init__ 2017-06-05 13:13:07 +02:00
Matthew Honnibal 8f8f90b46b Disable labeller if not parsing 2017-06-04 20:18:54 -05:00
Matthew Honnibal 939e8ed567 Add lookup properties for components in Language 2017-06-04 15:52:09 -05:00
Matthew Honnibal 92ae36f84e Improve way noun chunks iterator is looked up 2017-06-04 21:53:39 +02:00
Matthew Honnibal 21eef90dbc Support specifying which GPU 2017-06-03 16:10:23 -05:00
Matthew Honnibal fea1144e6d Set max batch size in evaluate 2017-06-03 13:31:33 -05:00
ines a3e4f91f4a Only load vocab if it exists 2017-06-01 14:38:35 +02:00
Matthew Honnibal 33e5ec737f Fix to/from disk methods 2017-05-31 13:43:10 +02:00
Matthew Honnibal 1e6df0a2a1 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-05-29 14:30:12 -05:00
ines 6145fe6a93 Catch all kwargs on Language 2017-05-29 20:43:48 +02:00
Matthew Honnibal 9c9ee24411 Fix broken lambda scoping in Python 2 2017-05-29 13:23:28 -05:00
Matthew Honnibal aa4c33914b Work on serialization 2017-05-29 08:40:45 -05:00
Matthew Honnibal 7b06bb896e Fix for serialization 2017-05-29 13:42:55 +02:00
Matthew Honnibal 74235587ef Fix to serialization 2017-05-29 13:40:31 +02:00
Matthew Honnibal 59f355d525 Fixes for serialization 2017-05-29 13:38:20 +02:00
Matthew Honnibal ff26aa6c37 Work on to/from bytes/disk serialization methods 2017-05-29 11:45:45 +02:00
Matthew Honnibal 8a24c60c1e Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-05-28 08:12:05 -05:00
Matthew Honnibal bc97bc292c Fix __call__ method 2017-05-28 08:11:58 -05:00
Matthew Honnibal b082f76494 Randomize pipeline order during training 2017-05-27 18:32:21 -05:00
Matthew Honnibal 73a643d32a Don't randomise pipeline for training, and don't update if no gradient 2017-05-27 08:20:13 -05:00
Matthew Honnibal 8af3100143 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-05-26 11:31:41 -05:00
ines 353f0ef8d7 Use disable argument (list) for serialization 2017-05-26 12:33:54 +02:00
Matthew Honnibal dbf2a4cf57 Update all models on each epoch 2017-05-25 19:46:56 -05:00
Matthew Honnibal 82b11b0320 Remove print statement 2017-05-25 17:15:59 -05:00
Matthew Honnibal f403c2cd5f Add env opts for optimizer 2017-05-25 11:19:26 -05:00
Matthew Honnibal 8500d9b1da Only train one task per iter, holding grads 2017-05-25 06:47:42 -05:00
Matthew Honnibal e6cc927ab1 Rearrange multi-task learning 2017-05-24 20:10:54 -05:00
Matthew Honnibal 9adfe9e8fc Don't hold gradient updates in language -- let the parser decide how to batch the updates. 2017-05-23 04:29:10 -05:00
Matthew Honnibal 3959d778ac Revert "Revert "WIP on improving parser efficiency""
This reverts commit 532afef4a8.
2017-05-23 03:06:53 -05:00
Matthew Honnibal 532afef4a8 Revert "WIP on improving parser efficiency"
This reverts commit bdaac7ab44.
2017-05-23 03:05:25 -05:00
Matthew Honnibal bdaac7ab44 WIP on improving parser efficiency 2017-05-23 02:59:31 -05:00
ines 54f04a9fe0 Update API docs with changes in spacy.gold and spacy.language 2017-05-22 12:29:30 +02:00
Matthew Honnibal 9262fc4829 Fix syntax error 2017-05-22 05:14:59 -05:00
Matthew Honnibal 2a5eb9f61e Make nonproj methods top-level functions, instead of class methods 2017-05-22 04:51:08 -05:00
Matthew Honnibal 5738d373d5 Add deprojectivize to pipeline 2017-05-22 04:51:08 -05:00
Matthew Honnibal 8d1e64be69 Add experimental NeuralLabeller 2017-05-22 04:51:08 -05:00
Matthew Honnibal 5db89053aa Merge docstrings 2017-05-21 13:46:23 -05:00
Matthew Honnibal 432b3499b3 Fix memory leak 2017-05-21 13:38:46 -05:00
Matthew Honnibal 4c9202249d Refactor training, to fix memory leak 2017-05-21 09:07:06 -05:00
ines d82ae9a585 Change "function" to "callable" in docs 2017-05-21 13:17:40 +02:00
Matthew Honnibal 3b7c108246 Pass tokvecs through as a list, instead of concatenated. Also fix padding 2017-05-20 13:23:32 -05:00
Matthew Honnibal 66ea9aebe7 Remove the state argument from Language 2017-05-19 13:25:42 -05:00
ines 2c8c9dc0c9 Update docstrings and API docs for Language 2017-05-19 18:47:24 +02:00
ines d42bc16868 Update docstrings and API docs for Language class 2017-05-18 23:57:38 +02:00
Matthew Honnibal c2c825127a Fix use_params and pipe methods 2017-05-18 08:30:59 -05:00
Matthew Honnibal 2713041571 Fix GPU usage in Language 2017-05-18 04:25:19 -05:00
Matthew Honnibal 793430aa7a Get spaCy train command working with neural network
* Integrate models into pipeline
* Add basic serialization (maybe incorrect)
* Fix pickle on vocab
2017-05-17 12:04:50 +02:00
Matthew Honnibal 8cf097ca88 Redesign training to integrate NN components
* Obsolete .parser, .entity etc names in favour of .pipeline
* Components no longer create models on initialization
* Models created by loading method (from_disk(), from_bytes() etc), or
    .begin_training()
* Add .predict(), .set_annotations() methods in components
* Pass state through pipeline, to allow components to share information
    more flexibly.
2017-05-16 16:17:30 +02:00
Matthew Honnibal 5211645af3 Get data flowing through pipeline. Needs redesign 2017-05-16 11:21:59 +02:00
Matthew Honnibal a9edb3aa1d Improve integration of NN parser, to support unified training API 2017-05-15 21:53:27 +02:00
Matthew Honnibal 9e167b7bb6 Strip serializer from code 2017-05-09 17:28:50 +02:00
ines ea5fa46475 Import LEX_ATTRS from lang.lex_attrs 2017-05-09 00:58:10 +02:00
ines 6eb6306843 Fix language data imports 2017-05-08 23:58:31 +02:00
Matthew Honnibal d0e19267e8 Create directory if missing in save_to_directory 2017-04-23 21:24:43 +02:00
Matthew Honnibal 4d2a659c52 Fix json dump for Python3 2017-04-23 17:05:53 +02:00
ines ddd5194088 Update Language docs and docstrings 2017-04-17 01:52:13 +02:00
ines f62b740961 Use compat.json_dumps 2017-04-17 01:46:14 +02:00
ines 8e83f8e2fa Update docstrings 2017-04-17 01:40:26 +02:00
ines e2299dc389 Ensure path in save_to_directory 2017-04-17 01:40:14 +02:00
Matthew Honnibal 4efd6fb9d6 Fix training 2017-04-16 15:28:27 -05:00
Matthew Honnibal 89a4f262fc Fix training methods 2017-04-16 13:00:37 -05:00
ines c05ec4b89a Add compat functions and remove old workarounds
Add ensure_path util function to handle checking instance of path
2017-04-15 12:11:16 +02:00
ines d24589aa72 Clean up imports, unused code, whitespace, docstrings 2017-04-15 12:05:47 +02:00
ines 561f2a3eb4 Use consistent formatting for docstrings 2017-04-15 11:59:21 +02:00
Matthew Honnibal 33ba5066eb Refactor Language.end_training, making new save_to_directory method 2017-04-14 23:51:24 +02:00
oeg 010293fb2f fix(typo): Fixes typo in method calling PseudoProjectivity.deprojectivize, failing with new train cli 2017-04-06 17:33:15 +02:00
Matthew Honnibal 47a3ef06a6 Unhack deprojetivization, moving it into pipeline
Previously the deprojectivize() call was attached to the transition
system, and only called for German. Instead it should be a separate
process, called after the parser. This makes it available for any
language. Closes #898.
2017-03-31 12:31:50 +02:00
Matthew Honnibal 83ba6c247c Fix init of Language without model 2017-03-26 16:46:00 +02:00
Raphaël Bournhonesque f332bf05be Remove unused import statements 2017-03-21 21:08:54 +01:00