Commit Graph

4491 Commits

Author SHA1 Message Date
ines 3af281a334 Update test model name 2017-11-01 23:02:00 +01:00
Matthew Honnibal b30dd36179 Allow Tagger.add_label() before training 2017-11-01 21:49:24 +01:00
Matthew Honnibal eca41f0cf6 Fix filename conversion for conllu 2017-11-01 21:26:49 +01:00
Matthew Honnibal e237472cdc Fix tag and filename conversion for conllu 2017-11-01 21:25:33 +01:00
Matthew Honnibal b84d99b281 Revert tagger.add_label() changes, to fix model 2017-11-01 21:10:45 +01:00
Matthew Honnibal f5855e539b Fix tagger model loading 2017-11-01 20:42:36 +01:00
Matthew Honnibal 624644adfe Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-11-01 20:26:41 +01:00
ines 5f661a1b3a Remove tensorizer from pre-set pipe_names 2017-11-01 19:48:33 +01:00
Matthew Honnibal 190522efd3 Fix tagger when some tags aren't in Morphology 2017-11-01 19:27:49 +01:00
Matthew Honnibal e85e31cfbd Fix backprop of d_pad 2017-11-01 19:27:26 +01:00
Matthew Honnibal 759cc79185 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-11-01 19:00:19 +01:00
Matthew Honnibal 1ae40b50b4 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-11-01 17:07:02 +01:00
Matthew Honnibal 7ae1aacdb8 Fix add_label methods 2017-11-01 17:06:43 +01:00
ines 8c2260e18c Move span tests to /doc 2017-11-01 16:56:35 +01:00
Matthew Honnibal 2ef7b59eb0 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-11-01 16:51:41 +01:00
ines 1d1f91a041 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-11-01 16:49:44 +01:00
ines 9659391944 Update deprecated methods and add warnings 2017-11-01 16:49:42 +01:00
ines 260cb37224 Catch deprecation warning 2017-11-01 16:49:18 +01:00
ines 5914faafbb Fix .merge tests to not use deprecated API 2017-11-01 16:49:11 +01:00
ines 705a4e3e4a Fix formatting 2017-11-01 16:44:08 +01:00
Matthew Honnibal d17a12c71d Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-11-01 16:38:26 +01:00
Matthew Honnibal 9f9439667b Don't create low-data text classifier if no vectors 2017-11-01 16:34:09 +01:00
Matthew Honnibal e7a9174877 Add add_label methods to Tagger and TextCategorizer 2017-11-01 16:32:44 +01:00
ines 39e0586192 Add deprecated helper
Uses warning to show DeprecationWarning and custom stack trace
2017-11-01 16:32:36 +01:00
Matthew Honnibal a7bf38bf31 Remove misleading comment on util.get_cuda_stream() 2017-11-01 13:57:25 +01:00
Matthew Honnibal 273e96b63f Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-11-01 13:27:35 +01:00
Matthew Honnibal 9e0ebee81c Add Token.is_sent_start property, so can deprecate Token.sent_start 2017-11-01 13:27:14 +01:00
Matthew Honnibal 7e7116cdf7 Fix Doc.to_array when only one string attr provided 2017-11-01 13:26:43 +01:00
Matthew Honnibal 301fb2bb60 Implement Span.n_lefts and Span.n_rights 2017-11-01 13:25:12 +01:00
Matthew Honnibal c047498f87 Fix vectors test 2017-11-01 13:24:47 +01:00
ines 9a5e7c6fe2 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-11-01 13:14:45 +01:00
ines bfe17b7df1 Fix begin_training if get_gold_tuples is None 2017-11-01 13:14:31 +01:00
ines affd3404ab Remove old model command (now "vocab") 2017-11-01 13:14:03 +01:00
Matthew Honnibal fdb4b8e456 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-11-01 02:07:17 +01:00
Matthew Honnibal c48dd0e1d3 Fix vector pruning 2017-11-01 02:06:58 +01:00
ines 37e62ab0e2 Update vector meta in meta.json 2017-11-01 01:25:09 +01:00
ines 96b4aef0bf Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-11-01 01:10:53 +01:00
Matthew Honnibal 86eba61fae Fix token.vector when vectors are missing 2017-11-01 00:47:35 +01:00
ines 5683fd65ed Update docstrings 2017-11-01 00:42:39 +01:00
Matthew Honnibal 44bce8e53f Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-11-01 00:35:16 +01:00
Matthew Honnibal c16310d156 Update vectors with find method 2017-11-01 00:34:55 +01:00
Ines Montani d11659463b
Merge pull request #1152 from jimregan/develop-irish
[WIP] attempt a port from #1147
2017-11-01 00:23:43 +01:00
ines 2ad2f09d12 Update docstrings and simplify most_similar 2017-11-01 00:18:08 +01:00
Jim O'Regan 08b0bfd153 merge 2017-10-31 22:55:59 +00:00
Jim O'Regan 00ecfa5417 Ó, not O 2017-10-31 22:54:42 +00:00
ines ba2e6c8c6f Update docstrings and formatting 2017-10-31 23:23:34 +01:00
Matthew Honnibal 0de8d213a3
Merge pull request #1475 from explosion/feature/sm-vectors
Improve and simplify Vectors class
2017-10-31 22:59:50 +01:00
Ines Montani 25b1d6cd91
Fix syntax error 2017-10-31 22:36:03 +01:00
Matthew Honnibal 92dc127569 Fix test for Python 3 2017-10-31 22:21:55 +01:00
Jim O'Regan fe4b10346a replace example sentence until I get around to adding a punctuation.py 2017-10-31 20:24:53 +00:00
Matthew Honnibal c5799ecc7b Remove print statement 2017-10-31 21:12:33 +01:00
ines 7e424a1804 Don't copy exception dicts if not necessary and tidy up 2017-10-31 21:05:29 +01:00
Matthew Honnibal c390f2d745 Make it easier to pass explicit no-pruning to vocab 2017-10-31 20:14:47 +01:00
Ines Montani 06c25a8882
Remove comma that caused list to wrap in tuple!
Also removed extra dict wrappings for performance (we used to have them in there, but they should only really exist if copying the dict is absolutely necessary)
2017-10-31 20:13:16 +01:00
Matthew Honnibal d90a22afe6 Fix loading previous vectors models 2017-10-31 19:58:35 +01:00
Ines Montani 147448b65b
Add missing symbols 2017-10-31 19:34:45 +01:00
Matthew Honnibal 997a61557a Add vectors.n_keys property 2017-10-31 19:30:52 +01:00
Matthew Honnibal 8075726838 Restore vector usage in models 2017-10-31 19:21:17 +01:00
Matthew Honnibal 3659a807b0 Remove vector pruning arg from train CLI 2017-10-31 19:21:05 +01:00
Ines Montani 9b0de9fb43
Fix import of symbols (now nested one level lower) 2017-10-31 19:17:58 +01:00
Matthew Honnibal 59203a2e8a Move vector pruning command into spacy vocab cli tool 2017-10-31 19:10:01 +01:00
Matthew Honnibal 77d8f5de9a Revise and simplify Vectors class 2017-10-31 18:25:08 +01:00
Jim O'Regan d4a8160c36 change quotes 2017-10-31 15:15:44 +00:00
Jim O'Regan 34ca59691b no idea what is wrong here 2017-10-31 14:50:13 +00:00
Jim O'Regan 41dd29e48e merge 2017-10-31 14:07:45 +00:00
Matthew Honnibal cb5217012f Fix vector remapping 2017-10-31 11:40:46 +01:00
Matthew Honnibal 9c11ee4a1c WIP on vectors fixes 2017-10-31 11:22:56 +01:00
Matthew Honnibal ce876c551e Fix GPU usage 2017-10-31 02:33:34 +01:00
Matthew Honnibal 7698903617 Fix GPU usage 2017-10-31 02:33:16 +01:00
Matthew Honnibal 368fdb389a WIP on refactoring and fixing vectors 2017-10-31 02:00:26 +01:00
Matthew Honnibal 4e3006cec7 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-10-30 19:44:58 +01:00
Matthew Honnibal 4112a991ec Fix vector pruning 2017-10-30 19:44:40 +01:00
ines ec657c1ddc Update vocab docs and document Vocab.prune_vectors 2017-10-30 19:35:41 +01:00
ines 803e41bc66 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-10-30 18:39:51 +01:00
ines 8e02294241 Add vectors to Language.meta 2017-10-30 18:39:48 +01:00
ines abf8aa05d3 Populate --create-meta defaults from file if available
If meta.json is found in directory and user chooses to overwrite it, show existing data as defaults.
2017-10-30 18:39:38 +01:00
ines ce98fa7934 Fix formatting 2017-10-30 18:38:55 +01:00
ines 98c35d2585 Fix spacy vocab command 2017-10-30 18:38:41 +01:00
Matthew Honnibal e98451b5f7 Add -prune-vectors argument to spacy.cly.train 2017-10-30 18:00:10 +01:00
Matthew Honnibal e026b29ea9 Add prune_vectors method to Vocab 2017-10-30 17:59:43 +01:00
Explosion Bot d0cf12c8c7 Fix off-by-one error in vectors 2017-10-30 16:22:03 +01:00
Explosion Bot 05a1dd570e Fix vocab script 2017-10-30 16:19:22 +01:00
Explosion Bot b46bdce8d2 Add missing import 2017-10-30 16:18:10 +01:00
Explosion Bot 2d2cc294b4 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-10-30 16:15:05 +01:00
Explosion Bot 0fc1209421 Wire up new vocab command 2017-10-30 16:14:50 +01:00
Explosion Bot aa64031751 Fix clear_vectors() method on Vocab 2017-10-30 16:09:04 +01:00
Explosion Bot 7b56b2f04b Add Vocab.cfg attr, to hold stuff like oov probs 2017-10-30 16:08:50 +01:00
Explosion Bot ab5d5ed880 Fix vectors.add() 2017-10-30 16:08:09 +01:00
Explosion Bot 41d0f1665a Fix add_attrs for cluster 2017-10-30 16:07:50 +01:00
ines 5453821a9f Update NER annotation scheme
Add note on training data sources and include coarse-grained Wikipedia scheme
2017-10-30 13:53:49 +01:00
Explosion Bot 5ede7cec9b Improve Lexeme.set_attrs method 2017-10-30 11:49:11 +01:00
Explosion Bot 72aea8f105 Update vectors.add() to allow setting keys to rows 2017-10-30 10:03:08 +01:00
Matthew Honnibal c43cc5361d
Merge pull request #1467 from explosion/feature/better-parser
💫 Bug fixes to parser model (requires retraining)
2017-10-29 02:05:22 +02:00
ines 6c2d8d3b2a Use shortcuts-nightly.json to resolve model shortcuts 2017-10-29 01:28:31 +02:00
Matthew Honnibal a0c7dabb72 Fix bug in 8-token parser features 2017-10-28 23:01:35 +00:00
Matthew Honnibal b713d10d97 Switch to 13 features in parser 2017-10-28 23:01:14 +00:00
Matthew Honnibal 3b91097321 Whitespace 2017-10-28 17:05:11 +00:00
Matthew Honnibal 6ef72864fa Improve initialization for hidden layers 2017-10-28 17:05:01 +00:00
Matthew Honnibal 5414e2f14b Use missing features in parser 2017-10-28 16:45:54 +00:00
Matthew Honnibal df4803cc6d Add learned missing values for parser 2017-10-28 16:45:14 +00:00
Matthew Honnibal 64e4ff7c4b Merge 'tidy-up' changes into branch. Resolve conflicts 2017-10-28 13:16:06 +02:00
Explosion Bot fb0c96f39a Fix optimizer loading 2017-10-28 11:58:16 +02:00
Explosion Bot b22e42af7f Merge changes to parser and _ml 2017-10-28 11:52:10 +02:00
ines d96e72f656 Tidy up rest 2017-10-27 21:07:59 +02:00
ines a8e10f94e4 Tidy up Lexeme and update docs 2017-10-27 21:07:50 +02:00
ines ba5e646219 Tidy up pipeline 2017-10-27 20:29:08 +02:00
ines b4d226a3f1 Tidy up syntax 2017-10-27 19:45:57 +02:00
ines 5167a0cce2 Tidy up Vectors and docs 2017-10-27 19:45:19 +02:00
ines 7946464742 Remove spacy.tagger (now in pipeline) 2017-10-27 19:45:04 +02:00
ines 9c89e2cdef Remove unused syntax iterators (now in language data) 2017-10-27 18:09:53 +02:00
ines d2df81d907 Fix not implemented Span getters 2017-10-27 18:09:28 +02:00
ines 544a407b93 Tidy up Doc, Token and Span and add missing docs 2017-10-27 17:07:26 +02:00
ines a6135336f5 Tidy up gold 2017-10-27 17:02:55 +02:00
ines 6a0483b7aa Tidy up and document Doc, Token and Span 2017-10-27 15:41:45 +02:00
ines 1a559d4c95 Remove old, unused file 2017-10-27 15:34:35 +02:00
ines 91899d337b Tidy up language, lemmatizer and scorer 2017-10-27 14:40:14 +02:00
ines 778212efea Tidy up init and main 2017-10-27 14:39:51 +02:00
ines e33b7e0b3c Tidy up parser and ML 2017-10-27 14:39:30 +02:00
ines e3265998c0 Tidy up displaCy 2017-10-27 14:39:19 +02:00
ines ea4a41c8fb Tidy up util and helpers 2017-10-27 14:39:09 +02:00
ines d941fc3667 Tidy up CLI 2017-10-27 14:38:39 +02:00
Matthew Honnibal 531142a933 Merge remote-tracking branch 'origin/develop' into feature/better-parser 2017-10-27 12:34:48 +00:00
Matthew Honnibal 19a2b9bf27 Fix import of Optimizer 2017-10-27 12:33:42 +00:00
Matthew Honnibal 4d048e94d3 Add compat for thinc.neural.optimizers.Optimizer 2017-10-27 10:23:49 +00:00
Ines Montani 4033e70c71 Merge pull request #1461 from explosion/feature/disable-pipes
💫 Add Language.disable_pipes(), to temporarily edit pipeline and update code examples
2017-10-27 12:21:40 +02:00
Matthew Honnibal 75a637fa43 Remove redundant imports from _ml 2017-10-27 10:19:56 +00:00
Matthew Honnibal c9987cf131 Avoid use of numpy.tensordot 2017-10-27 10:18:36 +00:00
Matthew Honnibal f6fef30adc Remove dead code from spacy._ml 2017-10-27 10:16:41 +00:00
Matthew Honnibal b9616419e1 Add try/except around bz2 import 2017-10-27 01:18:05 +00:00
Matthew Honnibal 783c0c8795 Remove unnecessary bz2 import 2017-10-27 01:17:54 +00:00
Matthew Honnibal bb25bdcd92 Adjust call to scatter_add for the new version 2017-10-27 01:16:55 +00:00
Ines Montani 287a3ca256 Merge pull request #1466 from explosion/feature/rename-pipeline
💫 Clean up dead linear model code
2017-10-27 02:03:28 +02:00
ines 4eb5bd02e7 Update textcat pre-processing after to_array change 2017-10-27 00:32:12 +02:00
ines 2d6ec99884 Set 'model' as default model name to prevent meta.json errors 2017-10-26 16:12:23 +02:00
ines 9e372913e0 Remove old 'SP' condition in tag map 2017-10-26 16:11:57 +02:00
Matthew Honnibal c52671420c Remove old cfile import 2017-10-26 13:28:19 +02:00
Matthew Honnibal ea03f1ef64 Remove obsolete cfile code 2017-10-26 13:23:36 +02:00
Matthew Honnibal 90d1d9b230 Remove obsolete parser code 2017-10-26 13:22:45 +02:00
ines 6f78e29bed Add LAW entity label to glossary 2017-10-26 13:04:35 +02:00
ines 9bf78d5fb3 Update spacy.explain docs 2017-10-26 13:04:25 +02:00
Matthew Honnibal 33f8c58782 Remove obsolete parser.pyx 2017-10-26 12:42:05 +02:00
Matthew Honnibal a8abc47811 Rename BaseThincComponent --> Pipe 2017-10-26 12:40:40 +02:00
Matthew Honnibal b0f3ea2200 Fix names of pipeline components
NeuralDependencyParser --> DependencyParser
NeuralEntityRecognizer --> EntityRecognizer
TokenVectorEncoder     --> Tensorizer
NeuralLabeller         --> MultitaskObjective
2017-10-26 12:38:23 +02:00
Matthew Honnibal b6b4f1aaf7 Merge pull request #1462 from explosion/feature/vector-meta-data
💫 Add vector meta data to model meta.json on train/package and show in docs
2017-10-26 11:39:41 +02:00
Matthew Honnibal 35977bdbb9 Update better-parser branch with develop 2017-10-26 00:55:53 +00:00
Ines Montani 090bd00369 Merge pull request #1464 from mayukh18/develop_bengali_pronouns
added the bengali pronouns for v2.0
2017-10-25 21:55:25 +02:00
mayukh18 1bc07758fa added few bengali pronouns 2017-10-25 22:24:40 +05:30
ines de1e5f35d5 Merge branch 'develop' into feature/disable-pipes 2017-10-25 16:33:12 +02:00
ines 728b609bf9 Merge branch 'develop' into feature/vector-meta-data 2017-10-25 16:32:22 +02:00
ines c0b55ebdac Fix PhraseMatcher.__contains__ and add more tests 2017-10-25 16:31:11 +02:00
ines 91beacf5e3 Fix Matcher.__contains__ 2017-10-25 16:19:38 +02:00
ines 11e3f19764 Fix vectors data added after training (see #1457) 2017-10-25 16:08:26 +02:00
ines 057954695b Read pipeline and vector data off model in --generate-meta 2017-10-25 16:03:26 +02:00
ines 273e638183 Add vector data to model meta after training (see #1457) 2017-10-25 16:03:05 +02:00
ines 18aae423fb Remove import of non-existing function 2017-10-25 15:54:10 +02:00
ines 5117a7d24d Fix whitespace 2017-10-25 15:54:02 +02:00
ines 657a4d91bc Merge branch 'develop' into feature/disable-pipes 2017-10-25 15:19:05 +02:00
ines 1a722dac31 Merge branch 'develop' into feature/disable-pipes 2017-10-25 15:18:18 +02:00
ines 6a00de4f77 Fix check of unexpected pipe names in restore() 2017-10-25 14:56:35 +02:00
ines 7f03932477 Return self on __enter__ 2017-10-25 14:56:16 +02:00
Matthew Honnibal b5de768852 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-10-25 14:44:16 +02:00
Matthew Honnibal 094512fd47 Fix model-mark on regression test. 2017-10-25 14:44:00 +02:00
Matthew Honnibal e70f80f29e Add Language.disable_pipes() 2017-10-25 13:46:41 +02:00
Matthew Honnibal 075e8118ea Update from develop 2017-10-25 12:45:21 +02:00
ines 72497c8cb2 Remove comments and add TODO 2017-10-25 12:15:43 +02:00
ines 4d97efc3b5 Add missing docstrings 2017-10-25 12:10:16 +02:00
ines 1262aa0bf9 Implement PhraseMatcher.__contains__ 2017-10-25 12:10:04 +02:00
ines 9c733a8849 Implement PhraseMatcher.__len__ 2017-10-25 12:09:56 +02:00
ines 7eebeeaf85 Fix Matcher.__contains__ 2017-10-25 12:09:47 +02:00
ines 7bcec57462 Remove unused attribute 2017-10-25 12:08:54 +02:00
ines 0b1dcbac14 Remove unused function 2017-10-25 12:08:46 +02:00
ines 3484174e48 Add Language.path 2017-10-25 11:57:43 +02:00
Ines Montani d3bf488e16 Merge pull request #1171 from mollerhoj/support-danish
Improve basic support for Danish
2017-10-24 20:29:57 +02:00
Matthew Honnibal d9bb1e5de8 Increment version 2017-10-24 17:06:19 +02:00
Matthew Honnibal 908809d488 Update tests 2017-10-24 17:05:15 +02:00
Matthew Honnibal 66766c1454 Restore SP tag to English tag_map, until models migrate 2017-10-24 17:05:00 +02:00
Matthew Honnibal 30e67fa808 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-10-24 16:08:23 +02:00
Matthew Honnibal b0f6fd3f1d Disable tokenizer cache for special-cases. Fixes #1250 2017-10-24 16:08:05 +02:00
Matthew Honnibal 63f0bde749 Add test for #1250: Tokenizer cache clobbered special-case attrs 2017-10-24 16:07:18 +02:00
ines 8492d5be6d Always make lemmatizer return a list of lemmas, not a set 2017-10-24 16:00:56 +02:00
ines 95f866f99f Add lookup argument to Lemmatizer.load 2017-10-24 16:00:56 +02:00
ines 95f6174516 Remove tensorizer from model pipeline example in spacy package 2017-10-24 16:00:56 +02:00
ines 090aed940a Add test for currently failing span.as_doc case 2017-10-24 16:00:56 +02:00
ines 4ef81a9ebc Fix whitespace 2017-10-24 16:00:56 +02:00
Matthew Honnibal 18f1c1d0ba Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-10-24 14:29:43 +02:00
Matthew Honnibal 4bea65a1a8 Fix Issue #1450: Off-by-1 in * and ? matches
Patterns that end in variable-length operators e.g. * and ? now end on
the correct token. Previously, they were off by 1: the next token was
pulled into the match, even if that's where the pattern failed.
2017-10-24 14:26:27 +02:00
Matthew Honnibal 391d5ef0d1 Normalize imports in regression test 2017-10-24 14:25:49 +02:00
ines c55db0a4a1 Add example sentences for Japanese and Chinese (see #1107) 2017-10-24 13:02:24 +02:00
ines 66f8f9d4a0 Fix Japanese tokenizer
JapaneseTokenizer now returns a Doc, not individual words
2017-10-24 13:02:19 +02:00
Matthew Honnibal dd5b2d8fa3 Check for out-of-memory when calling calloc. Closes #1446 2017-10-24 12:40:47 +02:00
Matthew Honnibal b66b8f028b Fix #1375 -- out-of-bounds on token.nbor() 2017-10-24 12:10:39 +02:00
Matthew Honnibal a68d89a4f3 Add failing test for bug #1375 -- no out-of-bounds error for token.nbor() 2017-10-24 12:05:25 +02:00
Ines Montani facf77e541 Merge branch 'develop' into support-danish 2017-10-24 11:53:19 +02:00
Matthew Honnibal ccd2ab1a62 Merge pull request #1443 from ramananbalakrishnan/develop-get-lca-matrix
Add LCA matrix for spans and docs
2017-10-24 11:22:46 +02:00
Matthew Honnibal ef3e5a361b Merge pull request #1442 from explosion/feature/fix-sp
💫Fix SP tag, tweak Vectors.__init__, fix Morphology
2017-10-24 10:24:07 +02:00
Matthew Honnibal fdf25d10ba Merge pull request #1440 from ramananbalakrishnan/develop
Support single value for attribute list in doc.to_array
2017-10-24 10:23:12 +02:00
Matthew Honnibal e7556ff048 Fix non-maxout parser 2017-10-23 18:16:23 +02:00
ines a31f048b4d Fix formatting 2017-10-23 10:38:06 +02:00
Matthew Honnibal 490ad3eaf0 Check that empty strings are handled. Closes #1242 2017-10-21 00:52:14 +02:00
Matthew Honnibal 8f8bccecb9 Patch deserialisation for invalid loads, to avoid model failure 2017-10-21 00:51:42 +02:00