Commit Graph

3419 Commits

Author SHA1 Message Date
Matthew Honnibal 1fa2bfb600 Add model_to_bytes and model_from_bytes helpers. Probably belong in thinc. 2017-05-29 09:27:04 +02:00
Matthew Honnibal 6dad4117ad Work on serialization for models 2017-05-29 01:37:57 +02:00
ines 7b1ddcc04d Add test for vocab serialization 2017-05-29 01:09:52 +02:00
ines 00b2094dc3 Fix typos, long integers and tests 2017-05-29 01:09:52 +02:00
ines 804dbb8d25 Add StringStore test for API docs 2017-05-29 01:09:52 +02:00
Matthew Honnibal 6cd5730ee7 Fix lex struct setters for strings 2017-05-29 01:05:09 +02:00
Matthew Honnibal 2edd96ce47 Draft Vocab to/from disk/bytes 2017-05-28 23:34:12 +02:00
Matthew Honnibal 4ddff020c3 Fix compile error 2017-05-28 23:30:40 +02:00
Matthew Honnibal 6d3caeadd2 Fix type check for long 2017-05-28 23:22:45 +02:00
Matthew Honnibal 92dbf28c1e Hack a fixture in the vectors tests, for xfail 2017-05-28 20:28:32 +02:00
Matthew Honnibal 9239f06ed3 Fix german noun chunks iterator 2017-05-28 20:13:03 +02:00
Matthew Honnibal fd9b6722a9 Fix noun chunks iterator for new stringstore 2017-05-28 20:12:10 +02:00
ines 414193e9ba Update docs to reflect StringStore changes 2017-05-28 18:19:11 +02:00
Matthew Honnibal 7996d21717 Fixes for new StringStore 2017-05-28 11:09:27 -05:00
Matthew Honnibal 8a24c60c1e Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-05-28 08:12:05 -05:00
Matthew Honnibal bc97bc292c Fix __call__ method 2017-05-28 08:11:58 -05:00
Matthew Honnibal 5cf47b847b Handle iob with no tag in converter 2017-05-28 08:11:39 -05:00
Matthew Honnibal fe11564b8e Finish stringstore change. Also xfail vectors tests 2017-05-28 15:10:22 +02:00
Matthew Honnibal b007a2b0d3 Update stringstore tests 2017-05-28 14:08:09 +02:00
Matthew Honnibal 84e66ca6d4 WIP on stringstore change. 27 failures 2017-05-28 14:06:40 +02:00
Matthew Honnibal fe4a746300 Accomodate symbols in new string scheme 2017-05-28 13:03:16 +02:00
Matthew Honnibal f51e6a6c16 Adjust lexeme sizing for attr_t being 64 bit 2017-05-28 12:51:09 +02:00
Matthew Honnibal a5606c3eda Work on changing StringStore to return hashes. 2017-05-28 12:36:27 +02:00
Matthew Honnibal 39293ab2ee Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-05-28 11:46:57 +02:00
Matthew Honnibal dd052572d4 Update arc eager for SBD changes 2017-05-28 11:46:51 +02:00
Matthew Honnibal 3ea98e2043 Remove vector member from lexeme 2017-05-28 11:46:24 +02:00
Matthew Honnibal 2445707f3c Re-delegate vectors to vocab 2017-05-28 11:46:10 +02:00
Matthew Honnibal 6863d01361 Remove vectors from lexeme 2017-05-28 11:45:48 +02:00
Matthew Honnibal 15f6efc127 Remove vectors from vocab 2017-05-28 11:45:32 +02:00
Matthew Honnibal c1263a844b Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-05-27 18:32:57 -05:00
Matthew Honnibal 9e711c3476 Divide d_loss by batch size 2017-05-27 18:32:46 -05:00
Matthew Honnibal b082f76494 Randomize pipeline order during training 2017-05-27 18:32:21 -05:00
Matthew Honnibal a1d4c97fb7 Improve correctness of minibatching 2017-05-27 17:59:00 -05:00
ines 84189c1cab Add 'xx' language ID for multi-language support
Allows models to specify their language ID as 'xx'.
2017-05-28 00:58:59 +02:00
ines 33e332e67c Remove unused export 2017-05-28 00:57:59 +02:00
ines c1983621fb Update util functions for model loading 2017-05-28 00:22:40 +02:00
ines c8543c8237 Fix formatting and docstrings and remove deprecated function 2017-05-28 00:22:40 +02:00
Matthew Honnibal 49235017bf Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-05-27 16:34:28 -05:00
Matthew Honnibal 7ebd26b8aa Use ordered dict to specify transitions 2017-05-27 15:52:20 -05:00
Matthew Honnibal 3eea5383a1 Add move_names property to parser 2017-05-27 15:51:55 -05:00
Matthew Honnibal 8de9829f09 Don't overwrite model in initialization, when loading 2017-05-27 15:50:40 -05:00
Matthew Honnibal 99316fa631 Use ordered dict to specify actions 2017-05-27 15:50:21 -05:00
Matthew Honnibal 655ca58c16 Clarifying change to StateC.clone 2017-05-27 15:49:37 -05:00
Matthew Honnibal 5e4312feed Evaluate loaded class, to ensure save/load works 2017-05-27 15:47:02 -05:00
Matthew Honnibal 34bbad8e0e Add __reduce__ methods on parser subclasses. Fixes pickling. 2017-05-27 15:46:06 -05:00
Matthew Honnibal 7cc9c3e9a6 Fix convert CLI 2017-05-27 15:44:42 -05:00
ines 1203959625 Add pipeline setting to meta.json generator 2017-05-27 20:02:01 +02:00
ines 086a06e7d7 Fix CLI docstrings and add command as first argument
Workaround for Plac
2017-05-27 20:01:46 +02:00
ines a8e58e04ef Add symbols class to punctuation rules to handle emoji (see #1088)
Currently doesn't work for Hungarian, because of conflicts with the
custom punctuation rules. Also doesn't take multi-character emoji like
👩🏽‍💻 into account.
2017-05-27 17:57:10 +02:00
Matthew Honnibal dc07d72d80 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-05-27 08:20:40 -05:00
Matthew Honnibal de13fe0305 Remove length cap on sentences 2017-05-27 08:20:32 -05:00
Matthew Honnibal 73a643d32a Don't randomise pipeline for training, and don't update if no gradient 2017-05-27 08:20:13 -05:00
Matthew Honnibal 3d22fcaf0b Return None from parser if there are no annotations 2017-05-26 14:02:59 -05:00
Matthew Honnibal d06f235fc9 Fix conflict on convert.py 2017-05-26 11:33:29 -05:00
Matthew Honnibal 2e587c6417 Export iob_to_biluo utility 2017-05-26 11:32:55 -05:00
Matthew Honnibal 2b3b937a04 Fix converter CLI 2017-05-26 11:32:41 -05:00
Matthew Honnibal 5a87bcf35f Fix converters 2017-05-26 11:32:34 -05:00
Matthew Honnibal 8af3100143 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-05-26 11:31:41 -05:00
Matthew Honnibal 3d5a536eaa Improve efficiency of parser batching 2017-05-26 11:31:23 -05:00
Matthew Honnibal daac3e3573 Always shuffle gold data, and support length cap 2017-05-26 11:30:52 -05:00
Matthew Honnibal d65f99a720 Improve model saving in train script 2017-05-26 05:52:09 -05:00
ines 51882c4984 Fix formatting 2017-05-26 12:37:45 +02:00
ines 353f0ef8d7 Use disable argument (list) for serialization 2017-05-26 12:33:54 +02:00
Matthew Honnibal 22d7b448a5 Fix convert command 2017-05-25 19:47:12 -05:00
Matthew Honnibal dbf2a4cf57 Update all models on each epoch 2017-05-25 19:46:56 -05:00
Matthew Honnibal faff1c23fb Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-05-25 17:16:10 -05:00
Matthew Honnibal 82b11b0320 Remove print statement 2017-05-25 17:15:59 -05:00
Matthew Honnibal 80cf42e33b Fix compounding and decaying utils 2017-05-25 17:15:39 -05:00
Matthew Honnibal df8015f05d Tweaks to train script 2017-05-25 17:15:24 -05:00
Matthew Honnibal 3a6e59cc53 Add minibatch function in spacy.gold 2017-05-25 17:15:09 -05:00
Matthew Honnibal 702fe74a4d Clean up spacy.cli.train 2017-05-25 16:16:30 -05:00
Matthew Honnibal b9cea9cd93 Add compounding and decaying functions 2017-05-25 16:16:10 -05:00
Matthew Honnibal 2cb7cc2db7 Remove commented code from parser 2017-05-25 14:55:09 -05:00
Matthew Honnibal f403c2cd5f Add env opts for optimizer 2017-05-25 11:19:26 -05:00
Matthew Honnibal c245ff6b27 Rebatch parser inputs, with mid-sentence states 2017-05-25 11:18:59 -05:00
Matthew Honnibal 679efe79c8 Make parser update less hacky 2017-05-25 06:49:00 -05:00
Matthew Honnibal 8500d9b1da Only train one task per iter, holding grads 2017-05-25 06:47:42 -05:00
Matthew Honnibal b27c587800 Fix pieces argument to PrecomputedMaxout 2017-05-25 06:46:59 -05:00
Matthew Honnibal e1cb5be0c7 Adjust dropout, depth and multi-task in parser 2017-05-24 20:11:41 -05:00
Matthew Honnibal e6cc927ab1 Rearrange multi-task learning 2017-05-24 20:10:54 -05:00
Matthew Honnibal 135a13790c Disable gold preprocessing 2017-05-24 20:10:20 -05:00
Matthew Honnibal 467bbeadb8 Add hidden layers for tagger 2017-05-24 20:09:51 -05:00
ines 66088851dc Add Doc.to_disk() and Doc.from_disk() methods 2017-05-24 11:58:17 +02:00
Matthew Honnibal 620df0414f Fix dropout in parser 2017-05-23 15:20:45 -05:00
Matthew Honnibal 5b67bcbee0 Increase default embed size to 7500 2017-05-23 15:20:16 -05:00
Matthew Honnibal 48eef94f92 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-05-23 18:47:32 +02:00
Matthew Honnibal d44b1eafc4 Fix conflict artefacts 2017-05-23 18:47:11 +02:00
Matthew Honnibal 01e59e4e6e * Add Token.sent_start property, re Issue #235 2017-05-23 18:41:11 +02:00
Matthew Honnibal 4917cbb484 Include sent_start test 2017-05-23 18:40:37 +02:00
Matthew Honnibal d68dd1f251 Add SENT_START attribute, for custom sentence boundary detection 2017-05-23 18:37:58 +02:00
Matthew Honnibal 8026c183d0 Add hacky logic to accelerate depth=0 case in parser 2017-05-23 11:06:49 -05:00
Matthew Honnibal e7d3159d91 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-05-23 05:58:17 -05:00
Matthew Honnibal a8b6d11c5b Support optional maxout layer 2017-05-23 05:58:07 -05:00
Matthew Honnibal c55b8fa7c5 Fix bugs in parse_batch 2017-05-23 05:57:52 -05:00
ines fb0ff0272f xfail neural parser tests for now and remove test for deprecated method 2017-05-23 12:40:37 +02:00
Matthew Honnibal 964707d795 Restore support for deeper networks in parser 2017-05-23 05:31:13 -05:00
Matthew Honnibal e27262f431 Go back to previous matcher signature, with on_match positional 2017-05-23 04:37:40 -05:00
Matthew Honnibal 5418bcf5d7 Resolve conflict on test 2017-05-23 04:37:16 -05:00
ines e6acd3bbf2 Fix matcher tests and matcher docs 2017-05-23 11:36:02 +02:00
ines d0c6d4f76d Fix formatting 2017-05-23 11:32:00 +02:00
Matthew Honnibal f0bcc0bd8d Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-05-23 04:29:28 -05:00
Matthew Honnibal 9adfe9e8fc Don't hold gradient updates in language -- let the parser decide how to batch the updates. 2017-05-23 04:29:10 -05:00
Matthew Honnibal 6b918cc58e Support making updates periodically during training 2017-05-23 04:23:29 -05:00
Matthew Honnibal 3f725ff7b3 Roll back changes to parser update 2017-05-23 04:23:05 -05:00
Matthew Honnibal 3959d778ac Revert "Revert "WIP on improving parser efficiency""
This reverts commit 532afef4a8.
2017-05-23 03:06:53 -05:00
Matthew Honnibal 532afef4a8 Revert "WIP on improving parser efficiency"
This reverts commit bdaac7ab44.
2017-05-23 03:05:25 -05:00
Matthew Honnibal bdaac7ab44 WIP on improving parser efficiency 2017-05-23 02:59:31 -05:00
Matthew Honnibal 8a9e318deb Put the parsing loop in a nogil prange block 2017-05-22 17:58:12 -05:00
ines a23f487b06 Tidy up displaCy and add "manual" option
Also don't require title in EntityRenderer
2017-05-22 18:48:20 +02:00
Matthew Honnibal 0264447c4d Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-05-22 10:41:56 -05:00
Matthew Honnibal 6e8dce2c05 Fix train command line args 2017-05-22 10:41:39 -05:00
Matthew Honnibal a7ee63c0ac Fix labeller loss for unseen labels 2017-05-22 10:41:20 -05:00
Matthew Honnibal c9760b2104 Support sentence limits in GoldCorpus 2017-05-22 10:40:46 -05:00
Matthew Honnibal e2136232f9 Exclude states with no matching gold annotations from parsing 2017-05-22 10:30:12 -05:00
Matthew Honnibal 83ffd16474 Fix offset calculation for other negative values 2017-05-22 08:00:53 -05:00
ines b3c7ee0148 Fix tests and use the new Matcher API 2017-05-22 13:54:20 +02:00
Matthew Honnibal f00f821496 Fix pseudoprojectivity->nonproj 2017-05-22 06:14:42 -05:00
Matthew Honnibal ae8cf70dc1 Fix CLI train signature 2017-05-22 06:13:39 -05:00
Matthew Honnibal 187f370734 Update tests for matcher changes 2017-05-22 12:59:50 +02:00
Matthew Honnibal 5d59e74cf6 PseudoProjectivity->nonproj 2017-05-22 05:49:53 -05:00
Matthew Honnibal 7e2cdc0c81 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-05-22 12:39:34 +02:00
Matthew Honnibal 70a8c531cd Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-05-22 05:39:18 -05:00
Matthew Honnibal 2f78413a02 PseudoProjectivity->nonproj 2017-05-22 05:39:03 -05:00
Matthew Honnibal 89ebc5c3cd Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-05-22 12:38:15 +02:00
Matthew Honnibal d8bb5bb959 Implement StringStore serialization, and update tests 2017-05-22 12:38:00 +02:00
ines 54f04a9fe0 Update API docs with changes in spacy.gold and spacy.language 2017-05-22 12:29:30 +02:00
ines b5fb43fdd8 Allow sys.exit status as exits keyword arg in util.prints() 2017-05-22 12:29:15 +02:00
ines fc3ec733ea Reduce complexity in CLI
Remove now redundant model command and move plac annotations to cli
files
2017-05-22 12:28:58 +02:00
Matthew Honnibal b45b4aa392 PseudoProjectivity --> nonproj 2017-05-22 05:17:44 -05:00
Matthew Honnibal aae97f00e9 Fix nonproj import 2017-05-22 05:15:06 -05:00
Matthew Honnibal 9262fc4829 Fix syntax error 2017-05-22 05:14:59 -05:00
Matthew Honnibal 93a042253b Make GoldParse attributes writeable 2017-05-22 04:51:08 -05:00
Matthew Honnibal 2a5eb9f61e Make nonproj methods top-level functions, instead of class methods 2017-05-22 04:51:08 -05:00
Matthew Honnibal c998776c25 Make single array for features, to reduce GPU copies 2017-05-22 04:51:08 -05:00
Matthew Honnibal bc2294d7f1 Add support for fiddly hyper-parameters to train func 2017-05-22 04:51:08 -05:00
Matthew Honnibal 80e19a2399 Simplify CLI implementation for subcommands. Remove model command. 2017-05-22 04:51:08 -05:00
Matthew Honnibal 33e2222839 Remove unused code in deprojectivize 2017-05-22 04:51:08 -05:00
Matthew Honnibal 4e0988605a Pass through non-projective=True 2017-05-22 04:51:08 -05:00
Matthew Honnibal 025d9bbc37 Fix handling of non-projective deps 2017-05-22 04:51:08 -05:00
Matthew Honnibal 5738d373d5 Add deprojectivize to pipeline 2017-05-22 04:51:08 -05:00
Matthew Honnibal 1b5fa68996 Do pseudo-projective pre-processing for parser 2017-05-22 04:51:08 -05:00
Matthew Honnibal 1d5d9838a2 Fix action collection for parser 2017-05-22 04:51:08 -05:00
Matthew Honnibal 8d1e64be69 Add experimental NeuralLabeller 2017-05-22 04:51:08 -05:00
Matthew Honnibal 9b1b0742fd Fix prediction for tok2vec 2017-05-22 04:51:08 -05:00
Matthew Honnibal f13d6c7359 Support gold preprocessing and single gold files 2017-05-22 04:51:08 -05:00
Matthew Honnibal e14533757b Use averaged params for evaluation 2017-05-22 04:51:08 -05:00
Matthew Honnibal 7811d97339 Refactor CLI 2017-05-22 04:51:08 -05:00
Matthew Honnibal 5db89053aa Merge docstrings 2017-05-21 13:46:23 -05:00
Matthew Honnibal 432b3499b3 Fix memory leak 2017-05-21 13:38:46 -05:00
Matthew Honnibal 59fbfb3829 Remove train.py -- functions now in GoldCorpus and Language 2017-05-21 09:08:27 -05:00
Matthew Honnibal 8904814c0e Add missing import 2017-05-21 09:07:56 -05:00
Matthew Honnibal baf3ef0ddc Remove import of removed train_config script 2017-05-21 09:07:34 -05:00
Matthew Honnibal 4c9202249d Refactor training, to fix memory leak 2017-05-21 09:07:06 -05:00
Matthew Honnibal 4803b3b69e Add GoldCorpus class, to manage data streaming 2017-05-21 09:06:17 -05:00
Matthew Honnibal 180e5afede Fix tokvecs flattening in pipeline 2017-05-21 09:05:34 -05:00
Matthew Honnibal 0731971bfc Add itershuffle utility function. Maybe belongs in thinc 2017-05-21 09:05:05 -05:00
ines 2c5cfe8bbf Update docstrings and API docs for StringStore 2017-05-21 14:18:58 +02:00
ines 251346b59f Fix typos and formatting 2017-05-21 14:18:46 +02:00
ines 075f5ff87a Update docstrings and API docs for GoldParse 2017-05-21 13:53:46 +02:00
ines 99b631617d Reformat docstrings 2017-05-21 13:32:15 +02:00
ines 885e82c9b0 Update docstrings and remove deprecated load classmethod 2017-05-21 13:27:52 +02:00
ines c5a653fa48 Update docstrings and API docs for Tokenizer 2017-05-21 13:18:14 +02:00
ines f216422ac5 Remove deprecated load classmethod 2017-05-21 13:18:01 +02:00
ines d82ae9a585 Change "function" to "callable" in docs 2017-05-21 13:17:40 +02:00
ines 3871157d84 Update spacy.util documentation 2017-05-21 01:12:09 +02:00
ines 0c6c65aa3c Improve messaging if model linking fails after download 2017-05-21 00:28:37 +02:00
Matthew Honnibal 3b7c108246 Pass tokvecs through as a list, instead of concatenated. Also fix padding 2017-05-20 13:23:32 -05:00
ines 924e8506de Move Defaults subclass to module scope (necessary for pickling) 2017-05-20 19:02:27 +02:00
Matthew Honnibal d52b65aec2 Revert "Move to contiguous buffer for token_ids and d_vectors"
This reverts commit 3ff8c35a79.
2017-05-20 11:26:23 -05:00
ines 27de0834b2 Update docstrings and API docs for Lexeme 2017-05-20 15:13:42 +02:00
ines 7ed8a92ed1 Update docstrings and API docs for Token 2017-05-20 15:13:33 +02:00
ines 4ed6a36622 Update docstrings and API docs for Matcher 2017-05-20 14:43:10 +02:00
ines 39f36539f6 Update docstrings and API docs for Matcher 2017-05-20 14:32:34 +02:00
ines c00ff257be Update docstrings and API docs for Matcher 2017-05-20 14:26:10 +02:00
ines 790435e51c Update docstrings 2017-05-20 14:05:07 +02:00
ines f0cc642bb9 Update docstrings and API docs for Vocab 2017-05-20 14:00:41 +02:00
Matthew Honnibal ce9234f593 Update Matcher API 2017-05-20 13:54:53 +02:00
Matthew Honnibal b272890a8c Try to move parser to simpler PrecomputedAffine class. Currently broken -- maybe the previous change 2017-05-20 06:40:10 -05:00
ines e39ad78267 Resolve model name properly in cli.info
Use util.resolve_model_path() to also allow package names and paths.
2017-05-20 12:24:40 +02:00
Matthew Honnibal 3ff8c35a79 Move to contiguous buffer for token_ids and d_vectors 2017-05-20 04:17:30 -05:00
Matthew Honnibal 8b04b0af9f Remove freqs from transition_system 2017-05-20 02:20:48 -05:00
Matthew Honnibal 61fe55efba Move EnglishDefaults class out of English 2017-05-20 02:18:19 -05:00
Matthew Honnibal a1ba20e2b1 Fix over-run on parse_batch 2017-05-19 18:57:30 -05:00
ines 1d4d3d0ecd Add TODO 2017-05-20 01:38:04 +02:00
Matthew Honnibal 7ee1827af0 Disable data caching in parser 2017-05-19 18:17:11 -05:00
Matthew Honnibal e84de028b5 Remove 'rebatch' op, and remove min-batch cap 2017-05-19 18:16:36 -05:00
Matthew Honnibal 3376d4d6e8 Update the train script, fixing GPU memory leak 2017-05-19 18:15:50 -05:00
Matthew Honnibal 836fe1d880 Update neural net tests 2017-05-19 18:11:29 -05:00
ines fe5d8819ea Update Matcher docstrings and API docs 2017-05-19 21:47:06 +02:00
Matthew Honnibal 08766240c3 Add incomplete iob converter 2017-05-19 13:27:51 -05:00
Matthew Honnibal c12ab47a56 Remove state argument in pipeline. Other changes 2017-05-19 13:26:36 -05:00
Matthew Honnibal 66ea9aebe7 Remove the state argument from Language 2017-05-19 13:25:42 -05:00
Matthew Honnibal 09a877886b WIP on iob converter 2017-05-19 13:24:39 -05:00
ines a804045597 Use is_ancestor instead of deprecated is_ancestor_of 2017-05-19 20:23:40 +02:00
Matthew Honnibal 8d5e6d9f4f Rename no_ner arg to no_entities 2017-05-19 13:23:11 -05:00
ines e9e62b01b0 Update docstrings and API docs for Token 2017-05-19 18:47:56 +02:00
ines 62ceec4fc6 Update docstrings and API docs for Span 2017-05-19 18:47:46 +02:00
ines 23f9a3ccc8 Update docstrings and API docs for Doc 2017-05-19 18:47:39 +02:00
ines 2c8c9dc0c9 Update docstrings and API docs for Language 2017-05-19 18:47:24 +02:00
ines 0791f0aae6 Update docstrings and API docs for Span class 2017-05-19 00:31:31 +02:00
ines 8455cb1327 Update docstring for Doc.__getitem__ 2017-05-19 00:30:51 +02:00
ines 0fc05e54e4 Document TokenVectorEncoder 2017-05-19 00:00:02 +02:00
ines b687ad109d Update docstrings and API docs for Doc class 2017-05-18 23:59:44 +02:00
ines d42bc16868 Update docstrings and API docs for Language class 2017-05-18 23:57:38 +02:00
ines 593361ee3c Update docstrings for Span class 2017-05-18 22:17:41 +02:00
ines b87066ff10 Update docstrings and API docs for Doc class 2017-05-18 22:17:41 +02:00
Matthew Honnibal 238be0f16a Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-05-18 08:32:22 -05:00
Matthew Honnibal c214c0decb Improve env_opt reporting 2017-05-18 08:32:03 -05:00
Matthew Honnibal bbb59e371c Fix GPU evaluation 2017-05-18 08:31:15 -05:00
Matthew Honnibal c2c825127a Fix use_params and pipe methods 2017-05-18 08:30:59 -05:00
Matthew Honnibal ca70b08661 Fix GPU training and evaluation 2017-05-18 08:30:33 -05:00
ines 489d2fb4ba Add is_in_jupyter() helper for displaCy (see #1058) 2017-05-18 14:13:14 +02:00
ines abf0188b0a Move cupy and CudaStream to compat 2017-05-18 14:12:45 +02:00
ines 33decd85b6 Reorganise and explicitly state what's importable 2017-05-18 14:12:31 +02:00
Matthew Honnibal a438cef8c5 Fix significant bug in feature calculation -- off by 1 2017-05-18 06:21:32 -05:00
Matthew Honnibal fc8d3a112c Add util.env_opt support: Can set hyper params through environment variables. 2017-05-18 04:36:53 -05:00
Matthew Honnibal d2626fdb45 Fix name error in nn parser 2017-05-18 04:31:01 -05:00
Matthew Honnibal b460533827 Bug fixes to pipeline 2017-05-18 04:29:51 -05:00
Matthew Honnibal 8815507f8e Move SpanishDefaults out of Language class, for pickle 2017-05-18 04:28:51 -05:00
Matthew Honnibal 2713041571 Fix GPU usage in Language 2017-05-18 04:25:19 -05:00
Matthew Honnibal 711ad5edc4 Cache features in doc2feats 2017-05-18 04:22:20 -05:00
Matthew Honnibal 39ea38c4b1 Add option to use gpu to spacy train 2017-05-18 04:21:49 -05:00
Matthew Honnibal a1d8e420b5 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-05-17 08:00:04 -05:00
Matthew Honnibal edfea3a513 Fix progress bar 2017-05-17 14:59:37 +02:00
Matthew Honnibal 0b7fd67408 Fix style check in displacy 2017-05-17 07:57:24 -05:00
Matthew Honnibal 55dab77de8 Add conversion rule for .conll 2017-05-17 13:13:48 +02:00
Matthew Honnibal 692bd2a186 Bug fix to tagger: wasnt backproping to token vectors 2017-05-17 13:13:14 +02:00
Matthew Honnibal 877f83807f Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-05-17 12:09:29 +02:00
Matthew Honnibal 793430aa7a Get spaCy train command working with neural network
* Integrate models into pipeline
* Add basic serialization (maybe incorrect)
* Fix pickle on vocab
2017-05-17 12:04:50 +02:00
Matthew Honnibal 3bf4a28d8d Use tag in CoNLL converter, not POS 2017-05-17 12:04:33 +02:00
ines 1a05078c79 Add language-specific syntax iterators to en and de 2017-05-17 12:04:03 +02:00
Matthew Honnibal c9a5d5d24b Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-05-16 16:22:05 +02:00
Matthew Honnibal 8cf097ca88 Redesign training to integrate NN components
* Obsolete .parser, .entity etc names in favour of .pipeline
* Components no longer create models on initialization
* Models created by loading method (from_disk(), from_bytes() etc), or
    .begin_training()
* Add .predict(), .set_annotations() methods in components
* Pass state through pipeline, to allow components to share information
    more flexibly.
2017-05-16 16:17:30 +02:00
Matthew Honnibal 221b4c1ee8 Fix test for Python 3 2017-05-16 13:06:30 +02:00
Matthew Honnibal 5211645af3 Get data flowing through pipeline. Needs redesign 2017-05-16 11:21:59 +02:00
Matthew Honnibal 1d7c18e58a Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-05-15 21:53:47 +02:00
Matthew Honnibal a9edb3aa1d Improve integration of NN parser, to support unified training API 2017-05-15 21:53:27 +02:00
ines 98354be150 Only get user_data if it exists on doc 2017-05-15 13:39:47 +02:00
ines c33bdeb564 Use uppercase for entity types 2017-05-15 01:24:57 +02:00
ines 4aaa607b8d Add xmlns:xlink so SVGs are rendered properly as individual files 2017-05-14 19:54:13 +02:00
ines 9dd13cd76a Update docstrings 2017-05-14 19:30:47 +02:00
ines a04550605a Add Jupyter notebook support (see #1058) 2017-05-14 18:39:01 +02:00
ines c31792aaec Add displaCy visualisers (see #1058) 2017-05-14 17:50:23 +02:00
ines b462076d80 Merge load_lang_class and get_lang_class 2017-05-14 01:31:10 +02:00
ines 36bebe7164 Update docstrings 2017-05-14 01:30:29 +02:00
Matthew Honnibal 4b9d69f428 Merge branch 'v2' into develop
* Move v2 parser into nn_parser.pyx
* New TokenVectorEncoder class in pipeline.pyx
* New spacy/_ml.py module

Currently the two parsers live side-by-side, until we figure out how to
organize them.
2017-05-14 01:10:23 +02:00
Matthew Honnibal 5cac951a16 Move new parser to nn_parser.pyx, and restore old parser, to make tests pass. 2017-05-14 00:55:01 +02:00
Matthew Honnibal f8c02b4341 Remove cupy imports from parser, so it can work on CPU 2017-05-14 00:37:53 +02:00
Matthew Honnibal 613ba79e2e Fiddle with sizings for parser 2017-05-13 17:20:23 -05:00
Matthew Honnibal e6d71e1778 Small fixes to parser 2017-05-13 17:19:04 -05:00