Commit Graph

37 Commits

Author SHA1 Message Date
Matthew Honnibal e2136232f9 Exclude states with no matching gold annotations from parsing 2017-05-22 10:30:12 -05:00
Matthew Honnibal 931feb3360 Allow beam parsing for NER 2017-03-11 11:12:01 -06:00
Matthew Honnibal bcf8f7ba40 * Add a parse_batch method to Parser, that releases the GIL around a batch of documents. 2016-02-01 08:34:55 +01:00
Matthew Honnibal 28e5ad62bc * Pass a StateC pointer into the transition and validation methods in the parser, so that the GIL can be released over a batch of documents 2016-02-01 03:00:15 +01:00
Matthew Honnibal a47f00901b * Pass a StateC pointer into the transition and validation methods in the parser, so that the GIL can be released over a batch of documents 2016-02-01 02:58:14 +01:00
Matthew Honnibal 10877a7791 * Update for thinc 5.0, including changing cost from int to weight_t, and updating the tagger and parser 2016-01-30 14:31:36 +01:00
Matthew Honnibal 151aa0b0e2 * Allow users to add_label, in order to extend the entity recogniser to new classes. Does not by itself add a class to the model 2016-01-19 19:09:33 +01:00
Matthew Honnibal 20fd36a0f7 * Very scrappy, likely buggy first-cut pickle implementation, to work on Issue #125: allow pickle for Apache Spark. The current implementation sends stuff to temp files, and does almost nothing to ensure all modifiable state is actually preserved. The Language() instance is a deep tree of extension objects, and if pickling during training, some of the C-data state is hard to preserve. 2015-10-13 13:44:41 +11:00
Matthew Honnibal 2a46c77324 * Whitespace 2015-08-08 23:35:59 +02:00
Matthew Honnibal 317cbbc015 * Serialization round trip now working with decent API, but with rough spots in the organisation and requiring vocabulary to be fixed ahead of time. 2015-07-19 15:18:17 +02:00
Matthew Honnibal e29daea85f * Fix bint/int typing problem in TransitionSystem. In C++ bint* means bool*, but in C it means int*. So, type-casting to bint* is unsafe. 2015-07-17 22:37:24 +02:00
Matthew Honnibal 9a8db9743c * Remove gil from parser.call 2015-07-14 23:47:33 +02:00
Matthew Honnibal 75aeccc064 * Rejig parser interface to use new thinc.api.Example class, in prep of theano model. Comment out beam search 2015-06-28 11:02:34 +02:00
Matthew Honnibal e5570c9700 * Set nogil for oracle functions 2015-06-10 06:56:56 +02:00
Matthew Honnibal 04b1cd9b8c * Greedy parsing working with new StateClass. Beam parsing broken 2015-06-10 04:20:23 +02:00
Matthew Honnibal d68c686ec1 * Move StateClass into interface of transition functions 2015-06-10 01:35:28 +02:00
Matthew Honnibal 4b98b3e9c8 * Cost functions now take StateClass argument, instead of State*. 2015-06-10 00:40:43 +02:00
Matthew Honnibal e0cf61f591 * Move StateClass into the interface for is_valid 2015-06-09 23:23:28 +02:00
Matthew Honnibal 0895d454fb * Prepare to switch to using state class, instead of state struct 2015-06-09 21:20:14 +02:00
Matthew Honnibal 8f142c1838 * Refactor transition system oracles, to split out move and label cost. Preparing to add Unshift move. Will exclude non-monotonic. 2015-06-07 03:21:29 +02:00
Matthew Honnibal 6bf35cecc3 * Refactor transition system to use classes with staticmethods. 2015-06-05 02:27:17 +02:00
Matthew Honnibal 079dad28a7 * Update for faster beam training 2015-06-04 19:32:32 +02:00
Matthew Honnibal a513ec500f * Have oracle functions take a struct instead of a Python object 2015-06-02 20:01:06 +02:00
Matthew Honnibal 0786d9b3c7 * Refactor TransitionSystem, adding set_valid method 2015-06-02 18:38:07 +02:00
Matthew Honnibal c7876aa8b6 * Add get_valid method 2015-06-01 23:06:00 +02:00
Matthew Honnibal fc75210941 * Move spacy.syntax.conll to spacy.gold 2015-05-24 21:35:02 +02:00
Matthew Honnibal fb8d50b3d5 Merge branch 'master' of ssh://github.com/honnibal/spaCy 2015-04-30 12:45:15 +02:00
Matthew Honnibal b3fd48c97b * Fix missing root labels bug identified in Issue #57 2015-04-28 20:45:51 +02:00
Jordan Suchow 3a8d9b37a6 Remove trailing whitespace 2015-04-19 13:01:38 -07:00
Matthew Honnibal 31fad99518 * Use StringStore to encode label names, instead of label_ids 2015-03-26 16:44:45 +01:00
Matthew Honnibal 8057a95f20 * NER seems to be working, scoring 69 F. Need to add decision-history features --- currently only use current word, 2 words context. Need refactoring. 2015-03-26 16:44:44 +01:00
Matthew Honnibal ae235e07b9 * Refactoring working for parser, but now need to rig up features for NER, and then debug oracle etc. 2015-03-26 16:44:44 +01:00
Matthew Honnibal b3eda03c9c * Tmp 2015-03-26 16:44:44 +01:00
Matthew Honnibal 10ed738df2 * Tmp commit 2015-03-26 16:44:43 +01:00
Matthew Honnibal 8c883cef58 * Refactored transition system code now compiling. Still need to hook up label oracle, and test 2015-03-26 16:44:43 +01:00
Matthew Honnibal 8eadb984cb * Refactor arc_eager to use new TransitionSystem base class. Need to fix oracle 2015-03-26 16:44:42 +01:00
Matthew Honnibal b063001596 * Add base TransitionSystem class. Still need to rethink how non-monotonic labelling will work for best_valid 2015-03-26 16:44:42 +01:00