Commit Graph

62 Commits

Author SHA1 Message Date
ines 0739ae7b76 Tidy up and fix formatting and imports 2017-04-15 13:05:15 +02:00
Matthew Honnibal 354458484c WIP on add_label bug during NER training
Currently when a new label is introduced to NER during training,
it causes the labels to be read in in an unexpected order. This
invalidates the model.
2017-04-14 23:52:17 +02:00
Matthew Honnibal 2611ac2a89 Fix scorer bug for NER, related to ambiguity between missing annotations and misaligned tokens 2017-03-16 09:38:28 -05:00
Matthew Honnibal 931feb3360 Allow beam parsing for NER 2017-03-11 11:12:01 -06:00
Matthew Honnibal 159e8c46e1 Merge old training fixes with newer state 2016-11-25 09:16:36 -06:00
Matthew Honnibal 39341598bb Fix NER label calculation 2016-11-25 09:02:22 -06:00
Matthew Honnibal 301f3cc898 Fix Issue #429. Add an initialize_state method to the named entity recogniser that adds missing entity types. This is a messy place to add this, because it's strange to have the method mutate state. A better home for this logic could be found. 2016-10-27 18:01:55 +02:00
Matthew Honnibal f787cd29fe Refactor the pipeline classes to make them more consistent, and remove the redundant blank() constructor. 2016-10-16 21:34:57 +02:00
Matthew Honnibal 9e09b39b9f Revert "Changes to transition systems for new StringStore scheme"
This reverts commit 0442e0ab1e.
2016-09-30 20:11:49 +02:00
Matthew Honnibal 0442e0ab1e Changes to transition systems for new StringStore scheme 2016-09-30 19:58:51 +02:00
Matthew Honnibal a47f00901b * Pass a StateC pointer into the transition and validation methods in the parser, so that the GIL can be released over a batch of documents 2016-02-01 02:58:14 +01:00
Matthew Honnibal daaad66448 * Now fully proxied 2016-02-01 02:37:08 +01:00
Matthew Honnibal 10877a7791 * Update for thinc 5.0, including changing cost from int to weight_t, and updating the tagger and parser 2016-01-30 14:31:36 +01:00
Matthew Honnibal c8e0011ebc * Add iterators to the NER and parser transition systems, to get the action types 2016-01-19 19:07:43 +01:00
Matthew Honnibal 5623242b3e * Adjust NER rules, so that U entries in gazetteer don't become B moves to the model 2015-11-12 04:48:23 +11:00
Matthew Honnibal 44fbdc7260 * Fix bug in NER transition system, that sometimes left no valid moves 2015-11-08 16:19:12 +01:00
Matthew Honnibal e92371bb54 * Fix rule that made Last action invalid if there was a preset of O, since if the entity is already open, that ship has sailed. 2015-11-08 22:17:51 +11:00
Matthew Honnibal af70dc166a * Fix Last restriction, that was supposed to prevent conflicts with presets, but was incorrect. 2015-11-07 09:52:00 +11:00
Matthew Honnibal d24b8509e4 * Correct screw ups from the previous commits 2015-11-07 06:51:41 +11:00
Matthew Honnibal 5efad178b5 * Set ent tag when close entity 2015-11-07 06:09:25 +11:00
Matthew Honnibal 01ab464383 * Prevent Begin and In moves from applying in NER if we're at the last token of a sentence, as this would mean the entity would span over a sentence boundary. Re Issue #169 2015-11-07 05:30:44 +11:00
Matthew Honnibal fe43f8cf39 * Whitespace 2015-08-09 02:31:53 +02:00
Matthew Honnibal 59c3bf60a6 * Ensure entity recognizer doesn't over-write preset types 2015-08-06 16:09:08 +02:00
Matthew Honnibal 9c1724ecae * Gazetteer stuff working, now need to wire up to API 2015-08-06 00:35:40 +02:00
Matthew Honnibal d5255aad77 * Update freqs for missing tags in ner, for serializer 2015-07-23 01:17:11 +02:00
Matthew Honnibal 317cbbc015 * Serialization round trip now working with decent API, but with rough spots in the organisation and requiring vocabulary to be fixed ahead of time. 2015-07-19 15:18:17 +02:00
Matthew Honnibal 75aeccc064 * Rejig parser interface to use new thinc.api.Example class, in prep of theano model. Comment out beam search 2015-06-28 11:02:34 +02:00
Matthew Honnibal 579735a095 * Remove import of _state module 2015-06-23 17:25:08 +02:00
Matthew Honnibal 15e177d7a1 * Fixes to unshift/fast-forward strategy. Getting 91.55 greedy on NW dev, gold preproc 2015-06-12 01:50:23 +02:00
Matthew Honnibal e2f9a80713 * Remove old _state imports 2015-06-10 07:09:17 +02:00
Matthew Honnibal 18cc326dc0 * Bug fixes to ner.pyx 2015-06-10 06:57:41 +02:00
Matthew Honnibal d68c686ec1 * Move StateClass into interface of transition functions 2015-06-10 01:35:28 +02:00
Matthew Honnibal 4b98b3e9c8 * Cost functions now take StateClass argument, instead of State*. 2015-06-10 00:40:43 +02:00
Matthew Honnibal e0cf61f591 * Move StateClass into the interface for is_valid 2015-06-09 23:23:28 +02:00
Matthew Honnibal 1fee7ade61 * Tweak to ner 2015-06-05 23:48:43 +02:00
Matthew Honnibal 33e70b167f * Remove dead code from ner.pyx 2015-06-05 17:12:47 +02:00
Matthew Honnibal 0114e7600d * Fix NER oracle 2015-06-05 17:11:26 +02:00
Matthew Honnibal 6bf35cecc3 * Refactor transition system to use classes with staticmethods. 2015-06-05 02:27:17 +02:00
Matthew Honnibal a513ec500f * Have oracle functions take a struct instead of a Python object 2015-06-02 20:01:06 +02:00
Matthew Honnibal 0786d9b3c7 * Refactor TransitionSystem, adding set_valid method 2015-06-02 18:38:07 +02:00
Matthew Honnibal c7876aa8b6 * Add get_valid method 2015-06-01 23:06:00 +02:00
Matthew Honnibal 76300bbb1b * Use updated JSON format, with sentences below paragraphs. Allows use of gold preprocessing flag. 2015-05-30 01:25:46 +02:00
Matthew Honnibal fc75210941 * Move spacy.syntax.conll to spacy.gold 2015-05-24 21:35:02 +02:00
Matthew Honnibal 20f1d868a3 * Tmp commit. Working on whole document parsing 2015-05-24 02:49:56 +02:00
Matthew Honnibal aff9359a8d * Update ner.pyx to expect brackets from gold_tuples 2015-05-12 20:27:55 +02:00
Matthew Honnibal fb8d50b3d5 Merge branch 'master' of ssh://github.com/honnibal/spaCy 2015-04-30 12:45:15 +02:00
Matthew Honnibal b3fd48c97b * Fix missing root labels bug identified in Issue #57 2015-04-28 20:45:51 +02:00
Jordan Suchow 3a8d9b37a6 Remove trailing whitespace 2015-04-19 13:01:38 -07:00
Matthew Honnibal 99dbf8a38c * Fix error type in lookup_transition 2015-04-16 01:36:22 +02:00
Matthew Honnibal 507048dc45 * Rename StandardError to Exception, for Python 3 compatibility 2015-04-12 07:28:34 +02:00