Commit Graph

89 Commits

Author SHA1 Message Date
Matthew Honnibal 6936ca1664 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2018-12-10 09:44:07 +01:00
Matthew Honnibal 4405b5c875 Fix resizing edge-case for NER 2018-12-10 06:25:17 +00:00
Matthew Honnibal 16c5861d29 Fix NER space constraints
Allow entities to end on spaces, to avoid stumping the oracle when we're
inside an entity, and there's a space just before a correct entity.
2018-12-09 08:06:45 +01:00
Matthew Honnibal 1e6725e9b7 Try to prevent spaces from being tagged as entities 2018-12-07 00:12:12 +00:00
Matthew Honnibal 817e1fc5e5 Fix out-of-bounds access in NER training
The helper method state.B(1) gets the index of the first token of the
buffer, or -1 if no such token exists. Normally this is safe because we
pass this to functions like state.safe_get(), which returns an empty
token. Here we used it directly as an array index, which is not okay!

This error may have been the cause of out-of-bounds access errors during
training. Similar errors may still be around, so much be hunted down.
Hunting this one down took a long time...I printed out values across
training runs and diffed, looking for points of divergence between
runs, when no randomness should be allowed.
2018-10-27 01:12:50 +02:00
Matthew Honnibal ee33de8652 Fix unpickling of NER parser 2018-05-21 17:42:40 +02:00
Matthew Honnibal 2c4a6d66fa Merge master into develop. Big merge, many conflicts -- need to review 2018-04-29 14:49:26 +02:00
Ines Montani 3141e04822
💫 New system for error messages and warnings (#2163)
* Add spacy.errors module

* Update deprecation and user warnings

* Replace errors and asserts with new error message system

* Remove redundant asserts

* Fix whitespace

* Add messages for print/util.prints statements

* Fix typo

* Fix typos

* Move CLI messages to spacy.cli._messages

* Add decorator to display error code with message

An implementation like this is nice because it only modifies the string when it's retrieved from the containing class – so we don't have to worry about manipulating tracebacks etc.

* Remove unused link in spacy.about

* Update errors for invalid pipeline components

* Improve error for unknown factories

* Add displaCy warnings

* Update formatting consistency

* Move error message to spacy.errors

* Update errors and check if doc returned by component is None
2018-04-03 15:50:31 +02:00
ines 3eb67bbe4b Allow entity types with dashes (resolves #1967) 2018-03-28 20:51:26 +02:00
Matthew Honnibal 1f7229f40f Revert "Merge branch 'develop' of https://github.com/explosion/spaCy into develop"
This reverts commit c9ba3d3c2d, reversing
changes made to 92c26a35d4.
2018-03-27 19:23:02 +02:00
Matthew Honnibal e361b4f82b Fix #1929: Incorrect NER when pre-set sentence boundaries. 2018-02-08 15:25:41 +01:00
Matthew Honnibal 2512ea9eeb Fix memory leak in beam parser 2017-11-14 02:11:40 +01:00
ines b4d226a3f1 Tidy up syntax 2017-10-27 19:45:57 +02:00
Matthew Honnibal 92c5d78b42 Unhack NER.add_action 2017-10-07 19:02:40 +02:00
Matthew Honnibal c003c561c3 Revert NER action loading change, for model compatibility 2017-09-17 05:46:03 -05:00
Matthew Honnibal 8c503487af Fix lookup of missing NER actions 2017-09-14 16:59:45 +02:00
Matthew Honnibal daf869ab3b Fix add_action for NER, so labelled 'O' actions aren't added 2017-09-14 16:16:41 +02:00
Matthew Honnibal 84b7ed49e4 Ensure updates aren't made if no gold available 2017-08-20 14:41:38 +02:00
Matthew Honnibal 27abc56e98 Add method to get beam entities 2017-07-29 21:59:02 +02:00
Matthew Honnibal 3da1063b36 Add beam decoding to parser, to allow NER uncertainties 2017-07-20 15:02:55 +02:00
Matthew Honnibal 0ca5832427 Improve negative example handling in NER oracle 2017-07-20 00:18:49 +02:00
Matthew Honnibal 7996d21717 Fixes for new StringStore 2017-05-28 11:09:27 -05:00
Matthew Honnibal 84e66ca6d4 WIP on stringstore change. 27 failures 2017-05-28 14:06:40 +02:00
Matthew Honnibal 99316fa631 Use ordered dict to specify actions 2017-05-27 15:50:21 -05:00
Matthew Honnibal 3d5a536eaa Improve efficiency of parser batching 2017-05-26 11:31:23 -05:00
Matthew Honnibal e2136232f9 Exclude states with no matching gold annotations from parsing 2017-05-22 10:30:12 -05:00
Matthew Honnibal 8b04b0af9f Remove freqs from transition_system 2017-05-20 02:20:48 -05:00
ines 0739ae7b76 Tidy up and fix formatting and imports 2017-04-15 13:05:15 +02:00
Matthew Honnibal 354458484c WIP on add_label bug during NER training
Currently when a new label is introduced to NER during training,
it causes the labels to be read in in an unexpected order. This
invalidates the model.
2017-04-14 23:52:17 +02:00
Matthew Honnibal 2611ac2a89 Fix scorer bug for NER, related to ambiguity between missing annotations and misaligned tokens 2017-03-16 09:38:28 -05:00
Matthew Honnibal 931feb3360 Allow beam parsing for NER 2017-03-11 11:12:01 -06:00
Matthew Honnibal 159e8c46e1 Merge old training fixes with newer state 2016-11-25 09:16:36 -06:00
Matthew Honnibal 39341598bb Fix NER label calculation 2016-11-25 09:02:22 -06:00
Matthew Honnibal 301f3cc898 Fix Issue #429. Add an initialize_state method to the named entity recogniser that adds missing entity types. This is a messy place to add this, because it's strange to have the method mutate state. A better home for this logic could be found. 2016-10-27 18:01:55 +02:00
Matthew Honnibal f787cd29fe Refactor the pipeline classes to make them more consistent, and remove the redundant blank() constructor. 2016-10-16 21:34:57 +02:00
Matthew Honnibal 9e09b39b9f Revert "Changes to transition systems for new StringStore scheme"
This reverts commit 0442e0ab1e.
2016-09-30 20:11:49 +02:00
Matthew Honnibal 0442e0ab1e Changes to transition systems for new StringStore scheme 2016-09-30 19:58:51 +02:00
Matthew Honnibal a47f00901b * Pass a StateC pointer into the transition and validation methods in the parser, so that the GIL can be released over a batch of documents 2016-02-01 02:58:14 +01:00
Matthew Honnibal daaad66448 * Now fully proxied 2016-02-01 02:37:08 +01:00
Matthew Honnibal 10877a7791 * Update for thinc 5.0, including changing cost from int to weight_t, and updating the tagger and parser 2016-01-30 14:31:36 +01:00
Matthew Honnibal c8e0011ebc * Add iterators to the NER and parser transition systems, to get the action types 2016-01-19 19:07:43 +01:00
Matthew Honnibal 5623242b3e * Adjust NER rules, so that U entries in gazetteer don't become B moves to the model 2015-11-12 04:48:23 +11:00
Matthew Honnibal 44fbdc7260 * Fix bug in NER transition system, that sometimes left no valid moves 2015-11-08 16:19:12 +01:00
Matthew Honnibal e92371bb54 * Fix rule that made Last action invalid if there was a preset of O, since if the entity is already open, that ship has sailed. 2015-11-08 22:17:51 +11:00
Matthew Honnibal af70dc166a * Fix Last restriction, that was supposed to prevent conflicts with presets, but was incorrect. 2015-11-07 09:52:00 +11:00
Matthew Honnibal d24b8509e4 * Correct screw ups from the previous commits 2015-11-07 06:51:41 +11:00
Matthew Honnibal 5efad178b5 * Set ent tag when close entity 2015-11-07 06:09:25 +11:00
Matthew Honnibal 01ab464383 * Prevent Begin and In moves from applying in NER if we're at the last token of a sentence, as this would mean the entity would span over a sentence boundary. Re Issue #169 2015-11-07 05:30:44 +11:00
Matthew Honnibal fe43f8cf39 * Whitespace 2015-08-09 02:31:53 +02:00
Matthew Honnibal 59c3bf60a6 * Ensure entity recognizer doesn't over-write preset types 2015-08-06 16:09:08 +02:00