Commit Graph

20 Commits

Author SHA1 Message Date
Matthew Honnibal 16122f566e Fix cpdef enum in attrs.pyx 2017-09-17 12:28:53 -05:00
Matthew Honnibal fe11564b8e Finish stringstore change. Also xfail vectors tests 2017-05-28 15:10:22 +02:00
Matthew Honnibal 84e66ca6d4 WIP on stringstore change. 27 failures 2017-05-28 14:06:40 +02:00
Matthew Honnibal d68dd1f251 Add SENT_START attribute, for custom sentence boundary detection 2017-05-23 18:37:58 +02:00
ines d24589aa72 Clean up imports, unused code, whitespace, docstrings 2017-04-15 12:05:47 +02:00
ines 561f2a3eb4 Use consistent formatting for docstrings 2017-04-15 11:59:21 +02:00
Matthew Honnibal d864708072 Add more morphology names in attrs.pyx 2017-03-15 09:26:16 -05:00
Roman Inflianskas 66e1109b53 Add support for Universal Dependencies v2.0 2017-03-03 13:17:34 +01:00
Matthew Honnibal 3980f1b0cb Ignore more morphology attributes in deprecated mode of intify_attrs 2016-12-18 17:33:46 +01:00
Matthew Honnibal d58187ffa7 Filter out morphology keys in deprecated attrs 2016-12-18 16:50:26 +01:00
Matthew Honnibal 6dd3b94fa6 Filter out deprecated attributes when reading special-case tokenization rules. 2016-11-25 09:57:18 -06:00
Matthew Honnibal a335c6dcc2 Exclude morphs from deprecated token attributes for now 2016-11-25 16:17:32 +01:00
Matthew Honnibal 846e80f2f4 Exclude morphs from deprecated token attributes for now 2016-11-25 16:14:54 +01:00
Matthew Honnibal 53d8ca8f51 Add spacy.attrs.intify_attrs function, to normalize strings in token attribute dictionaries. 2016-11-25 11:34:30 +01:00
Wolfgang Seeker 03fb498dbe introduce lang field for LexemeC to hold language id
put noun_chunk logic into iterators.py for each language separately
2016-03-10 13:01:34 +01:00
Matthew Honnibal c4017a06d9 * Add placeholders for the new flags in attrs and symbols 2016-02-04 15:49:45 +01:00
Matthew Honnibal 22bd0095f5 * Map empty string to NULL_ATTR in attrs 2015-10-10 22:10:19 +11:00
Matthew Honnibal 94bafc1417 * Rename ATTR_IDS to attrs.IDS. Rename ATTR_NAMES to attrs.NAMES. Rename UNIV_POS_IDS to parts_of_speech.IDS 2015-10-10 17:57:29 +11:00
Matthew Honnibal 064bd69ad0 * Refactor symbols, so that frequency rank can be derived from the orth id of a word. 2015-10-10 16:03:48 +11:00
Matthew Honnibal 44f39a876f * Add a blank attrs.pyx 2015-07-17 16:40:42 +02:00