Commit Graph

22 Commits

Author SHA1 Message Date
Ines Montani de11ea753a Merge branch 'master' into develop 2020-02-18 14:47:23 +01:00
adrianeboyd 5ee9d8c9b8
Add MORPH attr, add support in retokenizer (#4947)
* Add MORPH attr / symbol for token attrs

* Update retokenizer for MORPH
2020-01-29 17:45:46 +01:00
Sofie Van Landeghem a1b22e90cd serialize ENT_ID (#4852)
* expand serialization test for custom token attribute

* add failing test for issue 4849

* define ENT_ID as attr and use in doc serialization

* fix few typos
2020-01-06 14:57:34 +01:00
Matthew Honnibal ef666656b3 Fix attrs alignment 2019-07-12 17:59:47 +02:00
svlandeg 8608685543 ensure Span.as_doc keeps the entity links + unit test 2019-06-25 15:28:51 +02:00
Matthew Honnibal c0caf7cf27 Fix LANG symbol 2018-02-17 18:10:50 +01:00
Matthew Honnibal 0bf2f6be29 Add missing symbol for LANG attr. Fixes inconsistent numeric ID 2018-02-17 17:37:02 +01:00
4altinok 3deef1497a removed 18 and replaced 18 with is_currency 2018-02-11 18:51:09 +01:00
Matthew Honnibal 16122f566e Fix cpdef enum in attrs.pyx 2017-09-17 12:28:53 -05:00
Matthew Honnibal d68dd1f251 Add SENT_START attribute, for custom sentence boundary detection 2017-05-23 18:37:58 +02:00
Matthew Honnibal 1b31c05bf8 Whitespace 2016-12-18 16:51:40 +01:00
Wolfgang Seeker 03fb498dbe introduce lang field for LexemeC to hold language id
put noun_chunk logic into iterators.py for each language separately
2016-03-10 13:01:34 +01:00
Matthew Honnibal c4017a06d9 * Add placeholders for the new flags in attrs and symbols 2016-02-04 15:49:45 +01:00
Matthew Honnibal 064bd69ad0 * Refactor symbols, so that frequency rank can be derived from the orth id of a word. 2015-10-10 16:03:48 +11:00
Matthew Honnibal c2d8edd0bd * Add PROB attribute in attrs.pxd 2015-08-26 19:14:19 +02:00
Matthew Honnibal 9c667b7f15 * Set a value in attrs.pxd on the first flag, to reduce bugs 2015-08-06 16:08:04 +02:00
Matthew Honnibal 8e4c69ee8c * Add is_oov property, and fix up handling of attributes 2015-07-27 01:50:06 +02:00
Matthew Honnibal 6bb96c122d * Host IS_ flags in attrs.pxd, and add properties for them on Token and Lexeme objects 2015-07-26 16:37:16 +02:00
Matthew Honnibal efa80096f1 * Upd attrs id list 2015-07-16 01:26:54 +02:00
Jordan Suchow 3a8d9b37a6 Remove trailing whitespace 2015-04-19 13:01:38 -07:00
Matthew Honnibal 6640386b25 * Fix Issue #43: TAG attr not supported. Also add DEP attr, while I'm at it. Need better way of ensuring future changes don't break in similar way. 2015-04-07 06:00:57 +02:00
Matthew Honnibal d4c99f7dec * Add attrs.pxd 2015-01-26 22:22:09 +11:00