spaCy/spacy/tokens/token.pxd

from numpy cimport ndarray
from ..vocab cimport Vocab
from ..structs cimport TokenC
from ..attrs cimport *
from ..typedefs cimport attr_t, flags_t
from ..parts_of_speech cimport univ_pos_t
from .doc cimport Doc
from ..lexeme cimport Lexeme


cdef class Token:
    cdef readonly Vocab vocab
    cdef TokenC* c
    cdef readonly int i
    cdef readonly Doc doc

    @staticmethod
    cdef inline Token cinit(Vocab vocab, const TokenC* token, int offset, Doc doc):
        if offset < 0 or offset >= doc.length:
            msg = "Attempt to access token at %d, max length %d"
            raise IndexError(msg % (offset, doc.length))
        cdef Token self = Token.__new__(Token, vocab, doc, offset)
        return self

    #cdef inline TokenC struct_from_attrs(Vocab vocab, attrs):
    #    cdef TokenC token
    #    attrs = normalize_attrs(attrs)

    cpdef bint check_flag(self, attr_id_t flag_id) except -1

    @staticmethod
    cdef inline attr_t get_struct_attr(const TokenC* token, attr_id_t feat_name) nogil:
        if feat_name < (sizeof(flags_t) * 8):
            return Lexeme.c_check_flag(token.lex, feat_name)
        elif feat_name == LEMMA:
            return token.lemma
        elif feat_name == POS:
            return token.pos
        elif feat_name == TAG:
            return token.tag
        elif feat_name == DEP:
            return token.dep
        elif feat_name == HEAD:
            return token.head
        elif feat_name == SPACY:
            return token.spacy
        elif feat_name == ENT_IOB:
            return token.ent_iob
        elif feat_name == ENT_TYPE:
            return token.ent_type
        elif feat_name == SENT_START:
            return token.sent_start
        else:
            return Lexeme.get_struct_attr(token.lex, feat_name)

    @staticmethod
    cdef inline attr_t set_struct_attr(TokenC* token, attr_id_t feat_name,
                                       attr_t value) nogil:
        if feat_name == LEMMA:
            token.lemma = value
        elif feat_name == POS:
            token.pos = <univ_pos_t>value
        elif feat_name == TAG:
            token.tag = value
        elif feat_name == DEP:
            token.dep = value
        elif feat_name == HEAD:
            token.head = value
        elif feat_name == SPACY:
            token.spacy = value
        elif feat_name == ENT_IOB:
            token.ent_iob = value
        elif feat_name == ENT_TYPE:
            token.ent_type = value
        elif feat_name == SENT_START:
            token.sent_start = value
Add tensor field to Lexeme, Token, Doc and Span, so that users have a place to hang neural network outputs 2016-10-14 01:24:13 +00:00			`from numpy cimport ndarray`
* Refactor monster tokens.pyx file, into a tokens/ subpackage. Try to break the cycle between Doc and Token, and remove the need to pass around a unicode string reference 2015-07-13 17:20:48 +00:00			`from ..vocab cimport Vocab`
			`from ..structs cimport TokenC`
Add get_struct_attr staticmethod to Token, to match Lexeme.get_struct_attr. 2016-11-25 10:35:17 +00:00			`from ..attrs cimport *`
			`from ..typedefs cimport attr_t, flags_t`
Add set_struct_attr staticmethod to token 2016-11-25 11:41:47 +00:00			`from ..parts_of_speech cimport univ_pos_t`
* Go back to having token reference doc, instead of complicated gymnastics. Rename the attr 'doc', to expose it in the API 2015-07-13 22:10:11 +00:00			`from .doc cimport Doc`
Add get_struct_attr staticmethod to Token, to match Lexeme.get_struct_attr. 2016-11-25 10:35:17 +00:00			`from ..lexeme cimport Lexeme`
* Refactor monster tokens.pyx file, into a tokens/ subpackage. Try to break the cycle between Doc and Token, and remove the need to pass around a unicode string reference 2015-07-13 17:20:48 +00:00

			`cdef class Token:`
Initial, limited support for quantified patterns in Matcher, and tracking of ent_id attribute in Token and Span. The quantifiers need a lot more testing, and there are some known problems. The main known problem is that the zero-plus and one-plus quantifiers won't work if a token can match both the quantified pattern expression AND the tail of the match. 2016-09-21 12:54:55 +00:00			`cdef readonly Vocab vocab`
add function for setting head and label to token change PseudoProjectivity.deprojectivize to use these functions 2016-03-11 16:31:06 +00:00			`cdef TokenC* c`
* Refactor monster tokens.pyx file, into a tokens/ subpackage. Try to break the cycle between Doc and Token, and remove the need to pass around a unicode string reference 2015-07-13 17:20:48 +00:00			`cdef readonly int i`
* Go back to having token reference doc, instead of complicated gymnastics. Rename the attr 'doc', to expose it in the API 2015-07-13 22:10:11 +00:00			`cdef readonly Doc doc`
* Refactor monster tokens.pyx file, into a tokens/ subpackage. Try to break the cycle between Doc and Token, and remove the need to pass around a unicode string reference 2015-07-13 17:20:48 +00:00
			`@staticmethod`
* Go back to having token reference doc, instead of complicated gymnastics. Rename the attr 'doc', to expose it in the API 2015-07-13 22:10:11 +00:00			`cdef inline Token cinit(Vocab vocab, const TokenC* token, int offset, Doc doc):`
			`if offset < 0 or offset >= doc.length:`
* Refactor monster tokens.pyx file, into a tokens/ subpackage. Try to break the cycle between Doc and Token, and remove the need to pass around a unicode string reference 2015-07-13 17:20:48 +00:00			`msg = "Attempt to access token at %d, max length %d"`
* Go back to having token reference doc, instead of complicated gymnastics. Rename the attr 'doc', to expose it in the API 2015-07-13 22:10:11 +00:00			`raise IndexError(msg % (offset, doc.length))`
			`cdef Token self = Token.__new__(Token, vocab, doc, offset)`
* Refactor monster tokens.pyx file, into a tokens/ subpackage. Try to break the cycle between Doc and Token, and remove the need to pass around a unicode string reference 2015-07-13 17:20:48 +00:00			`return self`

Add get_struct_attr staticmethod to Token, to match Lexeme.get_struct_attr. 2016-11-25 10:35:17 +00:00			`#cdef inline TokenC struct_from_attrs(Vocab vocab, attrs):`
			`# cdef TokenC token`
			`# attrs = normalize_attrs(attrs)`

* Refactor monster tokens.pyx file, into a tokens/ subpackage. Try to break the cycle between Doc and Token, and remove the need to pass around a unicode string reference 2015-07-13 17:20:48 +00:00			`cpdef bint check_flag(self, attr_id_t flag_id) except -1`
Add get_struct_attr staticmethod to Token, to match Lexeme.get_struct_attr. 2016-11-25 10:35:17 +00:00
			`@staticmethod`
			`cdef inline attr_t get_struct_attr(const TokenC* token, attr_id_t feat_name) nogil:`
			`if feat_name < (sizeof(flags_t) * 8):`
			`return Lexeme.c_check_flag(token.lex, feat_name)`
			`elif feat_name == LEMMA:`
			`return token.lemma`
			`elif feat_name == POS:`
			`return token.pos`
			`elif feat_name == TAG:`
			`return token.tag`
			`elif feat_name == DEP:`
			`return token.dep`
			`elif feat_name == HEAD:`
			`return token.head`
			`elif feat_name == SPACY:`
			`return token.spacy`
			`elif feat_name == ENT_IOB:`
			`return token.ent_iob`
			`elif feat_name == ENT_TYPE:`
			`return token.ent_type`
fix sent_start in serialization 2018-01-28 18:50:42 +00:00			`elif feat_name == SENT_START:`
			`return token.sent_start`
Add get_struct_attr staticmethod to Token, to match Lexeme.get_struct_attr. 2016-11-25 10:35:17 +00:00			`else:`
			`return Lexeme.get_struct_attr(token.lex, feat_name)`
Add set_struct_attr staticmethod to token 2016-11-25 11:41:47 +00:00
			`@staticmethod`
			`cdef inline attr_t set_struct_attr(TokenC* token, attr_id_t feat_name,`
			`attr_t value) nogil:`
			`if feat_name == LEMMA:`
			`token.lemma = value`
			`elif feat_name == POS:`
			`token.pos = <univ_pos_t>value`
			`elif feat_name == TAG:`
			`token.tag = value`
			`elif feat_name == DEP:`
			`token.dep = value`
			`elif feat_name == HEAD:`
			`token.head = value`
			`elif feat_name == SPACY:`
			`token.spacy = value`
			`elif feat_name == ENT_IOB:`
			`token.ent_iob = value`
			`elif feat_name == ENT_TYPE:`
			`token.ent_type = value`
fix sent_start in serialization 2018-01-28 18:50:42 +00:00			`elif feat_name == SENT_START:`
			`token.sent_start = value`