Commit Graph

19 Commits

Author SHA1 Message Date
Matthew Honnibal a6a2159969 Add slot for text categories to Doc 2017-07-22 00:34:15 +02:00
Matthew Honnibal 6782eedf9b Tmp GPU code 2017-05-07 11:04:24 -05:00
Matthew Honnibal 5d5742b773 Add sentiment field to doc, rename getters_for_tokens and getters_for_spans, add user_hooks field to Doc. 2016-10-19 20:54:22 +02:00
Matthew Honnibal fbb7f3f15c Add user_data attribute to Doc object. 2016-10-17 11:43:22 +02:00
Matthew Honnibal ae11ea8240 Add getters_for_tokens and getters_for_spans attributes to Doc object. 2016-10-17 02:42:05 +02:00
Matthew Honnibal f3be9d0a9a Add tensor field to Lexeme, Token, Doc and Span, so that users have a place to hang neural network outputs 2016-10-14 03:24:13 +02:00
Matthew Honnibal 276fbe9996 * Fix assignment of iterator on Doc object 2016-05-02 15:26:24 +02:00
Wolfgang Seeker 5e2e8e951a add baseclass DocIterator for iterators over documents
add classes for English and German noun chunks

the respective iterators are set for the document when created by the parser
as they depend on the annotation scheme of the parsing model
2016-03-16 15:53:35 +01:00
Matthew Honnibal 6bb007d16e * Make set_parse nogil 2016-01-30 20:27:52 +01:00
Matthew Honnibal 56499d89ef * Rework the Span-merge patch, to avoid extending the interface of Doc, and avoid virtualizing the Span.start and Span.end indices, to keep Span usage efficient 2015-11-07 08:55:34 +11:00
Matthew Honnibal 68f479e821 * Rename Doc.data to Doc.c 2015-11-04 00:15:14 +11:00
Matthew Honnibal 77856c4fcd * Try giving Doc and Span objects vector and vector_norm attributes, and .similarity functions. Turns out to be bad idea. 2015-09-17 11:50:11 +10:00
Matthew Honnibal c2307fa9ee * More work on language-generic parsing 2015-08-28 02:02:33 +02:00
Matthew Honnibal 9c1724ecae * Gazetteer stuff working, now need to wire up to API 2015-08-06 00:35:40 +02:00
Matthew Honnibal 6609fcf4b2 * Make mem and vocab python-visible in Doc 2015-07-28 20:46:59 +02:00
Matthew Honnibal 8214b74eec * Restore _py_tokens cache, to handle orphan tokens. 2015-07-13 22:28:10 +02:00
Matthew Honnibal 67641f3b58 * Refactor tokenizer, to set the 'spacy' field on TokenC instead of passing a string 2015-07-13 21:46:02 +02:00
Matthew Honnibal 6eef0bf9ab * Break up tokens.pyx into tokens/doc.pyx, tokens/token.pyx, tokens/spans.pyx 2015-07-13 20:20:58 +02:00
Matthew Honnibal 3ea8756c24 * Add spacy/tokens/doc.pyx, for Doc class in its own file 2015-07-13 19:58:26 +02:00