Commit Graph

12 Commits

Author SHA1 Message Date
Wolfgang Seeker 5e2e8e951a add baseclass DocIterator for iterators over documents
add classes for English and German noun chunks

the respective iterators are set for the document when created by the parser
as they depend on the annotation scheme of the parsing model
2016-03-16 15:53:35 +01:00
Matthew Honnibal 6bb007d16e * Make set_parse nogil 2016-01-30 20:27:52 +01:00
Matthew Honnibal 56499d89ef * Rework the Span-merge patch, to avoid extending the interface of Doc, and avoid virtualizing the Span.start and Span.end indices, to keep Span usage efficient 2015-11-07 08:55:34 +11:00
Matthew Honnibal 68f479e821 * Rename Doc.data to Doc.c 2015-11-04 00:15:14 +11:00
Matthew Honnibal 77856c4fcd * Try giving Doc and Span objects vector and vector_norm attributes, and .similarity functions. Turns out to be bad idea. 2015-09-17 11:50:11 +10:00
Matthew Honnibal c2307fa9ee * More work on language-generic parsing 2015-08-28 02:02:33 +02:00
Matthew Honnibal 9c1724ecae * Gazetteer stuff working, now need to wire up to API 2015-08-06 00:35:40 +02:00
Matthew Honnibal 6609fcf4b2 * Make mem and vocab python-visible in Doc 2015-07-28 20:46:59 +02:00
Matthew Honnibal 8214b74eec * Restore _py_tokens cache, to handle orphan tokens. 2015-07-13 22:28:10 +02:00
Matthew Honnibal 67641f3b58 * Refactor tokenizer, to set the 'spacy' field on TokenC instead of passing a string 2015-07-13 21:46:02 +02:00
Matthew Honnibal 6eef0bf9ab * Break up tokens.pyx into tokens/doc.pyx, tokens/token.pyx, tokens/spans.pyx 2015-07-13 20:20:58 +02:00
Matthew Honnibal 3ea8756c24 * Add spacy/tokens/doc.pyx, for Doc class in its own file 2015-07-13 19:58:26 +02:00