.. |
__init__.py
|
* Basic punct tests updated and passing
|
2014-08-27 19:38:57 +02:00 |
de.pxd
|
* Add German tokenizer files
|
2014-09-25 18:29:13 +02:00 |
de.pyx
|
* Add German tokenizer files
|
2014-09-25 18:29:13 +02:00 |
en.pxd
|
* Refactor tokenization, splitting it into a clearer life-cycle.
|
2014-09-16 13:16:02 +02:00 |
en.pyx
|
* Switch to new data model, tests passing
|
2014-10-10 08:11:31 +11:00 |
lang.pxd
|
* Remove the feature array stuff from Tokens class, and replace vector with array-based implementation, with padding.
|
2014-10-23 01:57:59 +11:00 |
lang.pyx
|
* Remove the feature array stuff from Tokens class, and replace vector with array-based implementation, with padding.
|
2014-10-23 01:57:59 +11:00 |
lexeme.pxd
|
* Remove the feature array stuff from Tokens class, and replace vector with array-based implementation, with padding.
|
2014-10-23 01:57:59 +11:00 |
lexeme.pyx
|
* Remove the feature array stuff from Tokens class, and replace vector with array-based implementation, with padding.
|
2014-10-23 01:57:59 +11:00 |
orth.py
|
* Add offsets to Tokens class. Some changes to interfaces, and reorganization of spacy.Lang
|
2014-10-14 16:17:45 +11:00 |
pos.pxd
|
* Add greedy pos tagger
|
2014-10-22 10:17:26 +11:00 |
pos.pyx
|
* Upd for refactored Tokens class. Now gets 95.74, 185ms training on swbd_wsj_ewtb, eval on onto_web, Google POS tags.
|
2014-10-23 03:20:02 +11:00 |
pos_util.py
|
* Add POS utilities
|
2014-10-22 10:17:57 +11:00 |
ptb3.pxd
|
* Adding PTB3 tokenizer back in, so can understand how much boilerplate is in the docs for multiple tokenizers
|
2014-08-29 02:30:27 +02:00 |
ptb3.pyx
|
* Switch to using a Python ref counted gateway to malloc/free, to prevent memory leaks
|
2014-09-17 20:02:26 +02:00 |
tokens.pxd
|
* Remove the feature array stuff from Tokens class, and replace vector with array-based implementation, with padding.
|
2014-10-23 01:57:59 +11:00 |
tokens.pyx
|
* Remove the feature array stuff from Tokens class, and replace vector with array-based implementation, with padding.
|
2014-10-23 01:57:59 +11:00 |
typedefs.pxd
|
* Add offsets to Tokens class. Some changes to interfaces, and reorganization of spacy.Lang
|
2014-10-14 16:17:45 +11:00 |
util.py
|
* Add function to read detokenization rules
|
2014-10-22 12:54:59 +11:00 |
word.pxd
|
* Switch to new data model, tests passing
|
2014-10-10 08:11:31 +11:00 |
word.pyx
|
* Slight cleaning of tokenizer code
|
2014-10-10 19:17:22 +11:00 |