Matthew Honnibal
|
b0f6fd3f1d
|
Disable tokenizer cache for special-cases. Fixes #1250
|
2017-10-24 16:08:05 +02:00 |
Ines Montani
|
aa876884f0
|
Revert "Revert "Merge remote-tracking branch 'origin/master'""
This reverts commit fb9d3bb022 .
|
2017-01-09 13:28:13 +01:00 |
Matthew Honnibal
|
fd65cf6cbb
|
Finish refactoring data loading
|
2016-09-24 20:26:17 +02:00 |
Matthew Honnibal
|
141639ea3a
|
* Fix bug in tokenizer that caused new tokens to be added for affixes
|
2016-02-21 23:17:47 +00:00 |
Chris DuBois
|
dac8fe7bdb
|
Add __reduce__ to Tokenizer so that English pickles.
- Add tests to test_pickle and test_tokenizer that save to tempfiles.
|
2015-10-23 22:24:03 -07:00 |
Matthew Honnibal
|
c2307fa9ee
|
* More work on language-generic parsing
|
2015-08-28 02:02:33 +02:00 |
Matthew Honnibal
|
119c0f8c3f
|
* Hack out morphology stuff from tokenizer, while morphology being reimplemented.
|
2015-08-26 19:20:11 +02:00 |
Matthew Honnibal
|
109106a949
|
* Replace UniStr, using unicode objects instead
|
2015-07-22 04:52:05 +02:00 |
Matthew Honnibal
|
cfd842769e
|
* Allow infix tokens to be variable length
|
2015-07-18 22:45:00 +02:00 |
Matthew Honnibal
|
67641f3b58
|
* Refactor tokenizer, to set the 'spacy' field on TokenC instead of passing a string
|
2015-07-13 21:46:02 +02:00 |
Matthew Honnibal
|
6eef0bf9ab
|
* Break up tokens.pyx into tokens/doc.pyx, tokens/token.pyx, tokens/spans.pyx
|
2015-07-13 20:20:58 +02:00 |
Matthew Honnibal
|
bb522496dd
|
* Rename Tokens to Doc
|
2015-07-08 18:53:00 +02:00 |
Matthew Honnibal
|
6c7e44140b
|
* Work on word vectors, and other stuff
|
2015-01-17 16:21:17 +11:00 |
Matthew Honnibal
|
ce2edd6312
|
* Tmp commit. Refactoring to create a Python Lexeme class.
|
2015-01-12 10:26:22 +11:00 |
Matthew Honnibal
|
a60ae261ae
|
* Move tokenizer to its own file, and refactor
|
2014-12-20 07:29:16 +11:00 |