mirror of https://github.com/explosion/spaCy.git
df01a88763
Add Huffman-code serialization, and do a lot of refactoring. Highlights include: * Much more efficient StringStore * Vocab maintains a by-orth mapping of Lexemes * Avoid manually slicing Py_UNICODE buffers, simplifying tokenizer and vocab C APIs * Remove various bits of dead code * Work on removing GIL around parser * Work on bridge to Theano Conflicts: spacy/strings.pxd spacy/strings.pyx spacy/structs.pxd |
||
---|---|---|
.. | ||
en | ||
munge | ||
serialize | ||
syntax | ||
tokens | ||
__init__.pxd | ||
__init__.py | ||
_ml.pxd | ||
_ml.pyx | ||
_nn.py | ||
_nn.pyx | ||
_theano.pxd | ||
_theano.pyx | ||
attrs.pxd | ||
attrs.pyx | ||
cfile.pxd | ||
cfile.pyx | ||
gold.pxd | ||
gold.pyx | ||
lexeme.pxd | ||
lexeme.pyx | ||
morphology.pxd | ||
morphology.pyx | ||
multi_words.py | ||
orth.pxd | ||
orth.pyx | ||
parts_of_speech.pxd | ||
parts_of_speech.pyx | ||
scorer.py | ||
senses.pxd | ||
senses.pyx | ||
strings.pxd | ||
strings.pyx | ||
structs.pxd | ||
tokenizer.pxd | ||
tokenizer.pyx | ||
typedefs.pxd | ||
typedefs.pyx | ||
util.py | ||
vocab.pxd | ||
vocab.pyx |