spaCy/spacy
Matthew Honnibal df01a88763 Merge branch 'refactor' (and serializaton)
Add Huffman-code serialization, and do a lot of
refactoring. Highlights include:

* Much more efficient StringStore
* Vocab maintains a by-orth mapping of Lexemes
* Avoid manually slicing Py_UNICODE buffers,
  simplifying tokenizer and vocab C APIs
* Remove various bits of dead code
* Work on removing GIL around parser
* Work on bridge to Theano

Conflicts:
	spacy/strings.pxd
	spacy/strings.pyx
	spacy/structs.pxd
2015-07-23 02:18:35 +02:00
..
en * Set initial freqs, to avoid missing values in serializer 2015-07-23 01:16:27 +02:00
munge
serialize * Fix Packer API, so that it reads and writes bytes strings, instead of BitArray. Docs are always byte aligned anyway. 2015-07-23 01:13:02 +02:00
syntax * Update freqs for missing tags in ner, for serializer 2015-07-23 01:17:11 +02:00
tokens * Add working to/from bytes API to Doc 2015-07-23 01:14:45 +02:00
__init__.pxd
__init__.py
_ml.pxd
_ml.pyx
_nn.py
_nn.pyx
_theano.pxd
_theano.pyx * Begin reorganizing neuralnet work 2015-06-30 14:26:32 +02:00
attrs.pxd * Upd attrs id list 2015-07-16 01:26:54 +02:00
attrs.pyx
cfile.pxd * Add cfile.pyx 2015-07-23 01:10:36 +02:00
cfile.pyx * Add cfile.pyx 2015-07-23 01:10:36 +02:00
gold.pxd
gold.pyx
lexeme.pxd * Fix type declarations for attr_t. Remove unused id_t. 2015-07-18 22:39:57 +02:00
lexeme.pyx
morphology.pxd * Tmp commit. Refactoring to create a Python Lexeme class. 2015-01-12 10:26:22 +11:00
morphology.pyx
multi_words.py
orth.pxd * Make PyPy work 2015-01-05 17:54:38 +11:00
orth.pyx * Add length cap to word shape feature 2015-07-20 12:06:59 +02:00
parts_of_speech.pxd * Add SPACE part-of-speech tag, and train tagger to assign it. Also train tagger not to make whitespace an entity 2015-07-09 13:30:41 +02:00
parts_of_speech.pyx * Add SPACE part-of-speech tag, and train tagger to assign it. Also train tagger not to make whitespace an entity 2015-07-09 13:30:41 +02:00
scorer.py
senses.pxd
senses.pyx
strings.pxd * Replace UniStr, using unicode objects instead 2015-07-22 04:52:05 +02:00
strings.pyx Merge branch 'refactor' (and serializaton) 2015-07-23 02:18:35 +02:00
structs.pxd * Remove UniStr struct 2015-07-22 13:39:17 +02:00
tokenizer.pxd * Replace UniStr, using unicode objects instead 2015-07-22 04:52:05 +02:00
tokenizer.pyx * Fix tokenizer 2015-07-22 14:10:30 +02:00
typedefs.pxd
typedefs.pyx
util.py * Remove read_encoding_freqs from util.py 2015-07-23 01:17:32 +02:00
vocab.pxd * Add serializer property to Vocab, and lazy-load it. Add get_by_orth method. 2015-07-23 01:18:19 +02:00
vocab.pyx * Add serializer property to Vocab, and lazy-load it. Add get_by_orth method. 2015-07-23 01:18:19 +02:00