spaCy/spacy
Matthew Honnibal 590f38bdb2 * Add hacky solution to Issue #220. Currently specials.json only supports literal patterns, which doesn't allow us to pre-tag whitespace with the correct token, SP, as a rule. The data-driven approach should be easy but for some reason fails here. Adding a hard code in Morphology isn't a good solution, but we do want to fix the behaviour right away, and don't want to wait for an architecturally better solution. 2016-01-19 03:35:20 +01:00
..
data add data dir 2015-11-18 11:48:55 +01:00
de access model via sputnik 2015-12-07 06:01:28 +01:00
en * Restore the LOCAL_DATA_DIR global in spacy/en/__init__.py, although this is now deprecated 2016-01-19 02:54:56 +01:00
fi
it
munge * Fix Python3 problem in align_raw 2015-07-28 16:06:53 +02:00
serialize
syntax * Unwind limit to sentence boundary detection that prevents it from inserting boundaries on whitespace. Replace it with a check for whitespace in StateClass.fast_forward, so that whitespace is LeftArced when it's on the stack. This should prevent the previous problem of whitespace-only sentences. Should fix Issue #184, but may cause further problems. Needs testing. 2016-01-19 02:54:15 +01:00
tests * Unwind limit to sentence boundary detection that prevents it from inserting boundaries on whitespace. Replace it with a check for whitespace in StateClass.fast_forward, so that whitespace is LeftArced when it's on the stack. This should prevent the previous problem of whitespace-only sentences. Should fix Issue #184, but may cause further problems. Needs testing. 2016-01-19 02:54:15 +01:00
tokens * Disprefer punctuation and spaces as heads of spans 2016-01-18 18:14:09 +01:00
__init__.pxd
__init__.py
about.py
attrs.pxd
attrs.pyx * Map empty string to NULL_ATTR in attrs 2015-10-13 13:44:40 +11:00
cfile.pxd
cfile.pyx
gold.pxd * Remove unused import 2015-07-25 18:11:16 +02:00
gold.pyx * Use io module insteads of deprecated codecs module 2015-10-10 14:13:01 +11:00
language.py * Handle string paths in default_vocab, default_parser, default_entity in Language class 2016-01-18 22:37:24 +01:00
lemmatizer.py
lexeme.pxd * Fix ugly py_check_flag and py_set_flag functions in Lexeme 2015-09-15 13:06:18 +10:00
lexeme.pyx * Add .rank property to Token and Lexeme, for frequency rank 2015-11-08 16:18:25 +01:00
matcher.pyx
morphology.pxd
morphology.pyx
multi_words.py
orth.pxd
orth.pyx
parts_of_speech.pxd * Fix parts_of_speech now that symbols list has been reformed 2015-10-13 13:44:40 +11:00
parts_of_speech.pyx
scorer.py
strings.pxd
strings.pyx * Refactor away from the _ml module, to use thinc 4.0. Still some work needs to be done, e.g. to add __reduce__ to the models, more testing, etc. 2015-11-07 03:24:30 +11:00
structs.pxd
symbols.pxd
symbols.pyx
tagger.pxd
tagger.pyx untangle data_path/via 2016-01-16 12:23:45 +01:00
tokenizer.pxd
tokenizer.pyx
typedefs.pxd
typedefs.pyx
util.py
vocab.pxd
vocab.pyx