spaCy

History

Matthew Honnibal 590f38bdb2 * Add hacky solution to Issue #220 . Currently specials.json only supports literal patterns, which doesn't allow us to pre-tag whitespace with the correct token, SP, as a rule. The data-driven approach should be easy but for some reason fails here. Adding a hard code in Morphology isn't a good solution, but we do want to fix the behaviour right away, and don't want to wait for an architecturally better solution.		2016-01-19 03:35:20 +01:00
..
data	add data dir	2015-11-18 11:48:55 +01:00
de	access model via sputnik	2015-12-07 06:01:28 +01:00
en	* Restore the LOCAL_DATA_DIR global in spacy/en/__init__.py, although this is now deprecated	2016-01-19 02:54:56 +01:00
fi	…
it	…
munge	* Fix Python3 problem in align_raw	2015-07-28 16:06:53 +02:00
serialize	…
syntax	* Unwind limit to sentence boundary detection that prevents it from inserting boundaries on whitespace. Replace it with a check for whitespace in StateClass.fast_forward, so that whitespace is LeftArced when it's on the stack. This should prevent the previous problem of whitespace-only sentences. Should fix Issue #184 , but may cause further problems. Needs testing.	2016-01-19 02:54:15 +01:00
tests	* Unwind limit to sentence boundary detection that prevents it from inserting boundaries on whitespace. Replace it with a check for whitespace in StateClass.fast_forward, so that whitespace is LeftArced when it's on the stack. This should prevent the previous problem of whitespace-only sentences. Should fix Issue #184 , but may cause further problems. Needs testing.	2016-01-19 02:54:15 +01:00
tokens	* Disprefer punctuation and spaces as heads of spans	2016-01-18 18:14:09 +01:00
__init__.pxd	…
__init__.py	…
about.py	…
attrs.pxd	…
attrs.pyx	* Map empty string to NULL_ATTR in attrs	2015-10-13 13:44:40 +11:00
cfile.pxd	…
cfile.pyx	…
gold.pxd	* Remove unused import	2015-07-25 18:11:16 +02:00
gold.pyx	* Use io module insteads of deprecated codecs module	2015-10-10 14:13:01 +11:00
language.py	* Handle string paths in default_vocab, default_parser, default_entity in Language class	2016-01-18 22:37:24 +01:00
lemmatizer.py	…
lexeme.pxd	* Fix ugly py_check_flag and py_set_flag functions in Lexeme	2015-09-15 13:06:18 +10:00
lexeme.pyx	* Add .rank property to Token and Lexeme, for frequency rank	2015-11-08 16:18:25 +01:00
matcher.pyx	…
morphology.pxd	…
morphology.pyx	…
multi_words.py	…
orth.pxd	…
orth.pyx	…
parts_of_speech.pxd	* Fix parts_of_speech now that symbols list has been reformed	2015-10-13 13:44:40 +11:00
parts_of_speech.pyx	…
scorer.py	…
strings.pxd	…
strings.pyx	* Refactor away from the _ml module, to use thinc 4.0. Still some work needs to be done, e.g. to add __reduce__ to the models, more testing, etc.	2015-11-07 03:24:30 +11:00
structs.pxd	…
symbols.pxd	…
symbols.pyx	…
tagger.pxd	…
tagger.pyx	untangle data_path/via	2016-01-16 12:23:45 +01:00
tokenizer.pxd	…
tokenizer.pyx	…
typedefs.pxd	…
typedefs.pyx	…
util.py	…
vocab.pxd	…
vocab.pyx	…