spaCy/spacy
Ines Montani 78f754dd9a Merge pull request #705 from oroszgy/hu_tokenizer
Initial support for Hungarian
2016-12-27 00:48:13 +01:00
..
de Update tokenizer exceptions for German 2016-12-21 18:06:27 +01:00
en
es Fix formatting and consistency 2016-12-23 21:36:01 +01:00
fr Reorganise language data 2016-12-18 16:54:19 +01:00
hu Reformat stop words for better readability 2016-12-24 00:58:40 +01:00
it
language_data Add DET_LEMMA constant 2016-12-21 18:05:41 +01:00
munge * Fix Python3 problem in align_raw 2015-07-28 16:06:53 +02:00
nl
pt
serialize Fix Issue #459 -- failed to deserialize empty doc. 2016-10-23 16:31:05 +02:00
sv Added morph rules 2016-12-20 13:18:41 +01:00
syntax
tests
tokens
zh
__init__.pxd
__init__.py
about.py
attrs.pxd Whitespace 2016-12-18 16:51:40 +01:00
attrs.pyx
cfile.pxd
cfile.pyx
deprecated.py Finish refactoring data loading 2016-09-24 20:26:17 +02:00
download.py Let --data-path be specified when running download.py scripts 2016-11-20 15:48:04 +00:00
gold.pxd
gold.pyx Merge old training fixes with newer state 2016-11-25 09:16:36 -06:00
language.py
lemmatizer.py
lexeme.pxd
lexeme.pyx
matcher.pyx
morphology.pxd
morphology.pyx
multi_words.py * Fix Issue #50: Python 3 compatibility of v0.80 2015-04-13 05:59:43 +02:00
orth.pxd
orth.pyx introduce lang field for LexemeC to hold language id 2016-03-10 13:01:34 +01:00
parts_of_speech.pxd
parts_of_speech.pyx * Fix NAMES list in spacy/parts_of_speech.pyx 2015-10-13 14:18:45 +11:00
pipeline.pxd
pipeline.pyx
scorer.py
strings.pxd
strings.pyx
structs.pxd Initial, limited support for quantified patterns in Matcher, and tracking of ent_id attribute in Token and Span. The quantifiers need a lot more testing, and there are some known problems. The main known problem is that the zero-plus and one-plus quantifiers won't work if a token can match both the quantified pattern expression AND the tail of the match. 2016-09-21 14:54:55 +02:00
symbols.pxd Update symbols.pxd 2016-12-12 10:09:58 +11:00
symbols.pyx
tagger.pxd
tagger.pyx
tokenizer.pxd
tokenizer.pyx Temporarily put back the tokenize_from_strings method, while tests aren't updated yet. 2016-11-04 19:18:07 +01:00
train.py
typedefs.pxd
typedefs.pyx * Move POS tag definitions to parts_of_speech.pxd 2015-01-25 16:31:07 +11:00
util.py Move update_exc to global language data utils 2016-12-17 12:29:02 +01:00
vocab.pxd
vocab.pyx