spaCy

Marc Puig 51268e9f21 Typo error fixed (#3284 )	2019-02-17 17:51:02 +01:00
..
ar	Additions to Arabic stop words. (#2422 )	2018-06-08 02:33:23 +02:00
bn	Update morph_rules.py (#3283 )	2019-02-17 12:21:47 +01:00
ca	Typo error fixed (#3284 )	2019-02-17 17:51:02 +01:00
da	Add Danish lemmatizer (#2184 )	2018-04-07 19:07:28 +02:00
de	Also include lowercase norm exceptions	2018-10-13 15:37:30 +02:00
el	Optimize Greek language support (#2658 )	2018-08-14 02:31:32 +02:00
en	quick typo fix	2018-03-24 17:26:35 +01:00
es	Fix Spanish noun_chunks (resolves #2210 )	2018-04-18 18:44:01 -04:00
fa	Add Persian(Farsi) language support (#2797 )	2018-10-13 15:31:49 +02:00
fi	Enhancement/lang fi examples (#2547 )	2018-07-15 09:50:27 +02:00
fr	Improving the French lookup dictionnary for ambiguous words (#3185 )	2019-01-31 23:53:45 +01:00
ga	Remove comma that caused list to wrap in tuple!	2017-10-31 20:13:16 +01:00
he	Don't make copies of language data components	2017-10-11 15:34:55 +02:00
hi	Fix missing comma	2018-10-28 00:09:16 +02:00
hr	Update stop_words.py	2018-03-24 17:31:24 +01:00
hu	Don't copy exception dicts if not necessary and tidy up	2017-10-31 21:05:29 +01:00
id	Update Indonesian model (#2752 )	2018-09-14 12:30:32 +02:00
it	Fix syntax error in italian lemmatizer	2018-04-03 23:13:22 +02:00
ja	Making `lang/th/test_tokenizer.py` pass by creating `ThaiTokenizer` (#3078 )	2019-01-10 15:40:37 +01:00
kn	Update stop_words.py	2019-02-14 12:25:19 +01:00
nb	Updated wordforms for Norwegian lemmatizer (#3007 )	2018-12-06 15:46:18 +01:00
nl	Fix typo [ci skip]	2018-07-24 18:45:40 +02:00
pl	Improved polish tokenizer and stop words. (#2974 )	2019-02-08 14:27:21 +11:00
pt	Update Portuguese Language (#2790 )	2018-09-29 09:51:45 +02:00
ro	Updates to Romanian support (#2354 )	2018-05-24 11:40:00 +02:00
ru	Ukrainian language added. Small fixes in Russian (#3241 )	2019-02-07 21:05:11 +01:00
si	Adding "This is a sentence" example to Sinhala (#2846 )	2018-10-14 00:06:40 +02:00
sv	Fixed tag map for Swedish Talbanken (#3186 )	2019-02-08 14:28:59 +11:00
ta	Tamil (#3194 )	2019-01-27 06:02:04 +01:00
te	Basic support for Telugu language (#2751 )	2018-09-10 11:53:18 +02:00
th	Making `lang/th/test_tokenizer.py` pass by creating `ThaiTokenizer` (#3078 )	2019-01-10 15:40:37 +01:00
tl	Added alpha support for Tagalog language (#3062 )	2018-12-18 13:08:38 +01:00
tr	trilyon forgotten (#3083 )	2018-12-27 14:44:23 +01:00
tt	Add Tatar Language Support (#2444 )	2018-06-19 10:17:53 +02:00
uk	Ukrainian language added. Small fixes in Russian (#3241 )	2019-02-07 21:05:11 +01:00
ur	Add Urdu Language Support (#2430 )	2018-06-22 11:14:03 +02:00
vi	Add support for Vietnamese in spaCy by leveraging Pyvi, an external Vietnamese tokenizer (#2155 )	2018-03-29 12:19:51 +02:00
xx	Tidy up language data	2017-10-11 02:22:49 +02:00
zh	Fix Chinese language related bugs (#2634 )	2018-08-07 11:26:31 +02:00
__init__.py	Remove imports in /lang/__init__.py	2017-05-08 23:58:07 +02:00
char_classes.py	Ukrainian language added. Small fixes in Russian (#3241 )	2019-02-07 21:05:11 +01:00
entity_rules.py	Reorganise entity rules	2017-05-09 01:37:10 +02:00
lex_attrs.py	Merge pull request #1891 from fucking-signup/master	2018-02-18 13:47:47 +01:00
norm_exceptions.py	Update base norm exceptions with more unicode characters	2017-10-14 14:58:52 +02:00
punctuation.py	Add symbols class to punctuation rules to handle emoji (see #1088 )	2017-05-27 17:57:10 +02:00
tag_map.py	Fix formatting	2017-05-09 11:08:14 +02:00
tokenizer_exceptions.py	Tidy up tokenizer exceptions	2017-11-01 23:02:45 +01:00

ar

Additions to Arabic stop words. (#2422 )

2018-06-08 02:33:23 +02:00

bn

Update morph_rules.py (#3283 )

2019-02-17 12:21:47 +01:00

ca

Typo error fixed (#3284 )

2019-02-17 17:51:02 +01:00

da

Add Danish lemmatizer (#2184 )

2018-04-07 19:07:28 +02:00

de

Also include lowercase norm exceptions

2018-10-13 15:37:30 +02:00

el

Optimize Greek language support (#2658 )

2018-08-14 02:31:32 +02:00

en

quick typo fix

2018-03-24 17:26:35 +01:00

es

Fix Spanish noun_chunks (resolves #2210 )

2018-04-18 18:44:01 -04:00

fa

Add Persian(Farsi) language support (#2797 )

2018-10-13 15:31:49 +02:00

fi

Enhancement/lang fi examples (#2547 )

2018-07-15 09:50:27 +02:00

fr

Improving the French lookup dictionnary for ambiguous words (#3185 )

2019-01-31 23:53:45 +01:00

ga

Remove comma that caused list to wrap in tuple!

2017-10-31 20:13:16 +01:00

he

Don't make copies of language data components

2017-10-11 15:34:55 +02:00

hi

Fix missing comma

2018-10-28 00:09:16 +02:00

hr

Update stop_words.py

2018-03-24 17:31:24 +01:00

hu

Don't copy exception dicts if not necessary and tidy up

2017-10-31 21:05:29 +01:00

id

Update Indonesian model (#2752 )

2018-09-14 12:30:32 +02:00

it

Fix syntax error in italian lemmatizer

2018-04-03 23:13:22 +02:00

ja

Making `lang/th/test_tokenizer.py` pass by creating `ThaiTokenizer` (#3078 )

2019-01-10 15:40:37 +01:00

kn

Update stop_words.py

2019-02-14 12:25:19 +01:00

nb

Updated wordforms for Norwegian lemmatizer (#3007 )

2018-12-06 15:46:18 +01:00

nl

Fix typo [ci skip]

2018-07-24 18:45:40 +02:00

pl

Improved polish tokenizer and stop words. (#2974 )

2019-02-08 14:27:21 +11:00

pt

Update Portuguese Language (#2790 )

2018-09-29 09:51:45 +02:00

ro

Updates to Romanian support (#2354 )

2018-05-24 11:40:00 +02:00

ru

Ukrainian language added. Small fixes in Russian (#3241 )

2019-02-07 21:05:11 +01:00

si

Adding "This is a sentence" example to Sinhala (#2846 )

2018-10-14 00:06:40 +02:00

sv

Fixed tag map for Swedish Talbanken (#3186 )

2019-02-08 14:28:59 +11:00

ta

Tamil (#3194 )

2019-01-27 06:02:04 +01:00

te

Basic support for Telugu language (#2751 )

2018-09-10 11:53:18 +02:00

th

Making `lang/th/test_tokenizer.py` pass by creating `ThaiTokenizer` (#3078 )

2019-01-10 15:40:37 +01:00

tl

Added alpha support for Tagalog language (#3062 )

2018-12-18 13:08:38 +01:00

tr

trilyon forgotten (#3083 )

2018-12-27 14:44:23 +01:00

tt

Add Tatar Language Support (#2444 )

2018-06-19 10:17:53 +02:00

uk

Ukrainian language added. Small fixes in Russian (#3241 )

2019-02-07 21:05:11 +01:00

ur

Add Urdu Language Support (#2430 )

2018-06-22 11:14:03 +02:00

vi

Add support for Vietnamese in spaCy by leveraging Pyvi, an external Vietnamese tokenizer (#2155 )

2018-03-29 12:19:51 +02:00

xx

Tidy up language data

2017-10-11 02:22:49 +02:00

zh

Fix Chinese language related bugs (#2634 )

2018-08-07 11:26:31 +02:00

__init__.py

Remove imports in /lang/__init__.py

2017-05-08 23:58:07 +02:00

char_classes.py

Ukrainian language added. Small fixes in Russian (#3241 )

2019-02-07 21:05:11 +01:00

entity_rules.py

Reorganise entity rules

2017-05-09 01:37:10 +02:00

lex_attrs.py

Merge pull request #1891 from fucking-signup/master

2018-02-18 13:47:47 +01:00

norm_exceptions.py

Update base norm exceptions with more unicode characters

2017-10-14 14:58:52 +02:00

punctuation.py

Add symbols class to punctuation rules to handle emoji (see #1088 )

2017-05-27 17:57:10 +02:00

tag_map.py

Fix formatting

2017-05-09 11:08:14 +02:00

tokenizer_exceptions.py

Tidy up tokenizer exceptions

2017-11-01 23:02:45 +01:00