Commit Graph

248 Commits

Author SHA1 Message Date
Kevin Humphreys 7918fa4ef9 handle would've 2018-01-03 12:25:48 -08:00
zqhZY f27859fa99 add ChineseDefaults class for pickling 2017-12-28 17:13:58 +08:00
Søren Lind Kristiansen bef735aef7 Fix Danish abbreviation 'm.h.t.' 2017-12-21 09:24:31 +01:00
Ines Montani a3dd167d7f
Merge branch 'master' into da_ud_tokenization 2017-12-20 21:05:34 +00:00
Ines Montani 97f100f69f
Merge pull request #1742 from kimfalk/master
Two corrections in the da lan.
2017-12-20 21:02:00 +00:00
Ines Montani d682a8803e
Merge pull request #1672 from cbilgili/master
Adds Turkish Lemmatization
2017-12-20 21:01:00 +00:00
Benjamin Peterson 9452134cd1 remove no-break spaces from Hindi example (fixes #1750) 2017-12-20 11:35:30 -08:00
Søren Lind Kristiansen 7a2f2f6f94 Fix formatting. 2017-12-20 18:37:37 +01:00
Søren Lind Kristiansen 15d13efafd Tune Danish tokenizer to more closely match tokenization in Universal Dependencies. 2017-12-20 17:36:52 +01:00
Kim FalkJørgensen 648dc60755 Remove the incorrect exception 'm.h.t' 2017-12-20 10:02:39 +01:00
Kim FalkJørgensen 9c9f4ef84a Fixing a translation error in examples.py
Adding an exception in the tokenizer_exceptions.py
2017-12-19 15:26:50 +01:00
ines 22dc744b48 Fix check for '@' in like_url (see #1715) 2017-12-16 13:48:43 +01:00
Ines Montani 6455b574fc
Check for email address first 2017-12-12 10:25:13 +01:00
Bri-Will d77361d76c
Update lex_attrs.py. Fix like_url from matching on e-mail 2017-12-11 14:13:28 -08:00
Matthew Honnibal 2ab0f2d186
Merge pull request #1664 from jimregan/italian-lemmatizer
BOM in Italian lemmatiser
2017-12-06 11:09:04 +01:00
Matthew Honnibal 3f247119d3
Merge pull request #1668 from sorenlind/da_morph
Add more Danish morph rules and clean up existing ones
2017-12-06 11:08:09 +01:00
ines f2ea6d4713 Add Dutch example sentences (see #1107) 2017-12-01 23:36:05 +01:00
Canbey Bilgili abe098b255 Adds Turkish Lemmatization 2017-12-01 17:04:32 +03:00
Søren Lind Kristiansen d86b537a38 Enable morph rules for Danish 2017-11-30 15:58:02 +01:00
Søren Lind Kristiansen 13a988adc3 Remove 'Number[psor]' 2017-11-30 15:55:04 +01:00
Søren Lind Kristiansen dd6fde18a9 Add more Danish morph rules and clean up existing ones 2017-11-30 11:17:19 +01:00
Vadim Mazaev 4ba7ddf651 Bugfixies 2017-11-30 12:29:38 +03:00
Matthew Honnibal f9ed9ea529
Merge pull request #1624 from GreenRiverRUS/russian
Add support for Russian
2017-11-29 23:10:01 +01:00
Jim O'Regan ba6a23fd11 BOM in Italian lemmatiser 2017-11-29 17:40:07 +00:00
Ines Montani 9052643e2c
Merge pull request #1653 from sorenlind/da_example_typo
Fix typo
2017-11-27 14:47:42 +00:00
Søren Lind Kristiansen 5fe58b885b Fix typo 2017-11-27 15:36:18 +01:00
Ines Montani d52b1ab245
Add unicode_literals (hopefully fixes test failure on Python 2) 2017-11-27 15:16:54 +01:00
Søren Lind Kristiansen 0ffd27b0f6 Add several Danish alternative spellings 2017-11-27 13:35:41 +01:00
Vadim Mazaev cacd859dcd Added tag map, fixed tests fails, added more exceptions 2017-11-26 20:54:48 +03:00
Søren Lind Kristiansen ef03e9ea53 Remove unused import. 2017-11-25 13:04:02 +01:00
Søren Lind Kristiansen 6aa241bcec Add day of month tokenizer exceptions for Danish. 2017-11-24 15:03:24 +01:00
Søren Lind Kristiansen 0c276ed020 Add weekday abbreviations and remove abiguous month abbreviations for Danish. 2017-11-24 14:43:29 +01:00
Søren Lind Kristiansen 056547e989 Add multiple tokenizer exceptions for Danish. 2017-11-24 11:51:26 +01:00
Søren Lind Kristiansen ac8116510d Fix tokenization of 'i.' for Danish. 2017-11-24 11:16:53 +01:00
Vadim Mazaev 81314f8659 Fixed tokenizer: added char classes; added first lemmatizer and
tokenizer tests
2017-11-21 22:23:59 +03:00
Vadim Mazaev 52ee1f9bf9 Updated Russian Language, added lemmatizer, norm exceptions and lex
attrs
2017-11-21 11:44:46 +03:00
Vadim Mazaev a0739a06d4 Returned russian support from v1.10 branch 2017-11-17 17:06:15 +03:00
ines c9d72de0fb Add dummy serialization methods for Japanese and missing lang getter (resolves #1557) 2017-11-15 12:44:02 +01:00
Mathias Deschamps c0691b2ab4 Add tokenizer exceptions for ing verbs
Extend list of tokenizing exceptions introduced in 123810b
2017-11-13 17:46:05 +01:00
Mathias Deschamps 288298ead9 Add norm exception for ing verbs
Some ing verbs are sometimes written in or in'. Make the NORM form correct
2017-11-13 17:46:05 +01:00
Abhinav Sharma 59f5740ede
improved upon the list of included stop_words 2017-11-13 17:13:49 +05:30
ines 123810b6de Add "lovin'" to tokenizer exceptions (see #1248) 2017-11-09 17:09:30 +01:00
Ines Montani 42b241ccd0
Update language code in usage example in comment 2017-11-08 11:36:38 +01:00
Abhinav Sharma 84edade82d
Create examples.py
Populated the file with the translations of English example sentences
2017-11-08 13:23:08 +05:30
ines bcf42b8846 Fix typo 2017-11-08 01:06:37 +01:00
ines acb9bdb852 Fix PRON_LEMMA imports 2017-11-06 17:41:53 +01:00
ines baa231745c Fix Dutch tag map 2017-11-05 21:41:50 +01:00
ines 507ecb67af Fix Spanish tag map 2017-11-05 19:23:34 +01:00
ines 975e1042ff Fix Italian tag map 2017-11-05 18:34:09 +01:00
ines 6b2d6e4937 Fix Portuguese tag map 2017-11-05 18:31:00 +01:00