Commit Graph

233 Commits

Author SHA1 Message Date
Matthew Honnibal 2ab0f2d186
Merge pull request #1664 from jimregan/italian-lemmatizer
BOM in Italian lemmatiser
2017-12-06 11:09:04 +01:00
Matthew Honnibal 3f247119d3
Merge pull request #1668 from sorenlind/da_morph
Add more Danish morph rules and clean up existing ones
2017-12-06 11:08:09 +01:00
ines f2ea6d4713 Add Dutch example sentences (see #1107) 2017-12-01 23:36:05 +01:00
Søren Lind Kristiansen d86b537a38 Enable morph rules for Danish 2017-11-30 15:58:02 +01:00
Søren Lind Kristiansen 13a988adc3 Remove 'Number[psor]' 2017-11-30 15:55:04 +01:00
Søren Lind Kristiansen dd6fde18a9 Add more Danish morph rules and clean up existing ones 2017-11-30 11:17:19 +01:00
Vadim Mazaev 4ba7ddf651 Bugfixies 2017-11-30 12:29:38 +03:00
Matthew Honnibal f9ed9ea529
Merge pull request #1624 from GreenRiverRUS/russian
Add support for Russian
2017-11-29 23:10:01 +01:00
Jim O'Regan ba6a23fd11 BOM in Italian lemmatiser 2017-11-29 17:40:07 +00:00
Ines Montani 9052643e2c
Merge pull request #1653 from sorenlind/da_example_typo
Fix typo
2017-11-27 14:47:42 +00:00
Søren Lind Kristiansen 5fe58b885b Fix typo 2017-11-27 15:36:18 +01:00
Ines Montani d52b1ab245
Add unicode_literals (hopefully fixes test failure on Python 2) 2017-11-27 15:16:54 +01:00
Søren Lind Kristiansen 0ffd27b0f6 Add several Danish alternative spellings 2017-11-27 13:35:41 +01:00
Vadim Mazaev cacd859dcd Added tag map, fixed tests fails, added more exceptions 2017-11-26 20:54:48 +03:00
Søren Lind Kristiansen ef03e9ea53 Remove unused import. 2017-11-25 13:04:02 +01:00
Søren Lind Kristiansen 6aa241bcec Add day of month tokenizer exceptions for Danish. 2017-11-24 15:03:24 +01:00
Søren Lind Kristiansen 0c276ed020 Add weekday abbreviations and remove abiguous month abbreviations for Danish. 2017-11-24 14:43:29 +01:00
Søren Lind Kristiansen 056547e989 Add multiple tokenizer exceptions for Danish. 2017-11-24 11:51:26 +01:00
Søren Lind Kristiansen ac8116510d Fix tokenization of 'i.' for Danish. 2017-11-24 11:16:53 +01:00
Vadim Mazaev 81314f8659 Fixed tokenizer: added char classes; added first lemmatizer and
tokenizer tests
2017-11-21 22:23:59 +03:00
Vadim Mazaev 52ee1f9bf9 Updated Russian Language, added lemmatizer, norm exceptions and lex
attrs
2017-11-21 11:44:46 +03:00
Vadim Mazaev a0739a06d4 Returned russian support from v1.10 branch 2017-11-17 17:06:15 +03:00
ines c9d72de0fb Add dummy serialization methods for Japanese and missing lang getter (resolves #1557) 2017-11-15 12:44:02 +01:00
Mathias Deschamps c0691b2ab4 Add tokenizer exceptions for ing verbs
Extend list of tokenizing exceptions introduced in 123810b
2017-11-13 17:46:05 +01:00
Mathias Deschamps 288298ead9 Add norm exception for ing verbs
Some ing verbs are sometimes written in or in'. Make the NORM form correct
2017-11-13 17:46:05 +01:00
Abhinav Sharma 59f5740ede
improved upon the list of included stop_words 2017-11-13 17:13:49 +05:30
ines 123810b6de Add "lovin'" to tokenizer exceptions (see #1248) 2017-11-09 17:09:30 +01:00
Ines Montani 42b241ccd0
Update language code in usage example in comment 2017-11-08 11:36:38 +01:00
Abhinav Sharma 84edade82d
Create examples.py
Populated the file with the translations of English example sentences
2017-11-08 13:23:08 +05:30
ines bcf42b8846 Fix typo 2017-11-08 01:06:37 +01:00
ines acb9bdb852 Fix PRON_LEMMA imports 2017-11-06 17:41:53 +01:00
ines baa231745c Fix Dutch tag map 2017-11-05 21:41:50 +01:00
ines 507ecb67af Fix Spanish tag map 2017-11-05 19:23:34 +01:00
ines 975e1042ff Fix Italian tag map 2017-11-05 18:34:09 +01:00
ines 6b2d6e4937 Fix Portuguese tag map 2017-11-05 18:31:00 +01:00
ines fa2687fded Fix Dutch tag map 2017-11-05 17:57:59 +01:00
ines fb8990d916 Fix Spanish tag map 2017-11-05 17:48:46 +01:00
ines 9d13288f73 Fix French tag map 2017-11-05 17:47:59 +01:00
ines 54579805c5 Fix French tag map 2017-11-05 17:44:05 +01:00
Matthew Honnibal 0d4bd6414e Fix Italian tag map 2017-11-05 14:11:03 +01:00
ines ef597622a6 Add Portuguese tag map 2017-11-05 13:58:34 +01:00
ines 793c62dfda Add Dutch tag map 2017-11-05 13:48:07 +01:00
ines f7485a09c8 Fix Italian tag map 2017-11-05 13:12:58 +01:00
ines 3cef901834 Add tag map for French and Italian 2017-11-04 23:32:51 +01:00
ines 6c15aafebd Fix formatting 2017-11-04 23:07:02 +01:00
ines 9baab241b4 Add skeleton language data for Turkish 2017-11-02 16:32:24 +01:00
ines c6fea3e5f6 Add Romanian and Croatian skeletons (experimental)
Add language data templates to make it easier for others to contribute to the language support
2017-11-01 23:04:28 +01:00
ines 18c859500b Add missing imports 2017-11-01 23:02:51 +01:00
ines 819e30a26e Tidy up tokenizer exceptions 2017-11-01 23:02:45 +01:00
ines 9659391944 Update deprecated methods and add warnings 2017-11-01 16:49:42 +01:00