Commit Graph

36 Commits

Author SHA1 Message Date
Jens Dahl Møllerhøj e5055e3cf6 Add Danish lemmatizer (#2184)
* add danish lemmatizer

* fill contributor agreement
2018-04-07 19:07:28 +02:00
Kit 9bc524982e
Find lowercased forms of numeric words 2018-01-08 03:25:08 +01:00
Søren Lind Kristiansen bef735aef7 Fix Danish abbreviation 'm.h.t.' 2017-12-21 09:24:31 +01:00
Ines Montani a3dd167d7f
Merge branch 'master' into da_ud_tokenization 2017-12-20 21:05:34 +00:00
Søren Lind Kristiansen 7a2f2f6f94 Fix formatting. 2017-12-20 18:37:37 +01:00
Søren Lind Kristiansen 15d13efafd Tune Danish tokenizer to more closely match tokenization in Universal Dependencies. 2017-12-20 17:36:52 +01:00
Kim FalkJørgensen 648dc60755 Remove the incorrect exception 'm.h.t' 2017-12-20 10:02:39 +01:00
Kim FalkJørgensen 9c9f4ef84a Fixing a translation error in examples.py
Adding an exception in the tokenizer_exceptions.py
2017-12-19 15:26:50 +01:00
Søren Lind Kristiansen d86b537a38 Enable morph rules for Danish 2017-11-30 15:58:02 +01:00
Søren Lind Kristiansen 13a988adc3 Remove 'Number[psor]' 2017-11-30 15:55:04 +01:00
Søren Lind Kristiansen dd6fde18a9 Add more Danish morph rules and clean up existing ones 2017-11-30 11:17:19 +01:00
Ines Montani 9052643e2c
Merge pull request #1653 from sorenlind/da_example_typo
Fix typo
2017-11-27 14:47:42 +00:00
Søren Lind Kristiansen 5fe58b885b Fix typo 2017-11-27 15:36:18 +01:00
Ines Montani d52b1ab245
Add unicode_literals (hopefully fixes test failure on Python 2) 2017-11-27 15:16:54 +01:00
Søren Lind Kristiansen 0ffd27b0f6 Add several Danish alternative spellings 2017-11-27 13:35:41 +01:00
Søren Lind Kristiansen ef03e9ea53 Remove unused import. 2017-11-25 13:04:02 +01:00
Søren Lind Kristiansen 6aa241bcec Add day of month tokenizer exceptions for Danish. 2017-11-24 15:03:24 +01:00
Søren Lind Kristiansen 0c276ed020 Add weekday abbreviations and remove abiguous month abbreviations for Danish. 2017-11-24 14:43:29 +01:00
Søren Lind Kristiansen 056547e989 Add multiple tokenizer exceptions for Danish. 2017-11-24 11:51:26 +01:00
Søren Lind Kristiansen ac8116510d Fix tokenization of 'i.' for Danish. 2017-11-24 11:16:53 +01:00
ines acb9bdb852 Fix PRON_LEMMA imports 2017-11-06 17:41:53 +01:00
ines 819e30a26e Tidy up tokenizer exceptions 2017-11-01 23:02:45 +01:00
ines 7e424a1804 Don't copy exception dicts if not necessary and tidy up 2017-10-31 21:05:29 +01:00
Ines Montani facf77e541 Merge branch 'develop' into support-danish 2017-10-24 11:53:19 +02:00
ines 8ce6f96180 Don't make copies of language data components 2017-10-11 15:34:55 +02:00
ines 0c2343d73a Tidy up language data 2017-10-11 02:22:49 +02:00
ines 1fe5e1a4d1 Add language example sentences (see #1107)
da, de, en, es, fr, he, it, nb, pl, pt, sv
2017-08-19 12:22:29 +02:00
mollerhoj 85144835da Add Tag_map for Danish 2017-07-03 15:52:55 +02:00
mollerhoj 64c732918a Add Morph_rules. (TODO: Not working?) 2017-07-03 15:52:55 +02:00
mollerhoj 3b2cb107a3 Add like_num functionality to Danish 2017-07-03 15:49:51 +02:00
mollerhoj e8f40ceed8 Add short names of months to tokenizer_exceptions 2017-07-03 15:49:51 +02:00
mollerhoj dc5be7d2f3 Cleanup list of Danish stopwords 2017-07-03 15:40:58 +02:00
ines 4c643d74c5 Add norm exceptions to other Language classes 2017-06-03 22:29:21 +02:00
ines 924e8506de Move Defaults subclass to module scope (necessary for pickling) 2017-05-20 19:02:27 +02:00
ines 48177c4f92 Add missing tokenizer exceptions 2017-05-12 09:25:24 +02:00
ines bb8be3d194 Add Danish language data 2017-05-10 21:15:12 +02:00