Commit Graph

118 Commits

Author SHA1 Message Date
Matthew Honnibal b29e6bff46 Improve lemmatization rule for am|VBP 2017-09-04 15:18:10 +02:00
Matthew Honnibal 2e28982e28 Merge pull request #1288 from geovedi/indonesian
Indonesian language support
2017-08-26 21:31:13 +02:00
Matthew Honnibal cfc055734e Split % in units, for compatibility with corpus 2017-08-25 20:03:37 -05:00
Jim Geovedi 58d8078971 Merge remote-tracking branch 'upstream/develop' into indonesian 2017-08-25 09:21:49 +08:00
Matthew Honnibal bb2541ffd3 Fix PROB attr for OOV words 2017-08-23 12:11:52 +02:00
ines a68dc891ea Port over changes from #1281 2017-08-21 23:19:18 +02:00
Jim Geovedi f77443ab68 reworked 2017-08-20 13:43:21 +07:00
Jim Geovedi b7d83f37c8 indonesian abbr. 2017-08-20 12:16:50 +07:00
Jim Geovedi 7193c47f0b direct lookup 2017-08-20 11:57:52 +07:00
Jim Geovedi fdf802d505 added examples 2017-08-20 11:57:10 +07:00
Jim Geovedi fa544e6c9a Merge remote-tracking branch 'upstream/develop' into indonesian 2017-08-20 11:49:40 +07:00
ines 1fe5e1a4d1 Add language example sentences (see #1107)
da, de, en, es, fr, he, it, nb, pl, pt, sv
2017-08-19 12:22:29 +02:00
Jim Geovedi 37f19f5ed2 added more currencies based on corpus data 2017-08-03 13:03:25 +07:00
Jim Geovedi 30fd068d42 hashtag prefix should be handled somewhere else 2017-08-03 13:03:02 +07:00
Jim Geovedi ba07e23c87 added USD in currency rules 2017-08-02 22:42:47 +07:00
Jim Geovedi bb08d696f9 added hashtag rule and fixed currency rules 2017-07-30 21:23:28 +07:00
Jim Geovedi e9af79a803 added u-\d+ rules (sports team) 2017-07-30 21:23:01 +07:00
Jim Geovedi e5adc26c72 simplified rules 2017-07-29 18:21:32 +07:00
Jim Geovedi 4d04898dea updated regexp 2017-07-29 17:44:57 +07:00
Jim Geovedi 7d96d477ea updated like_num 2017-07-29 17:44:46 +07:00
Jim Geovedi 3cca4ed798 added lex attrs rules 2017-07-29 17:22:21 +07:00
Jim Geovedi 8b814c63f1 more exceptions 2017-07-27 19:46:30 +07:00
Jim Geovedi 6c725e8dcf updated lemma 2017-07-27 19:46:21 +07:00
Jim Geovedi 547973b92a wip syntax iterators 2017-07-27 10:51:34 +07:00
Jim Geovedi bbc75da38d enable syntax iterator and lemma lookup 2017-07-27 10:51:15 +07:00
Jim Geovedi 24a8c8bf28 added wip lemma dict 2017-07-26 21:39:54 +07:00
Jim Geovedi 63f14ba46b added hyphen-suffix rules 2017-07-26 19:28:57 +07:00
Jim Geovedi f288964441 removed -el from suffix rules 2017-07-26 19:28:38 +07:00
Jim Geovedi 6eee7a7411 updated tokenizer exceptions 2017-07-26 19:13:47 +07:00
Jim Geovedi edec51b1b1 update punctuation rules 2017-07-26 19:13:36 +07:00
Jim Geovedi 62443d495a enable token match 2017-07-26 19:13:14 +07:00
Jim Geovedi c97f5ae0bb updated tokenizer exceptions 2017-07-26 19:12:52 +07:00
Jim Geovedi 73f6ac9d9b added hyhen 2017-07-24 15:56:31 +07:00
Jim Geovedi 68454c40bf added missing import 2017-07-24 14:12:34 +07:00
Jim Geovedi eaf9cbd708 cursed of copy & paste 2017-07-24 14:11:51 +07:00
Jim Geovedi 7aad6718bc enable tokenizer exceptions 2017-07-24 14:11:10 +07:00
Jim Geovedi ad56c9179a added tokenizer exceptions list 2017-07-24 14:10:16 +07:00
Jim Geovedi c1f3fe99fe updated punctuation rules 2017-07-24 13:57:21 +07:00
Jim Geovedi 37fa2c8c80 punctution rules 2017-07-24 06:17:18 +07:00
Jim Geovedi 082e94ac1c added inflix rules 2017-07-24 06:17:07 +07:00
Jim Geovedi d0ec484725 reverted 2017-07-24 06:16:29 +07:00
Jim Geovedi 0e590c711f added prefix & suffix rules 2017-07-23 23:46:40 +07:00
Jim Geovedi ba922e30e8 added ampere hour unit 2017-07-23 23:46:18 +07:00
Jim Geovedi 3b17eba27b added frequency units 2017-07-23 23:10:52 +07:00
Jim Geovedi d5fd32a572 added known currencies 2017-07-23 22:56:48 +07:00
Jim Geovedi f6f15678fb added lex_attrs 2017-07-23 22:55:22 +07:00
Jim Geovedi bed8162d00 added tokenizer_exceptions 2017-07-23 22:55:05 +07:00
Jim Geovedi b80c35bc9a added norm_exceptions 2017-07-23 22:54:49 +07:00
Jim Geovedi b5de329ea3 added norm_exceptions 2017-07-23 22:54:19 +07:00
Jim Geovedi 082e9ade46 fixed typo 2017-07-23 21:30:34 +07:00