Commit Graph

87 Commits

Author SHA1 Message Date
Jim Geovedi c97f5ae0bb updated tokenizer exceptions 2017-07-26 19:12:52 +07:00
Jim Geovedi 73f6ac9d9b added hyhen 2017-07-24 15:56:31 +07:00
Jim Geovedi 68454c40bf added missing import 2017-07-24 14:12:34 +07:00
Jim Geovedi eaf9cbd708 cursed of copy & paste 2017-07-24 14:11:51 +07:00
Jim Geovedi 7aad6718bc enable tokenizer exceptions 2017-07-24 14:11:10 +07:00
Jim Geovedi ad56c9179a added tokenizer exceptions list 2017-07-24 14:10:16 +07:00
Jim Geovedi c1f3fe99fe updated punctuation rules 2017-07-24 13:57:21 +07:00
Jim Geovedi 37fa2c8c80 punctution rules 2017-07-24 06:17:18 +07:00
Jim Geovedi 082e94ac1c added inflix rules 2017-07-24 06:17:07 +07:00
Jim Geovedi d0ec484725 reverted 2017-07-24 06:16:29 +07:00
Jim Geovedi 0e590c711f added prefix & suffix rules 2017-07-23 23:46:40 +07:00
Jim Geovedi ba922e30e8 added ampere hour unit 2017-07-23 23:46:18 +07:00
Jim Geovedi 3b17eba27b added frequency units 2017-07-23 23:10:52 +07:00
Jim Geovedi d5fd32a572 added known currencies 2017-07-23 22:56:48 +07:00
Jim Geovedi f6f15678fb added lex_attrs 2017-07-23 22:55:22 +07:00
Jim Geovedi bed8162d00 added tokenizer_exceptions 2017-07-23 22:55:05 +07:00
Jim Geovedi b80c35bc9a added norm_exceptions 2017-07-23 22:54:49 +07:00
Jim Geovedi b5de329ea3 added norm_exceptions 2017-07-23 22:54:19 +07:00
Jim Geovedi 082e9ade46 fixed typo 2017-07-23 21:30:34 +07:00
Jim Geovedi e2efeb186e added stopwords 2017-07-23 20:52:37 +07:00
Jim Geovedi da98676839 use template 2017-07-23 20:51:31 +07:00
Jim Geovedi c2b4dd7809 start working on Indonesian language 2017-07-23 20:50:56 +07:00
Ines Montani c91642efd5 Port over changes from #1168 2017-07-01 11:43:54 +02:00
Jim Regan d81ceb0cd5 Merge branch 'develop' into polish 2017-06-26 22:42:27 +01:00
Jim O'Regan 2f84c73585 a start 2017-06-26 22:40:04 +01:00
Jim O'Regan 28d7f0a672 reference 2017-06-26 22:38:28 +01:00
Matthew Honnibal 91e52543ef Merge pull request #1118 from Gregory-Howard/patch-2
Update _tokenizer_exceptions_list (adding cities)
2017-06-20 11:16:07 +02:00
Tpt 7745b3ae04 Adds noun chunks to French syntax iterators 2017-06-12 15:29:58 +02:00
Grégory Howard cd974b32b7 Update _tokenizer_exceptions_list (adding cities) 2017-06-09 17:58:18 +02:00
Matthew Honnibal 55d0621532 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-06-04 15:53:25 -05:00
Matthew Honnibal e28f90b672 Fix syntax iterators 2017-06-04 15:51:50 -05:00
Ines Montani 112c5787eb Merge pull request #1101 from oroszgy/hu_tokenizer_fix
More robust Hungarian tokenizer.
2017-06-04 22:37:51 +02:00
ines 9254a3dd78 Import and add Spanish syntax iterators 2017-06-04 21:42:15 +02:00
Matthew Honnibal 7ca215bc26 Resolve lex_attr_getters conflict 2017-06-03 16:12:01 -05:00
ines 4c643d74c5 Add norm exceptions to other Language classes 2017-06-03 22:29:21 +02:00
ines fa7e576c57 Change order of exception dicts 2017-06-03 21:52:06 +02:00
Matthew Honnibal 3f5c85d8de Reorder setting of lex attrs, to avoid clobbering 2017-06-03 14:47:55 -05:00
Matthew Honnibal aeb7520133 Make norm use lower-case 2017-06-03 14:47:38 -05:00
Matthew Honnibal de3954843e Populate norm exceptions with lower-case 2017-06-03 14:47:12 -05:00
ines e47eef5e03 Update German tokenizer exceptions and tests 2017-06-03 21:07:44 +02:00
ines 0d6fa8b241 Add German norm exceptions 2017-06-03 20:54:18 +02:00
ines 5bd311c77e Fix update of norm exceptions 2017-06-03 20:54:09 +02:00
ines 746653880c Add English norm exceptions to lex_attrs 2017-06-03 20:27:28 +02:00
ines 095eeeb12f Update English tokenizer exceptions and add norms 2017-06-03 20:27:16 +02:00
ines e5d426406a Add base norm exceptions 2017-06-03 20:27:05 +02:00
ines 2f1025a94c Port over Spanish changes from #1096 2017-06-02 19:09:58 +02:00
Gyorgy Orosz f0c3b09242 More robust Hungarian tokenizer. 2017-05-31 22:28:40 +02:00
Gyorgy Orosz 8c0b4b850e Fixed emoji handling for Hungarian 2017-05-30 21:34:46 +02:00
ines 84189c1cab Add 'xx' language ID for multi-language support
Allows models to specify their language ID as 'xx'.
2017-05-28 00:58:59 +02:00
ines 33e332e67c Remove unused export 2017-05-28 00:57:59 +02:00