Commit Graph

36 Commits

Author SHA1 Message Date
Mathias Deschamps c0691b2ab4 Add tokenizer exceptions for ing verbs
Extend list of tokenizing exceptions introduced in 123810b
2017-11-13 17:46:05 +01:00
Mathias Deschamps 288298ead9 Add norm exception for ing verbs
Some ing verbs are sometimes written in or in'. Make the NORM form correct
2017-11-13 17:46:05 +01:00
ines 123810b6de Add "lovin'" to tokenizer exceptions (see #1248) 2017-11-09 17:09:30 +01:00
ines acb9bdb852 Fix PRON_LEMMA imports 2017-11-06 17:41:53 +01:00
ines 819e30a26e Tidy up tokenizer exceptions 2017-11-01 23:02:45 +01:00
ines 9659391944 Update deprecated methods and add warnings 2017-11-01 16:49:42 +01:00
ines 7e424a1804 Don't copy exception dicts if not necessary and tidy up 2017-10-31 21:05:29 +01:00
Ines Montani d3bf488e16 Merge pull request #1171 from mollerhoj/support-danish
Improve basic support for Danish
2017-10-24 20:29:57 +02:00
Matthew Honnibal 66766c1454 Restore SP tag to English tag_map, until models migrate 2017-10-24 17:05:00 +02:00
Ines Montani facf77e541 Merge branch 'develop' into support-danish 2017-10-24 11:53:19 +02:00
Matthew Honnibal 49895fbef6 Rename 'SP' special tag to '_SP'
Renaming the tag with an underscore lets us add it to the tag map
without worrying that we'll change the sequence of tags, which throws
off the tag-to-ID mapping. For instance, if we inserted a 'SP' tag,
the "VERB" tag is pushed to a different class ID, and the model is all
messed up.
2017-10-20 14:01:12 +02:00
Matthew Honnibal 839de87ca9 Make lambda func a named function, for pickling 2017-10-17 18:21:20 +02:00
ines 38c756fd85 Port over changes from #1287 2017-10-14 13:16:21 +02:00
ines 8ce6f96180 Don't make copies of language data components 2017-10-11 15:34:55 +02:00
ines 417d45f5d0 Add lemmatizer data as variable on language data
Don't create lookup lemmatizer within Language class and just pass in
the data so it can be set on Token creation
2017-10-11 02:24:58 +02:00
ines 0c2343d73a Tidy up language data 2017-10-11 02:22:49 +02:00
Matthew Honnibal b29e6bff46 Improve lemmatization rule for am|VBP 2017-09-04 15:18:10 +02:00
ines a68dc891ea Port over changes from #1281 2017-08-21 23:19:18 +02:00
ines 1fe5e1a4d1 Add language example sentences (see #1107)
da, de, en, es, fr, he, it, nb, pl, pt, sv
2017-08-19 12:22:29 +02:00
mollerhoj 23025d3b05 Clean up a couple of strange English stopwords 2017-07-03 15:41:59 +02:00
Matthew Honnibal e28f90b672 Fix syntax iterators 2017-06-04 15:51:50 -05:00
Matthew Honnibal 3f5c85d8de Reorder setting of lex attrs, to avoid clobbering 2017-06-03 14:47:55 -05:00
Matthew Honnibal de3954843e Populate norm exceptions with lower-case 2017-06-03 14:47:12 -05:00
ines 5bd311c77e Fix update of norm exceptions 2017-06-03 20:54:09 +02:00
ines 746653880c Add English norm exceptions to lex_attrs 2017-06-03 20:27:28 +02:00
ines 095eeeb12f Update English tokenizer exceptions and add norms 2017-06-03 20:27:16 +02:00
ines 33e332e67c Remove unused export 2017-05-28 00:57:59 +02:00
Matthew Honnibal 5db89053aa Merge docstrings 2017-05-21 13:46:23 -05:00
ines 924e8506de Move Defaults subclass to module scope (necessary for pickling) 2017-05-20 19:02:27 +02:00
Matthew Honnibal 61fe55efba Move EnglishDefaults class out of English 2017-05-20 02:18:19 -05:00
ines 1a05078c79 Add language-specific syntax iterators to en and de 2017-05-17 12:04:03 +02:00
ines 2f870123bf Fix formatting 2017-05-12 15:37:20 +02:00
ines 12c3d5fbba Fix formatting 2017-05-09 01:15:28 +02:00
ines 88adeee548 Add English lex_attrs overrides 2017-05-09 01:09:52 +02:00
ines 73b577cb01 Fix relative imports 2017-05-08 22:29:04 +02:00
ines f46ffe3e89 Move language data to /lang module 2017-05-08 20:00:40 +02:00