Commit Graph

69 Commits

Author SHA1 Message Date
mollerhoj 64c732918a Add Morph_rules. (TODO: Not working?) 2017-07-03 15:52:55 +02:00
mollerhoj 3b2cb107a3 Add like_num functionality to Danish 2017-07-03 15:49:51 +02:00
mollerhoj e8f40ceed8 Add short names of months to tokenizer_exceptions 2017-07-03 15:49:51 +02:00
mollerhoj 23025d3b05 Clean up a couple of strange English stopwords 2017-07-03 15:41:59 +02:00
mollerhoj dc5be7d2f3 Cleanup list of Danish stopwords 2017-07-03 15:40:58 +02:00
Jim Regan d81ceb0cd5 Merge branch 'develop' into polish 2017-06-26 22:42:27 +01:00
Jim O'Regan 2f84c73585 a start 2017-06-26 22:40:04 +01:00
Jim O'Regan 28d7f0a672 reference 2017-06-26 22:38:28 +01:00
Matthew Honnibal 91e52543ef Merge pull request #1118 from Gregory-Howard/patch-2
Update _tokenizer_exceptions_list (adding cities)
2017-06-20 11:16:07 +02:00
Tpt 7745b3ae04 Adds noun chunks to French syntax iterators 2017-06-12 15:29:58 +02:00
Grégory Howard cd974b32b7 Update _tokenizer_exceptions_list (adding cities) 2017-06-09 17:58:18 +02:00
Matthew Honnibal 55d0621532 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-06-04 15:53:25 -05:00
Matthew Honnibal e28f90b672 Fix syntax iterators 2017-06-04 15:51:50 -05:00
Ines Montani 112c5787eb Merge pull request #1101 from oroszgy/hu_tokenizer_fix
More robust Hungarian tokenizer.
2017-06-04 22:37:51 +02:00
ines 9254a3dd78 Import and add Spanish syntax iterators 2017-06-04 21:42:15 +02:00
Matthew Honnibal 7ca215bc26 Resolve lex_attr_getters conflict 2017-06-03 16:12:01 -05:00
ines 4c643d74c5 Add norm exceptions to other Language classes 2017-06-03 22:29:21 +02:00
ines fa7e576c57 Change order of exception dicts 2017-06-03 21:52:06 +02:00
Matthew Honnibal 3f5c85d8de Reorder setting of lex attrs, to avoid clobbering 2017-06-03 14:47:55 -05:00
Matthew Honnibal aeb7520133 Make norm use lower-case 2017-06-03 14:47:38 -05:00
Matthew Honnibal de3954843e Populate norm exceptions with lower-case 2017-06-03 14:47:12 -05:00
ines e47eef5e03 Update German tokenizer exceptions and tests 2017-06-03 21:07:44 +02:00
ines 0d6fa8b241 Add German norm exceptions 2017-06-03 20:54:18 +02:00
ines 5bd311c77e Fix update of norm exceptions 2017-06-03 20:54:09 +02:00
ines 746653880c Add English norm exceptions to lex_attrs 2017-06-03 20:27:28 +02:00
ines 095eeeb12f Update English tokenizer exceptions and add norms 2017-06-03 20:27:16 +02:00
ines e5d426406a Add base norm exceptions 2017-06-03 20:27:05 +02:00
ines 2f1025a94c Port over Spanish changes from #1096 2017-06-02 19:09:58 +02:00
Gyorgy Orosz f0c3b09242 More robust Hungarian tokenizer. 2017-05-31 22:28:40 +02:00
Gyorgy Orosz 8c0b4b850e Fixed emoji handling for Hungarian 2017-05-30 21:34:46 +02:00
ines 84189c1cab Add 'xx' language ID for multi-language support
Allows models to specify their language ID as 'xx'.
2017-05-28 00:58:59 +02:00
ines 33e332e67c Remove unused export 2017-05-28 00:57:59 +02:00
ines a8e58e04ef Add symbols class to punctuation rules to handle emoji (see #1088)
Currently doesn't work for Hungarian, because of conflicts with the
custom punctuation rules. Also doesn't take multi-character emoji like
👩🏽‍💻 into account.
2017-05-27 17:57:10 +02:00
Matthew Honnibal 5db89053aa Merge docstrings 2017-05-21 13:46:23 -05:00
ines 924e8506de Move Defaults subclass to module scope (necessary for pickling) 2017-05-20 19:02:27 +02:00
Matthew Honnibal 61fe55efba Move EnglishDefaults class out of English 2017-05-20 02:18:19 -05:00
Matthew Honnibal 8815507f8e Move SpanishDefaults out of Language class, for pickle 2017-05-18 04:28:51 -05:00
ines 1a05078c79 Add language-specific syntax iterators to en and de 2017-05-17 12:04:03 +02:00
Matthew Honnibal 4b9d69f428 Merge branch 'v2' into develop
* Move v2 parser into nn_parser.pyx
* New TokenVectorEncoder class in pipeline.pyx
* New spacy/_ml.py module

Currently the two parsers live side-by-side, until we figure out how to
organize them.
2017-05-14 01:10:23 +02:00
ines a4a37a783e Remove import from non-existing module 2017-05-13 16:00:09 +02:00
ines c13b3fa052 Add LEX_ATTRS 2017-05-12 15:37:45 +02:00
ines bca2ea9c72 Update Portuguese lexical attributes 2017-05-12 15:37:39 +02:00
ines 2f870123bf Fix formatting 2017-05-12 15:37:20 +02:00
ines ca65993d59 Add basic Polish Language class 2017-05-12 09:25:37 +02:00
ines 48177c4f92 Add missing tokenizer exceptions 2017-05-12 09:25:24 +02:00
ines bb8be3d194 Add Danish language data 2017-05-10 21:15:12 +02:00
ines a0b00624bb Make sure like_email returns bool 2017-05-09 11:37:29 +02:00
ines ea60932e1b Fix formatting 2017-05-09 11:08:14 +02:00
ines 02d0ac5cab Remove redundant function and fix formatting 2017-05-09 11:06:04 +02:00
ines b5ca50607e Reorganise entity rules 2017-05-09 01:37:10 +02:00