Commit Graph

39 Commits

Author SHA1 Message Date
ines 84189c1cab Add 'xx' language ID for multi-language support
Allows models to specify their language ID as 'xx'.
2017-05-28 00:58:59 +02:00
ines 33e332e67c Remove unused export 2017-05-28 00:57:59 +02:00
ines a8e58e04ef Add symbols class to punctuation rules to handle emoji (see #1088)
Currently doesn't work for Hungarian, because of conflicts with the
custom punctuation rules. Also doesn't take multi-character emoji like
👩🏽‍💻 into account.
2017-05-27 17:57:10 +02:00
Matthew Honnibal 5db89053aa Merge docstrings 2017-05-21 13:46:23 -05:00
ines 924e8506de Move Defaults subclass to module scope (necessary for pickling) 2017-05-20 19:02:27 +02:00
Matthew Honnibal 61fe55efba Move EnglishDefaults class out of English 2017-05-20 02:18:19 -05:00
Matthew Honnibal 8815507f8e Move SpanishDefaults out of Language class, for pickle 2017-05-18 04:28:51 -05:00
ines 1a05078c79 Add language-specific syntax iterators to en and de 2017-05-17 12:04:03 +02:00
Matthew Honnibal 4b9d69f428 Merge branch 'v2' into develop
* Move v2 parser into nn_parser.pyx
* New TokenVectorEncoder class in pipeline.pyx
* New spacy/_ml.py module

Currently the two parsers live side-by-side, until we figure out how to
organize them.
2017-05-14 01:10:23 +02:00
ines a4a37a783e Remove import from non-existing module 2017-05-13 16:00:09 +02:00
ines c13b3fa052 Add LEX_ATTRS 2017-05-12 15:37:45 +02:00
ines bca2ea9c72 Update Portuguese lexical attributes 2017-05-12 15:37:39 +02:00
ines 2f870123bf Fix formatting 2017-05-12 15:37:20 +02:00
ines ca65993d59 Add basic Polish Language class 2017-05-12 09:25:37 +02:00
ines 48177c4f92 Add missing tokenizer exceptions 2017-05-12 09:25:24 +02:00
ines bb8be3d194 Add Danish language data 2017-05-10 21:15:12 +02:00
ines a0b00624bb Make sure like_email returns bool 2017-05-09 11:37:29 +02:00
ines ea60932e1b Fix formatting 2017-05-09 11:08:14 +02:00
ines 02d0ac5cab Remove redundant function and fix formatting 2017-05-09 11:06:04 +02:00
ines b5ca50607e Reorganise entity rules 2017-05-09 01:37:10 +02:00
ines 12c3d5fbba Fix formatting 2017-05-09 01:15:28 +02:00
ines 2829a024ef Re-add basic like_num check to global lex_attrs 2017-05-09 01:15:23 +02:00
ines 88adeee548 Add English lex_attrs overrides 2017-05-09 01:09:52 +02:00
ines 8f3fbbb147 Fix typos 2017-05-09 01:09:37 +02:00
ines 2216e5f326 Reorganise lex_attrs and add dict 2017-05-09 00:57:54 +02:00
ines e666f14d20 Add global lex_attrs 2017-05-09 00:41:53 +02:00
ines 41972c43fe Use consistent regex imports 2017-05-09 00:34:31 +02:00
ines 9f0fd5963f Reorganise Hungarian punctuation rules 2017-05-09 00:01:59 +02:00
ines fc0d793360 Reorganise Bengali punctuation rules 2017-05-09 00:01:52 +02:00
ines e895d1afd7 Reorganise French punctuation rules 2017-05-09 00:00:54 +02:00
ines 014bda0ae3 Reorganise global punctuation rules 2017-05-09 00:00:46 +02:00
ines a91278cb32 Rename _URL_PATTERN to URL_PATTERN 2017-05-09 00:00:00 +02:00
ines 604f299cf6 Add char classes to global language data 2017-05-08 23:59:33 +02:00
ines f6f5d78cb9 Fix formatting 2017-05-08 23:59:17 +02:00
ines 3c0f85de8e Remove imports in /lang/__init__.py 2017-05-08 23:58:07 +02:00
ines 614aa09582 Tidy up Bengali tokenizer exceptions 2017-05-08 22:29:49 +02:00
ines 73b577cb01 Fix relative imports 2017-05-08 22:29:04 +02:00
ines ae99990f63 Fix formatting 2017-05-08 22:23:48 +02:00
ines f46ffe3e89 Move language data to /lang module 2017-05-08 20:00:40 +02:00