Commit Graph

14 Commits

Author SHA1 Message Date
Ines Montani 09ecc39b4e Fix multi-line string of NUM_WORDS (resolves #759) 2017-01-20 15:11:48 +01:00
Wolfgang Seeker 03fb498dbe introduce lang field for LexemeC to hold language id
put noun_chunk logic into iterators.py for each language separately
2016-03-10 13:01:34 +01:00
Wolfgang Seeker bc9c62e279 replace Language functions with corresponding orth functions
implement punctuation functions in orth
2016-03-09 18:07:37 +01:00
Henning Peters 12d58a7099 remove text-unidecode dependency 2016-02-24 08:01:59 +01:00
Matthew Honnibal fe611132f0 * Add stubs for is_bracket/is_quote/is_left_punct/is_right_punct functions 2016-02-04 13:03:04 +01:00
Matthew Honnibal b125289f30 * Fix type declaration in asciied function 2015-10-09 13:46:57 +11:00
Matthew Honnibal f9d2a5b651 * Fix issue #112: Replace unidecode with text-unidecode, to avoid license problems. 2015-09-28 23:40:18 +10:00
Matthew Honnibal 6f1743692a * Work on language-independent refactoring 2015-08-23 20:49:18 +02:00
Matthew Honnibal 3879d28457 * Fix https for url detection 2015-08-23 02:40:35 +02:00
Matthew Honnibal 6bb96c122d * Host IS_ flags in attrs.pxd, and add properties for them on Token and Lexeme objects 2015-07-26 16:37:16 +02:00
Matthew Honnibal 06639dc497 * Add length cap to word shape feature 2015-07-20 12:06:59 +02:00
Matthew Honnibal 6c7e44140b * Work on word vectors, and other stuff 2015-01-17 16:21:17 +11:00
Matthew Honnibal 6b68f7ef75 * Finally get string types right for orth function 2015-01-06 03:17:39 +11:00
Matthew Honnibal 3f1944d688 * Make PyPy work 2015-01-05 17:54:38 +11:00