Commit Graph

80 Commits

Author SHA1 Message Date
Ines Montani 62b558ab72 💫 Support lexical attributes in retokenizer attrs (closes #2390) (#3325)
* Fix formatting and whitespace

* Add support for lexical attributes (closes #2390)

* Document lexical attribute setting during retokenization

* Assign variable oputside of nested loop
2019-02-24 21:13:51 +01:00
Matthew Honnibal 84e66ca6d4 WIP on stringstore change. 27 failures 2017-05-28 14:06:40 +02:00
Matthew Honnibal f51e6a6c16 Adjust lexeme sizing for attr_t being 64 bit 2017-05-28 12:51:09 +02:00
Matthew Honnibal 793430aa7a Get spaCy train command working with neural network
* Integrate models into pipeline
* Add basic serialization (maybe incorrect)
* Fix pickle on vocab
2017-05-17 12:04:50 +02:00
Wolfgang Seeker 03fb498dbe introduce lang field for LexemeC to hold language id
put noun_chunk logic into iterators.py for each language separately
2016-03-10 13:01:34 +01:00
Matthew Honnibal 193f127f81 * Fix ugly py_check_flag and py_set_flag functions in Lexeme 2015-09-15 13:06:18 +10:00
Matthew Honnibal e7e529edf4 * Fix Lexeme.check_flag 2015-09-10 14:45:43 +02:00
Matthew Honnibal 4f8e38271d * Fix merge errors in lexeme.pxd 2015-09-06 20:19:08 +02:00
Matthew Honnibal 86c888667f * Merge in changes from de branch 2015-09-06 19:49:28 +02:00
Matthew Honnibal d2fc104a26 * Begin merge of Gazetteer and DE branches 2015-09-06 19:45:15 +02:00
Matthew Honnibal e35bb36be7 * Ensure Lexeme.check_flag returns a boolean value 2015-09-06 17:52:32 +02:00
Matthew Honnibal 6f1743692a * Work on language-independent refactoring 2015-08-23 20:49:18 +02:00
Matthew Honnibal cad0cca4e3 * Tmp 2015-08-22 22:04:34 +02:00
Matthew Honnibal c263577424 * Fix lower attribute in lexeme.pxd 2015-08-06 16:07:41 +02:00
Matthew Honnibal 6bb96c122d * Host IS_ flags in attrs.pxd, and add properties for them on Token and Lexeme objects 2015-07-26 16:37:16 +02:00
Matthew Honnibal 4dddc8a69b * Fix type declarations for attr_t. Remove unused id_t. 2015-07-18 22:39:57 +02:00
Matthew Honnibal a6d040bd11 * Import Lexeme attrs from spacy.attrs, not spacy.typedefs 2015-07-16 11:20:08 +02:00
Matthew Honnibal 65251e7625 * Remove redundant attr_id_t from typedefs.pxd 2015-07-16 00:58:51 +02:00
Matthew Honnibal 78db7e32f7 * Remove has_sense method from Lexeme declaration 2015-07-08 19:41:20 +02:00
Matthew Honnibal b64c843861 * Remove senses attr 2015-07-08 19:26:24 +02:00
Matthew Honnibal 2b8459d9a8 * Add senses flag to Lexeme 2015-07-01 20:10:41 +02:00
Matthew Honnibal c04e6ebca6 * Allow user to load different sized vectors. 2015-06-05 16:26:39 +02:00
Jordan Suchow 3a8d9b37a6 Remove trailing whitespace 2015-04-19 13:01:38 -07:00
Matthew Honnibal 321b402739 * Store the l2 norm of the word's vector 2015-02-07 08:42:16 -05:00
Matthew Honnibal fda94271af * Rename NORM1 and NORM2 attrs to lower and norm 2015-01-24 06:17:03 +11:00
Matthew Honnibal 5ed8b2b98f * Rename sic to orth 2015-01-23 02:08:25 +11:00
Matthew Honnibal 5e63c606ad * Rename vec to repvec 2015-01-22 02:03:54 +11:00
Matthew Honnibal 6c7e44140b * Work on word vectors, and other stuff 2015-01-17 16:21:17 +11:00
Matthew Honnibal 7d3c40de7d * Tests passing after refactor. API has obvious warts, particularly in Token and Lexeme 2015-01-15 00:33:16 +11:00
Matthew Honnibal 0930892fc1 * Tmp. Working on refactor. Compiles, must hook up lexical feats. 2015-01-14 00:03:48 +11:00
Matthew Honnibal 46da3d74d2 * Tmp. Refactoring, introducing a Lexeme PyObject. 2015-01-12 11:23:44 +11:00
Matthew Honnibal ce2edd6312 * Tmp commit. Refactoring to create a Python Lexeme class. 2015-01-12 10:26:22 +11:00
Matthew Honnibal 4c4aa2c5c9 * Work on train 2014-12-22 07:25:43 +11:00
Matthew Honnibal f6556d8e5d * Refactor, move Lexeme struct to structs.pxd 2014-12-20 06:51:03 +11:00
Matthew Honnibal 9959a64f7b * Working morphology and lemmatisation. POS tagging quite fast. 2014-12-10 08:09:32 +11:00
Matthew Honnibal ef4398b204 * Rearrange POS stuff, so that language-specific stuff can live in language-specific modules 2014-12-07 23:52:41 +11:00
Matthew Honnibal 49f3780ff5 * Fiddle with lexeme attrs 2014-12-04 21:22:38 +11:00
Matthew Honnibal e1b1f45cc9 * Add STEM attribute to lexeme 2014-12-04 20:46:20 +11:00
Matthew Honnibal d70d31aa45 * Introduce first attempt at const-ness 2014-12-03 15:44:25 +11:00
Matthew Honnibal b463a7eb86 * Make flag-setting a language-specific thing 2014-12-03 11:04:32 +11:00
Matthew Honnibal 50309e6e49 * Fix context vector, importing all features 2014-11-05 22:11:39 +11:00
Matthew Honnibal 70ea862703 * Remove vocab10k field, and add flags for gazetteers 2014-11-03 00:13:51 +11:00
Matthew Honnibal 8335706321 * Add LIKE_URL and LIKE_NUMBER flag features 2014-11-02 13:19:23 +11:00
Matthew Honnibal 6c807aa45f * Restore id attribute to lexeme, and rename pos field to postype, to store clustered tag dictionaries 2014-10-31 17:43:00 +11:00
Matthew Honnibal 87c2418a89 * Fiddle with data types on Lexeme, to compress them to a much smaller size. 2014-10-30 15:42:15 +11:00
Matthew Honnibal e6b87766fe * Remove lexemes vector from Lexicon, and the id and hash attributes from Lexeme 2014-10-30 15:21:38 +11:00
Matthew Honnibal 13909a2e24 * Rewriting Lexeme serialization. 2014-10-29 23:19:38 +11:00
Matthew Honnibal 08ce602243 * Large refactor, particularly to Python API 2014-10-24 00:59:17 +11:00
Matthew Honnibal e5e951ae67 * Remove the feature array stuff from Tokens class, and replace vector with array-based implementation, with padding. 2014-10-23 01:57:59 +11:00
Matthew Honnibal 0a0e41f6c8 * Add prefix and suffix features 2014-10-22 12:56:09 +11:00