Ines Montani
|
62b558ab72
|
💫 Support lexical attributes in retokenizer attrs (closes #2390) (#3325)
* Fix formatting and whitespace
* Add support for lexical attributes (closes #2390)
* Document lexical attribute setting during retokenization
* Assign variable oputside of nested loop
|
2019-02-24 21:13:51 +01:00 |
Matthew Honnibal
|
84e66ca6d4
|
WIP on stringstore change. 27 failures
|
2017-05-28 14:06:40 +02:00 |
Matthew Honnibal
|
f51e6a6c16
|
Adjust lexeme sizing for attr_t being 64 bit
|
2017-05-28 12:51:09 +02:00 |
Matthew Honnibal
|
793430aa7a
|
Get spaCy train command working with neural network
* Integrate models into pipeline
* Add basic serialization (maybe incorrect)
* Fix pickle on vocab
|
2017-05-17 12:04:50 +02:00 |
Wolfgang Seeker
|
03fb498dbe
|
introduce lang field for LexemeC to hold language id
put noun_chunk logic into iterators.py for each language separately
|
2016-03-10 13:01:34 +01:00 |
Matthew Honnibal
|
193f127f81
|
* Fix ugly py_check_flag and py_set_flag functions in Lexeme
|
2015-09-15 13:06:18 +10:00 |
Matthew Honnibal
|
e7e529edf4
|
* Fix Lexeme.check_flag
|
2015-09-10 14:45:43 +02:00 |
Matthew Honnibal
|
4f8e38271d
|
* Fix merge errors in lexeme.pxd
|
2015-09-06 20:19:08 +02:00 |
Matthew Honnibal
|
86c888667f
|
* Merge in changes from de branch
|
2015-09-06 19:49:28 +02:00 |
Matthew Honnibal
|
d2fc104a26
|
* Begin merge of Gazetteer and DE branches
|
2015-09-06 19:45:15 +02:00 |
Matthew Honnibal
|
e35bb36be7
|
* Ensure Lexeme.check_flag returns a boolean value
|
2015-09-06 17:52:32 +02:00 |
Matthew Honnibal
|
6f1743692a
|
* Work on language-independent refactoring
|
2015-08-23 20:49:18 +02:00 |
Matthew Honnibal
|
cad0cca4e3
|
* Tmp
|
2015-08-22 22:04:34 +02:00 |
Matthew Honnibal
|
c263577424
|
* Fix lower attribute in lexeme.pxd
|
2015-08-06 16:07:41 +02:00 |
Matthew Honnibal
|
6bb96c122d
|
* Host IS_ flags in attrs.pxd, and add properties for them on Token and Lexeme objects
|
2015-07-26 16:37:16 +02:00 |
Matthew Honnibal
|
4dddc8a69b
|
* Fix type declarations for attr_t. Remove unused id_t.
|
2015-07-18 22:39:57 +02:00 |
Matthew Honnibal
|
a6d040bd11
|
* Import Lexeme attrs from spacy.attrs, not spacy.typedefs
|
2015-07-16 11:20:08 +02:00 |
Matthew Honnibal
|
65251e7625
|
* Remove redundant attr_id_t from typedefs.pxd
|
2015-07-16 00:58:51 +02:00 |
Matthew Honnibal
|
78db7e32f7
|
* Remove has_sense method from Lexeme declaration
|
2015-07-08 19:41:20 +02:00 |
Matthew Honnibal
|
b64c843861
|
* Remove senses attr
|
2015-07-08 19:26:24 +02:00 |
Matthew Honnibal
|
2b8459d9a8
|
* Add senses flag to Lexeme
|
2015-07-01 20:10:41 +02:00 |
Matthew Honnibal
|
c04e6ebca6
|
* Allow user to load different sized vectors.
|
2015-06-05 16:26:39 +02:00 |
Jordan Suchow
|
3a8d9b37a6
|
Remove trailing whitespace
|
2015-04-19 13:01:38 -07:00 |
Matthew Honnibal
|
321b402739
|
* Store the l2 norm of the word's vector
|
2015-02-07 08:42:16 -05:00 |
Matthew Honnibal
|
fda94271af
|
* Rename NORM1 and NORM2 attrs to lower and norm
|
2015-01-24 06:17:03 +11:00 |
Matthew Honnibal
|
5ed8b2b98f
|
* Rename sic to orth
|
2015-01-23 02:08:25 +11:00 |
Matthew Honnibal
|
5e63c606ad
|
* Rename vec to repvec
|
2015-01-22 02:03:54 +11:00 |
Matthew Honnibal
|
6c7e44140b
|
* Work on word vectors, and other stuff
|
2015-01-17 16:21:17 +11:00 |
Matthew Honnibal
|
7d3c40de7d
|
* Tests passing after refactor. API has obvious warts, particularly in Token and Lexeme
|
2015-01-15 00:33:16 +11:00 |
Matthew Honnibal
|
0930892fc1
|
* Tmp. Working on refactor. Compiles, must hook up lexical feats.
|
2015-01-14 00:03:48 +11:00 |
Matthew Honnibal
|
46da3d74d2
|
* Tmp. Refactoring, introducing a Lexeme PyObject.
|
2015-01-12 11:23:44 +11:00 |
Matthew Honnibal
|
ce2edd6312
|
* Tmp commit. Refactoring to create a Python Lexeme class.
|
2015-01-12 10:26:22 +11:00 |
Matthew Honnibal
|
4c4aa2c5c9
|
* Work on train
|
2014-12-22 07:25:43 +11:00 |
Matthew Honnibal
|
f6556d8e5d
|
* Refactor, move Lexeme struct to structs.pxd
|
2014-12-20 06:51:03 +11:00 |
Matthew Honnibal
|
9959a64f7b
|
* Working morphology and lemmatisation. POS tagging quite fast.
|
2014-12-10 08:09:32 +11:00 |
Matthew Honnibal
|
ef4398b204
|
* Rearrange POS stuff, so that language-specific stuff can live in language-specific modules
|
2014-12-07 23:52:41 +11:00 |
Matthew Honnibal
|
49f3780ff5
|
* Fiddle with lexeme attrs
|
2014-12-04 21:22:38 +11:00 |
Matthew Honnibal
|
e1b1f45cc9
|
* Add STEM attribute to lexeme
|
2014-12-04 20:46:20 +11:00 |
Matthew Honnibal
|
d70d31aa45
|
* Introduce first attempt at const-ness
|
2014-12-03 15:44:25 +11:00 |
Matthew Honnibal
|
b463a7eb86
|
* Make flag-setting a language-specific thing
|
2014-12-03 11:04:32 +11:00 |
Matthew Honnibal
|
50309e6e49
|
* Fix context vector, importing all features
|
2014-11-05 22:11:39 +11:00 |
Matthew Honnibal
|
70ea862703
|
* Remove vocab10k field, and add flags for gazetteers
|
2014-11-03 00:13:51 +11:00 |
Matthew Honnibal
|
8335706321
|
* Add LIKE_URL and LIKE_NUMBER flag features
|
2014-11-02 13:19:23 +11:00 |
Matthew Honnibal
|
6c807aa45f
|
* Restore id attribute to lexeme, and rename pos field to postype, to store clustered tag dictionaries
|
2014-10-31 17:43:00 +11:00 |
Matthew Honnibal
|
87c2418a89
|
* Fiddle with data types on Lexeme, to compress them to a much smaller size.
|
2014-10-30 15:42:15 +11:00 |
Matthew Honnibal
|
e6b87766fe
|
* Remove lexemes vector from Lexicon, and the id and hash attributes from Lexeme
|
2014-10-30 15:21:38 +11:00 |
Matthew Honnibal
|
13909a2e24
|
* Rewriting Lexeme serialization.
|
2014-10-29 23:19:38 +11:00 |
Matthew Honnibal
|
08ce602243
|
* Large refactor, particularly to Python API
|
2014-10-24 00:59:17 +11:00 |
Matthew Honnibal
|
e5e951ae67
|
* Remove the feature array stuff from Tokens class, and replace vector with array-based implementation, with padding.
|
2014-10-23 01:57:59 +11:00 |
Matthew Honnibal
|
0a0e41f6c8
|
* Add prefix and suffix features
|
2014-10-22 12:56:09 +11:00 |