Commit Graph

34 Commits

Author SHA1 Message Date
Ines Montani 46568f40a7 Merge branch 'master' into tmp/sync 2020-03-26 13:38:14 +01:00
Tom Keefe ddf63b97a8
make idx available via to_array (#5030) 2020-02-22 14:13:06 +01:00
Ines Montani de11ea753a Merge branch 'master' into develop 2020-02-18 14:47:23 +01:00
adrianeboyd 5ee9d8c9b8
Add MORPH attr, add support in retokenizer (#4947)
* Add MORPH attr / symbol for token attrs

* Update retokenizer for MORPH
2020-01-29 17:45:46 +01:00
Sofie Van Landeghem a1b22e90cd serialize ENT_ID (#4852)
* expand serialization test for custom token attribute

* add failing test for issue 4849

* define ENT_ID as attr and use in doc serialization

* fix few typos
2020-01-06 14:57:34 +01:00
Ines Montani db55577c45
Drop Python 2.7 and 3.5 (#4828)
* Remove unicode declarations

* Remove Python 3.5 and 2.7 from CI

* Don't require pathlib

* Replace compat helpers

* Remove OrderedDict

* Use f-strings

* Set Cython compiler language level

* Fix typo

* Re-add OrderedDict for Table

* Update setup.cfg

* Revert CONTRIBUTING.md

* Revert lookups.md

* Revert top-level.md

* Small adjustments and docs [ci skip]
2019-12-22 01:53:56 +01:00
Sofie Van Landeghem 4e7259c6cf Bugfix initializing DocBin with attributes (#4368)
* docbin init fix + documentation fix + unit tests

* newline

* try with zlib instead of gzip (python 2 incompatibilities)
2019-10-03 14:48:45 +02:00
Matthew Honnibal bcd08f20af Merge changes from master 2019-08-21 14:18:52 +02:00
svlandeg 8608685543 ensure Span.as_doc keeps the entity links + unit test 2019-06-25 15:28:51 +02:00
Matthew Honnibal dd9ea478c5 Fix intify_attrs function for obsolete data 2019-03-07 21:59:03 +01:00
Matthew Honnibal 1f7229f40f Revert "Merge branch 'develop' of https://github.com/explosion/spaCy into develop"
This reverts commit c9ba3d3c2d, reversing
changes made to 92c26a35d4.
2018-03-27 19:23:02 +02:00
4altinok 94fb0b75e3 code for is_currency 2018-02-11 18:51:32 +01:00
Vadim Mazaev cacd859dcd Added tag map, fixed tests fails, added more exceptions 2017-11-26 20:54:48 +03:00
ines d96e72f656 Tidy up rest 2017-10-27 21:07:59 +02:00
Matthew Honnibal 16122f566e Fix cpdef enum in attrs.pyx 2017-09-17 12:28:53 -05:00
Matthew Honnibal fe11564b8e Finish stringstore change. Also xfail vectors tests 2017-05-28 15:10:22 +02:00
Matthew Honnibal 84e66ca6d4 WIP on stringstore change. 27 failures 2017-05-28 14:06:40 +02:00
Matthew Honnibal d68dd1f251 Add SENT_START attribute, for custom sentence boundary detection 2017-05-23 18:37:58 +02:00
ines d24589aa72 Clean up imports, unused code, whitespace, docstrings 2017-04-15 12:05:47 +02:00
ines 561f2a3eb4 Use consistent formatting for docstrings 2017-04-15 11:59:21 +02:00
Matthew Honnibal d864708072 Add more morphology names in attrs.pyx 2017-03-15 09:26:16 -05:00
Roman Inflianskas 66e1109b53 Add support for Universal Dependencies v2.0 2017-03-03 13:17:34 +01:00
Matthew Honnibal 3980f1b0cb Ignore more morphology attributes in deprecated mode of intify_attrs 2016-12-18 17:33:46 +01:00
Matthew Honnibal d58187ffa7 Filter out morphology keys in deprecated attrs 2016-12-18 16:50:26 +01:00
Matthew Honnibal 6dd3b94fa6 Filter out deprecated attributes when reading special-case tokenization rules. 2016-11-25 09:57:18 -06:00
Matthew Honnibal a335c6dcc2 Exclude morphs from deprecated token attributes for now 2016-11-25 16:17:32 +01:00
Matthew Honnibal 846e80f2f4 Exclude morphs from deprecated token attributes for now 2016-11-25 16:14:54 +01:00
Matthew Honnibal 53d8ca8f51 Add spacy.attrs.intify_attrs function, to normalize strings in token attribute dictionaries. 2016-11-25 11:34:30 +01:00
Wolfgang Seeker 03fb498dbe introduce lang field for LexemeC to hold language id
put noun_chunk logic into iterators.py for each language separately
2016-03-10 13:01:34 +01:00
Matthew Honnibal c4017a06d9 * Add placeholders for the new flags in attrs and symbols 2016-02-04 15:49:45 +01:00
Matthew Honnibal 22bd0095f5 * Map empty string to NULL_ATTR in attrs 2015-10-10 22:10:19 +11:00
Matthew Honnibal 94bafc1417 * Rename ATTR_IDS to attrs.IDS. Rename ATTR_NAMES to attrs.NAMES. Rename UNIV_POS_IDS to parts_of_speech.IDS 2015-10-10 17:57:29 +11:00
Matthew Honnibal 064bd69ad0 * Refactor symbols, so that frequency rank can be derived from the orth id of a word. 2015-10-10 16:03:48 +11:00
Matthew Honnibal 44f39a876f * Add a blank attrs.pyx 2015-07-17 16:40:42 +02:00