Daniël de Kok
e2b70df012
Configure isort to use the Black profile, recursively isort the `spacy` module ( #12721 )
...
* Use isort with Black profile
* isort all the things
* Fix import cycles as a result of import sorting
* Add DOCBIN_ALL_ATTRS type definition
* Add isort to requirements
* Remove isort from build dependencies check
* Typo
2023-06-14 17:48:41 +02:00
Adriane Boyd
40e1000db0
Restore Doc attr getter values in Doc.to_json ( #11700 )
2022-11-03 11:49:08 +01:00
Edward
d66ccb8eb0
Fix multiple entries per custom extension in doc json ( #11551 )
...
* Fix multiple extensions and character offset
* Rename token_start/end to start/end
* Refactor Doc.from_json based on review
* Iterate over user_data items
* Only add non-empty underscore entries
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2022-10-19 15:52:47 +02:00
Edward
5afa98aabf
Support custom attributes for tokens and spans in json conversion ( #11125 )
...
* Add token and span custom attributes to to_json()
* Change logic for to_json
* Add functionality to from_json
* Small adjustments
* Move token/span attributes to new dict key
* Fix test
* Fix the same test but much better
* Add backwards compatibility tests and adjust logic
* Add test to check if attributes not set in underscore are not saved in the json
* Add tests for json compatibility
* Adjust test names
* Fix tests and clean up code
* Fix assert json tests
* small adjustment
* adjust naming and code readability
* Adjust naming, added more tests and changed logic
* Fix typo
* Adjust errors, naming, and small test optimization
* Fix byte tests
* Fix bytes tests
* Change naming and json structure
* update schema
* Update spacy/schemas.py
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
* Update spacy/tokens/doc.pyx
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
* Update spacy/tokens/doc.pyx
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
* Update spacy/schemas.py
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
* Update schema for underscore attributes
* Adjust underscore schema
* adjust schema tests
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2022-08-23 10:05:02 +02:00
Raphael Mitsch
8387ce4c01
Add Doc.from_json() ( #10688 )
...
* Implement Doc.from_json: rough draft.
* Implement Doc.from_json: first draft with tests.
* Implement Doc.from_json: added documentation on website for Doc.to_json(), Doc.from_json().
* Implement Doc.from_json: formatting changes.
* Implement Doc.to_json(): reverting unrelated formatting changes.
* Implement Doc.to_json(): fixing entity and span conversion. Moving fixture and doc <-> json conversion tests into single file.
* Implement Doc.from_json(): replaced entity/span converters with doc.char_span() calls.
* Implement Doc.from_json(): handling sentence boundaries in spans.
* Implementing Doc.from_json(): added parser-free sentence boundaries transfer.
* Implementing Doc.from_json(): added parser-free sentence boundaries transfer.
* Implementing Doc.from_json(): incorporated various PR feedback.
* Renaming fixture for document without dependencies.
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
* Implementing Doc.from_json(): using two sent_starts instead of one.
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
* Implementing Doc.from_json(): doc_without_dependency_parser() -> doc_without_deps.
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
* Implementing Doc.from_json(): incorporating various PR feedback. Rebased on latest master.
* Implementing Doc.from_json(): refactored Doc.from_json() to work with annotation IDs instead of their string representations.
* Implement Doc.from_json(): reverting unwanted formatting/rebasing changes.
* Implement Doc.from_json(): added check for char_span() calculation for entities.
* Update spacy/tokens/doc.pyx
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
* Implement Doc.from_json(): minor refactoring, additional check for token attribute consistency with corresponding test.
* Implement Doc.from_json(): removed redundancy in annotation type key naming.
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
* Implement Doc.from_json(): Simplifying setting annotation values.
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
* Implement doc.from_json(): renaming annot_types to token_attrs.
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
* Implement Doc.from_json(): adjustments for renaming of annot_types to token_attrs.
* Implement Doc.from_json(): removing default categories.
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
* Implement Doc.from_json(): simplifying lexeme initialization.
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
* Implement Doc.from_json(): simplifying lexeme initialization.
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
* Implement Doc.from_json(): refactoring to only have keys for present annotations.
* Implement Doc.from_json(): fix check for tokens' HEAD attributes.
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
* Implement Doc.from_json(): refactoring Doc.from_json().
* Implement Doc.from_json(): fixing span_group retrieval.
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
* Implement Doc.from_json(): fixing span retrieval.
* Implement Doc.from_json(): added schema for Doc JSON format. Minor refactoring in Doc.from_json().
* Implement Doc.from_json(): added comment regarding Token and Span extension support.
* Implement Doc.from_json(): renaming inconsistent_props to partial_attrs..
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
* Implement Doc.from_json(): adjusting error message.
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
* Implement Doc.from_json(): extending E1038 message.
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
* Implement Doc.from_json(): added params to E1038 raises.
* Implement Doc.from_json(): combined attribute collection with partial attributes check.
* Implement Doc.from_json(): added optional schema validation.
* Implement Doc.from_json(): fixed optional fields in schema, tests.
* Implement Doc.from_json(): removed redundant None check for DEP.
* Implement Doc.from_json(): added passing of schema validatoin message to E1037..
* Implement Doc.from_json(): removing redundant error E1040.
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
* Implement Doc.from_json(): changing message for E1037.
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
* Implement Doc.from_json(): adjusted website docs and docstring of Doc.from_json().
* Update spacy/tests/doc/test_json_doc_conversion.py
* Implement Doc.from_json(): docstring update.
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
* Implement Doc.from_json(): docstring update.
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
* Implement Doc.from_json(): website docs update.
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
* Implement Doc.from_json(): docstring formatting.
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
* Implement Doc.from_json(): docstring formatting.
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
* Implement Doc.from_json(): fixing Doc reference in website docs.
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
* Implement Doc.from_json(): reformatted website/docs/api/doc.md.
* Implement Doc.from_json(): bumped IDs of new errors to avoid merge conflicts.
* Implement Doc.from_json(): fixing bug in tests.
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
* Implement Doc.from_json(): fix setting of sentence starts for docs without DEP.
* Implement Doc.from_json(): add check for valid char spans when manually setting sentence boundaries. Refactor sentence boundary setting slightly. Move error message for lack of support for partial token annotations to errors.py.
* Implement Doc.from_json(): simplify token sentence start manipulation.
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
* Combine related error messages
* Update spacy/tests/doc/test_json_doc_conversion.py
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2022-06-02 14:03:47 +02:00