Commit Graph

12096 Commits

Author SHA1 Message Date
Ines Montani 84fb3a3fb3 Auto-format and fix tuple 2020-07-03 15:20:10 +02:00
Matthew Honnibal e1b3e8ee11 Set version to v3.0.0a1 2020-07-03 13:21:08 +02:00
Matthew Honnibal a902b5f217
Record whether Doc objects are built from known spacing (#5697)
* Tell convert CLI to store user data for Doc

* Remove assert

* Add has_unknwon_spaces flag on Doc

* Do not tokenize docs with unknown spaces in Corpus

* Handle conversion of unknown spaces in Example

* Fixes

* Fixes

* Draft has_known_spaces support in DocBin

* Add test for serialize has_unknown_spaces

* Fix DocBin serialization when has_unknown_spaces

* Use serialization in test
2020-07-03 12:58:16 +02:00
Adriane Boyd abad56db7d
Add conllu2docs converter (#5704)
Add conllu2docs converter adapted from conllu2json converter
2020-07-03 12:54:32 +02:00
Jan Jessewitsch e4dcac4a4b
Merging multiple docs into one (#5032)
* Add static method to Doc to allow merging of multiple docs.

* Add error description for the error that occurs if docs with different
vocabs (from different languages) are merged in Doc.from_docs().

* Add test for Doc.from_docs() implementation.

* Fix using numpy's concatenate in Doc.from_docs.

* Replace typing's type annotations in from_docs.

* Simply remove type annotations in from_docs.

* Add documentation for Doc.from_docs to api.

* Simplify from_docs, its test and the api doc for codebase consistency.

* Fix merging of Doc objects that end with whitespaces (Achieved by simply not setting the SPACY attribute on whitespace tokens). Remove two unnecessary imports of attributes.

* Add merging of user data from Doc objects in from_docs. Add user data test case to corresponding test. Add applicable warning messages.

* Fix incorrect setting of tokens idx by using concatenated spaces (again). Add test case to corresponding test.

* Add MORPH to attrs

* Update warnings calls

* Remove out-dated error from merge

* Rename space_delimiter to ensure_whitespace

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2020-07-03 11:32:42 +02:00
Sofie Van Landeghem 41b65fd0f8
fix to pretrain script (#5699)
* fix to pretrain script

* remove unnecessary import
2020-07-02 21:48:01 +02:00
Adriane Boyd a723fa02a1
DocBin: add version number, missing attributes and strings (#5685)
* Add version number to DocBin

Add a version number to DocBin for future use.

* Add POS to all attributes in DocBin

* Add morph string to strings in DocBin

* Update DocBin API

* Add string for ENT_KB_ID in DocBin
2020-07-02 17:41:50 +02:00
Ines Montani b5268955d7 Update matcher usage examples [ci skip] 2020-07-02 15:39:45 +02:00
Ines Montani d36632553a
Merge pull request #5688 from explosion/remove-deprecated
Remove deprecated methods: Doc.print_tree, Doc.merge, Span.merge
2020-07-02 15:10:30 +02:00
Ines Montani 8a5b9a6d5f
Merge pull request #5693 from svlandeg/bugfix/nel-v3 2020-07-02 14:45:46 +02:00
Ines Montani ee8a830248
Merge pull request #5687 from svlandeg/bugfix/init-model
Fixing init_model
2020-07-02 14:10:28 +02:00
svlandeg 04ed4d60a8 raise error when links are not aligned to tokens 2020-07-02 13:57:35 +02:00
svlandeg f503817623 fix parsing entity links in new gold format 2020-07-02 13:48:11 +02:00
Ines Montani 60c2695131 Remove deprecated methods 2020-07-01 22:33:39 +02:00
Ines Montani a4cfe9fc33 Remove inline notes on v2 changes [ci skip] 2020-07-01 22:29:22 +02:00
Ines Montani 79540e1eea Remove bin/spacy from MANIFEST 2020-07-01 22:15:18 +02:00
Ines Montani 97342f3f99
Merge pull request #5686 from tiangolo/refactor/cli-completion 2020-07-01 22:14:48 +02:00
Sebastián Ramírez b0f425971e Remove shellingham from dependencies 2020-07-01 21:47:50 +02:00
Ines Montani 3dff412f58 Merge branch 'nightly.spacy.io' into develop [ci skip] 2020-07-01 21:33:47 +02:00
Ines Montani 2f07144f80 Update netlify.toml [ci skip] 2020-07-01 21:33:20 +02:00
Ines Montani fe4cfd0632 Start updating website for v3 [ci skip] 2020-07-01 21:26:39 +02:00
svlandeg a30bc77415 bugfixing prune_vectors and vectors_loc 2020-07-01 21:00:47 +02:00
Sebastián Ramírez b985cc4025 📄 Add spaCy Contributor Agreement 2020-07-01 20:57:21 +02:00
Sebastián Ramírez 764499246e 🔧 Update spacy CLI script entrypoint to support completion 2020-07-01 20:21:05 +02:00
Sebastián Ramírez b02db67247 Add shellingham for automatic shell detection
and update Typer pinning
2020-07-01 20:20:04 +02:00
Matthw Honnibal 94a0cf46fd Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2020-07-01 18:45:45 +02:00
Matthw Honnibal 6a0a27e5c2 Fix max_steps 2020-07-01 18:08:14 +02:00
Ines Montani a4650761a8 Fix package name
We specify it twice because GitHub wouldn't recognise the spaCy repo as a package (e.g. for its "used by" stats) if it didn't specify the name inline
2020-07-01 16:47:26 +02:00
Ines Montani 49105034cb Auto-format 2020-07-01 16:46:56 +02:00
Ines Montani 8d90e44d74 Fix title 2020-07-01 15:38:01 +02:00
Ines Montani 8fb574900a Update parent package and version 2020-07-01 15:35:23 +02:00
Ines Montani 4f42bcdd13 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2020-07-01 15:33:57 +02:00
Ines Montani 38f226bda8 Update images [ci skip] 2020-07-01 15:33:54 +02:00
Matthew Honnibal 0ada186dda Set version to v3.0.0.dev14 2020-07-01 15:31:04 +02:00
Matthw Honnibal cb51bb637b Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2020-07-01 15:17:27 +02:00
Matthw Honnibal 7734cbc34d Set batch size in begin_training 2020-07-01 15:16:59 +02:00
Matthw Honnibal 1f7709e9a6 Improve max length check in corpus 2020-07-01 15:16:43 +02:00
Matthw Honnibal 2fa56484b2 Fix eval batch size 2020-07-01 15:16:25 +02:00
Matthw Honnibal c5d12d1a22 Allow batch size to be set for evaluation in spacy train 2020-07-01 15:04:36 +02:00
Ines Montani 6e28760316 Fix 404 [ci skip] 2020-07-01 15:02:55 +02:00
Matthw Honnibal f5532757a3 Filter out 0-length examples in Corpus 2020-07-01 15:02:37 +02:00
Ines Montani 7037512e55 Handle robots.txt for nightly/special deploys [ci skip] 2020-07-01 14:50:58 +02:00
Ines Montani 997f6eeca7 Adjust nightly site url [ci skip] 2020-07-01 14:42:59 +02:00
Ines Montani e1eb48e932 Add nightly social image [ci skip] 2020-07-01 14:41:13 +02:00
Ines Montani 5d02f71653 Add nightly favicon and Binder [ci skip] 2020-07-01 14:33:33 +02:00
Ines Montani db12ee4da9 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2020-07-01 14:21:49 +02:00
Ines Montani bc87ba97e0
Merge pull request #5681 from svlandeg/bugfix/exec-cwd 2020-07-01 14:13:19 +02:00
Ines Montani dc6d9c2fac Auto-infer nightly state from branch 2020-07-01 14:05:11 +02:00
Ines Montani 02334aeafc Make alert more prominent 2020-07-01 13:25:13 +02:00
Ines Montani 360e54863a Update netlify.toml 2020-07-01 13:23:58 +02:00