spaCy

History

Jan Jessewitsch e4dcac4a4b Merging multiple docs into one (#5032 ) * Add static method to Doc to allow merging of multiple docs. * Add error description for the error that occurs if docs with different vocabs (from different languages) are merged in Doc.from_docs(). * Add test for Doc.from_docs() implementation. * Fix using numpy's concatenate in Doc.from_docs. * Replace typing's type annotations in from_docs. * Simply remove type annotations in from_docs. * Add documentation for Doc.from_docs to api. * Simplify from_docs, its test and the api doc for codebase consistency. * Fix merging of Doc objects that end with whitespaces (Achieved by simply not setting the SPACY attribute on whitespace tokens). Remove two unnecessary imports of attributes. * Add merging of user data from Doc objects in from_docs. Add user data test case to corresponding test. Add applicable warning messages. * Fix incorrect setting of tokens idx by using concatenated spaces (again). Add test case to corresponding test. * Add MORPH to attrs * Update warnings calls * Remove out-dated error from merge * Rename space_delimiter to ensure_whitespace Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>		2020-07-03 11:32:42 +02:00
..
annotation.md	Update tag maps and docs for English and German (#4501 )	2019-10-24 12:56:05 +02:00
cli.md	Remove inline notes on v2 changes [ci skip]	2020-07-01 22:29:22 +02:00
cython-classes.md	unicode -> str consistency	2020-05-24 17:23:00 +02:00
cython-structs.md	Documentation updates for v2.3.0 (#5593 )	2020-06-16 15:37:35 +02:00
cython.md	💫 Update website (#3285 )	2019-02-17 19:31:19 +01:00
dependencyparser.md	unicode -> str consistency	2020-05-24 17:23:00 +02:00
doc.md	Merging multiple docs into one (#5032 )	2020-07-03 11:32:42 +02:00
docbin.md	DocBin: add version number, missing attributes and strings (#5685 )	2020-07-02 17:41:50 +02:00
entitylinker.md	unicode -> str consistency	2020-05-24 17:23:00 +02:00
entityrecognizer.md	Merge branch 'develop' into master-tmp	2020-06-03 14:36:59 +02:00
entityruler.md	Start updating website for v3 [ci skip]	2020-07-01 21:26:39 +02:00
goldcorpus.md	unicode -> str consistency	2020-05-24 17:23:00 +02:00
goldparse.md	Start updating website for v3 [ci skip]	2020-07-01 21:26:39 +02:00
index.md	💫 Update website (#3285 )	2019-02-17 19:31:19 +01:00
kb.md	unicode -> str consistency	2020-05-24 17:23:00 +02:00
language.md	Remove inline notes on v2 changes [ci skip]	2020-07-01 22:29:22 +02:00
lemmatizer.md	Start updating website for v3 [ci skip]	2020-07-01 21:26:39 +02:00
lexeme.md	Merge branch 'develop' into master-tmp	2020-06-20 15:52:00 +02:00
lookups.md	unicode -> str consistency	2020-05-24 17:23:00 +02:00
matcher.md	Update matcher usage examples [ci skip]	2020-07-02 15:39:45 +02:00
phrasematcher.md	Update matcher usage examples [ci skip]	2020-07-02 15:39:45 +02:00
pipeline-functions.md	unicode -> str consistency	2020-05-24 17:23:00 +02:00
scorer.md	Fix markup	2020-07-01 13:02:07 +02:00
sentencizer.md	Merge branch 'develop' into master-tmp	2020-06-20 15:52:00 +02:00
span.md	Start updating website for v3 [ci skip]	2020-07-01 21:26:39 +02:00
stringstore.md	Start updating website for v3 [ci skip]	2020-07-01 21:26:39 +02:00
tagger.md	unicode -> str consistency	2020-05-24 17:23:00 +02:00
textcategorizer.md	unicode -> str consistency	2020-05-24 17:23:00 +02:00
token.md	Start updating website for v3 [ci skip]	2020-07-01 21:26:39 +02:00
tokenizer.md	Merge branch 'develop' into master-tmp	2020-06-20 15:52:00 +02:00
top-level.md	Remove inline notes on v2 changes [ci skip]	2020-07-01 22:29:22 +02:00
vectors.md	Start updating website for v3 [ci skip]	2020-07-01 21:26:39 +02:00
vocab.md	Start updating website for v3 [ci skip]	2020-07-01 21:26:39 +02:00