spaCy

Commit Graph

Author	SHA1	Message	Date
svlandeg	e8fd0c1f1e	EL architectures documentation	2020-08-06 17:41:26 +02:00
svlandeg	f396f091dc	update EL API	2020-08-06 16:40:48 +02:00
svlandeg	81d0b1c390	update EL pipe arguments	2020-08-06 16:22:50 +02:00
svlandeg	0b4d1e1bc4	'debug data' instead of 'debug-data'	2020-08-06 15:47:31 +02:00
svlandeg	881e3f8fd0	add docbin explanation and example	2020-08-06 15:29:44 +02:00
Ines Montani	5d417d3b19	WIP: Update docs [ci skip]	2020-08-06 13:10:15 +02:00
Ines Montani	06e80d95cd	Sync develop with nightly docs state (#5883 ) Co-authored-by: svlandeg <sofie.vanlandeghem@gmail.com>	2020-08-06 00:28:14 +02:00
Ines Montani	5cc0d89fad	Simplify config overrides in CLI and deserialization (#5880 )	2020-08-05 23:35:09 +02:00
Ines Montani	50311a4d37	Update docs [ci skip]	2020-08-05 20:29:53 +02:00
Ines Montani	2a4d56e730	Update docs	2020-08-05 15:01:00 +02:00
Ines Montani	cdec46493f	Update docs	2020-08-05 15:00:54 +02:00
Ines Montani	4c055f0aa7	Add init CLI and init config (#5854 ) * Add init CLI and init config draft * Improve config validation * Auto-format * Don't export anything in debug config * Update docs	2020-08-02 15:18:30 +02:00
Ines Montani	b40f44419b	Simplify pipe analysis - remove unused code - don't print by default - integrate attrs info into analysis output	2020-08-01 13:40:06 +02:00
Ines Montani	98c6a85c8b	Update docs [ci skip]	2020-07-31 18:55:38 +02:00
Ines Montani	e9e8fa2466	Update docs and types	2020-07-31 17:02:54 +02:00
Ines Montani	6365837ca9	Merge pull request #5833 from explosion/feature/scorer-adjustments	2020-07-31 14:00:39 +02:00
Ines Montani	5a221f79c2	Revert "Remove keyword-only from Scorer API docs" [ci skip] This reverts commit `7a6ac47dc1`.	2020-07-31 14:00:21 +02:00
Ines Montani	160f1a5f94	Update docs [ci skip]	2020-07-31 13:26:39 +02:00
Adriane Boyd	9b509aa87f	Move Language.evaluate scorer config to new arg Move `Language.evaluate` scorer config from `component_cfg` to separate argument `scorer_cfg`.	2020-07-31 11:05:16 +02:00
Adriane Boyd	9d79916792	Merge branch 'develop' into feature/scorer-adjustments	2020-07-31 10:48:14 +02:00
Ines Montani	3449c45fd9	Update docs [ci skip]	2020-07-29 19:48:26 +02:00
Ines Montani	9c80cb673d	Update docs [ci skip]	2020-07-29 19:41:34 +02:00
Ines Montani	9f69afdd1e	Update docs [ci skip]	2020-07-29 19:09:44 +02:00
Ines Montani	7a21775cd0	Merge pull request #5834 from explosion/feature/vectors	2020-07-29 18:49:26 +02:00
Ines Montani	6a5c853edb	Fix docs [ci skip]	2020-07-29 18:45:12 +02:00
Ines Montani	158d8c1e48	Update docs [ci skip]	2020-07-29 18:44:10 +02:00
Matthew Honnibal	f7adc9d3b7	Start rewriting vectors docs	2020-07-29 17:10:06 +02:00
Ines Montani	b0f57a0cac	Update docs and consistency	2020-07-29 15:14:07 +02:00
Ines Montani	e0ffe36e79	Update docstrings, docs and types	2020-07-29 11:36:42 +02:00
Adriane Boyd	7a6ac47dc1	Remove keyword-only from Scorer API docs	2020-07-29 10:40:30 +02:00
Ines Montani	ac24adec73	Small adjustments to Scorer and docs	2020-07-28 21:39:42 +02:00
Ines Montani	256b24b720	Update arch docs WIP [ci skip]	2020-07-28 20:33:52 +02:00
Ines Montani	ae4d8a6ffd	Update docstrings, docs and pipe consistency	2020-07-28 13:37:31 +02:00
Ines Montani	0094cb0d04	Remove scores list from config and document	2020-07-28 11:22:24 +02:00
Ines Montani	894e20c466	Merge branch 'develop' into feature/component-scores	2020-07-27 18:14:39 +02:00
Ines Montani	d8b519c23c	API docs, docstrings and argument consistency	2020-07-27 18:11:45 +02:00
Ines Montani	10b84e1e27	Add flag to toggle sdist creation on package [ci skip]	2020-07-27 16:52:23 +02:00
Adriane Boyd	fdf09cb231	Update Scorer API docs for score_cats	2020-07-27 15:34:42 +02:00
Ines Montani	7dd53d0964	Fix typo [ci skip]	2020-07-27 00:34:00 +02:00
Ines Montani	7adbaf9a5b	Update docs [ci skip]	2020-07-27 00:29:45 +02:00
Matthew Honnibal	fb5dbe30b5	Trim training 101	2020-07-26 13:43:22 +02:00
Matthew Honnibal	e6a7deb7cc	Edits to the training 101 section	2020-07-26 13:42:08 +02:00
Ines Montani	c288dba8e7	Update docs [ci skip]	2020-07-25 18:51:12 +02:00
Ines Montani	eb9acae34d	Merge pull request #5791 from adrianeboyd/docs/morphology	2020-07-25 15:10:21 +02:00
Adriane Boyd	2bcceb80c4	Refactor the Scorer to improve flexibility (#5731 ) * Refactor the Scorer to improve flexibility Refactor the `Scorer` to improve flexibility for arbitrary pipeline components. * Individual pipeline components provide their own `evaluate` methods that score a list of `Example`s and return a dictionary of scores * `Scorer` is initialized either: * with a provided pipeline containing components to be scored * with a default pipeline containing the built-in statistical components (senter, tagger, morphologizer, parser, ner) * `Scorer.score` evaluates a list of `Example`s and returns a dictionary of scores referring to the scores provided by the components in the pipeline Significant differences: * `tags_acc` is renamed to `tag_acc` to be consistent with `token_acc` and the new `morph_acc`, `pos_acc`, and `lemma_acc` * Scoring is no longer cumulative: `Scorer.score` scores a list of examples rather than a single example and does not retain any state about previously scored examples * PRF values in the returned scores are no longer multiplied by 100 * Add kwargs to Morphologizer.evaluate * Create generalized scoring methods in Scorer * Generalized static scoring methods are added to `Scorer` * Methods require an attribute (either on Token or Doc) that is used to key the returned scores Naming differences: * `uas`, `las`, and `las_per_type` in the scores dict are renamed to `dep_uas`, `dep_las`, and `dep_las_per_type` Scoring differences: * `Doc.sents` is now scored as spans rather than on sentence-initial token positions so that `Doc.sents` and `Doc.ents` can be scored with the same method (this lowers scores since a single incorrect sentence start results in two incorrect spans) * Simplify / extend hasattr check for eval method * Add hasattr check to tokenizer scoring * Simplify to hasattr check for component scoring * Reset Example alignment if docs are set Reset the Example alignment if either doc is set in case the tokenization has changed. * Add PRF tokenization scoring for tokens as spans Add PRF scores for tokens as character spans. The scores are: * token_acc: # correct tokens / # gold tokens * token_p/r/f: PRF for (token.idx, token.idx + len(token)) * Add docstring to Scorer.score_tokenization * Rename component.evaluate() to component.score() * Update Scorer API docs * Update scoring for positive_label in textcat * Fix TextCategorizer.score kwargs * Update Language.evaluate docs * Update score names in default config	2020-07-25 12:53:02 +02:00
Adriane Boyd	8f44584bef	Update MorphAnalysis.get and related examples	2020-07-23 08:51:31 +02:00
Adriane Boyd	941b9e33f7	Add Token.morph_	2020-07-22 17:59:45 +02:00
Ines Montani	be476e495e	Merge pull request #5787 from adrianeboyd/docs/morphologizer Initial draft of Morphologizer API docs	2020-07-22 17:16:57 +02:00
Adriane Boyd	d3385f4be2	Add Morphology and MorphAnalysis to overview	2020-07-21 13:06:22 +02:00
Adriane Boyd	fcd3a4abe3	Add morph to Token API docs	2020-07-21 13:05:58 +02:00
Adriane Boyd	14df00ae98	Add Morphology and MorphAnalsysis API docs Add initial draft of `Morphology` and `MorphAnalysis` API docs.	2020-07-21 10:33:46 +02:00
Ines Montani	644074b954	Merge branch 'develop' into master-tmp	2020-07-20 14:58:04 +02:00
Adriane Boyd	986f7e4d69	Initial draft of Morphologizer API docs	2020-07-20 12:53:02 +02:00
Adriane Boyd	39ebcd9ec9	Refactor Chinese tokenizer configuration (#5736 ) * Refactor Chinese tokenizer configuration Refactor `ChineseTokenizer` configuration so that it uses a single `segmenter` setting to choose between character segmentation, jieba, and pkuseg. * replace `use_jieba`, `use_pkuseg`, `require_pkuseg` with the setting `segmenter` with the supported values: `char`, `jieba`, `pkuseg` * make the default segmenter plain character segmentation `char` (no additional libraries required) * Fix Chinese serialization test to use char default * Warn if attempting to customize other segmenter Add a warning if `Chinese.pkuseg_update_user_dict` is called when another segmenter is selected.	2020-07-19 13:34:37 +02:00
Adriane Boyd	cd5af72c9a	Update pkuseg version (#5774 ) * Update pkuseg version in Chinese tokenizer warnings * Update pkuseg version in `Makefile` * Remove warning about python3.8 wheels in docs	2020-07-19 11:09:49 +02:00
Ines Montani	872938ec76	Merge pull request #5747 from explosion/feature/refactor-config-args	2020-07-14 00:00:22 +02:00
Ines Montani	5f6f4ff594	Remove object subclassing	2020-07-12 14:03:23 +02:00
Ines Montani	c96535e338	Update command docstrings and docs	2020-07-12 13:53:49 +02:00
Ines Montani	3f948b9c74	Update docs	2020-07-12 12:32:28 +02:00
Ines Montani	11bbc82c24	Update cli.md [ci skip]	2020-07-10 23:37:52 +02:00
Ines Montani	9455b060d2	Update cli.md	2020-07-10 22:57:22 +02:00
Ines Montani	7b5717cac3	Merge branch 'develop' into feature/refactor-config-args	2020-07-10 22:50:07 +02:00
Ines Montani	e6a6587a9a	Update projects.md [ci skip]	2020-07-10 22:41:27 +02:00
Ines Montani	f2cd982e7b	Update training.md	2020-07-10 22:34:27 +02:00
Ines Montani	52e9b5b472	Fix formatting	2020-07-09 23:25:58 +02:00
Ines Montani	28cdae898a	Update projects.md	2020-07-09 22:35:54 +02:00
Ines Montani	7bcf9f7cfb	Document new features	2020-07-09 21:10:36 +02:00
Ines Montani	ea01831f6a	Update projects docs etc.	2020-07-09 19:43:25 +02:00
Ines Montani	175d34d8f9	Update sidebar menu	2020-07-09 11:44:09 +02:00
Ines Montani	9ee5b71412	Update cli.md	2020-07-09 11:44:00 +02:00
Ines Montani	9ae4040183	Update API docs	2020-07-08 13:34:35 +02:00
svlandeg	c94279ac1b	remove tensors, fix predict, get_loss and set_annotations	2020-07-08 13:11:54 +02:00
svlandeg	90b100c39f	remove component.Model, update constructor, losses is return value of update	2020-07-08 12:14:30 +02:00
Ines Montani	2298e129e6	Update example and training docs	2020-07-07 20:30:12 +02:00
svlandeg	2b60e894cb	fix component constructors, update, begin_training, reference to GoldParse	2020-07-07 19:17:19 +02:00
svlandeg	14a796e3f9	add Example API with examples of Example usage	2020-07-07 14:46:41 +02:00
Ines Montani	bb3ee38cf9	Update WIP	2020-07-06 22:22:37 +02:00
Ines Montani	44da24ddd0	Update doc.md	2020-07-06 18:17:00 +02:00
Ines Montani	44790c1c32	Update docs and add keyword-only tag	2020-07-06 18:14:57 +02:00
Ines Montani	a35236e5f0	Update v3 docs WIP [ci skip]	2020-07-06 15:57:44 +02:00
Ines Montani	63247cbe87	Update v3 docs [ci skip]	2020-07-05 16:11:16 +02:00
Matthew Honnibal	3e78e82a83	Experimental character-based pretraining (#5700 ) * Use cosine loss in Cloze multitask * Fix char_embed for gpu * Call resume_training for base model in train CLI * Fix bilstm_depth default in pretrain command * Implement character-based pretraining objective * Use chars loss in ClozeMultitask * Add method to decode predicted characters * Fix number characters * Rescale gradients for mlm * Fix char embed+vectors in ml * Fix pipes * Fix pretrain args * Move get_characters_loss * Fix import * Fix import * Mention characters loss option in pretrain * Remove broken 'self attention' option in pretrain * Revert "Remove broken 'self attention' option in pretrain" This reverts commit `56b820f6af`. * Document 'characters' objective of pretrain	2020-07-05 15:48:39 +02:00
Ines Montani	dc8c9d912f	Update docs [ci skip]	2020-07-04 16:47:24 +02:00
Ines Montani	4498dfe99d	Update docs	2020-07-04 16:25:30 +02:00
Ines Montani	1e0d54edd1	Update docs	2020-07-04 14:23:10 +02:00
Ines Montani	fe224dc2dd	Merge branch 'develop' into nightly.spacy.io	2020-07-03 16:48:27 +02:00
Ines Montani	06f1ecb308	Update v3 docs	2020-07-03 16:48:21 +02:00
Ines Montani	cdf9ee1716	Add stub for Example API docs [ci skip]	2020-07-03 15:46:10 +02:00
Ines Montani	fa8e097c04	Update convert docs [ci skip]	2020-07-03 15:42:04 +02:00
Jan Jessewitsch	e4dcac4a4b	Merging multiple docs into one (#5032 ) * Add static method to Doc to allow merging of multiple docs. * Add error description for the error that occurs if docs with different vocabs (from different languages) are merged in Doc.from_docs(). * Add test for Doc.from_docs() implementation. * Fix using numpy's concatenate in Doc.from_docs. * Replace typing's type annotations in from_docs. * Simply remove type annotations in from_docs. * Add documentation for Doc.from_docs to api. * Simplify from_docs, its test and the api doc for codebase consistency. * Fix merging of Doc objects that end with whitespaces (Achieved by simply not setting the SPACY attribute on whitespace tokens). Remove two unnecessary imports of attributes. * Add merging of user data from Doc objects in from_docs. Add user data test case to corresponding test. Add applicable warning messages. * Fix incorrect setting of tokens idx by using concatenated spaces (again). Add test case to corresponding test. * Add MORPH to attrs * Update warnings calls * Remove out-dated error from merge * Rename space_delimiter to ensure_whitespace Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2020-07-03 11:32:42 +02:00
Adriane Boyd	a723fa02a1	DocBin: add version number, missing attributes and strings (#5685 ) * Add version number to DocBin Add a version number to DocBin for future use. * Add POS to all attributes in DocBin * Add morph string to strings in DocBin * Update DocBin API * Add string for ENT_KB_ID in DocBin	2020-07-02 17:41:50 +02:00
Ines Montani	b5268955d7	Update matcher usage examples [ci skip]	2020-07-02 15:39:45 +02:00
Ines Montani	a4cfe9fc33	Remove inline notes on v2 changes [ci skip]	2020-07-01 22:29:22 +02:00
Ines Montani	fe4cfd0632	Start updating website for v3 [ci skip]	2020-07-01 21:26:39 +02:00
Ines Montani	26df4efa94	Add new in v3.0	2020-07-01 13:02:17 +02:00
Ines Montani	18a900abc2	Fix markup	2020-07-01 13:02:07 +02:00
Ines Montani	414dc7ace1	Merge branch 'spacy.io' into spacy.io-develop	2020-07-01 11:47:47 +02:00
Álvaro Abella Bascarán	7111b9de2e	Fix in docs: pipe(docs) instead of pipe(texts) (#5680 ) Very minor fix in docs, specifically in this part: ``` matcher = PhraseMatcher(nlp.vocab) > for doc in matcher.pipe(texts, batch_size=50): > pass ``` `texts` suggests the input is an iterable of strings. I replaced it for `docs`.	2020-06-30 20:01:12 +02:00
Álvaro Abella Bascarán	ff0dbe5c64	Fix in docs: pipe(docs) instead of pipe(texts) (#5680 ) Very minor fix in docs, specifically in this part: ``` matcher = PhraseMatcher(nlp.vocab) > for doc in matcher.pipe(texts, batch_size=50): > pass ``` `texts` suggests the input is an iterable of strings. I replaced it for `docs`.	2020-06-30 20:00:50 +02:00
Matthias Hertel	305221f3e5	Website: fixed the token span in the text about the rule-based matching example (#5669 ) * fixed token span in pattern matcher example * contributor agreement	2020-06-30 19:58:55 +02:00
Matthias Hertel	8b0f749606	Website: fixed the token span in the text about the rule-based matching example (#5669 ) * fixed token span in pattern matcher example * contributor agreement	2020-06-30 19:58:23 +02:00
Adriane Boyd	d777d9cc38	Extend v2.3 migration guide (#5653 ) * Extend preloaded vocab section * Add section on tag maps	2020-06-26 14:13:01 +02:00
Adriane Boyd	c4d0209472	Extend v2.3 migration guide (#5653 ) * Extend preloaded vocab section * Add section on tag maps	2020-06-26 14:12:29 +02:00
Adriane Boyd	a2660bd9c6	Fix backslashes in warnings config diff (#5640 ) Fix backslashes in warnings config diff in v2.3 migration section.	2020-06-24 10:26:57 +02:00
Adriane Boyd	fd4287c178	Fix backslashes in warnings config diff (#5640 ) Fix backslashes in warnings config diff in v2.3 migration section.	2020-06-24 10:26:12 +02:00
Adriane Boyd	4f73ced914	Extend what's new in v2.3 with vocab / is_oov (#5635 )	2020-06-23 16:50:43 +02:00
Adriane Boyd	7ce451c211	Extend what's new in v2.3 with vocab / is_oov (#5635 )	2020-06-23 16:48:59 +02:00
Adriane Boyd	fcdecefacf	Add warnings example in v2.3 migration guide (#5627 )	2020-06-22 14:38:06 +02:00
Adriane Boyd	bc1cb30b21	Add warnings example in v2.3 migration guide (#5627 )	2020-06-22 14:37:24 +02:00
Ines Montani	52728d8fa3	Merge branch 'develop' into master-tmp	2020-06-20 15:52:00 +02:00
Adriane Boyd	66889de166	Warning for sudachipy 0.4.5 (#5611 )	2020-06-19 13:45:23 +02:00
Adriane Boyd	931d80de72	Warning for sudachipy 0.4.5 (#5611 )	2020-06-19 12:43:41 +02:00
Ines Montani	6d712f3e06	Merge pull request #5599 from adrianeboyd/docs/v2.3.0-minor	2020-06-16 13:49:25 -07:00
Adriane Boyd	02369f91d3	Fix spacy convert argument	2020-06-16 20:41:17 +02:00
Adriane Boyd	f0fd77648f	Change example title to Dr. Change example title to Dr. so the current model does exclude the title in the initial example.	2020-06-16 20:36:21 +02:00
Adriane Boyd	a6abdfbc3c	Fix numpy.zeros() dtype for Doc.from_array	2020-06-16 20:35:45 +02:00
Adriane Boyd	9aff317ca7	Update POS in tagging example	2020-06-16 20:26:57 +02:00
Adriane Boyd	457babfa0c	Update alignment example for new gold.align	2020-06-16 20:22:03 +02:00
Ines Montani	44af53bdd9	Add pkuseg warnings and auto-format [ci skip]	2020-06-16 17:13:35 +02:00
Ines Montani	a9e5b840ee	Fix typos and auto-format [ci skip]	2020-06-16 16:38:45 +02:00
Adriane Boyd	d5110ffbf2	Documentation updates for v2.3.0 (#5593 ) * Update website models for v2.3.0 * Add docs for Chinese word segmentation * Tighten up Chinese docs section * Merge branch 'master' into docs/v2.3.0 [ci skip] * Merge branch 'master' into docs/v2.3.0 [ci skip] * Auto-format and update version * Update matcher.md * Update languages and sorting * Typo in landing page * Infobox about token_match behavior * Add meta and basic docs for Japanese * POS -> TAG in models table * Add info about lookups for normalization * Updates to API docs for v2.3 * Update adding norm exceptions for adding languages * Add --omit-extra-lookups to CLI API docs * Add initial draft of "What's New in v2.3" * Add new in v2.3 tags to Chinese and Japanese sections * Add tokenizer to migration section * Add new in v2.3 flags to init-model * Typo * More what's new in v2.3 Co-authored-by: Ines Montani <ines@ines.io>	2020-06-16 15:37:35 +02:00
Sofie Van Landeghem	c0f4a1e43b	train is from-config by default (#5575 ) * verbose and tag_map options * adding init_tok2vec option and only changing the tok2vec that is specified * adding omit_extra_lookups and verifying textcat config * wip * pretrain bugfix * add replace and resume options * train_textcat fix * raw text functionality * improve UX when KeyError or when input data can't be parsed * avoid unnecessary access to goldparse in TextCat pipe * save performance information in nlp.meta * add noise_level to config * move nn_parser's defaults to config file * multitask in config - doesn't work yet * scorer offering both F and AUC options, need to be specified in config * add textcat verification code from old train script * small fixes to config files * clean up * set default config for ner/parser to allow create_pipe to work as before * two more test fixes * small fixes * cleanup * fix NER pickling + additional unit test * create_pipe as before	2020-06-12 02:02:07 +02:00
Sofie Van Landeghem	4d1ba6feb4	add tag variant for 2.3 (#5542 )	2020-06-04 19:16:33 +02:00
Ines Montani	810fce3bb1	Merge branch 'develop' into master-tmp	2020-06-03 14:36:59 +02:00
svlandeg	5f0a91cf37	fix conv-depth parameter	2020-05-29 09:56:29 +02:00
Ines Montani	262d306eaa	unicode -> str consistency	2020-05-24 17:23:00 +02:00
Ines Montani	5d3806e059	unicode -> str consistency	2020-05-24 17:20:58 +02:00
Jannis	aa53ce6996	Documentation Typo Fix (#5492 ) * Fix typo Change 'realize' to 'realise' * Add contributer agreement	2020-05-22 19:50:26 +02:00
Matthew Honnibal	f6078d866a	Merge pull request #5121 from adrianeboyd/bugfix/revert-token-match Revert token_match priority changes from #4374 and extend token match options	2020-05-22 14:42:51 +02:00
Ines Montani	65c7e82de2	Auto-format and remove 2.3 feature [ci skip]	2020-05-22 13:50:30 +02:00
Adriane Boyd	e4a1b5dab1	Rename to url_match Rename to `url_match` and update docs.	2020-05-22 12:41:03 +02:00
Adriane Boyd	730fa493a4	Merge remote-tracking branch 'upstream/master' into bugfix/revert-token-match	2020-05-22 12:18:00 +02:00
Ines Montani	24f72c669c	Merge branch 'develop' into master-tmp	2020-05-21 18:39:06 +02:00
Sofie Van Landeghem	0d94737857	Feature toggle_pipes (#5378 ) * make disable_pipes deprecated in favour of the new toggle_pipes * rewrite disable_pipes statements * update documentation * remove bin/wiki_entity_linking folder * one more fix * remove deprecated link to documentation * few more doc fixes * add note about name change to the docs * restore original disable_pipes * small fixes * fix typo * fix error number to W096 * rename to select_pipes * also make changes to the documentation Co-authored-by: Matthew Honnibal <honnibal+gh@gmail.com>	2020-05-18 22:27:10 +02:00
Ines Montani	f333c2a011	Merge pull request #5386 from svlandeg/fix/nel-docs	2020-05-10 12:00:09 +02:00
adrianeboyd	4a15b559ba	Clarify Token.pos as UPOS (#5419 )	2020-05-08 10:36:25 +02:00
adrianeboyd	a2345618f1	Fix Token API docs from #5375 (#5418 )	2020-05-08 10:25:02 +02:00
Adriane Boyd	565e0eef73	Add tokenizer option for token match with affixes To fix the slow tokenizer URL (#4374) and allow `token_match` to take priority over prefixes and suffixes by default, introduce a new tokenizer option for a token match pattern that's applied after prefixes and suffixes but before infixes.	2020-05-05 10:35:33 +02:00
Adriane Boyd	792c8af8cf	Merge remote-tracking branch 'upstream/master' into bugfix/revert-token-match	2020-05-05 09:25:57 +02:00
svlandeg	ebaed7dcfa	Few more updates to the EL documentation	2020-04-30 10:17:06 +02:00
adrianeboyd	bdff76dede	Various updates/additions to CLI scripts (#5362 ) * `debug-data`: determine coverage of provided vectors * `evaluate`: support `blank:lg` model to make it possible to just evaluate tokenization * `init-model`: add option to truncate vectors to N most frequent vectors from word2vec file * `train`: * if training on GPU, only run evaluation/timing on CPU in the first iteration * if training is aborted, exit with a non-0 exit status	2020-04-29 12:56:46 +02:00
Sofie Van Landeghem	cfdaf99b80	Fix passing of component configuration (#5374 ) * add kwargs to to_disk methods in docs - otherwise crashes on 'exclude' argument * add fix and test for Issue 5137	2020-04-29 12:56:17 +02:00
Sofie Van Landeghem	f67343295d	Update NEL examples and documentation (#5370 ) * simplify creation of KB by skipping dim reduction * small fixes to train EL example script * add KB creation and NEL training example scripts to example section * update descriptions of example scripts in the documentation * moving wiki_entity_linking folder from bin to projects * remove test for wiki NEL functionality that is being moved	2020-04-29 12:53:53 +02:00
adrianeboyd	a6e521cd79	Add is_sent_end token property (#5375 ) Reconstruction of the original PR #4697 by @MiniLau. Removes unused `SENT_END` symbol and `IS_SENT_END` from `Matcher` schema because the Matcher is only going to be able to support `IS_SENT_START`.	2020-04-29 12:53:16 +02:00
adrianeboyd	90ce34db42	Add cuda101 and cuda102 options to setup (#5377 ) * Add cuda101 and cuda102 options to setup * Update cudaNNN options in docs	2020-04-29 12:51:12 +02:00
adrianeboyd	792aa7b6ab	Remove references to textcat spans (#5360 ) Remove references to unimplemented `TextCategorizer` span labels in `GoldParse` and `Doc`.	2020-04-27 18:01:12 +02:00
adrianeboyd	90c754024f	Update nlp.vectors to nlp.vocab.vectors (#5357 )	2020-04-27 10:53:05 +02:00
Mike	481574cbc8	[minor doc change] embedding vis. link is broken in `website/docs/usage/examples.md` (#5325 ) * The embedding vis. link is broken The first link seems to be reasonable for now unless someone has an updated embedding vis they want to share? * contributor agreement * Update Mlawrence95.md * Update website/docs/usage/examples.md Co-Authored-By: Sofie Van Landeghem <svlandeg@users.noreply.github.com> Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>	2020-04-21 20:35:12 +02:00
laszabine	fb73d4943a	Amend documentation to Language.evaluate (#5319 ) * Specified usage of arguments to Language.evaluate * Created contributor agreement	2020-04-16 20:00:18 +02:00
Sofie Van Landeghem	a3965ec13d	tag-map-path since 2.2.4 instead of 2.2.3 (#5289 )	2020-04-14 14:53:47 +02:00

1 2 3 4 5 ...

987 Commits