Commit Graph

373 Commits

Author SHA1 Message Date
Sofie Van Landeghem 6b012cebff Make pos/tag distinction more clear in docs (#4246)
* make distinction between tag and pos more prominent in docs

* out of the 101
2019-09-06 10:31:21 +02:00
adrianeboyd 8fe7bdd0fa Improve token pattern checking without validation (#4105)
* Fix typo in rule-based matching docs

* Improve token pattern checking without validation

Add more detailed token pattern checks without full JSON pattern validation and
provide more detailed error messages.

Addresses #4070 (also related: #4063, #4100).

* Check whether top-level attributes in patterns and attr for PhraseMatcher are
  in token pattern schema

* Check whether attribute value types are supported in general (as opposed to
  per attribute with full validation)

* Report various internal error types (OverflowError, AttributeError, KeyError)
  as ValueError with standard error messages

* Check for tagger/parser in PhraseMatcher pipeline for attributes TAG, POS,
  LEMMA, and DEP

* Add error messages with relevant details on how to use validate=True or nlp()
  instead of nlp.make_doc()

* Support attr=TEXT for PhraseMatcher

* Add NORM to schema

* Expand tests for pattern validation, Matcher, PhraseMatcher, and EntityRuler

* Remove unnecessary .keys()

* Rephrase error messages

* Add another type check to Matcher

Add another type check to Matcher for more understandable error messages
in some rare cases.

* Support phrase_matcher_attr=TEXT for EntityRuler

* Don't use spacy.errors in examples and bin scripts

* Fix error code

* Auto-format

Also try get Azure pipelines to finally start a build :(

* Update errors.py


Co-authored-by: Ines Montani <ines@ines.io>
Co-authored-by: Matthew Honnibal <honnibal+gh@gmail.com>
2019-08-21 14:00:37 +02:00
Ines Montani 3134a9b6e0 Add section on expanding regex match to token boundaries (see #4158) [ci skip] 2019-08-21 12:53:31 +02:00
Ines Montani 66aba2d676 Improve regex matching docs [ci skip] 2019-08-19 13:59:41 +02:00
Sofie Van Landeghem cc66f47893 Make enabling/disabling jupyter mode more explicit (#4144)
* make enabling/disabling jupyter mode more explicit

* markup fix
2019-08-19 11:53:34 +02:00
Ines Montani e520eb3f6c Make visualized NER examples more clear (closes #4104) [ci skip] 2019-08-18 16:29:29 +02:00
Ines Montani 1362f793cf Improve docs on phrase pattern attributes (closes #4100) [ci skip] 2019-08-11 11:13:49 +02:00
Ines Montani 8b4a0fabbb Adjust docs example [ci skip] 2019-08-07 00:46:47 +02:00
adrianeboyd 69aca7d839 Add validate option to EntityRuler (#4089)
* Add validate option to EntityRuler

* Add validate to EntityRuler, passed to Matcher and PhraseMatcher

* Add validate to usage and API docs

* Update website/docs/usage/rule-based-matching.md

Co-Authored-By: Ines Montani <ines@ines.io>

* Update website/docs/usage/rule-based-matching.md

Co-Authored-By: Ines Montani <ines@ines.io>
2019-08-07 00:40:53 +02:00
Ines Montani 4ae320e5c2 Use consistent casing for entity ruler patterns (see #4063) [ci skip] 2019-08-06 12:20:22 +02:00
Ines Montani 223bde5cf6 Improve docs on matcher attributes [ci skip] (closes #4063) 2019-08-06 12:13:42 +02:00
Ines Montani 2bfae0b167 Auto-format 2019-08-06 12:13:31 +02:00
Ines Montani bd39e5e630 Add "Processing text" section [ci skip] 2019-07-25 17:38:03 +02:00
Ines Montani a5e3d2f318 Improve section on disabling pipes [ci skip] 2019-07-25 14:25:34 +02:00
Ines Montani 02e444ec7c Add section on special tokenizer component [ci skip] 2019-07-25 14:25:03 +02:00
Ines Montani 1fa6d6ba55 Improve consistency of docs examples [ci skip] 2019-07-25 14:24:56 +02:00
Ines Montani 1167c303a0 Fix typos [ci skip] 2019-07-19 13:08:18 +02:00
Ines Montani c3ead02ea5 Adjust wording [ci skip] 2019-07-17 16:06:25 +02:00
Ines Montani 1d5ff3e455 Add infobox 2019-07-17 15:29:36 +02:00
Ines Montani 114cb18892 Improve wording 2019-07-17 15:27:53 +02:00
Ines Montani 7522beef9e Add "Things to try" prompts 2019-07-17 15:25:02 +02:00
Ines Montani 9f02e3c027 Adjust example
Not actually supported in this alignment interpretation
2019-07-17 15:13:50 +02:00
Ines Montani 1ea472468a Add usage docs for aligning tokenization 2019-07-17 15:08:33 +02:00
pmbaumgartner 9a86d95ea2 fix custom attribute links 2019-07-14 20:23:54 -04:00
Ines Montani ebe58e7fa1 Document gold.docs_to_json [ci skip] 2019-07-10 10:27:33 +02:00
Ines Montani 881f5bc401 Auto-format 2019-07-10 10:27:29 +02:00
Ines Montani d361e380b8 Fix matcher callback example (closes #3862) 2019-06-26 14:47:26 +02:00
Alejandro Alcalde 4866a7ee9e Changed learning rate by its param name. (#3855)
* Changed learning rate by its param name.

I've been searching for a while how the parameter learning rate was named, with `beta1` and `beta2` its easy as they are marked as code, but learning rate wasn't. I think writing the actual parameter name would be helpful.

* Signing SCA
2019-06-20 10:29:20 +02:00
Ramanan Balakrishnan eb12703d10 minor fix to broken link in documentation (#3819) [ci skip] 2019-06-04 11:15:35 +02:00
Ines Montani 0c74506c9c Fix typos in docs (closes #3802) [ci skip] 2019-06-01 11:35:01 +02:00
mak 89379a7fa4 Corrected example model URL in requirements.txt (#3786)
The URL used to show how to add a model to the requirements.txt had the old release path (excl. explosion).
2019-05-29 10:51:55 +02:00
Aaron Kub 719a15f23d fixing regex matcher examples (#3708) (#3719) 2019-05-10 14:23:52 +02:00
张晓飞 ba1ff00370 update response after calling add_pipe (#3661)
* update response after calling add_pipe

component:print_info is appened in the last, so need show it at the end of  pipeline

* Create henry860916.md
2019-05-01 12:02:18 +02:00
Ramiro Gómez 8ee4100f8f Remove dangling M (#3657)
I assume this is a typo. Sorry if it has a meaning that I'm not aware of.
2019-04-29 19:44:43 +02:00
Amit Chaudhary 167d63af31 Fix broken link to Dive Into Python 3 website (#3656)
* Fix broken link to Dive Into Python 3 website

* Sign spaCy Contributor Agreement
2019-04-29 19:44:00 +02:00
Ivan Tham fa94f83697 Improve redundant variable name (#3643)
* Improve redundant variable name

* Apply suggestions from code review

Co-Authored-By: pickfire <pickfire@riseup.net>
2019-04-26 16:50:14 +02:00
Ines Montani 0dce4585b1 Add course to 101 2019-04-19 15:59:51 +02:00
Ines Montani 38395d9518 Merge branch 'spacy.io' 2019-04-19 15:26:20 +02:00
Ines Montani 7ac5bb0a7b Update landing and feature overview 2019-04-19 15:23:08 +02:00
fizban99 f2f2df6e78 entity types for colors should be in uppercase (#3599)
although the text indicates the entity types should be in lowercase, the sample code shows uppercase, which is the correct format.
2019-04-17 11:22:56 +02:00
Ines Montani 9e7deeaf48 Remove Datacamp 2019-04-13 17:46:32 +02:00
Ines Montani 2f0f439c54 Remove non-existent example (closes #3533) 2019-04-03 09:59:17 +02:00
Ines Montani 200d8bdb3c Merge branch 'spacy.io' [ci skip] 2019-03-23 16:46:34 +01:00
Ines Montani 06bf130890 💫 Add better and serializable sentencizer (#3471)
* Add better serializable sentencizer component

* Replace default factory

* Add tests

* Tidy up

* Pass test

* Update docs
2019-03-23 15:45:02 +01:00
Ines Montani b532386a60 Fix typo [ci skip] 2019-03-22 18:36:17 +01:00
Ines Montani 5073ce63fd Merge branch 'spacy.io' [ci skip] 2019-03-22 15:17:11 +01:00
Ines Montani 0712efc6b3 Update version requirements [ci skip] 2019-03-21 10:23:54 +01:00
Ines Montani d4eed4a84f Add note on unicode build to troubleshooting guide (see #3421) [ci skip] 2019-03-19 10:27:02 +01:00
Ines Montani a611b32fbf Update model docs [ci skip] 2019-03-17 11:48:18 +01:00
Ines Montani cbcba699dd Fix missing ids 2019-03-14 17:56:53 +01:00