Commit Graph

446 Commits

Author SHA1 Message Date
Ines Montani 685e4b2554 Update v2-2.md [ci skip] 2019-09-27 16:35:01 +02:00
Em Zhan aafa091541 Fix typo in documentation (#4322)
* Fix typo 'probj' instead of 'pobj'

* Add spaCy contributor agreement for zqianem
2019-09-25 19:42:18 +02:00
Ines Montani 197406de1d Update v2-2.md [ci skip] 2019-09-19 14:33:58 +02:00
Ines Montani ddc09b08ed Update v2-2.md [ci skip] 2019-09-19 00:58:30 +02:00
Ines Montani 9c940eab94 Update version in examples [ci skip] 2019-09-18 21:23:26 +02:00
Ines Montani f873548f6c Add backwards incompatibility [ci skip] 2019-09-18 21:21:48 +02:00
Ines Montani dd1810f05a Update DocBin and add docs 2019-09-18 20:23:21 +02:00
Ines Montani d62690b3ba Update examples 2019-09-18 19:57:36 +02:00
Matthew Honnibal 931e96b6c7 DocPallet->DocBin in docs 2019-09-18 15:17:26 +02:00
Matthew Honnibal f537cbeacc Update v2-2 docs 2019-09-18 14:07:55 +02:00
Ines Montani 16c2522791 Merge branch 'master' into develop 2019-09-14 16:42:01 +02:00
Ines Montani 86befc80bf WIP: Add v2.2 page [ci skip] 2019-09-14 16:41:48 +02:00
Ines Montani 04d36d2471 Remove unused link [ci skip] 2019-09-14 16:41:19 +02:00
Ines Montani 5c8b5e68ec Fix docs consistency [ci skip] 2019-09-14 16:23:37 +02:00
Ines Montani bbf7337eaf Update adding languages docs [ci skip] 2019-09-14 15:32:15 +02:00
Ines Montani 25b2b3ff45 Remove LEMMA from exception examples [ci skip] 2019-09-12 16:26:27 +02:00
Ines Montani 82c16b7943 Remove u-strings and fix formatting [ci skip] 2019-09-12 16:11:15 +02:00
Ines Montani a31e9e1cd5 Update training docs [ci skip] 2019-09-12 15:32:39 +02:00
Ines Montani b544dcb3c5 Document debug-data [ci skip] 2019-09-12 15:26:20 +02:00
Ines Montani c0a4cab178 Update "Adding languages" docs [ci skip] 2019-09-12 14:53:06 +02:00
Ines Montani e7c20ad1d2 Update colors entry points docs [ci skip] 2019-09-12 12:59:10 +02:00
Ines Montani 7b59a919e6 Update entry points docs [ci skip] 2019-09-12 12:52:06 +02:00
Sofie Van Landeghem 0b4b4f1819 Documentation for Entity Linking (#4065)
* document token ent_kb_id

* document span kb_id

* update pipeline documentation

* prior and context weights as bool's instead

* entitylinker api documentation

* drop for both models

* finish entitylinker documentation

* small fixes

* documentation for KB

* candidate documentation

* links to api pages in code

* small fix

* frequency examples as counts for consistency

* consistent documentation about tensors returned by predict

* add entity linking to usage 101

* add entity linking infobox and KB section to 101

* entity-linking in linguistic features

* small typo corrections

* training example and docs for entity_linker

* predefined nlp and kb

* revert back to similarity encodings for simplicity (for now)

* set prior probabilities to 0 when excluded

* code clean up

* bugfix: deleting kb ID from tokens when entities were removed

* refactor train el example to use either model or vocab

* pretrain_kb example for example kb generation

* add to training docs for KB + EL example scripts

* small fixes

* error numbering

* ensure the language of vocab and nlp stay consistent across serialization

* equality with =

* avoid conflict in errors file

* add error 151

* final adjustements to the train scripts - consistency

* update of goldparse documentation

* small corrections

* push commit

* typo fix

* add candidate API to kb documentation

* update API sidebar with EntityLinker and KnowledgeBase

* remove EL from 101 docs

* remove entity linker from 101 pipelines / rephrase

* custom el model instead of existing model

* set version to 2.2 for EL functionality

* update documentation for 2 CLI scripts
2019-09-12 11:38:34 +02:00
Sofie Van Landeghem 6b012cebff Make pos/tag distinction more clear in docs (#4246)
* make distinction between tag and pos more prominent in docs

* out of the 101
2019-09-06 10:31:21 +02:00
adrianeboyd 8fe7bdd0fa Improve token pattern checking without validation (#4105)
* Fix typo in rule-based matching docs

* Improve token pattern checking without validation

Add more detailed token pattern checks without full JSON pattern validation and
provide more detailed error messages.

Addresses #4070 (also related: #4063, #4100).

* Check whether top-level attributes in patterns and attr for PhraseMatcher are
  in token pattern schema

* Check whether attribute value types are supported in general (as opposed to
  per attribute with full validation)

* Report various internal error types (OverflowError, AttributeError, KeyError)
  as ValueError with standard error messages

* Check for tagger/parser in PhraseMatcher pipeline for attributes TAG, POS,
  LEMMA, and DEP

* Add error messages with relevant details on how to use validate=True or nlp()
  instead of nlp.make_doc()

* Support attr=TEXT for PhraseMatcher

* Add NORM to schema

* Expand tests for pattern validation, Matcher, PhraseMatcher, and EntityRuler

* Remove unnecessary .keys()

* Rephrase error messages

* Add another type check to Matcher

Add another type check to Matcher for more understandable error messages
in some rare cases.

* Support phrase_matcher_attr=TEXT for EntityRuler

* Don't use spacy.errors in examples and bin scripts

* Fix error code

* Auto-format

Also try get Azure pipelines to finally start a build :(

* Update errors.py


Co-authored-by: Ines Montani <ines@ines.io>
Co-authored-by: Matthew Honnibal <honnibal+gh@gmail.com>
2019-08-21 14:00:37 +02:00
Ines Montani 3134a9b6e0 Add section on expanding regex match to token boundaries (see #4158) [ci skip] 2019-08-21 12:53:31 +02:00
Ines Montani 66aba2d676 Improve regex matching docs [ci skip] 2019-08-19 13:59:41 +02:00
Sofie Van Landeghem cc66f47893 Make enabling/disabling jupyter mode more explicit (#4144)
* make enabling/disabling jupyter mode more explicit

* markup fix
2019-08-19 11:53:34 +02:00
Ines Montani e520eb3f6c Make visualized NER examples more clear (closes #4104) [ci skip] 2019-08-18 16:29:29 +02:00
Ines Montani 1362f793cf Improve docs on phrase pattern attributes (closes #4100) [ci skip] 2019-08-11 11:13:49 +02:00
Ines Montani 8b4a0fabbb Adjust docs example [ci skip] 2019-08-07 00:46:47 +02:00
adrianeboyd 69aca7d839 Add validate option to EntityRuler (#4089)
* Add validate option to EntityRuler

* Add validate to EntityRuler, passed to Matcher and PhraseMatcher

* Add validate to usage and API docs

* Update website/docs/usage/rule-based-matching.md

Co-Authored-By: Ines Montani <ines@ines.io>

* Update website/docs/usage/rule-based-matching.md

Co-Authored-By: Ines Montani <ines@ines.io>
2019-08-07 00:40:53 +02:00
Ines Montani 4ae320e5c2 Use consistent casing for entity ruler patterns (see #4063) [ci skip] 2019-08-06 12:20:22 +02:00
Ines Montani 223bde5cf6 Improve docs on matcher attributes [ci skip] (closes #4063) 2019-08-06 12:13:42 +02:00
Ines Montani 2bfae0b167 Auto-format 2019-08-06 12:13:31 +02:00
Ines Montani bd39e5e630 Add "Processing text" section [ci skip] 2019-07-25 17:38:03 +02:00
Ines Montani a5e3d2f318 Improve section on disabling pipes [ci skip] 2019-07-25 14:25:34 +02:00
Ines Montani 02e444ec7c Add section on special tokenizer component [ci skip] 2019-07-25 14:25:03 +02:00
Ines Montani 1fa6d6ba55 Improve consistency of docs examples [ci skip] 2019-07-25 14:24:56 +02:00
Ines Montani 1167c303a0 Fix typos [ci skip] 2019-07-19 13:08:18 +02:00
Ines Montani c3ead02ea5 Adjust wording [ci skip] 2019-07-17 16:06:25 +02:00
Ines Montani 1d5ff3e455 Add infobox 2019-07-17 15:29:36 +02:00
Ines Montani 114cb18892 Improve wording 2019-07-17 15:27:53 +02:00
Ines Montani 7522beef9e Add "Things to try" prompts 2019-07-17 15:25:02 +02:00
Ines Montani 9f02e3c027 Adjust example
Not actually supported in this alignment interpretation
2019-07-17 15:13:50 +02:00
Ines Montani 1ea472468a Add usage docs for aligning tokenization 2019-07-17 15:08:33 +02:00
pmbaumgartner 9a86d95ea2 fix custom attribute links 2019-07-14 20:23:54 -04:00
Ines Montani ebe58e7fa1 Document gold.docs_to_json [ci skip] 2019-07-10 10:27:33 +02:00
Ines Montani 881f5bc401 Auto-format 2019-07-10 10:27:29 +02:00
Ines Montani d361e380b8 Fix matcher callback example (closes #3862) 2019-06-26 14:47:26 +02:00
Alejandro Alcalde 4866a7ee9e Changed learning rate by its param name. (#3855)
* Changed learning rate by its param name.

I've been searching for a while how the parameter learning rate was named, with `beta1` and `beta2` its easy as they are marked as code, but learning rate wasn't. I think writing the actual parameter name would be helpful.

* Signing SCA
2019-06-20 10:29:20 +02:00
Ramanan Balakrishnan eb12703d10 minor fix to broken link in documentation (#3819) [ci skip] 2019-06-04 11:15:35 +02:00
Ines Montani 0c74506c9c Fix typos in docs (closes #3802) [ci skip] 2019-06-01 11:35:01 +02:00
mak 89379a7fa4 Corrected example model URL in requirements.txt (#3786)
The URL used to show how to add a model to the requirements.txt had the old release path (excl. explosion).
2019-05-29 10:51:55 +02:00
Aaron Kub 719a15f23d fixing regex matcher examples (#3708) (#3719) 2019-05-10 14:23:52 +02:00
张晓飞 ba1ff00370 update response after calling add_pipe (#3661)
* update response after calling add_pipe

component:print_info is appened in the last, so need show it at the end of  pipeline

* Create henry860916.md
2019-05-01 12:02:18 +02:00
Ramiro Gómez 8ee4100f8f Remove dangling M (#3657)
I assume this is a typo. Sorry if it has a meaning that I'm not aware of.
2019-04-29 19:44:43 +02:00
Amit Chaudhary 167d63af31 Fix broken link to Dive Into Python 3 website (#3656)
* Fix broken link to Dive Into Python 3 website

* Sign spaCy Contributor Agreement
2019-04-29 19:44:00 +02:00
Ivan Tham fa94f83697 Improve redundant variable name (#3643)
* Improve redundant variable name

* Apply suggestions from code review

Co-Authored-By: pickfire <pickfire@riseup.net>
2019-04-26 16:50:14 +02:00
Ines Montani 0dce4585b1 Add course to 101 2019-04-19 15:59:51 +02:00
Ines Montani 38395d9518 Merge branch 'spacy.io' 2019-04-19 15:26:20 +02:00
Ines Montani 7ac5bb0a7b Update landing and feature overview 2019-04-19 15:23:08 +02:00
fizban99 f2f2df6e78 entity types for colors should be in uppercase (#3599)
although the text indicates the entity types should be in lowercase, the sample code shows uppercase, which is the correct format.
2019-04-17 11:22:56 +02:00
Ines Montani 9e7deeaf48 Remove Datacamp 2019-04-13 17:46:32 +02:00
Ines Montani 2f0f439c54 Remove non-existent example (closes #3533) 2019-04-03 09:59:17 +02:00
Ines Montani 200d8bdb3c Merge branch 'spacy.io' [ci skip] 2019-03-23 16:46:34 +01:00
Ines Montani 06bf130890 💫 Add better and serializable sentencizer (#3471)
* Add better serializable sentencizer component

* Replace default factory

* Add tests

* Tidy up

* Pass test

* Update docs
2019-03-23 15:45:02 +01:00
Ines Montani b532386a60 Fix typo [ci skip] 2019-03-22 18:36:17 +01:00
Ines Montani 5073ce63fd Merge branch 'spacy.io' [ci skip] 2019-03-22 15:17:11 +01:00
Ines Montani 0712efc6b3 Update version requirements [ci skip] 2019-03-21 10:23:54 +01:00
Ines Montani d4eed4a84f Add note on unicode build to troubleshooting guide (see #3421) [ci skip] 2019-03-19 10:27:02 +01:00
Ines Montani a611b32fbf Update model docs [ci skip] 2019-03-17 11:48:18 +01:00
Ines Montani cbcba699dd Fix missing ids 2019-03-14 17:56:53 +01:00
Ines Montani 4cfe4aa224 Fix small issues in the docs [ci skip] 2019-03-12 22:57:15 +01:00
Ines Montani ba7eb2d131 Update section [ci skip] 2019-03-12 16:18:34 +01:00
Ines Montani cecc31b765 Don't auto-slugify accordion links [ci skip] 2019-03-12 15:30:49 +01:00
Ines Montani 72fb324d95 Add vector training script to bin [ci skip] 2019-03-12 12:07:56 +01:00
Ines Montani 3abf0e6b9f Replace dev-resources links with real examples 2019-03-12 12:07:40 +01:00
Ines Montani 59c0620487 Auto-format 2019-03-12 12:07:11 +01:00
Ines Montani 7c05ca01e8 💫 Support mutable default values for extension attributes (#3389)
* Support mutable default values in extensions

* Update documentation
2019-03-11 12:50:44 +01:00
Ines Montani 8dbf1e9037 Also fix #3387 on develop 2019-03-10 23:36:28 +01:00
Ines Montani 9a8f169e5c Update v2-1.md 2019-03-10 18:58:51 +01:00
Ines Montani 296446a1c8
Tidy up and improve docs and docstrings (#3370)
<!--- Provide a general summary of your changes in the title. -->

## Description
* tidy up and adjust Cython code to code style
* improve docstrings and make calling `help()` nicer
* add URLs to new docs pages to docstrings wherever possible, mostly to user-facing objects
* fix various typos and inconsistencies in docs

### Types of change
enhancement, docs

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2019-03-08 11:42:26 +01:00
Ines Montani 48a206a95f Fix displaCy visualizations in docs (closes #3357) [ci skip] 2019-03-06 13:20:44 +01:00
Ines Montani c478a2ccb6 Update backwards incompat [ci skip] 2019-02-27 11:56:56 +01:00
Ines Montani 1b6238101a Add table explaining training metrics [closes #2644] 2019-02-25 10:03:43 +01:00
Ines Montani 62b558ab72 💫 Support lexical attributes in retokenizer attrs (closes #2390) (#3325)
* Fix formatting and whitespace

* Add support for lexical attributes (closes #2390)

* Document lexical attribute setting during retokenization

* Assign variable oputside of nested loop
2019-02-24 21:13:51 +01:00
Ines Montani aa52305461 Improve pipeline model and meta example [ci skip] 2019-02-24 18:45:39 +01:00
Ines Montani df19e2bff6
💫 Allow setting of custom attributes during retokenization (closes #3314) (#3324)
<!--- Provide a general summary of your changes in the title. -->

## Description

This PR adds the abilility to override custom extension attributes during merging. This will only work for attributes that are writable, i.e. attributes registered with a default value like `default=False` or attribute that have both a getter *and* a setter implemented.

```python
Token.set_extension('is_musician', default=False)

doc = nlp("I like David Bowie.")
with doc.retokenize() as retokenizer:
    attrs = {"LEMMA": "David Bowie", "_": {"is_musician": True}}
    retokenizer.merge(doc[2:4], attrs=attrs)

assert doc[2].text == "David Bowie"
assert doc[2].lemma_ == "David Bowie"
assert doc[2]._.is_musician
```

### Types of change
enhancement

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2019-02-24 18:38:47 +01:00
Ines Montani 403b9cd58b Add docs on adding to existing tokenizer rules [ci skip] 2019-02-24 18:35:19 +01:00
Ines Montani 383e2e1f12 Update Python versions [ci skip] 2019-02-24 11:49:45 +01:00
Ines Montani b624cb4b89 Update v2-1.md 2019-02-24 11:49:27 +01:00
Ines Montani 0fc908d7a5 Add note on merging speed in v2.1 (see #3300) [ci skip] 2019-02-21 12:34:18 +01:00
Ines Montani 236aa94ded Update v2-1.md 2019-02-21 12:33:56 +01:00
Sofie 9a478b6db8 Clean up of char classes, few tokenizer fixes and faster default French tokenizer (#3293)
* splitting up latin unicode interval

* removing hyphen as infix for French

* adding failing test for issue 1235

* test for issue #3002 which now works

* partial fix for issue #2070

* keep the hyphen as infix for French (as it was)

* restore french expressions with hyphen as infix (as it was)

* added succeeding unit test for Issue #2656

* Fix issue #2822 with custom Italian exception

* Fix issue #2926 by allowing numbers right before infix /

* splitting up latin unicode interval

* removing hyphen as infix for French

* adding failing test for issue 1235

* test for issue #3002 which now works

* partial fix for issue #2070

* keep the hyphen as infix for French (as it was)

* restore french expressions with hyphen as infix (as it was)

* added succeeding unit test for Issue #2656

* Fix issue #2822 with custom Italian exception

* Fix issue #2926 by allowing numbers right before infix /

* remove duplicate

* remove xfail for Issue #2179 fixed by Matt

* adjust documentation and remove reference to regex lib
2019-02-20 22:10:13 +01:00
Ines Montani 57ae71ea95 Add docs on serializing the pipeline (see #3289) [ci skip] 2019-02-18 14:13:29 +01:00
Ines Montani 38e4422c0d Improve matcher example (resolves #3287) 2019-02-18 13:26:37 +01:00
Ines Montani 660cfe44c5 Fix formatting 2019-02-18 13:26:22 +01:00
Ines Montani 212ff359ef Fix links [ci skip] 2019-02-17 22:25:50 +01:00
Ines Montani e597110d31
💫 Update website (#3285)
<!--- Provide a general summary of your changes in the title. -->

## Description

The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.

This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.


### Types of change
enhancement

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2019-02-17 19:31:19 +01:00