Commit Graph

41 Commits

Author SHA1 Message Date
Sofie Van Landeghem deb143fa70
Token sent attributes more consistent (#10164)
* remove duplicate line

* add sent start/end token attributes to the docs

* let has_annotation work with IS_SENT_END

* elif instead of if

* add has_annotation test for sent attributes

* fix typo

* remove duplicate is_sent_start entry in docs
2022-02-08 08:35:37 +01:00
Paul O'Leary McCann b53e39455e
Fix UD POS docs links (fix #9013) (#9407)
* Fix UD POS docs links (fix #9013)

The previous link seems to have been for UD v1.

* Fix link
2021-10-11 11:51:19 +02:00
Adriane Boyd 4d1ef8f695 Tidy up docs 2021-06-28 12:08:15 +02:00
Sofie Van Landeghem 0fd0d949c4
fix 's typo's across code base (#8384) 2021-06-15 10:57:08 +02:00
Adriane Boyd f68fc29130
Update sent_starts in Example.from_dict (#7847)
* Update sent_starts in Example.from_dict

Update `sent_starts` for `Example.from_dict` so that `Optional[bool]`
values have the same meaning as for `Token.is_sent_start`.

Use `Optional[bool]` as the type for sent start values in the docs.

* Use helper function for conversion to ternary ints
2021-04-22 11:32:45 +02:00
Ines Montani 230e651ad6 Merge branch 'develop' into master-tmp 2021-01-27 13:26:29 +11:00
jganseman 1f2b0ec168
proposing a more concise explanation for is_oov
proposing a more concise explanation for is_oov
2021-01-26 10:53:39 +01:00
Adriane Boyd 9328dd5625
Handle unset token.morph in Morphologizer (#6704)
* Handle unset token.morph in Morphologizer

Handle unset `token.morph` in `Morphologizer.initialize` and
`Morphologizer.get_loss`. If both `token.morph` and `token.pos` are
unset, treat the annotation as missing rather than empty.

* Add token.has_morph()
2021-01-15 17:20:10 +01:00
Ines Montani df06f7a792 Update docs [ci skip] 2020-10-02 13:24:33 +02:00
Adriane Boyd fd09e6b140 Update docs for Token.morph / Token.set_morph 2020-10-02 09:05:15 +02:00
walterhenry 3360825e00 Proofreading
Another round of proofreading. All the API docs have been read through and I've grazed the Usage docs.
2020-09-28 16:50:15 +02:00
Ines Montani 52bd3a8b48 Update docs [ci skip] 2020-08-21 13:22:59 +02:00
Ines Montani 3ae5e02f4f Update docs, types and API consistency 2020-08-17 16:45:24 +02:00
Ines Montani 023ba7ae26 Update docs 2020-08-10 17:13:11 +02:00
Ines Montani c099f6eece Add Token.lex 2020-08-10 16:43:52 +02:00
Ines Montani e0ffe36e79 Update docstrings, docs and types 2020-07-29 11:36:42 +02:00
Adriane Boyd 941b9e33f7 Add Token.morph_ 2020-07-22 17:59:45 +02:00
Adriane Boyd fcd3a4abe3 Add morph to Token API docs 2020-07-21 13:05:58 +02:00
Ines Montani 1e0d54edd1 Update docs 2020-07-04 14:23:10 +02:00
Ines Montani fe4cfd0632 Start updating website for v3 [ci skip] 2020-07-01 21:26:39 +02:00
Adriane Boyd d5110ffbf2
Documentation updates for v2.3.0 (#5593)
* Update website models for v2.3.0

* Add docs for Chinese word segmentation

* Tighten up Chinese docs section

* Merge branch 'master' into docs/v2.3.0 [ci skip]

* Merge branch 'master' into docs/v2.3.0 [ci skip]

* Auto-format and update version

* Update matcher.md

* Update languages and sorting

* Typo in landing page

* Infobox about token_match behavior

* Add meta and basic docs for Japanese

* POS -> TAG in models table

* Add info about lookups for normalization

* Updates to API docs for v2.3

* Update adding norm exceptions for adding languages

* Add --omit-extra-lookups to CLI API docs

* Add initial draft of "What's New in v2.3"

* Add new in v2.3 tags to Chinese and Japanese sections

* Add tokenizer to migration section

* Add new in v2.3 flags to init-model

* Typo

* More what's new in v2.3

Co-authored-by: Ines Montani <ines@ines.io>
2020-06-16 15:37:35 +02:00
Ines Montani 65c7e82de2 Auto-format and remove 2.3 feature [ci skip] 2020-05-22 13:50:30 +02:00
adrianeboyd 4a15b559ba
Clarify Token.pos as UPOS (#5419) 2020-05-08 10:36:25 +02:00
adrianeboyd a2345618f1
Fix Token API docs from #5375 (#5418) 2020-05-08 10:25:02 +02:00
adrianeboyd a6e521cd79
Add is_sent_end token property (#5375)
Reconstruction of the original PR #4697 by @MiniLau.

Removes unused `SENT_END` symbol and `IS_SENT_END` from `Matcher` schema
because the Matcher is only going to be able to support `IS_SENT_START`.
2020-04-29 12:53:16 +02:00
Adriane Boyd 3853d385fa Fix formatting in Token API 2020-02-20 13:41:24 +01:00
Tclack88 ab8dc2732c Update token.md (#4767)
* Update token.md

documentation is confusing: A '?' is a right punct, but '¿' is a left punct

* Update token.md

add quotations around parentheses in `is_left_punct` and `is_right_punct` for clarrification, ensuring the question mark that follows is not percieved as an example of left and right punctuation

* Move quotes into code block [ci skip]
2019-12-06 19:22:02 +01:00
Ines Montani cbacb0f1a4 Update shape docs and examples (resolves #4615) [ci skip] 2019-11-23 17:16:55 +01:00
Ines Montani 82c16b7943 Remove u-strings and fix formatting [ci skip] 2019-09-12 16:11:15 +02:00
Sofie Van Landeghem 0b4b4f1819 Documentation for Entity Linking (#4065)
* document token ent_kb_id

* document span kb_id

* update pipeline documentation

* prior and context weights as bool's instead

* entitylinker api documentation

* drop for both models

* finish entitylinker documentation

* small fixes

* documentation for KB

* candidate documentation

* links to api pages in code

* small fix

* frequency examples as counts for consistency

* consistent documentation about tensors returned by predict

* add entity linking to usage 101

* add entity linking infobox and KB section to 101

* entity-linking in linguistic features

* small typo corrections

* training example and docs for entity_linker

* predefined nlp and kb

* revert back to similarity encodings for simplicity (for now)

* set prior probabilities to 0 when excluded

* code clean up

* bugfix: deleting kb ID from tokens when entities were removed

* refactor train el example to use either model or vocab

* pretrain_kb example for example kb generation

* add to training docs for KB + EL example scripts

* small fixes

* error numbering

* ensure the language of vocab and nlp stay consistent across serialization

* equality with =

* avoid conflict in errors file

* add error 151

* final adjustements to the train scripts - consistency

* update of goldparse documentation

* small corrections

* push commit

* typo fix

* add candidate API to kb documentation

* update API sidebar with EntityLinker and KnowledgeBase

* remove EL from 101 docs

* remove entity linker from 101 pipelines / rephrase

* custom el model instead of existing model

* set version to 2.2 for EL functionality

* update documentation for 2 CLI scripts
2019-09-12 11:38:34 +02:00
Ines Montani ce4c3e5204 Document force flag on set_extension (closes #4148) 2019-08-19 19:22:07 +02:00
Ines Montani 0f76e0022d Update .tensor docs [ci skip] 2019-08-01 18:37:09 +02:00
Nipun Sadvilkar 1f13005751 Incorrect Token attribute ent_iob_ description (#3800)
* Incorrect Token attribute ent_iob_ description

* Add spaCy contributor agreement
2019-05-31 16:50:45 +02:00
Ines Montani 321c9f5acc Fix lex_id docs (closes #3743) 2019-05-16 23:15:58 +02:00
Ines Montani 25f5592d57 Improve Token.prob and Lexeme.prob docs (resolves #3701) 2019-05-11 15:23:41 +02:00
pierremonico 0d26bfe677 Removes duplicate in table (#3550)
* Removes duplicate in table

Just fixing typos.

* Remove newline


Co-authored-by: Ines Montani <ines@ines.io>
2019-04-08 10:30:42 +02:00
Ines Montani cdd418b93e Auto-format [ci skip] 2019-03-11 17:10:50 +01:00
Matthew Honnibal b0b990e405 Fix token.conjuncts (closes #795) (#3392)
* Implement conjuncts method

* Add span.conjuncts property

* Un-xfail token.conjuncts tests

* Update docs for token.conjuncts and span.conjuncts

* Fix merge error in token.conjuncts
2019-03-11 17:05:45 +01:00
Ines Montani 296446a1c8
Tidy up and improve docs and docstrings (#3370)
<!--- Provide a general summary of your changes in the title. -->

## Description
* tidy up and adjust Cython code to code style
* improve docstrings and make calling `help()` nicer
* add URLs to new docs pages to docstrings wherever possible, mostly to user-facing objects
* fix various typos and inconsistencies in docs

### Types of change
enhancement, docs

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2019-03-08 11:42:26 +01:00
Matthew Honnibal 4a3371acd5
Make doc[0].is_sent_start == True (closes #2869) (#3340)
* Make doc[0] have sent_start True. Closes #2869

* Document that doc[0].is_sent_start defaults True.
2019-02-27 11:17:17 +01:00
Ines Montani e597110d31
💫 Update website (#3285)
<!--- Provide a general summary of your changes in the title. -->

## Description

The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.

This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.


### Types of change
enhancement

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2019-02-17 19:31:19 +01:00