Document `spacy-llm`'s `RawTask` (#13180)

* Add section on RawTask.

* Fix API docs.

* Update website/docs/api/large-language-models.mdx

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

---------

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
This commit is contained in:
Raphael Mitsch 2023-12-11 17:14:12 +01:00 committed by GitHub
parent a25a3b996b
commit e79a9c5acd
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
1 changed files with 61 additions and 0 deletions

View File

@ -236,6 +236,67 @@ objects. This depends on the return type of the [model](#models).
| `responses` | The generated prompts. ~~Iterable[Any]~~ | | `responses` | The generated prompts. ~~Iterable[Any]~~ |
| **RETURNS** | The annotated documents. ~~Iterable[Doc]~~ | | **RETURNS** | The annotated documents. ~~Iterable[Doc]~~ |
### Raw prompting {id="raw"}
Different to all other tasks `spacy.Raw.vX` doesn't provide a specific prompt,
wrapping doc data, to the model. Instead it instructs the model to reply to the
doc content. This is handy for use cases like question answering (where each doc
contains one question) or if you want to include customized prompts for each doc.
#### spacy.Raw.v1 {id="raw-v1"}
Note that since this task may request arbitrary information, it doesn't do any
parsing per se - the model response is stored in a custom `Doc` attribute (i. e.
can be accessed via `doc._.{field}`).
It supports both zero-shot and few-shot prompting.
> #### Example config
>
> ```ini
> [components.llm.task]
> @llm_tasks = "spacy.Raw.v1"
> examples = null
> ```
| Argument | Description |
| --------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `template` | Custom prompt template to send to LLM model. Defaults to [raw.v1.jinja](https://github.com/explosion/spacy-llm/blob/main/spacy_llm/tasks/templates/raw.v1.jinja). ~~str~~ |
| `examples` | Optional function that generates examples for few-shot learning. Defaults to `None`. ~~Optional[Callable[[], Iterable[Any]]]~~ |
| `parse_responses` | Callable for parsing LLM responses for this task. Defaults to the internal parsing method for this task. ~~Optional[TaskResponseParser[RawTask]]~~ |
| `prompt_example_type` | Type to use for fewshot examples. Defaults to `RawExample`. ~~Optional[Type[FewshotExample]]~~ |
| `field` | Name of extension attribute to store model reply in (i. e. the reply will be available in `doc._.{field}`). Defaults to `reply`. ~~str~~ |
To perform [few-shot learning](/usage/large-language-models#few-shot-prompts),
you can write down a few examples in a separate file, and provide these to be
injected into the prompt to the LLM. The default reader `spacy.FewShotReader.v1`
supports `.yml`, `.yaml`, `.json` and `.jsonl`.
```yaml
# Each example can follow an arbitrary pattern. It might help the prompt performance though if the examples resemble
# the actual docs' content.
- text: "3 + 5 = x. What's x?"
reply: '8'
- text: 'Write me a limerick.'
reply:
"There was an Old Man with a beard, Who said, 'It is just as I feared! Two
Owls and a Hen, Four Larks and a Wren, Have all built their nests in my
beard!"
- text: "Analyse the sentiment of the text 'This is great'."
reply: "'This is great' expresses a very positive sentiment."
```
```ini
[components.llm.task]
@llm_tasks = "spacy.Raw.v1"
field = "llm_reply"
[components.llm.task.examples]
@misc = "spacy.FewShotReader.v1"
path = "raw_examples.yml"
```
### Summarization {id="summarization"} ### Summarization {id="summarization"}
A summarization task takes a document as input and generates a summary that is A summarization task takes a document as input and generates a summary that is