💫 Update website (#3285)
<!--- Provide a general summary of your changes in the title. --> ## Description The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on. This PR also includes various new docs pages and content. Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837. ### Types of change enhancement ## Checklist <!--- Before you submit the PR, go over this checklist and make sure you can tick off all the boxes. [] -> [x] --> - [x] I have submitted the spaCy Contributor Agreement. - [x] I ran the tests, and all new and existing tests passed. - [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
|
@ -5,9 +5,15 @@ corpora/
|
|||
keys/
|
||||
|
||||
# Website
|
||||
website/.cache/
|
||||
website/public/
|
||||
website/node_modules
|
||||
website/.npm
|
||||
website/logs
|
||||
*.log
|
||||
npm-debug.log*
|
||||
website/www/
|
||||
website/_deploy.sh
|
||||
website/.gitignore
|
||||
|
||||
# Cython / C extensions
|
||||
cythonize.json
|
||||
|
|
|
@ -0,0 +1,38 @@
|
|||
{
|
||||
"semi": false,
|
||||
"singleQuote": true,
|
||||
"trailingComma": "es5",
|
||||
"tabWidth": 4,
|
||||
"printWidth": 100,
|
||||
"overrides": [
|
||||
{
|
||||
"files": "*.sass",
|
||||
"options": {
|
||||
"printWidth": 999
|
||||
}
|
||||
},
|
||||
{
|
||||
"files": "*.mdx",
|
||||
"options": {
|
||||
"tabWidth": 2,
|
||||
"printWidth": 80,
|
||||
"proseWrap": "always"
|
||||
}
|
||||
},
|
||||
{
|
||||
"files": "*.md",
|
||||
"options": {
|
||||
"tabWidth": 2,
|
||||
"printWidth": 80,
|
||||
"proseWrap": "always",
|
||||
"htmlWhitespaceSensitivity": "strict"
|
||||
}
|
||||
},
|
||||
{
|
||||
"files": "*.html",
|
||||
"options": {
|
||||
"htmlWhitespaceSensitivity": "strict"
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
|
@ -1,12 +0,0 @@
|
|||
//- 💫 404 ERROR
|
||||
|
||||
include _includes/_mixins
|
||||
|
||||
+landing-header
|
||||
h1.c-landing__title.u-heading-0
|
||||
| Ooops, this page#[br]
|
||||
| does not exist!
|
||||
|
||||
h2.c-landing__title.u-heading-3.u-padding-small
|
||||
+button(false, true, "secondary-light")(href="javascript:history.go(-1)")
|
||||
| Click here to go back
|
|
@ -1,143 +1,559 @@
|
|||
<a href="https://explosion.ai"><img src="https://explosion.ai/assets/img/logo.svg" width="125" height="125" align="right" /></a>
|
||||
<Comment>
|
||||
|
||||
# spacy.io website and docs
|
||||
|
||||
The [spacy.io](https://spacy.io) website is implemented in [Jade (aka Pug)](https://www.jade-lang.org), and is built or served by [Harp](https://harpjs.com). Jade is an extensible templating language with a readable syntax, that compiles to HTML.
|
||||
The website source makes extensive use of Jade mixins, so that the design system is abstracted away from the content you're
|
||||
writing. You can read more about our approach in our blog post, ["Rebuilding a Website with Modular Markup"](https://explosion.ai/blog/modular-markup).
|
||||
_This page contains the documentation and styleguide for the spaCy website. Its
|
||||
rendered version is available at https://spacy.io/styleguide._
|
||||
|
||||
---
|
||||
|
||||
## Viewing the site locally
|
||||
</Comment>
|
||||
|
||||
The [spacy.io](https://spacy.io) website is implemented using
|
||||
[Gatsby](https://www.gatsbyjs.org) with
|
||||
[Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This
|
||||
allows authoring content in **straightforward Markdown** without the usual
|
||||
limitations. Standard elements can be overwritten with powerful
|
||||
[React](http://reactjs.org/) components and wherever Markdown syntax isn't
|
||||
enough, JSX components can be used.
|
||||
|
||||
> #### Contributing to the site
|
||||
>
|
||||
> The docs can always use another example or more detail, and they should always
|
||||
> be up to date and not misleading. We always appreciate a
|
||||
> [pull request](https://github.com/explosion/spaCy/pulls). To quickly find the
|
||||
> correct file to edit, simply click on the "Suggest edits" button at the bottom
|
||||
> of a page.
|
||||
>
|
||||
> For more details on editing the site locally, see the installation
|
||||
> instructions and markdown reference below.
|
||||
|
||||
## Logo {#logo source="website/src/images/logo.svg"}
|
||||
|
||||
import { Logos } from 'widgets/styleguide'
|
||||
|
||||
If you would like to use the spaCy logo on your site, please get in touch and
|
||||
ask us first. However, if you want to show support and tell others that your
|
||||
project is using spaCy, you can grab one of our
|
||||
[spaCy badges](/usage/spacy-101#faq-project-with-spacy).
|
||||
|
||||
<Logos />
|
||||
|
||||
## Colors {#colors}
|
||||
|
||||
import { Colors, Patterns } from 'widgets/styleguide'
|
||||
|
||||
<Colors />
|
||||
|
||||
### Patterns
|
||||
|
||||
<Patterns />
|
||||
|
||||
## Typography {#typography}
|
||||
|
||||
import { H1, H2, H3, H4, H5, Label, InlineList, Comment } from
|
||||
'components/typography'
|
||||
|
||||
> #### Markdown
|
||||
>
|
||||
> ```markdown_
|
||||
> ## Headline 2
|
||||
> ## Headline 2 {#some_id}
|
||||
> ## Headline 2 {#some_id tag="method"}
|
||||
> ```
|
||||
>
|
||||
> #### JSX
|
||||
>
|
||||
> ```jsx
|
||||
> <H2>Headline 2</H2>
|
||||
> <H2 id="some_id">Headline 2</H2>
|
||||
> <H2 id="some_id" tag="method">Headline 2</H2>
|
||||
> ```
|
||||
|
||||
Headlines are set in
|
||||
[HK Grotesk](http://cargocollective.com/hanken/HK-Grotesk-Open-Source-Font) by
|
||||
Hanken Design. All other body text and code uses the best-matching default
|
||||
system font to provide a "native" reading experience.
|
||||
|
||||
<Infobox title="Important note" variant="warning">
|
||||
|
||||
Level 2 headings are automatically wrapped in `<section>` elements at compile
|
||||
time, using a custom
|
||||
[Markdown transformer](https://github.com/explosion/spaCy/tree/master/website/plugins/remark-wrap-section.js).
|
||||
This makes it easier to highlight the section that's currently in the viewpoint
|
||||
in the sidebar menu.
|
||||
|
||||
</Infobox>
|
||||
|
||||
<div>
|
||||
<H1>Headline 1</H1>
|
||||
<H2>Headline 2</H2>
|
||||
<H3>Headline 3</H3>
|
||||
<H4>Headline 4</H4>
|
||||
<H5>Headline 5</H5>
|
||||
<Label>Label</Label>
|
||||
</div>
|
||||
|
||||
---
|
||||
|
||||
The following optional attributes can be set on the headline to modify it. For
|
||||
example, to add a tag for the documented type or mark features that have been
|
||||
introduced in a specific version or require statistical models to be loaded.
|
||||
Tags are also available as standalone `<Tag />` components.
|
||||
|
||||
| Argument | Example | Result |
|
||||
| -------- | -------------------------- | ----------------------------------------- |
|
||||
| `tag` | `{tag="method"}` | <Tag>method</Tag> |
|
||||
| `new` | `{new="2"}` | <Tag variant="new">2</Tag> |
|
||||
| `model` | `{model="tagger, parser"}` | <Tag variant="model">tagger, parser</Tag> |
|
||||
| `hidden` | `{hidden="true"}` | |
|
||||
|
||||
## Elements {#elements}
|
||||
|
||||
### Links {#links}
|
||||
|
||||
> #### Markdown
|
||||
>
|
||||
> ```markdown
|
||||
> [I am a link](https://spacy.io)
|
||||
> ```
|
||||
>
|
||||
> #### JSX
|
||||
>
|
||||
> ```jsx
|
||||
> <Link to="https://spacy.io">I am a link</Link>
|
||||
> ```
|
||||
|
||||
Special link styles are used depending on the link URL.
|
||||
|
||||
- [I am a regular external link](https://explosion.ai)
|
||||
- [I am a link to the documentation](/api/doc)
|
||||
- [I am a link to GitHub](https://github.com/explosion/spaCy)
|
||||
|
||||
### Abbreviations {#abbr}
|
||||
|
||||
import { Abbr } from 'components/typography'
|
||||
|
||||
> #### JSX
|
||||
>
|
||||
> ```jsx
|
||||
> <Abbr title="Explanation">Abbreviation</Abbr>
|
||||
> ```
|
||||
|
||||
Some text with <Abbr title="Explanation here">an abbreviation</Abbr>. On small
|
||||
screens, I collapse and the explanation text is displayed next to the
|
||||
abbreviation.
|
||||
|
||||
### Tags {#tags}
|
||||
|
||||
import Tag from 'components/tag'
|
||||
|
||||
> ```jsx
|
||||
> <Tag>method</Tag>
|
||||
> <Tag variant="new">2.1</Tag>
|
||||
> <Tag variant="model">tagger, parser</Tag>
|
||||
> ```
|
||||
|
||||
Tags can be used together with headlines, or next to properties across the
|
||||
documentation, and combined with tooltips to provide additional information. An
|
||||
optional `variant` argument can be used for special tags. `variant="new"` makes
|
||||
the tag take a version number to mark new features. Using the component,
|
||||
visibility of this tag can later be toggled once the feature isn't considered
|
||||
new anymore. Setting `variant="model"` takes a description of model capabilities
|
||||
and can be used to mark features that require a respective model to be
|
||||
installed.
|
||||
|
||||
<InlineList>
|
||||
|
||||
<Tag>method</Tag> <Tag variant="new">2</Tag> <Tag variant="model">tagger,
|
||||
parser</Tag>
|
||||
|
||||
</InlineList>
|
||||
|
||||
### Buttons {#buttons}
|
||||
|
||||
import Button from 'components/button'
|
||||
|
||||
> ```jsx
|
||||
> <Button to="#" variant="primary">Primary small</Button>
|
||||
> <Button to="#" variant="secondary">Secondary small</Button>
|
||||
> ```
|
||||
|
||||
Link buttons come in two variants, `primary` and `secondary` and two sizes, with
|
||||
an optional `large` size modifier. Since they're mostly used as enhanced links,
|
||||
the buttons are implemented as styled links instead of native button elements.
|
||||
|
||||
<InlineList><Button to="#" variant="primary">Primary small</Button>
|
||||
<Button to="#" variant="secondary">Secondary small</Button></InlineList>
|
||||
|
||||
<InlineList><Button to="#" variant="primary" large>Primary large</Button>
|
||||
<Button to="#" variant="secondary" large>Secondary large</Button></InlineList>
|
||||
|
||||
## Components
|
||||
|
||||
### Table
|
||||
|
||||
> #### Markdown
|
||||
>
|
||||
> ```markdown_
|
||||
> | Header 1 | Header 2 |
|
||||
> | --- | --- |
|
||||
> | Column 1 | Column 2 |
|
||||
> ```
|
||||
>
|
||||
> #### JSX
|
||||
>
|
||||
> ```markup
|
||||
> <Table>
|
||||
> <Tr><Th>Header 1</Th><Th>Header 2</Th></Tr></thead>
|
||||
> <Tr><Td>Column 1</Td><Td>Column 2</Td></Tr>
|
||||
> </Table>
|
||||
> ```
|
||||
|
||||
Tables are used to present data and API documentation. Certain keywords can be
|
||||
used to mark a footer row with a distinct style, for example to visualise the
|
||||
return values of a documented function.
|
||||
|
||||
| Header 1 | Header 2 | Header 3 | Header 4 |
|
||||
| ----------- | -------- | :------: | -------: |
|
||||
| Column 1 | Column 2 | Column 3 | Column 4 |
|
||||
| Column 1 | Column 2 | Column 3 | Column 4 |
|
||||
| Column 1 | Column 2 | Column 3 | Column 4 |
|
||||
| Column 1 | Column 2 | Column 3 | Column 4 |
|
||||
| **RETURNS** | Column 2 | Column 3 | Column 4 |
|
||||
|
||||
### List
|
||||
|
||||
> #### Markdown
|
||||
>
|
||||
> ```markdown_
|
||||
> 1. One
|
||||
> 2. Two
|
||||
> ```
|
||||
>
|
||||
> #### JSX
|
||||
>
|
||||
> ```markup
|
||||
> <Ol>
|
||||
> <Li>One</Li>
|
||||
> <Li>Two</Li>
|
||||
> </Ol>
|
||||
> ```
|
||||
|
||||
Lists are available as bulleted and numbered. Markdown lists are transformed
|
||||
automatically.
|
||||
|
||||
- I am a bulleted list
|
||||
- I have nice bullets
|
||||
- Lorem ipsum dolor
|
||||
- consectetur adipiscing elit
|
||||
|
||||
1. I am an ordered list
|
||||
2. I have nice numbers
|
||||
3. Lorem ipsum dolor
|
||||
4. consectetur adipiscing elit
|
||||
|
||||
### Aside
|
||||
|
||||
> #### Markdown
|
||||
>
|
||||
> ```markdown_
|
||||
> > #### Aside title
|
||||
> > This is aside text.
|
||||
> ```
|
||||
>
|
||||
> #### JSX
|
||||
>
|
||||
> ```jsx
|
||||
> <Aside title="Aside title">This is aside text.</Aside>
|
||||
> ```
|
||||
|
||||
Asides can be used to display additional notes and content in the right-hand
|
||||
column. Asides can contain text, code and other elements if needed. Visually,
|
||||
asides are moved to the side on the X-axis, and displayed at the same level they
|
||||
were inserted. On small screens, they collapse and are rendered in their
|
||||
original position, in between the text.
|
||||
|
||||
To make them easier to use in Markdown, paragraphs formatted as blockquotes will
|
||||
turn into asides by default. Level 4 headlines (with a leading `####`) will
|
||||
become aside titles.
|
||||
|
||||
### Code Block
|
||||
|
||||
> #### Markdown
|
||||
>
|
||||
> ````markdown_
|
||||
> ```python
|
||||
> ### This is a title
|
||||
> import spacy
|
||||
> ```
|
||||
> ````
|
||||
>
|
||||
> #### JSX
|
||||
>
|
||||
> ```jsx
|
||||
> <CodeBlock title="This is a title" lang="python">
|
||||
> import spacy
|
||||
> </CodeBlock>
|
||||
> ```
|
||||
|
||||
Code blocks use the [Prism](http://prismjs.com/) syntax highlighter with a
|
||||
custom theme. The language can be set individually on each block, and defaults
|
||||
to raw text with no highlighting. An optional label can be added as the first
|
||||
line with the prefix `####` (Python-like) and `///` (JavaScript-like). the
|
||||
indented block as plain text and preserve whitespace.
|
||||
|
||||
```python
|
||||
### Using spaCy
|
||||
import spacy
|
||||
nlp = spacy.load("en_core_web_sm")
|
||||
doc = nlp(u"This is a sentence.")
|
||||
for token in doc:
|
||||
print(token.text, token.pos_)
|
||||
```
|
||||
|
||||
Code blocks and also specify an optional range of line numbers to highlight by
|
||||
adding `{highlight="..."}` to the headline. Acceptable ranges are spans like
|
||||
`5-7`, but also `5-7,10` or `5-7,10,13-14`.
|
||||
|
||||
> #### Markdown
|
||||
>
|
||||
> ````markdown_
|
||||
> ```python
|
||||
> ### This is a title {highlight="1-2"}
|
||||
> import spacy
|
||||
> nlp = spacy.load("en_core_web_sm")
|
||||
> ```
|
||||
> ````
|
||||
|
||||
```python
|
||||
### Using the matcher {highlight="5-7"}
|
||||
import spacy
|
||||
from spacy.matcher import Matcher
|
||||
|
||||
nlp = spacy.load('en_core_web_sm')
|
||||
matcher = Matcher(nlp.vocab)
|
||||
pattern = [{'LOWER': 'hello'}, {'IS_PUNCT': True}, {'LOWER': 'world'}]
|
||||
matcher.add('HelloWorld', None, pattern)
|
||||
doc = nlp(u'Hello, world! Hello world!')
|
||||
matches = matcher(doc)
|
||||
```
|
||||
|
||||
Adding `{executable="true"}` to the title turns the code into an executable
|
||||
block, powered by [Binder](https://mybinder.org) and
|
||||
[Juniper](https://github.com/ines/juniper). If JavaScript is disabled, the
|
||||
interactive widget defaults to a regular code block.
|
||||
|
||||
> #### Markdown
|
||||
>
|
||||
> ````markdown_
|
||||
> ```python
|
||||
> ### {executable="true"}
|
||||
> import spacy
|
||||
> nlp = spacy.load("en_core_web_sm")
|
||||
> ```
|
||||
> ````
|
||||
|
||||
```python
|
||||
### {executable="true"}
|
||||
import spacy
|
||||
nlp = spacy.load("en_core_web_sm")
|
||||
doc = nlp(u"This is a sentence.")
|
||||
for token in doc:
|
||||
print(token.text, token.pos_)
|
||||
```
|
||||
|
||||
If a code block only contains a URL to a GitHub file, the raw file contents are
|
||||
embedded automatically and syntax highlighting is applied. The link to the
|
||||
original file is shown at the top of the widget.
|
||||
|
||||
> #### Markdown
|
||||
>
|
||||
> ````markdown_
|
||||
> ```python
|
||||
> https://github.com/...
|
||||
> ```
|
||||
> ````
|
||||
>
|
||||
> #### JSX
|
||||
>
|
||||
> ```jsx
|
||||
> <GitHubCode url="https://github.com/..." lang="python" />
|
||||
> ```
|
||||
|
||||
```python
|
||||
https://github.com/explosion/spaCy/tree/master/examples/pipeline/custom_component_countries_api.py
|
||||
```
|
||||
|
||||
### Infobox
|
||||
|
||||
import Infobox from 'components/infobox'
|
||||
|
||||
> #### JSX
|
||||
>
|
||||
> ```jsx
|
||||
> <Infobox title="Information">Regular infobox</Infobox>
|
||||
> <Infobox title="Important note" variant="warning">This is a warning.</Infobox>
|
||||
> <Infobox title="Be careful!" variant="danger">This is dangerous.</Infobox>
|
||||
> ```
|
||||
|
||||
Infoboxes can be used to add notes, updates, warnings or additional information
|
||||
to a page or section. Semantically, they're implemented and interpreted as an
|
||||
`aside` element. Infoboxes can take an optional `title` argument, as well as an
|
||||
optional `variant` (either `"warning"` or `"danger"`).
|
||||
|
||||
<Infobox title="This is an infobox">
|
||||
|
||||
If needed, an infobox can contain regular text, `inline code`, lists and other
|
||||
blocks.
|
||||
|
||||
</Infobox>
|
||||
|
||||
<Infobox title="This is a warning" variant="warning">
|
||||
|
||||
If needed, an infobox can contain regular text, `inline code`, lists and other
|
||||
blocks.
|
||||
|
||||
</Infobox>
|
||||
|
||||
<Infobox title="This is dangerous" variant="danger">
|
||||
|
||||
If needed, an infobox can contain regular text, `inline code`, lists and other
|
||||
blocks.
|
||||
|
||||
</Infobox>
|
||||
|
||||
### Accordion
|
||||
|
||||
import Accordion from 'components/accordion'
|
||||
|
||||
> #### JSX
|
||||
>
|
||||
> ```jsx
|
||||
> <Accordion title="This is an accordion">
|
||||
> Accordion content goes here.
|
||||
> </Accordion>
|
||||
> ```
|
||||
|
||||
Accordions are collapsible sections that are mostly used for lengthy tables,
|
||||
like the tag and label annotation schemes for different languages. They all need
|
||||
to be presented – but chances are the user doesn't actually care about _all_ of
|
||||
them, especially not at the same time. So it's fairly reasonable to hide them
|
||||
begin a click. This particular implementation was inspired by the amazing
|
||||
[Inclusive Components blog](https://inclusive-components.design/collapsible-sections/).
|
||||
|
||||
<Accordion title="This is an accordion">
|
||||
|
||||
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Quisque enim ante,
|
||||
pretium a orci eget, varius dignissim augue. Nam eu dictum mauris, id tincidunt
|
||||
nisi. Integer commodo pellentesque tincidunt. Nam at turpis finibus tortor
|
||||
gravida sodales tincidunt sit amet est. Nullam euismod arcu in tortor auctor,
|
||||
sit amet dignissim justo congue.
|
||||
|
||||
</Accordion>
|
||||
|
||||
## Setup and installation {#setup}
|
||||
|
||||
Before running the setup, make sure your versions of
|
||||
[Node](https://nodejs.org/en/) and [npm](https://www.npmjs.com/) are up to date.
|
||||
|
||||
```bash
|
||||
sudo npm install --global harp
|
||||
# Clone the repository
|
||||
git clone https://github.com/explosion/spaCy
|
||||
cd spaCy/website
|
||||
harp server
|
||||
|
||||
# Install Gatsby's command-line tool
|
||||
npm install --global gatsby-cli
|
||||
|
||||
# Install the dependencies
|
||||
npm install
|
||||
|
||||
# Start the development server
|
||||
npm run dev
|
||||
```
|
||||
|
||||
This will serve the site on [http://localhost:9000](http://localhost:9000).
|
||||
If you are planning on making edits to the site, you should also set up the
|
||||
[Prettier](https://prettier.io/) code formatter. It takes care of formatting
|
||||
Markdown and other files automatically.
|
||||
[See here](https://prettier.io/docs/en/editors.html) for the available
|
||||
extensions for your code editor. The
|
||||
[`.prettierrc`](https://github.com/explosion/spaCy/tree/master/website/.prettierrc)
|
||||
file in the root defines the settings used in this codebase.
|
||||
|
||||
## Markdown reference {#markdown}
|
||||
|
||||
## Making changes to the site
|
||||
All page content and page meta lives in the `.md` files in the `/docs`
|
||||
directory. The frontmatter block at the top of each file defines the page title
|
||||
and other settings like the sidebar menu.
|
||||
|
||||
The docs can always use another example or more detail, and they should always be up to date and not misleading. If you see something, say something – we always appreciate a [pull request](https://github.com/explosion/spaCy/pulls). To quickly find the correct file to edit, simply click on the "Suggest edits" button at the bottom of a page.
|
||||
````markdown
|
||||
---
|
||||
title: Page title
|
||||
---
|
||||
|
||||
### File structure
|
||||
## Headline starting a section {#some_id}
|
||||
|
||||
While all page content lives in the `.jade` files, article meta (page titles, sidebars etc.) is stored as JSON. Each folder contains a `_data.json` with all required meta for its files.
|
||||
This is a regular paragraph with a [link](https://spacy.io) and **bold text**.
|
||||
|
||||
### Markup language and conventions
|
||||
> #### This is an aside title
|
||||
>
|
||||
> This is aside text.
|
||||
|
||||
Jade/Pug is a whitespace-sensitive markup language that compiles to HTML. Indentation is used to nest elements, and for template logic, like `if`/`else` or `for`, mainly used to iterate over objects and arrays in the meta data. It also allows inline JavaScript expressions.
|
||||
### Subheadline
|
||||
|
||||
For an overview of Harp and Jade, see [this blog post](https://ines.io/blog/the-ultimate-guide-static-websites-harp-jade). For more info on the Jade/Pug syntax, check out their [documentation](https://pugjs.org).
|
||||
| Header 1 | Header 2 |
|
||||
| -------- | -------- |
|
||||
| Column 1 | Column 2 |
|
||||
|
||||
In the [spacy.io](https://spacy.io) source, we use 4 spaces to indent and hard-wrap at 80 characters.
|
||||
|
||||
```pug
|
||||
p This is a very short paragraph. It stays inline.
|
||||
|
||||
p
|
||||
| This is a much longer paragraph. It's hard-wrapped at 80 characters to
|
||||
| make it easier to read on GitHub and in editors that do not have soft
|
||||
| wrapping enabled. To prevent Jade from interpreting each line as a new
|
||||
| element, it's prefixed with a pipe and two spaces. This ensures that no
|
||||
| spaces are dropped – for example, if your editor strips out trailing
|
||||
| whitespace by default. Inline links are added using the inline syntax,
|
||||
| like this: #[+a("https://google.com") Google].
|
||||
```python
|
||||
### Code block title {highlight="2-3"}
|
||||
import spacy
|
||||
nlp = spacy.load("en_core_web_sm")
|
||||
doc = nlp("Hello world")
|
||||
```
|
||||
|
||||
Note that for external links, `+a("...")` is used instead of `a(href="...")` – it's a mixin that takes care of adding all required attributes. If possible, always use a mixin instead of regular HTML elements. The only plain HTML elements we use are:
|
||||
<Infobox title="Important note" variant="warning">
|
||||
|
||||
| Element | Description |
|
||||
| --- | --- |
|
||||
| `p` | paragraphs |
|
||||
| `code` | inline `code` |
|
||||
| `em` | *italicized* text |
|
||||
| `strong` | **bold** text |
|
||||
This is content in the infobox.
|
||||
|
||||
### Mixins
|
||||
</Infobox>
|
||||
````
|
||||
|
||||
Each file includes a collection of [custom mixins](_includes/_mixins.jade) that make it easier to add content components – no HTML or class names required.
|
||||
In addition to the native markdown elements, you can use the components
|
||||
[`<Infobox />`][infobox], [`<Accordion />`][accordion], [`<Abbr />`][abbr] and
|
||||
[`<Tag />`][tag] via their JSX syntax.
|
||||
|
||||
For example:
|
||||
```pug
|
||||
//- Bulleted list
|
||||
[infobox]: https://spacy.io/styleguide#infobox
|
||||
[accordion]: https://spacy.io/styleguide#accordion
|
||||
[abbr]: https://spacy.io/styleguide#abbr
|
||||
[tag]: https://spacy.io/styleguide#tag
|
||||
|
||||
+list
|
||||
+item This is a list item.
|
||||
+item This is another list item.
|
||||
## Project structure {#structure}
|
||||
|
||||
//- Table with header
|
||||
|
||||
+table([ "Header one", "Header two" ])
|
||||
+row
|
||||
+cell Table cell
|
||||
+cell Another one
|
||||
|
||||
+row
|
||||
+cell And one more.
|
||||
+cell And the last one.
|
||||
|
||||
//- Headlines with optional permalinks
|
||||
|
||||
+h(2, "link-id") Headline 2 with link to #link-id
|
||||
```yaml
|
||||
### Directory structure
|
||||
├── docs # the actual markdown content
|
||||
├── meta # JSON-formatted site metadata
|
||||
| ├── languages.json # supported languages and statistical models
|
||||
| ├── logos.json # logos and links for landing page
|
||||
| ├── sidebars.json # sidebar navigations for different sections
|
||||
| ├── site.json # general site metadata
|
||||
| └── universe.json # data for the spaCy universe section
|
||||
├── public # compiled site
|
||||
├── src # source
|
||||
| ├── components # React components
|
||||
| ├── fonts # webfonts
|
||||
| ├── images # images used in the layout
|
||||
| ├── plugins # custom plugins to transform Markdown
|
||||
| ├── styles # CSS modules and global styles
|
||||
| ├── templates # page layouts
|
||||
| | ├── docs.js # layout template for documentation pages
|
||||
| | ├── index.js # global layout template
|
||||
| | ├── models.js # layout template for model pages
|
||||
| | └── universe.js # layout templates for universe
|
||||
| └── widgets # non-reusable components with content, e.g. changelog
|
||||
├── gatsby-browser.js # browser-specific hooks for Gatsby
|
||||
├── gatsby-config.js # Gatsby configuration
|
||||
├── gatsby-node.js # Node-specific hooks for Gatsby
|
||||
└── package.json # package settings and dependencies
|
||||
```
|
||||
|
||||
Code blocks are implemented using `+code` or `+aside-code` (to display them in the right sidebar). A `.` is added after the mixin call to preserve whitespace:
|
||||
|
||||
```pug
|
||||
+code("This is a label").
|
||||
import spacy
|
||||
en_nlp = spacy.load('en')
|
||||
en_doc = en_nlp(u'Hello, world. Here are two sentences.')
|
||||
```
|
||||
|
||||
You can find the documentation for the available mixins in [`_includes/_mixins.jade`](_includes/_mixins.jade).
|
||||
|
||||
### Helpers for linking to content
|
||||
|
||||
Aside from the `+a()` mixin, there are three other helpers to make linking to content more convenient.
|
||||
|
||||
#### Linking to GitHub
|
||||
|
||||
Since GitHub links can be long and tricky, you can use the `gh()` function to generate them automatically for spaCy and all repositories owned by [explosion](https://github.com/explosion):
|
||||
|
||||
```javascript
|
||||
// Syntax: gh(repo, [file], [branch])
|
||||
|
||||
gh("spaCy", "spacy/matcher.pyx")
|
||||
// https://github.com/explosion/spaCy/blob/master/spacy/matcher.pyx
|
||||
|
||||
```
|
||||
|
||||
#### Linking to source
|
||||
|
||||
`+src()` generates a link with a little source icon to indicate it's linking to a code source. Ideally, it's used in combination with `gh()`:
|
||||
|
||||
```pug
|
||||
+src(gh("spaCy", "spacy/matcher.pyx")) matcher.pxy
|
||||
```
|
||||
|
||||
#### Linking to API reference
|
||||
|
||||
`+api()` generates a link to a page in the API docs, with an added icon. It should only be used across the workflows in the usage section, and only on the first mention of the respective class.
|
||||
|
||||
It takes the slug of an API page as the argument. You can also use anchors to link to specific sections – they're usually the method or property names.
|
||||
|
||||
```pug
|
||||
+api("tokenizer") #[code Tokenizer]
|
||||
+api("doc#similarity") #[code Doc.similarity()]
|
||||
```
|
||||
|
||||
### Most common causes of compile errors
|
||||
|
||||
| Problem | Fix |
|
||||
| --- | --- |
|
||||
| JSON formatting errors | make sure last elements of objects don't end with commas and/or use a JSON linter |
|
||||
| unescaped characters like `<` or `>` and sometimes `'` in inline elements | replace with encoded version: `<`, `>` etc. |
|
||||
| "Cannot read property 'call' of undefined" / "foo is not a function" | make sure mixin names are spelled correctly and mixins file is included with the correct path |
|
||||
| "no closing bracket found" | make sure inline elements end with a `]`, like `#[code spacy.load('en')]` and for nested inline elements, make sure they're all on the same line and contain spaces between them (**bad:** `#[+api("doc")#[code Doc]]`) |
|
||||
|
||||
If Harp fails and throws a Jade error, don't take the reported line number at face value – it's often wrong, as the page is compiled from templates and several files.
|
||||
|
|
|
@ -1,59 +0,0 @@
|
|||
{
|
||||
"index": {
|
||||
"landing": true,
|
||||
"logos": [
|
||||
{
|
||||
"airbnb": [ "https://www.airbnb.com", 150, 45],
|
||||
"quora": [ "https://www.quora.com", 120, 34 ],
|
||||
"retriever": [ "https://www.retriever.no", 150, 33 ],
|
||||
"stitchfix": [ "https://www.stitchfix.com", 150, 18 ]
|
||||
},
|
||||
{
|
||||
"chartbeat": [ "https://chartbeat.com", 180, 25 ],
|
||||
"allenai": [ "https://allenai.org", 220, 37 ]
|
||||
}
|
||||
],
|
||||
"features": [
|
||||
{
|
||||
"recode": ["https://www.recode.net/2017/6/22/15855492/ai-artificial-intelligence-nonprofit-good-human-chatbots-machine-learning", 100, 25],
|
||||
"wapo": ["https://www.washingtonpost.com/news/wonk/wp/2016/05/18/googles-new-artificial-intelligence-cant-understand-these-sentences-can-you/", 100, 77],
|
||||
"bbc": ["http://www.bbc.co.uk/rd/blog/2017-08-irfs-weeknotes-number-250", 90, 26],
|
||||
"microsoft": ["https://www.microsoft.com/developerblog/2016/09/13/training-a-classifier-for-relation-extraction-from-medical-literature/", 130, 28]
|
||||
},
|
||||
{
|
||||
"venturebeat": ["https://venturebeat.com/2017/01/27/4-ai-startups-that-analyze-customer-reviews/", 150, 19],
|
||||
"thoughtworks": ["https://www.thoughtworks.com/radar/tools", 150, 28]
|
||||
}
|
||||
]
|
||||
},
|
||||
|
||||
"robots.txt": {
|
||||
"layout": false
|
||||
},
|
||||
|
||||
"404": {
|
||||
"title": "404 Error",
|
||||
"landing": true
|
||||
},
|
||||
|
||||
"styleguide": {
|
||||
"title": "Styleguide",
|
||||
"sidebar": {
|
||||
"Styleguide": { "": "styleguide" },
|
||||
"Resources": {
|
||||
"Website Source": "https://github.com/explosion/spacy/tree/master/website",
|
||||
"Contributing Guide": "https://github.com/explosion/spaCy/blob/master/CONTRIBUTING.md"
|
||||
}
|
||||
},
|
||||
"menu": {
|
||||
"Introduction": "intro",
|
||||
"Logo": "logo",
|
||||
"Colors": "colors",
|
||||
"Typography": "typography",
|
||||
"Elements": "elements",
|
||||
"Components": "components",
|
||||
"Embeds": "embeds",
|
||||
"Markup Reference": "markup"
|
||||
}
|
||||
}
|
||||
}
|
|
@ -1,97 +0,0 @@
|
|||
{
|
||||
"globals": {
|
||||
"title": "spaCy",
|
||||
"description": "spaCy is a free open-source library for Natural Language Processing in Python. It features NER, POS tagging, dependency parsing, word vectors and more.",
|
||||
|
||||
"SITENAME": "spaCy",
|
||||
"SLOGAN": "Industrial-strength Natural Language Processing in Python",
|
||||
"SITE_URL": "https://spacy.io",
|
||||
"EMAIL": "contact@explosion.ai",
|
||||
|
||||
"COMPANY": "Explosion AI",
|
||||
"COMPANY_URL": "https://explosion.ai",
|
||||
"DEMOS_URL": "https://explosion.ai/demos",
|
||||
"MODELS_REPO": "explosion/spacy-models",
|
||||
|
||||
"SPACY_VERSION": "2.1",
|
||||
"BINDER_VERSION": "2.0.16",
|
||||
|
||||
"SOCIAL": {
|
||||
"twitter": "spacy_io",
|
||||
"github": "explosion",
|
||||
"reddit": "spacynlp",
|
||||
"codepen": "explosion",
|
||||
"gitter": "explosion/spaCy"
|
||||
},
|
||||
|
||||
"NAVIGATION": {
|
||||
"Usage": "/usage",
|
||||
"Models": "/models",
|
||||
"API": "/api",
|
||||
"Universe": "/universe"
|
||||
},
|
||||
|
||||
"FOOTER": {
|
||||
"spaCy": {
|
||||
"Usage": "/usage",
|
||||
"Models": "/models",
|
||||
"API Reference": "/api",
|
||||
"Universe": "/universe"
|
||||
},
|
||||
"Support": {
|
||||
"Issue Tracker": "https://github.com/explosion/spaCy/issues",
|
||||
"Stack Overflow": "http://stackoverflow.com/questions/tagged/spacy",
|
||||
"Reddit Usergroup": "https://www.reddit.com/r/spacynlp/",
|
||||
"Gitter Chat": "https://gitter.im/explosion/spaCy"
|
||||
},
|
||||
"Connect": {
|
||||
"Twitter": "https://twitter.com/spacy_io",
|
||||
"GitHub": "https://github.com/explosion/spaCy",
|
||||
"Blog": "https://explosion.ai/blog",
|
||||
"Contact": "mailto:contact@explosion.ai"
|
||||
}
|
||||
},
|
||||
|
||||
"QUICKSTART": [
|
||||
{ "id": "os", "title": "Operating system", "options": [
|
||||
{ "id": "mac", "title": "macOS / OSX", "checked": true },
|
||||
{ "id": "windows", "title": "Windows" },
|
||||
{ "id": "linux", "title": "Linux" }]
|
||||
},
|
||||
{ "id": "package", "title": "Package manager", "options": [
|
||||
{ "id": "pip", "title": "pip", "checked": true },
|
||||
{ "id": "conda", "title": "conda" },
|
||||
{ "id": "source", "title": "from source" }]
|
||||
},
|
||||
{ "id": "python", "title": "Python version", "options": [
|
||||
{ "id": 2, "title": "2.x" },
|
||||
{ "id": 3, "title": "3.x", "checked": true }]
|
||||
},
|
||||
{ "id": "config", "title": "Configuration", "multiple": true, "options": [
|
||||
{"id": "venv", "title": "virtualenv", "help": "Use a virtual environment and install spaCy into a user directory" }]
|
||||
},
|
||||
{ "id": "model", "title": "Models", "multiple": true }
|
||||
],
|
||||
|
||||
"QUICKSTART_MODELS": [
|
||||
{ "id": "lang", "title": "Language"},
|
||||
{ "id": "load", "title": "Loading style", "options": [
|
||||
{ "id": "spacy", "title": "Use spacy.load()", "checked": true, "help": "Use spaCy's built-in loader to load the model by name." },
|
||||
{ "id": "module", "title": "Import as module", "help": "Import the model explicitly as a Python module." }]
|
||||
},
|
||||
{ "id": "config", "title": "Options", "multiple": true, "options": [
|
||||
{ "id": "example", "title": "Show usage example" }]
|
||||
}
|
||||
],
|
||||
|
||||
"V_CSS": "2.2.1",
|
||||
"V_JS": "2.2.4",
|
||||
"DEFAULT_SYNTAX": "python",
|
||||
"ANALYTICS": "UA-58931649-1",
|
||||
"MAILCHIMP": {
|
||||
"user": "spacy.us12",
|
||||
"id": "83b0498b1e7fa3c91ce68c3f1",
|
||||
"list": "89ad33e698"
|
||||
}
|
||||
}
|
||||
}
|
|
@ -1,28 +0,0 @@
|
|||
//- 💫 INCLUDES > FOOTER
|
||||
|
||||
footer.o-footer.u-text
|
||||
+grid.o-content
|
||||
each group, label in FOOTER
|
||||
+grid-col("quarter")
|
||||
ul
|
||||
li.u-text-label.u-color-subtle=label
|
||||
|
||||
each url, item in group
|
||||
li
|
||||
+a(url)=item
|
||||
|
||||
if SECTION == "index"
|
||||
+grid-col("quarter")
|
||||
include _newsletter
|
||||
|
||||
if SECTION != "index"
|
||||
.o-content.o-block.u-border-dotted
|
||||
include _newsletter
|
||||
|
||||
.o-inline-list.u-text-center.u-text-tiny.u-color-subtle
|
||||
span © 2016-#{new Date().getFullYear()} #[+a(COMPANY_URL, true)=COMPANY]
|
||||
|
||||
+a(COMPANY_URL, true)(aria-label="Explosion AI")
|
||||
+icon("explosion", 45).o-icon.u-color-theme.u-grayscale
|
||||
|
||||
+a(COMPANY_URL + "/legal", true) Legal / Imprint
|
|
@ -1,95 +0,0 @@
|
|||
//- 💫 INCLUDES > FUNCTIONS
|
||||
|
||||
//- Descriptive variables, available in the global scope
|
||||
|
||||
- CURRENT = current.source
|
||||
- SECTION = current.path[0]
|
||||
- LANGUAGES = public.models._data.LANGUAGES
|
||||
- MODELS = public.models._data.MODELS
|
||||
- CURRENT_MODELS = MODELS[current.source] || []
|
||||
|
||||
- MODEL_COUNT = Object.keys(MODELS).map(m => Object.keys(MODELS[m]).length).reduce((a, b) => a + b)
|
||||
- MODEL_LANG_COUNT = Object.keys(MODELS).length
|
||||
- LANG_COUNT = Object.keys(LANGUAGES).length - 1
|
||||
|
||||
- MODEL_META = public.models._data.MODEL_META
|
||||
- MODEL_LICENSES = public.models._data.MODEL_LICENSES
|
||||
- MODEL_BENCHMARKS = public.models._data.MODEL_BENCHMARKS
|
||||
- EXAMPLE_SENT_LANGS = public.models._data.EXAMPLE_SENT_LANGS
|
||||
- EXAMPLE_SENTENCES = public.models._data.EXAMPLE_SENTENCES
|
||||
|
||||
- IS_PAGE = (SECTION != "index") && !landing
|
||||
- IS_MODELS = (SECTION == "models" && LANGUAGES[current.source])
|
||||
- HAS_MODELS = IS_MODELS && CURRENT_MODELS.length
|
||||
|
||||
//- Get page URL
|
||||
|
||||
- function getPageUrl() {
|
||||
- var path = current.path;
|
||||
- if(path[path.length - 1] == 'index') path = path.slice(0, path.length - 1);
|
||||
- return `${SITE_URL}/${path.join('/')}`;
|
||||
- }
|
||||
|
||||
//- Get pretty page title depending on section
|
||||
|
||||
- function getPageTitle() {
|
||||
- var sections = ['api', 'usage', 'models'];
|
||||
- if (sections.includes(SECTION)) {
|
||||
- var titleSection = (SECTION == "api") ? 'API' : SECTION.charAt(0).toUpperCase() + SECTION.slice(1);
|
||||
- return `${title} · ${SITENAME} ${titleSection} Documentation`;
|
||||
- }
|
||||
- else if (SECTION != 'index') return `${title} · ${SITENAME}`;
|
||||
- return `${SITENAME} · ${SLOGAN}`;
|
||||
- }
|
||||
|
||||
//- Get social image based on section and settings
|
||||
|
||||
- function getPageImage() {
|
||||
- var img = (SECTION == 'api') ? 'api' : 'default';
|
||||
- return `${SITE_URL}/assets/img/social/preview_${preview || img}.jpg`;
|
||||
- }
|
||||
|
||||
//- Add prefixes to items of an array (for modifier CSS classes)
|
||||
array - [array] list of class names or options, e.g. ["foot"]
|
||||
prefix - [string] prefix to add to each class, e.g. "c-table__row"
|
||||
RETURNS - [array] list of modified class names
|
||||
|
||||
- function prefixArgs(array, prefix) {
|
||||
- return array.map(arg => prefix + '--' + arg).join(' ');
|
||||
- }
|
||||
|
||||
|
||||
//- Convert API paths (semi-temporary fix for renamed sections)
|
||||
path - [string] link path supplied to +api mixin
|
||||
RETURNS - [string] new link path to correct location
|
||||
|
||||
- function convertAPIPath(path) {
|
||||
- if (path.startsWith('spacy#') || path.startsWith('displacy#') || path.startsWith('util#')) {
|
||||
- var comps = path.split('#');
|
||||
- return "top-level#" + comps[0] + '.' + comps[1];
|
||||
- }
|
||||
- return path;
|
||||
- }
|
||||
|
||||
|
||||
//- Get model components from ID. Components can then be looked up in LANGUAGES
|
||||
and MODEL_META respectively, to get their human-readable form.
|
||||
id - [string] model ID, e.g. "en_core_web_sm"
|
||||
RETURNS - [object] object keyed by components lang, type, genre and size
|
||||
|
||||
- function getModelComponents(id) {
|
||||
- var comps = id.split('_');
|
||||
- return {'lang': comps[0], 'type': comps[1], 'genre': comps[2], 'size': comps[3]}
|
||||
- }
|
||||
|
||||
|
||||
//- Generate GitHub links
|
||||
repo - [string] name of repo owned by explosion
|
||||
filepath - [string] logical path to file relative to repository root
|
||||
branch - [string] optional branch, defaults to "master"
|
||||
RETURNS - [string] the correct link to the file on GitHub
|
||||
|
||||
- function gh(repo, filepath, branch) {
|
||||
- var branch = ALPHA ? 'develop' : branch
|
||||
- return 'https://github.com/' + SOCIAL.github + '/' + (repo || '') + (filepath ? '/blob/' + (branch || 'master') + '/' + filepath : '' );
|
||||
- }
|
|
@ -1,749 +0,0 @@
|
|||
//- 💫 INCLUDES > MIXINS
|
||||
|
||||
include _functions
|
||||
|
||||
|
||||
//- Section
|
||||
id - [string] anchor assigned to section (used for breadcrumb navigation)
|
||||
|
||||
mixin section(id)
|
||||
section.o-section(id=id ? "section-" + id : null data-section=id)&attributes(attributes)
|
||||
block
|
||||
|
||||
|
||||
//- Accordion (collapsible sections)
|
||||
title - [string] Section title.
|
||||
id - [string] Optional section ID for permalinks.
|
||||
level - [integer] Headline level for section title.
|
||||
|
||||
mixin accordion(title, id, level)
|
||||
section.o-accordion.o-block
|
||||
+h(level || 4).o-no-block(id=id)
|
||||
button.o-accordion__button.o-grid.o-grid--vcenter.o-grid--space.js-accordion(aria-expanded="false")=title
|
||||
svg.o-accordion__icon(width="20" height="20" viewBox="0 0 10 10" aria-hidden="true" focusable="false")
|
||||
rect.o-accordion__hide(height="8" width="2" y="1" x="4")
|
||||
rect(height="2" width="8" y="4" x="1")
|
||||
|
||||
.o-accordion__content(hidden="")
|
||||
block
|
||||
|
||||
|
||||
//- Headlines Helper Mixin
|
||||
level - [integer] 1, 2, 3, 4, or 5
|
||||
|
||||
mixin headline(level)
|
||||
if level == 1
|
||||
h1.u-heading-1&attributes(attributes)
|
||||
block
|
||||
|
||||
else if level == 2
|
||||
h2.u-heading-2&attributes(attributes)
|
||||
block
|
||||
|
||||
else if level == 3
|
||||
h3.u-heading-3&attributes(attributes)
|
||||
block
|
||||
|
||||
else if level == 4
|
||||
h4.u-heading-4&attributes(attributes)
|
||||
block
|
||||
|
||||
else if level == 5
|
||||
h5.u-heading-5&attributes(attributes)
|
||||
block
|
||||
|
||||
|
||||
//- Headlines
|
||||
level - [integer] headline level, corresponds to h1, h2, h3 etc.
|
||||
id - [string] unique identifier, creates permalink (optional)
|
||||
|
||||
mixin h(level, id, source)
|
||||
+headline(level).u-heading(id=id)&attributes(attributes)
|
||||
+permalink(id)
|
||||
block
|
||||
|
||||
if source
|
||||
+button(gh("spacy", source), false, "secondary", "small").u-nowrap.u-float-right
|
||||
span Source #[+icon("code", 14).o-icon--inline]
|
||||
|
||||
|
||||
//- Permalink rendering
|
||||
id - [string] permalink ID used for link anchor
|
||||
|
||||
mixin permalink(id)
|
||||
if id
|
||||
a.u-permalink(href="##{id}")
|
||||
block
|
||||
|
||||
else
|
||||
block
|
||||
|
||||
|
||||
//- External links
|
||||
url - [string] link href
|
||||
trusted - [boolean] if not set / false, rel="noopener nofollow" is added
|
||||
info: https://mathiasbynens.github.io/rel-noopener/
|
||||
|
||||
mixin a(url, trusted)
|
||||
- external = url.includes("http")
|
||||
a(href=url target=external ? "_blank" : null rel=external && !trusted ? "noopener nofollow" : null)&attributes(attributes)
|
||||
block
|
||||
|
||||
|
||||
//- Source link (with added icon for "code")
|
||||
url - [string] link href, can also be gh() function to generate GitHub link
|
||||
see _functions.jade for more info
|
||||
|
||||
mixin src(url)
|
||||
span.u-inline-block.u-nowrap
|
||||
+a(url)
|
||||
block
|
||||
|
||||
| #[+icon("code", 16).o-icon--inline.u-color-theme]
|
||||
|
||||
|
||||
//- API link (with added tag and automatically generated path)
|
||||
path - [string] path to API docs page relative to /api/
|
||||
|
||||
mixin api(path)
|
||||
- path = convertAPIPath(path)
|
||||
+a("/api/" + path, true)(target="_self").u-no-border.u-inline-block.u-nowrap
|
||||
block
|
||||
|
||||
| #[+icon("book", 16).o-icon--inline.u-color-theme]
|
||||
|
||||
|
||||
//- Help icon with tooltip
|
||||
tooltip - [string] Tooltip text
|
||||
icon_size - [integer] Optional size of help icon in px.
|
||||
|
||||
mixin help(tooltip, icon_size)
|
||||
span(data-tooltip=tooltip)&attributes(attributes)
|
||||
if tooltip
|
||||
span.u-hidden(aria-role="tooltip")=tooltip
|
||||
+icon("help_o", icon_size || 16).o-icon--inline
|
||||
|
||||
|
||||
//- Abbreviation
|
||||
|
||||
mixin abbr(title)
|
||||
abbr.o-abbr(data-tooltip=title data-tooltip-style="code" aria-label=title)&attributes(attributes)
|
||||
block
|
||||
|
||||
//- Aside wrapper
|
||||
label - [string] aside label
|
||||
|
||||
mixin aside-wrapper(label, emoji)
|
||||
aside.c-aside
|
||||
.c-aside__content(role="complementary")&attributes(attributes)
|
||||
if label
|
||||
h4.u-text-label.u-text-label--dark
|
||||
if emoji
|
||||
span.o-emoji=emoji
|
||||
| #{label}
|
||||
block
|
||||
|
||||
|
||||
//- Aside for text
|
||||
label - [string] aside title (optional)
|
||||
|
||||
mixin aside(label, emoji)
|
||||
+aside-wrapper(label, emoji)
|
||||
.c-aside__text.u-text-small&attributes(attributes)
|
||||
block
|
||||
|
||||
|
||||
//- Aside for code
|
||||
label - [string] aside title (optional or false for no label)
|
||||
language - [string] language for syntax highlighting (default: "python")
|
||||
supports basic relevant languages available for PrismJS
|
||||
prompt - [string] prompt displayed before first line, e.g. "$"
|
||||
|
||||
mixin aside-code(label, language, prompt)
|
||||
+aside-wrapper(label)&attributes(attributes)
|
||||
+code(false, language, prompt).o-no-block
|
||||
block
|
||||
|
||||
|
||||
//- Infobox
|
||||
label - [string] infobox title (optional or false for no title)
|
||||
emoji - [string] optional emoji displayed before the title, necessary as
|
||||
argument to be able to wrap it for spacing
|
||||
|
||||
mixin infobox(label, emoji)
|
||||
aside.o-box.o-block.u-text-small&attributes(attributes)
|
||||
if label
|
||||
h3.u-heading.u-text-label.u-color-theme
|
||||
if emoji
|
||||
span.o-emoji=emoji
|
||||
| #{label}
|
||||
|
||||
block
|
||||
|
||||
|
||||
//- Logos displayed in the top corner of some infoboxes
|
||||
logos - [array] List of icon ID, width, height and link.
|
||||
|
||||
mixin infobox-logos(...logos)
|
||||
.o-box__logos.u-text-right.u-float-right
|
||||
for logo in logos
|
||||
if logo[3]
|
||||
| #[+a(logo[3]).u-inline-block.u-hide-link.u-padding-small #[+icon(logo[0], logo[1], logo[2]).u-color-dark]]
|
||||
else
|
||||
| #[+icon(logo[0], logo[1], logo[2]).u-color-dark]
|
||||
|
||||
|
||||
//- SVG from map (uses embedded SVG sprite)
|
||||
name - [string] SVG symbol id
|
||||
width - [integer] width in px
|
||||
height - [integer] height in px (default: same as width)
|
||||
|
||||
mixin svg(name, width, height)
|
||||
svg(aria-hidden="true" viewBox="0 0 #{width} #{height || width}" width=width height=(height || width))&attributes(attributes)
|
||||
use(xlink:href="#svg_#{name}")
|
||||
|
||||
|
||||
//- Icon
|
||||
name - [string] icon name (will be used as symbol id: #svg_{name})
|
||||
width - [integer] icon width (default: 20)
|
||||
height - [integer] icon height (defaults to width)
|
||||
|
||||
mixin icon(name, width, height)
|
||||
- var width = width || 20
|
||||
- var height = height || width
|
||||
+svg(name, width, height).o-icon(style="min-width: #{width}px")&attributes(attributes)
|
||||
|
||||
|
||||
//- Pro/Con/Neutral icon
|
||||
icon - [string] "pro", "con" or "neutral" (default: "neutral")
|
||||
size - [integer] icon size (optional)
|
||||
|
||||
mixin procon(icon, label, show_label, size)
|
||||
- var colors = { yes: "green", no: "red", neutral: "subtle" }
|
||||
span.u-nowrap
|
||||
+icon(icon, size || 20)(class="u-color-#{colors[icon] || 'subtle'}").o-icon--inline&attributes(attributes)
|
||||
span.u-text-small(class=show_label ? null : "u-hidden")=(label || icon)
|
||||
|
||||
|
||||
//- Link button
|
||||
url - [string] link href
|
||||
trusted - [boolean] if not set / false, rel="noopener nofollow" is added
|
||||
info: https://mathiasbynens.github.io/rel-noopener/
|
||||
...style - all other arguments are added as class names c-button--argument
|
||||
see assets/css/_components/_buttons.sass
|
||||
|
||||
mixin button(url, trusted, ...style)
|
||||
- external = url && url.includes("http")
|
||||
a.c-button.u-text-label(href=url class=prefixArgs(style, "c-button") role="button" target=external ? "_blank" : null rel=external && !trusted ? "noopener nofollow" : null)&attributes(attributes)
|
||||
block
|
||||
|
||||
|
||||
//- Code block
|
||||
label - [string] aside title (optional or false for no label)
|
||||
language - [string] language for syntax highlighting (default: "python")
|
||||
supports basic relevant languages available for PrismJS
|
||||
prompt - [string] prompt displayed before first line, e.g. "$"
|
||||
height - [integer] optional height to clip code block to
|
||||
icon - [string] icon displayed next to code block (e.g. "accept" for new code)
|
||||
wrap - [boolean] wrap text and disable horizontal scrolling
|
||||
|
||||
mixin code(label, language, prompt, height, icon, wrap)
|
||||
- var lang = (language != "none") ? (language || DEFAULT_SYNTAX) : null
|
||||
- var lang_class = (language != "none") ? "lang-" + (language || DEFAULT_SYNTAX) : null
|
||||
pre.c-code-block.o-block(data-language=lang class=lang_class class=icon ? "c-code-block--has-icon" : null style=height ? "height: #{height}px" : null)&attributes(attributes)
|
||||
if label
|
||||
h4.u-text-label.u-text-label--dark=label
|
||||
if icon
|
||||
- var classes = {'accept': 'u-color-green', 'reject': 'u-color-red'}
|
||||
.c-code-block__icon(class=classes[icon] || null class=classes[icon] ? "c-code-block__icon--border" : null)
|
||||
+icon(icon, 18)
|
||||
|
||||
code.c-code-block__content(class=wrap ? "u-wrap" : null data-prompt=prompt)
|
||||
block
|
||||
|
||||
//- Executable code
|
||||
|
||||
mixin code-exec(label, large)
|
||||
- label = (label || "Editable code example") + " (experimental)"
|
||||
+terminal-wrapper(label, !large)
|
||||
figure.juniper-wrapper
|
||||
span.juniper-wrapper__text.u-text-tiny v#{BINDER_VERSION} · Python 3 · via #[+a("https://mybinder.org/").u-hide-link Binder]
|
||||
+code(data-executable="true")&attributes(attributes)
|
||||
block
|
||||
|
||||
//- Wrapper for code blocks to display old/new versions
|
||||
|
||||
mixin code-wrapper()
|
||||
span.u-inline-block.u-padding-top.u-width-full
|
||||
block
|
||||
|
||||
//- Code blocks to display old/new versions
|
||||
label - [string] ARIA label for block. Defaults to "correct"/"incorrect".
|
||||
|
||||
mixin code-old(label, lang, prompt)
|
||||
- var label = label || 'incorrect'
|
||||
+code(false, lang, prompt, false, "reject").o-block-small(aria-label=label)
|
||||
block
|
||||
|
||||
mixin code-new(label, lang, prompt)
|
||||
- var label = label || 'correct'
|
||||
+code(false, lang, prompt, false, "accept").o-block-small(aria-label=label)
|
||||
block
|
||||
|
||||
|
||||
//- CodePen embed
|
||||
slug - [string] ID of CodePen demo (taken from URL)
|
||||
height - [integer] height of demo embed iframe
|
||||
default_tab - [string] code tab(s) visible on load (default: "result")
|
||||
|
||||
mixin codepen(slug, height, default_tab)
|
||||
figure.o-block(style="min-height: #{height}px")&attributes(attributes)
|
||||
.codepen(data-height=height data-theme-id="31335" data-slug-hash=slug data-default-tab=(default_tab || "result") data-embed-version="2" data-user=SOCIAL.codepen)
|
||||
+a("https://codepen.io/" + SOCIAL.codepen + "/" + slug) View on CodePen
|
||||
|
||||
script(async src="https://assets.codepen.io/assets/embed/ei.js")
|
||||
|
||||
|
||||
//- GitHub embed
|
||||
repo - [string] repository owned by explosion organization
|
||||
file - [string] logical path to file, relative to repository root
|
||||
alt_file - [string] alternative file path used in footer and link button
|
||||
height - [integer] height of code preview in px
|
||||
|
||||
mixin github(repo, file, height, alt_file, language)
|
||||
- var branch = ALPHA ? "develop" : "master"
|
||||
- var height = height || 250
|
||||
|
||||
figure.o-block
|
||||
pre.c-code-block.o-block-small(class="lang-#{(language || DEFAULT_SYNTAX)}" style="height: #{height}px; min-height: #{height}px")
|
||||
code.c-code-block__content(data-gh-embed="#{repo}/#{branch}/#{file}").
|
||||
Can't fetch code example from GitHub :(
|
||||
|
||||
Please use the link below to view the example. If you've come across
|
||||
a broken link, we always appreciate a pull request to the repository,
|
||||
or a report on the issue tracker. Thanks!
|
||||
|
||||
footer.o-grid.u-text
|
||||
.o-block-small.u-flex-full.u-padding-small #[+icon("github")] #[code.u-break.u-break--all=repo + '/' + (alt_file || file)]
|
||||
div
|
||||
+button(gh(repo, alt_file || file), false, "primary", "small") View on GitHub
|
||||
|
||||
|
||||
//- Youtube video embed
|
||||
id - [string] ID of YouTube video.
|
||||
ratio - [string] Video ratio, "16x9" or "4x3".
|
||||
|
||||
mixin youtube(id, ratio)
|
||||
figure.o-video.o-block(class="o-video--" + (ratio || "16x9"))
|
||||
iframe.o-video__iframe(src="https://www.youtube.com/embed/#{id}" frameborder="0" height="500" allowfullscreen)
|
||||
|
||||
|
||||
//- Images / figures
|
||||
url - [string] url or path to image
|
||||
width - [integer] image width in px, for better rendering (default: 500)
|
||||
caption - [string] image caption
|
||||
alt - [string] alternative image text, defaults to caption
|
||||
|
||||
mixin image(url, width, caption, alt)
|
||||
figure.o-block&attributes(attributes)
|
||||
if url
|
||||
img(src=url alt=(alt || caption) width="#{width || 500}")
|
||||
|
||||
if caption
|
||||
+image-caption=caption
|
||||
|
||||
block
|
||||
|
||||
|
||||
//- Image caption
|
||||
|
||||
mixin image-caption()
|
||||
figcaption.u-text-small.u-color-subtle.u-padding-small&attributes(attributes)
|
||||
block
|
||||
|
||||
|
||||
//- Graphic or illustration with button
|
||||
original - [string] Path to original image
|
||||
|
||||
mixin graphic(original)
|
||||
+image
|
||||
block
|
||||
if original
|
||||
.u-text-right
|
||||
+button(original, false, "secondary", "small") View large graphic
|
||||
|
||||
|
||||
//- Chart.js
|
||||
id - [string] chart ID, will be assigned as #chart_{id}
|
||||
|
||||
mixin chart(id, height)
|
||||
figure.o-block&attributes(attributes)
|
||||
canvas(id="chart_#{id}" width="800" height=(height || "400") style="max-width: 100%")
|
||||
|
||||
|
||||
//- Labels
|
||||
|
||||
mixin label()
|
||||
.u-text-label.u-color-dark&attributes(attributes)
|
||||
block
|
||||
|
||||
|
||||
mixin label-inline()
|
||||
strong.u-text-label.u-color-dark&attributes(attributes)
|
||||
block
|
||||
|
||||
|
||||
//- Tag
|
||||
tooltip - [string] optional tooltip text.
|
||||
hide_icon - [boolean] hide tooltip icon
|
||||
|
||||
mixin tag(tooltip, hide_icon)
|
||||
div.u-text-tag.u-text-tag--spaced(data-tooltip=tooltip)&attributes(attributes)
|
||||
block
|
||||
if tooltip
|
||||
if !hide_icon
|
||||
| #[+icon("help", 12).o-icon--tag]
|
||||
| #[span.u-hidden(aria-role="tooltip")=tooltip]
|
||||
|
||||
|
||||
//- "Requires model" tag with tooltip and list of capabilities
|
||||
...capabs - [string] Required model capabilities, e.g. "vectors".
|
||||
|
||||
mixin tag-model(...capabs)
|
||||
- var intro = "To use this functionality, spaCy needs a model to be installed"
|
||||
- var ext = capabs.length ? " that supports the following capabilities: " + capabs.join(', ') : ""
|
||||
+tag(intro + ext + ".") Needs model
|
||||
|
||||
|
||||
//- "New" tag to label features new in a specific version
|
||||
By using a separate mixin with a version ID, it becomes easy to quickly
|
||||
enable/disable tags without having to modify the markup in the docs.
|
||||
version - [string or integer] version number, without "v" prefix
|
||||
|
||||
mixin tag-new(version)
|
||||
- var version = (typeof version == 'number') ? version.toFixed(1) : version
|
||||
- var tooltip = "This feature is new and was introduced in spaCy v" + version
|
||||
+tag(tooltip, true) v#{version}
|
||||
|
||||
|
||||
//- List
|
||||
type - [string] "numbers", "letters", "roman" (bulleted list if none set)
|
||||
start - [integer] start number
|
||||
|
||||
mixin list(type, start)
|
||||
if type
|
||||
ol.c-list.o-block.u-text(class="c-list--#{type}" style=(start === 0 || start) ? "counter-reset: li #{(start - 1)}" : null)&attributes(attributes)
|
||||
block
|
||||
|
||||
else
|
||||
ul.c-list.c-list--bullets.o-block.u-text&attributes(attributes)
|
||||
block
|
||||
|
||||
|
||||
//- List item (only used within +list)
|
||||
|
||||
mixin item()
|
||||
li.c-list__item&attributes(attributes)
|
||||
block
|
||||
|
||||
|
||||
//- Table
|
||||
head - [array] table headings (should match number of columns)
|
||||
|
||||
mixin table(head)
|
||||
table.c-table.o-block&attributes(attributes)
|
||||
|
||||
if head
|
||||
+row("head")
|
||||
each column in head
|
||||
+head-cell=column
|
||||
|
||||
block
|
||||
|
||||
|
||||
//- Table row (only used within +table)
|
||||
|
||||
mixin row(...style)
|
||||
tr.c-table__row(class=prefixArgs(style, "c-table__row"))&attributes(attributes)
|
||||
block
|
||||
|
||||
|
||||
|
||||
//- Header table cell (only used within +row)
|
||||
|
||||
mixin head-cell()
|
||||
th.c-table__head-cell.u-text-label&attributes(attributes)
|
||||
block
|
||||
|
||||
|
||||
//- Table cell (only used within +row in +table)
|
||||
|
||||
mixin cell(...style)
|
||||
td.c-table__cell.u-text(class=prefixArgs(style, "c-table__cell"))&attributes(attributes)
|
||||
block
|
||||
|
||||
|
||||
//- Grid Container
|
||||
...style - all arguments are added as class names o-grid--argument
|
||||
see assets/css/_base/_grid.sass
|
||||
|
||||
mixin grid(...style)
|
||||
.o-grid.o-block(class=prefixArgs(style, "o-grid"))&attributes(attributes)
|
||||
block
|
||||
|
||||
|
||||
//- Grid Column (only used within +grid)
|
||||
width - [string] "quarter", "third", "half", "two-thirds", "three-quarters"
|
||||
see $grid in assets/css/_variables.sass
|
||||
|
||||
mixin grid-col(...style)
|
||||
.o-grid__col(class=prefixArgs(style, "o-grid__col"))&attributes(attributes)
|
||||
block
|
||||
|
||||
|
||||
//- Card (only used within +grid)
|
||||
title - [string] card title
|
||||
url - [string] link for card
|
||||
author - [string] optional author, displayed as byline at the bottom
|
||||
icon - [string] optional ID of icon displayed with card
|
||||
width - [string] optional width of grid column, defaults to "half"
|
||||
|
||||
mixin card(title, url, author, icon, width)
|
||||
+grid-col(width || "half").o-box.o-grid.o-grid--space.u-text&attributes(attributes)
|
||||
+a(url)
|
||||
h4.u-heading.u-text-label
|
||||
if icon
|
||||
+icon(icon, 25).u-float-right
|
||||
if title
|
||||
span.u-color-dark=title
|
||||
.o-block-small.u-text-small
|
||||
block
|
||||
if author
|
||||
.u-color-subtle.u-text-tiny by #{author}
|
||||
|
||||
|
||||
//- Table of contents, to be used with +item mixins for links
|
||||
col - [string] width of column (see +grid-col)
|
||||
|
||||
mixin table-of-contents(col)
|
||||
+grid-col(col || "half")
|
||||
+infobox
|
||||
+label.o-block-small Table of contents
|
||||
+list("numbers").u-text-small.o-no-block
|
||||
block
|
||||
|
||||
|
||||
//- Bibliography
|
||||
id - [string] ID of bibliography component, for anchor links. Can be used if
|
||||
there's more than one bibliography on one page.
|
||||
|
||||
mixin bibliography(id)
|
||||
section(id=id || "bibliography")
|
||||
+infobox
|
||||
+label.o-block-small Bibliography
|
||||
+list("numbers").u-text-small.o-no-block
|
||||
block
|
||||
|
||||
|
||||
//- Footnote
|
||||
id - [string / integer] ID of footnote.
|
||||
bib_id - [string] ID of bibliography component, defaults to "bibliography".
|
||||
tooltip - [string] optional text displayed as tooltip
|
||||
|
||||
mixin fn(id, bib_id, tooltip)
|
||||
sup.u-padding-small(id="bib" + id data-tooltip=tooltip)
|
||||
span.u-text-tag
|
||||
+a("#" + (bib_id || "bibliography")).u-hide-link #{id}
|
||||
|
||||
|
||||
//- Table rows for annotation specs
|
||||
|
||||
mixin pos-row(tag, pos, morph, desc)
|
||||
+row
|
||||
+cell #[code(class=(tag.length > 10) ? "u-break u-break--all" : null)=tag]
|
||||
+cell #[code=pos]
|
||||
+cell
|
||||
- var morphs = morph.includes("|") ? morph.split("|") : morph.split(" ")
|
||||
for m in morphs
|
||||
if m
|
||||
| #[code=m]
|
||||
+cell.u-text-small=desc
|
||||
|
||||
mixin ud-row(tag, desc, example)
|
||||
+row
|
||||
+cell #[code=tag]
|
||||
+cell.u-text-small=desc
|
||||
if example
|
||||
+cell.u-text-small
|
||||
em=example
|
||||
|
||||
mixin dep-row(label, desc)
|
||||
+row
|
||||
+cell #[code=label]
|
||||
+cell=desc
|
||||
|
||||
|
||||
//- Table rows for linguistic annotations
|
||||
annots [array] - array of cell content
|
||||
style [array] array of 1 (display as code) or 0 (display as text)
|
||||
|
||||
mixin annotation-row(annots, style)
|
||||
+row
|
||||
for cell, i in annots
|
||||
if style && style[i]
|
||||
- cell = (typeof(cell) != 'boolean') ? cell : cell ? 'True' : 'False'
|
||||
+cell #[code=cell]
|
||||
else
|
||||
+cell=cell
|
||||
block
|
||||
|
||||
|
||||
//- spaCy logo
|
||||
|
||||
mixin logo()
|
||||
+svg("spacy", 675, 215).o-logo&attributes(attributes)
|
||||
|
||||
|
||||
//- Gitter chat button and widget
|
||||
button - [string] text shown on button
|
||||
label - [string] title of chat window (default: same as button)
|
||||
|
||||
mixin gitter(button, label)
|
||||
aside.js-gitter.c-chat.is-collapsed(data-title=(label || button))
|
||||
|
||||
button.js-gitter-button.c-chat__button.u-text-tag
|
||||
+icon("chat", 16).o-icon--inline
|
||||
!=button
|
||||
|
||||
|
||||
//- Badge
|
||||
image - [string] path to badge image
|
||||
url - [string] badge link
|
||||
|
||||
mixin badge(image, url)
|
||||
+a(url).u-padding-small.u-hide-link&attributes(attributes)
|
||||
img.o-badge(src=image alt=url height="20")
|
||||
|
||||
|
||||
//- Quickstart widget
|
||||
quickstart.js with manual markup, inspired by PyTorch's "Getting started"
|
||||
groups - [object] option groups, uses global variable QUICKSTART
|
||||
headline - [string] optional text to be rendered as widget headline
|
||||
|
||||
mixin quickstart(groups, headline, description, hide_results)
|
||||
.c-quickstart.o-block-small#qs
|
||||
.c-quickstart__content
|
||||
if headline
|
||||
+h(2)=headline
|
||||
if description
|
||||
p=description
|
||||
for group in groups
|
||||
.c-quickstart__group.u-text-small(data-qs-group=group.id)
|
||||
if group.title
|
||||
.c-quickstart__legend=group.title
|
||||
if group.help
|
||||
| #[+help(group.help)]
|
||||
.c-quickstart__fields
|
||||
for option in group.options
|
||||
input.c-quickstart__input(class="c-quickstart__input--" + (group.input_style ? group.input_style : group.multiple ? "check" : "radio") type=group.multiple ? "checkbox" : "radio" name=group.id id="qs-#{option.id}" value=option.id checked=option.checked)
|
||||
label.c-quickstart__label.u-text-tiny(for="qs-#{option.id}")!=option.title
|
||||
if option.meta
|
||||
| #[span.c-quickstart__label__meta (#{option.meta})]
|
||||
if option.help
|
||||
| #[+help(option.help)]
|
||||
|
||||
if hide_results
|
||||
block
|
||||
else
|
||||
pre.c-code-block
|
||||
code.c-code-block__content.c-quickstart__code(data-qs-results="")
|
||||
block
|
||||
|
||||
|
||||
//- Quickstart code item
|
||||
data - [object] Rendering conditions (keyed by option group ID, value: option)
|
||||
style - [string] modifier ID for line style
|
||||
|
||||
mixin qs(data, style)
|
||||
- args = {}
|
||||
for value, setting in data
|
||||
- args['data-qs-' + setting] = value
|
||||
span.c-quickstart__line(class="c-quickstart__line--#{style || 'bash'}")&attributes(args)
|
||||
block
|
||||
|
||||
|
||||
//- Terminal-style code window
|
||||
label - [string] title displayed in top bar of terminal window
|
||||
|
||||
mixin terminal-wrapper(label, small)
|
||||
.x-terminal(class=small ? "x-terminal--small" : null)
|
||||
.x-terminal__icons(class=small ? "x-terminal__icons--small" : null): span
|
||||
.u-padding-small.u-text-center(class=small ? "u-text-tiny" : "u-text")
|
||||
strong=label
|
||||
block
|
||||
|
||||
mixin terminal(label, button_text, button_url, exec)
|
||||
+terminal-wrapper(label)
|
||||
+code.x-terminal__code(data-executable=exec ? "" : null)
|
||||
block
|
||||
|
||||
if button_text && button_url
|
||||
+button(button_url, true, "primary", "small").x-terminal__button=button_text
|
||||
|
||||
|
||||
//- Landing
|
||||
|
||||
mixin landing-header()
|
||||
header.c-landing
|
||||
.c-landing__wrapper
|
||||
.c-landing__content
|
||||
block
|
||||
|
||||
mixin landing-banner(headline, label)
|
||||
.c-landing__banner.u-padding.o-block.u-color-light
|
||||
+grid.c-landing__banner__content.o-no-block
|
||||
+grid-col("third")
|
||||
h3.u-heading.u-heading-1
|
||||
if label
|
||||
div
|
||||
span.u-text-label.u-text-label--light=label
|
||||
!=headline
|
||||
|
||||
+grid-col("two-thirds").c-landing__banner__text
|
||||
block
|
||||
|
||||
|
||||
mixin landing-logos(title, logos)
|
||||
.o-content.u-text-center&attributes(attributes)
|
||||
h3.u-heading.u-text-label.u-color-dark=title
|
||||
|
||||
each row, i in logos
|
||||
- var is_last = i == logos.length - 1
|
||||
+grid("center").o-inline-list.o-no-block(class=is_last ? "o-no-block" : null)
|
||||
each details, name in row
|
||||
+a(details[0]).u-padding-medium
|
||||
+icon(name, details[1], details[2])
|
||||
|
||||
if is_last
|
||||
block
|
||||
|
||||
|
||||
//- Under construction (temporary)
|
||||
Marks sections that still need to be completed for the v2.0 release.
|
||||
|
||||
mixin under-construction()
|
||||
+infobox("Under construction", "🚧")
|
||||
| This section is still being written and will be updated as soon as
|
||||
| possible. Is there anything that you think should definitely
|
||||
| mentioned or explained here? Any examples you'd like to see?
|
||||
| #[strong Let us know] on the #[+a(gh("spacy") + "/issues") issue tracker]!
|
||||
|
||||
|
||||
//- Legacy docs
|
||||
|
||||
mixin legacy()
|
||||
+aside("Looking for the old docs?", "📖")
|
||||
| To help you make the transition from v1.x to v2.0, we've uploaded the
|
||||
| old website to #[strong #[+a("https://legacy.spacy.io/docs") legacy.spacy.io]].
|
||||
| Wherever possible, the new docs also include notes on features that have
|
||||
| changed in v2.0, and features that were introduced in the new version.
|
|
@ -1,16 +0,0 @@
|
|||
//- 💫 INCLUDES > TOP NAVIGATION
|
||||
|
||||
nav.c-nav.u-text.js-nav(class=landing ? "c-nav--theme" : null)
|
||||
a(href="/" aria-label=SITENAME) #[+logo]
|
||||
|
||||
ul.c-nav__menu
|
||||
- var current_url = '/' + current.path[0]
|
||||
each url, item in NAVIGATION
|
||||
- var is_active = (current_url == url)
|
||||
li.c-nav__menu__item(class=is_active ? "is-active" : null)
|
||||
+a(url)(tabindex=is_active ? "-1" : null)=item
|
||||
|
||||
li.c-nav__menu__item
|
||||
+a(gh("spaCy"))(aria-label="GitHub") #[+icon("github", 20)]
|
||||
|
||||
progress.c-progress.js-progress(value="0" max="1")
|
|
@ -1,16 +0,0 @@
|
|||
//- 💫 INCLUDES > NEWSLETTER
|
||||
|
||||
ul.o-block-small
|
||||
li.u-text-label.u-color-subtle Stay in the loop!
|
||||
li Receive updates about new releases, tutorials and more.
|
||||
|
||||
form.o-grid#mc-embedded-subscribe-form(action="//#{MAILCHIMP.user}.list-manage.com/subscribe/post?u=#{MAILCHIMP.id}&id=#{MAILCHIMP.list}" method="post" name="mc-embedded-subscribe-form" target="_blank" novalidate)
|
||||
|
||||
//- MailChimp spam protection
|
||||
div(style="position: absolute; left: -5000px;" aria-hidden="true")
|
||||
input(type="text" name="b_#{MAILCHIMP.id}_#{MAILCHIMP.list}" tabindex="-1" value="")
|
||||
|
||||
.o-grid-col.o-grid.o-grid--nowrap.o-field.u-padding-small
|
||||
div
|
||||
input#mce-EMAIL.o-field__input.u-text(type="email" name="EMAIL" placeholder="Your email" aria-label="Your email")
|
||||
button#mc-embedded-subscribe.o-field__button.u-text-label.u-color-theme.u-nowrap(type="submit" name="subscribe") Sign up
|
|
@ -1,54 +0,0 @@
|
|||
//- 💫 INCLUDES > DOCS PAGE TEMPLATE
|
||||
|
||||
- sidebar_content = (public[SECTION] ? public[SECTION]._data.sidebar : public._data[SECTION] ? public._data[SECTION].sidebar : false) || FOOTER
|
||||
|
||||
include _sidebar
|
||||
|
||||
main.o-main.o-main--sidebar.o-main--aside
|
||||
article.o-content
|
||||
+grid.o-no-block
|
||||
+h(1).u-heading--title=title.replace("'", "’")
|
||||
if tag
|
||||
+tag=tag
|
||||
if tag_new
|
||||
+tag-new(tag_new)
|
||||
|
||||
if teaser
|
||||
.u-heading__teaser.u-text-small.u-color-dark=teaser
|
||||
else if IS_MODELS
|
||||
.u-heading__teaser.u-text-small.u-color-dark
|
||||
| Available statistical models for
|
||||
| #[code=current.source] (#{LANGUAGES[current.source]}).
|
||||
|
||||
if source
|
||||
.o-block.u-text-right
|
||||
+button(gh("spacy", source), false, "secondary", "small").u-nowrap
|
||||
| Source #[+icon("code", 14)]
|
||||
|
||||
if IS_MODELS
|
||||
include _page_models
|
||||
else
|
||||
!=yield
|
||||
|
||||
+grid.o-content.u-text
|
||||
+grid-col("half")
|
||||
if !IS_MODELS
|
||||
.o-inline-list
|
||||
+button(gh("spacy", "website/" + current.path.join('/') + ".jade"), false, "secondary", "small")
|
||||
| #[span.o-icon Suggest edits] #[+icon("code", 14)]
|
||||
|
||||
+grid-col("half").u-text-right
|
||||
if next && public[SECTION]._data[next]
|
||||
- data = public[SECTION]._data[next]
|
||||
|
||||
+grid("vcenter")
|
||||
+a(next).u-text-small.u-flex-full
|
||||
h4.u-text-label.u-color-dark Read next
|
||||
| #{data.title}
|
||||
|
||||
+a(next).c-icon-button.c-icon-button--right(aria-hidden="true")
|
||||
+icon("arrow-right", 24)
|
||||
|
||||
+gitter("spaCy chat")
|
||||
|
||||
include _footer
|
|
@ -1,109 +0,0 @@
|
|||
//- 💫 INCLUDES > MODELS PAGE TEMPLATE
|
||||
|
||||
for id in CURRENT_MODELS
|
||||
- var comps = getModelComponents(id)
|
||||
+section(id)
|
||||
section(data-vue=id data-model=id)
|
||||
+grid("vcenter").o-no-block(id=id)
|
||||
+grid-col("two-thirds")
|
||||
+h(2)
|
||||
+a("#" + id).u-permalink=id
|
||||
|
||||
+grid-col("third").u-text-right
|
||||
.u-color-subtle.u-text-tiny
|
||||
+button(gh("spacy-models") + "/releases", true, "secondary", "small")(v-bind:href="releaseUrl")
|
||||
| Release details
|
||||
.u-padding-small Latest: #[code(v-text="version") n/a]
|
||||
|
||||
+aside-code("Installation", "bash", "$").
|
||||
python -m spacy download #{id}
|
||||
|
||||
p(v-if="description" v-text="description")
|
||||
|
||||
+infobox(v-if="error")
|
||||
| Unable to load model details from GitHub. To find out more
|
||||
| about this model, see the overview of the
|
||||
| #[+a(gh("spacy-models") + "/releases") latest model releases].
|
||||
|
||||
+table.o-block-small(v-bind:data-loading="loading")
|
||||
+row
|
||||
+cell #[+label Language]
|
||||
+cell #[+tag=comps.lang] #{LANGUAGES[comps.lang]}
|
||||
for comp, label in {"Type": comps.type, "Genre": comps.genre}
|
||||
+row
|
||||
+cell #[+label=label]
|
||||
+cell #[+tag=comp] #{MODEL_META[comp]}
|
||||
+row
|
||||
+cell #[+label Size]
|
||||
+cell #[+tag=comps.size] #[span(v-text="sizeFull" v-if="sizeFull")] #[em(v-else="") n/a]
|
||||
|
||||
+row(v-if="pipeline && pipeline.length" v-cloak="")
|
||||
+cell
|
||||
+label Pipeline #[+help(MODEL_META.pipeline).u-color-subtle]
|
||||
+cell
|
||||
span(v-for="(pipe, index) in pipeline" v-if="pipeline")
|
||||
code(v-text="pipe")
|
||||
span(v-if="index != pipeline.length - 1") ,
|
||||
|
||||
+row(v-if="vectors" v-cloak="")
|
||||
+cell
|
||||
+label Vectors #[+help(MODEL_META.vectors).u-color-subtle]
|
||||
+cell(v-text="vectors")
|
||||
|
||||
+row(v-if="sources && sources.length" v-cloak="")
|
||||
+cell
|
||||
+label Sources #[+help(MODEL_META.sources).u-color-subtle]
|
||||
+cell
|
||||
span(v-for="(source, index) in sources") {{ source }}
|
||||
span(v-if="index != sources.length - 1") ,
|
||||
|
||||
+row(v-if="author" v-cloak="")
|
||||
+cell #[+label Author]
|
||||
+cell
|
||||
+a("")(v-bind:href="url" v-if="url" v-text="author")
|
||||
span(v-else="" v-text="author") {{ model.author }}
|
||||
|
||||
+row(v-if="license" v-cloak="")
|
||||
+cell #[+label License]
|
||||
+cell
|
||||
+a("")(v-bind:href="modelLicenses[license]" v-if="modelLicenses[license]") {{ license }}
|
||||
span(v-else="") {{ license }}
|
||||
|
||||
+row(v-cloak="")
|
||||
+cell #[+label Compat #[+help(MODEL_META.compat).u-color-subtle]]
|
||||
+cell
|
||||
.o-field.u-float-left
|
||||
select.o-field__select.u-text-small(v-model="spacyVersion")
|
||||
option(v-for="version in orderedCompat" v-bind:value="version") spaCy v{{ version }}
|
||||
code(v-if="compatVersion" v-text="compatVersion")
|
||||
em(v-else="") not compatible
|
||||
|
||||
+grid.o-block-small(v-cloak="" v-if="hasAccuracy")
|
||||
for keys, label in MODEL_BENCHMARKS
|
||||
.u-flex-full.u-padding-small
|
||||
+table.o-block-small
|
||||
+row("head")
|
||||
+head-cell(colspan="2")=(MODEL_META["benchmark_" + label] || label)
|
||||
for label, field in keys
|
||||
+row
|
||||
+cell.u-nowrap
|
||||
+label=label
|
||||
if MODEL_META[field]
|
||||
| #[+help(MODEL_META[field]).u-color-subtle]
|
||||
+cell("num")
|
||||
span(v-if="#{field}" v-text="#{field}")
|
||||
em(v-if="!#{field}") n/a
|
||||
|
||||
p.u-text-small.u-color-dark(v-if="notes" v-text="notes" v-cloak="")
|
||||
|
||||
if comps.size == "sm" && EXAMPLE_SENT_LANGS.includes(comps.lang)
|
||||
section
|
||||
+code-exec("Test the model live").
|
||||
import spacy
|
||||
from spacy.lang.#{comps.lang}.examples import sentences
|
||||
|
||||
nlp = spacy.load('#{id}')
|
||||
doc = nlp(sentences[0])
|
||||
print(doc.text)
|
||||
for token in doc:
|
||||
print(token.text, token.pos_, token.dep_)
|
|
@ -1,28 +0,0 @@
|
|||
//- 💫 INCLUDES > SCRIPTS
|
||||
|
||||
- scripts = ["vendor/prism.min", "vendor/vue.min"]
|
||||
- if (SECTION == "universe") scripts.push("vendor/vue-markdown.min")
|
||||
- if (quickstart) scripts.push("vendor/quickstart.min")
|
||||
- if (IS_PAGE) scripts.push("vendor/in-view.min")
|
||||
- if (IS_PAGE || SECTION == "index") scripts.push("vendor/juniper.min")
|
||||
|
||||
for script in scripts
|
||||
script(src="/assets/js/" + script + ".js")
|
||||
script(src="/assets/js/main.js?v#{V_JS}" type=(environment == "deploy") ? null : "module")
|
||||
|
||||
if environment == "deploy"
|
||||
script(src="https://www.google-analytics.com/analytics.js", async)
|
||||
script
|
||||
| window.ga=window.ga||function(){
|
||||
| (ga.q=ga.q||[]).push(arguments)}; ga.l=+new Date;
|
||||
| ga('create', '#{ANALYTICS}', 'auto'); ga('send', 'pageview');
|
||||
|
||||
if IS_PAGE
|
||||
script(src="https://sidecar.gitter.im/dist/sidecar.v1.js" async defer)
|
||||
script
|
||||
| ((window.gitter = {}).chat = {}).options = {
|
||||
| useStyles: false,
|
||||
| activationElement: '.js-gitter-button',
|
||||
| targetElement: '.js-gitter',
|
||||
| room: '!{SOCIAL.gitter}'
|
||||
| };
|
|
@ -1,23 +0,0 @@
|
|||
//- 💫 INCLUDES > SIDEBAR
|
||||
|
||||
menu.c-sidebar.js-sidebar.u-text
|
||||
if sidebar_content
|
||||
each items, sectiontitle in sidebar_content
|
||||
ul.c-sidebar__section.o-block-small
|
||||
li.u-text-label.u-color-dark=sectiontitle
|
||||
|
||||
each url, item in items
|
||||
- var is_current = CURRENT == url || (CURRENT == "index" && url == "./")
|
||||
li.c-sidebar__item
|
||||
+a(url)(class=is_current ? "is-active" : null tabindex=is_current ? "-1" : null data-sidebar-active=is_current ? "" : null)=item
|
||||
|
||||
if is_current
|
||||
if IS_MODELS && CURRENT_MODELS.length
|
||||
- menu = Object.assign({}, ...CURRENT_MODELS.map(id => ({ [id]: id })))
|
||||
if menu
|
||||
ul.c-sidebar__crumb.u-hidden-sm
|
||||
- var counter = 0
|
||||
for id, title in menu
|
||||
- counter++
|
||||
li.c-sidebar__crumb__item(data-nav=id)
|
||||
+a("#section-" + id)=title
|
|
@ -1,57 +0,0 @@
|
|||
//- 💫 GLOBAL LAYOUT
|
||||
|
||||
include _includes/_mixins
|
||||
|
||||
- title = IS_MODELS ? LANGUAGES[current.source] || title : title
|
||||
|
||||
- PAGE_URL = getPageUrl()
|
||||
- PAGE_TITLE = getPageTitle()
|
||||
- PAGE_IMAGE = getPageImage()
|
||||
|
||||
doctype html
|
||||
html(lang="en")
|
||||
head
|
||||
title=PAGE_TITLE
|
||||
meta(charset="utf-8")
|
||||
meta(name="viewport" content="width=device-width, initial-scale=1.0")
|
||||
meta(name="referrer" content="always")
|
||||
meta(name="description" content=description)
|
||||
|
||||
meta(property="og:type" content="website")
|
||||
meta(property="og:site_name" content=sitename)
|
||||
meta(property="og:url" content=PAGE_URL)
|
||||
meta(property="og:title" content=PAGE_TITLE)
|
||||
meta(property="og:description" content=description)
|
||||
meta(property="og:image" content=PAGE_IMAGE)
|
||||
|
||||
meta(name="twitter:card" content="summary_large_image")
|
||||
meta(name="twitter:site" content="@" + SOCIAL.twitter)
|
||||
meta(name="twitter:title" content=PAGE_TITLE)
|
||||
meta(name="twitter:description" content=description)
|
||||
meta(name="twitter:image" content=PAGE_IMAGE)
|
||||
|
||||
link(rel="shortcut icon" href="/assets/img/favicon.ico")
|
||||
link(rel="icon" type="image/x-icon" href="/assets/img/favicon.ico")
|
||||
|
||||
if SECTION == "api"
|
||||
link(href="/assets/css/style_green.css?v#{V_CSS}" rel="stylesheet")
|
||||
else if SECTION == "universe"
|
||||
link(href="/assets/css/style_purple.css?v#{V_CSS}" rel="stylesheet")
|
||||
else
|
||||
link(href="/assets/css/style.css?v#{V_CSS}" rel="stylesheet")
|
||||
|
||||
body
|
||||
include _includes/_svg
|
||||
include _includes/_navigation
|
||||
|
||||
if !landing
|
||||
include _includes/_page-docs
|
||||
|
||||
else if SECTION == "universe"
|
||||
!=yield
|
||||
|
||||
else
|
||||
main!=yield
|
||||
include _includes/_footer
|
||||
|
||||
include _includes/_scripts
|
|
@ -1,43 +0,0 @@
|
|||
//- 💫 DOCS > API > ANNOTATION > BILUO
|
||||
|
||||
+table(["Tag", "Description"])
|
||||
+row
|
||||
+cell #[code #[span.u-color-theme B] EGIN]
|
||||
+cell The first token of a multi-token entity.
|
||||
|
||||
+row
|
||||
+cell #[code #[span.u-color-theme I] N]
|
||||
+cell An inner token of a multi-token entity.
|
||||
|
||||
+row
|
||||
+cell #[code #[span.u-color-theme L] AST]
|
||||
+cell The final token of a multi-token entity.
|
||||
|
||||
+row
|
||||
+cell #[code #[span.u-color-theme U] NIT]
|
||||
+cell A single-token entity.
|
||||
|
||||
+row
|
||||
+cell #[code #[span.u-color-theme O] UT]
|
||||
+cell A non-entity token.
|
||||
|
||||
+aside("Why BILUO, not IOB?")
|
||||
| There are several coding schemes for encoding entity annotations as
|
||||
| token tags. These coding schemes are equally expressive, but not
|
||||
| necessarily equally learnable.
|
||||
| #[+a("http://www.aclweb.org/anthology/W09-1119") Ratinov and Roth]
|
||||
| showed that the minimal #[strong Begin], #[strong In], #[strong Out]
|
||||
| scheme was more difficult to learn than the #[strong BILUO] scheme that
|
||||
| we use, which explicitly marks boundary tokens.
|
||||
|
||||
p
|
||||
| spaCy translates the character offsets into this scheme, in order to
|
||||
| decide the cost of each action given the current state of the entity
|
||||
| recogniser. The costs are then used to calculate the gradient of the
|
||||
| loss, to train the model. The exact algorithm is a pastiche of
|
||||
| well-known methods, and is not currently described in any single
|
||||
| publication. The model is a greedy transition-based parser guided by a
|
||||
| linear model whose weights are learned using the averaged perceptron
|
||||
| loss, via the #[+a("http://www.aclweb.org/anthology/C12-1059") dynamic oracle]
|
||||
| imitation learning strategy. The transition system is equivalent to the
|
||||
| BILOU tagging scheme.
|
|
@ -1,158 +0,0 @@
|
|||
//- 💫 DOCS > API > ANNOTATION > DEPENDENCY LABELS
|
||||
|
||||
p
|
||||
| This section lists the syntactic dependency labels assigned by
|
||||
| spaCy's #[+a("/models") models]. The individual labels are
|
||||
| language-specific and depend on the training corpus.
|
||||
|
||||
+accordion("Universal Dependency Labels")
|
||||
p
|
||||
| The #[+a("http://universaldependencies.org/u/dep/") Universal Dependencies scheme]
|
||||
| is used in all languages trained on Universal Dependency Corpora.
|
||||
|
||||
+table(["Dep", "Description"])
|
||||
+ud-row("acl", "clausal modifier of noun (adjectival clause)")
|
||||
+ud-row("advcl", "adverbial clause modifier")
|
||||
+ud-row("advmod", "adverbial modifier")
|
||||
+ud-row("amod", "adjectival modifier")
|
||||
+ud-row("appos", "appositional modifier")
|
||||
+ud-row("aux", "auxiliary")
|
||||
+ud-row("case", "case marking")
|
||||
+ud-row("cc", "coordinating conjunction")
|
||||
+ud-row("ccomp", "clausal complement")
|
||||
+ud-row("clf", "classifier")
|
||||
+ud-row("compound", "compound")
|
||||
+ud-row("conj", "conjunct")
|
||||
+ud-row("cop", "copula")
|
||||
+ud-row("csubj", "clausal subject")
|
||||
+ud-row("dep", "unspecified dependency")
|
||||
+ud-row("det", "determiner")
|
||||
+ud-row("discourse", "discourse element")
|
||||
+ud-row("dislocated", "dislocated elements")
|
||||
+ud-row("expl", "expletive")
|
||||
+ud-row("fixed", "fixed multiword expression")
|
||||
+ud-row("flat", "flat multiword expression")
|
||||
+ud-row("goeswith", "goes with")
|
||||
+ud-row("iobj", "indirect object")
|
||||
+ud-row("list", "list")
|
||||
+ud-row("mark", "marker")
|
||||
+ud-row("nmod", "nominal modifier")
|
||||
+ud-row("nsubj", "nominal subject")
|
||||
+ud-row("nummod", "numeric modifier")
|
||||
+ud-row("obj", "object")
|
||||
+ud-row("obl", "oblique nominal")
|
||||
+ud-row("orphan", "orphan")
|
||||
+ud-row("parataxis", "parataxis")
|
||||
+ud-row("punct", "punctuation")
|
||||
+ud-row("reparandum", "overridden disfluency")
|
||||
+ud-row("root", "root")
|
||||
+ud-row("vocative", "vocative")
|
||||
+ud-row("xcomp", "open clausal complement")
|
||||
|
||||
+accordion("English", "dependency-parsing-english")
|
||||
p
|
||||
| The English dependency labels use the
|
||||
| #[+a("https://github.com/clir/clearnlp-guidelines/blob/master/md/specifications/dependency_labels.md") CLEAR Style]
|
||||
| by #[+a("http://www.clearnlp.com") ClearNLP].
|
||||
|
||||
+table(["Label", "Description"])
|
||||
+dep-row("acl", "clausal modifier of noun (adjectival clause)")
|
||||
+dep-row("acomp", "adjectival complement")
|
||||
+dep-row("advcl", "adverbial clause modifier")
|
||||
+dep-row("advmod", "adverbial modifier")
|
||||
+dep-row("agent", "agent")
|
||||
+dep-row("amod", "adjectival modifier")
|
||||
+dep-row("appos", "appositional modifier")
|
||||
+dep-row("attr", "attribute")
|
||||
+dep-row("aux", "auxiliary")
|
||||
+dep-row("auxpass", "auxiliary (passive)")
|
||||
+dep-row("case", "case marking")
|
||||
+dep-row("cc", "coordinating conjunction")
|
||||
+dep-row("ccomp", "clausal complement")
|
||||
+dep-row("compound", "compound")
|
||||
+dep-row("conj", "conjunct")
|
||||
+dep-row("cop", "copula")
|
||||
+dep-row("csubj", "clausal subject")
|
||||
+dep-row("csubjpass", "clausal subject (passive)")
|
||||
+dep-row("dative", "dative")
|
||||
+dep-row("dep", "unclassified dependent")
|
||||
+dep-row("det", "determiner")
|
||||
+dep-row("dobj", "direct object")
|
||||
+dep-row("expl", "expletive")
|
||||
+dep-row("intj", "interjection")
|
||||
+dep-row("mark", "marker")
|
||||
+dep-row("meta", "meta modifier")
|
||||
+dep-row("neg", "negation modifier")
|
||||
+dep-row("nn", "noun compound modifier")
|
||||
+dep-row("nounmod", "modifier of nominal")
|
||||
+dep-row("npmod", "noun phrase as adverbial modifier")
|
||||
+dep-row("nsubj", "nominal subject")
|
||||
+dep-row("nsubjpass", "nominal subject (passive)")
|
||||
+dep-row("nummod", "numeric modifier")
|
||||
+dep-row("oprd", "object predicate")
|
||||
+dep-row("obj", "object")
|
||||
+dep-row("obl", "oblique nominal")
|
||||
+dep-row("parataxis", "parataxis")
|
||||
+dep-row("pcomp", "complement of preposition")
|
||||
+dep-row("pobj", "object of preposition")
|
||||
+dep-row("poss", "possession modifier")
|
||||
+dep-row("preconj", "pre-correlative conjunction")
|
||||
+dep-row("prep", "prepositional modifier")
|
||||
+dep-row("prt", "particle")
|
||||
+dep-row("punct", "punctuation")
|
||||
+dep-row("quantmod", "modifier of quantifier")
|
||||
+dep-row("relcl", "relative clause modifier")
|
||||
+dep-row("root", "root")
|
||||
+dep-row("xcomp", "open clausal complement")
|
||||
|
||||
+accordion("German", "dependency-parsing-german")
|
||||
p
|
||||
| The German dependency labels use the
|
||||
| #[+a("http://www.ims.uni-stuttgart.de/forschung/ressourcen/korpora/TIGERCorpus/annotation/index.html") TIGER Treebank]
|
||||
| annotation scheme.
|
||||
|
||||
+table(["Label", "Description"])
|
||||
+dep-row("ac", "adpositional case marker")
|
||||
+dep-row("adc", "adjective component")
|
||||
+dep-row("ag", "genitive attribute")
|
||||
+dep-row("ams", "measure argument of adjective")
|
||||
+dep-row("app", "apposition")
|
||||
+dep-row("avc", "adverbial phrase component")
|
||||
+dep-row("cc", "comparative complement")
|
||||
+dep-row("cd", "coordinating conjunction")
|
||||
+dep-row("cj", "conjunct")
|
||||
+dep-row("cm", "comparative conjunction")
|
||||
+dep-row("cp", "complementizer")
|
||||
+dep-row("cvc", "collocational verb construction")
|
||||
+dep-row("da", "dative")
|
||||
+dep-row("dh", "discourse-level head")
|
||||
+dep-row("dm", "discourse marker")
|
||||
+dep-row("ep", "expletive es")
|
||||
+dep-row("hd", "head")
|
||||
+dep-row("ju", "junctor")
|
||||
+dep-row("mnr", "postnominal modifier")
|
||||
+dep-row("mo", "modifier")
|
||||
+dep-row("ng", "negation")
|
||||
+dep-row("nk", "noun kernel element")
|
||||
+dep-row("nmc", "numerical component")
|
||||
+dep-row("oa", "accusative object")
|
||||
+dep-row("oa", "second accusative object")
|
||||
+dep-row("oc", "clausal object")
|
||||
+dep-row("og", "genitive object")
|
||||
+dep-row("op", "prepositional object")
|
||||
+dep-row("par", "parenthetical element")
|
||||
+dep-row("pd", "predicate")
|
||||
+dep-row("pg", "phrasal genitive")
|
||||
+dep-row("ph", "placeholder")
|
||||
+dep-row("pm", "morphological particle")
|
||||
+dep-row("pnc", "proper noun component")
|
||||
+dep-row("rc", "relative clause")
|
||||
+dep-row("re", "repeated element")
|
||||
+dep-row("rs", "reported speech")
|
||||
+dep-row("sb", "subject")
|
||||
+dep-row("sbp", "passivised subject")
|
||||
+dep-row("sp", "subject or predicate")
|
||||
+dep-row("svp", "separable verb prefix")
|
||||
+dep-row("uc", "unit component")
|
||||
+dep-row("vo", "vocative")
|
||||
+dep-row("ROOT", "root")
|
|
@ -1,109 +0,0 @@
|
|||
//- 💫 DOCS > API > ANNOTATION > NAMED ENTITIES
|
||||
|
||||
p
|
||||
| Models trained on the
|
||||
| #[+a("https://catalog.ldc.upenn.edu/LDC2013T19") OntoNotes 5] corpus
|
||||
| support the following entity types:
|
||||
|
||||
+table(["Type", "Description"])
|
||||
+row
|
||||
+cell #[code PERSON]
|
||||
+cell People, including fictional.
|
||||
|
||||
+row
|
||||
+cell #[code NORP]
|
||||
+cell Nationalities or religious or political groups.
|
||||
|
||||
+row
|
||||
+cell #[code FAC]
|
||||
+cell Buildings, airports, highways, bridges, etc.
|
||||
|
||||
+row
|
||||
+cell #[code ORG]
|
||||
+cell Companies, agencies, institutions, etc.
|
||||
|
||||
+row
|
||||
+cell #[code GPE]
|
||||
+cell Countries, cities, states.
|
||||
|
||||
+row
|
||||
+cell #[code LOC]
|
||||
+cell Non-GPE locations, mountain ranges, bodies of water.
|
||||
|
||||
+row
|
||||
+cell #[code PRODUCT]
|
||||
+cell Objects, vehicles, foods, etc. (Not services.)
|
||||
|
||||
+row
|
||||
+cell #[code EVENT]
|
||||
+cell Named hurricanes, battles, wars, sports events, etc.
|
||||
|
||||
+row
|
||||
+cell #[code WORK_OF_ART]
|
||||
+cell Titles of books, songs, etc.
|
||||
|
||||
+row
|
||||
+cell #[code LAW]
|
||||
+cell Named documents made into laws.
|
||||
|
||||
+row
|
||||
+cell #[code LANGUAGE]
|
||||
+cell Any named language.
|
||||
|
||||
+row
|
||||
+cell #[code DATE]
|
||||
+cell Absolute or relative dates or periods.
|
||||
|
||||
+row
|
||||
+cell #[code TIME]
|
||||
+cell Times smaller than a day.
|
||||
|
||||
+row
|
||||
+cell #[code PERCENT]
|
||||
+cell Percentage, including "%".
|
||||
|
||||
+row
|
||||
+cell #[code MONEY]
|
||||
+cell Monetary values, including unit.
|
||||
|
||||
+row
|
||||
+cell #[code QUANTITY]
|
||||
+cell Measurements, as of weight or distance.
|
||||
|
||||
+row
|
||||
+cell #[code ORDINAL]
|
||||
+cell "first", "second", etc.
|
||||
|
||||
+row
|
||||
+cell #[code CARDINAL]
|
||||
+cell Numerals that do not fall under another type.
|
||||
|
||||
+h(4, "ner-wikipedia-scheme") Wikipedia scheme
|
||||
|
||||
p
|
||||
| Models trained on Wikipedia corpus
|
||||
| (#[+a("http://www.sciencedirect.com/science/article/pii/S0004370212000276") Nothman et al., 2013])
|
||||
| use a less fine-grained NER annotation scheme and recognise the
|
||||
| following entities:
|
||||
|
||||
+table(["Type", "Description"])
|
||||
+row
|
||||
+cell #[code PER]
|
||||
+cell Named person or family.
|
||||
|
||||
+row
|
||||
+cell #[code LOC]
|
||||
+cell
|
||||
| Name of politically or geographically defined location (cities,
|
||||
| provinces, countries, international regions, bodies of water,
|
||||
| mountains).
|
||||
|
||||
+row
|
||||
+cell #[code ORG]
|
||||
+cell Named corporate, governmental, or other organizational entity.
|
||||
|
||||
+row
|
||||
+cell #[code MISC]
|
||||
+cell
|
||||
| Miscellaneous entities, e.g. events, nationalities, products or
|
||||
| works of art.
|
|
@ -1,179 +0,0 @@
|
|||
//- 💫 DOCS > API > ANNOTATION > POS TAGS
|
||||
|
||||
p
|
||||
| This section lists the fine-grained and coarse-grained part-of-speech
|
||||
| tags assigned by spaCy's #[+a("/models") models]. The individual mapping
|
||||
| is specific to the training corpus and can be defined in the respective
|
||||
| language data's #[+a("/usage/adding-languages#tag-map") #[code tag_map.py]].
|
||||
|
||||
+accordion("Universal Part-of-speech Tags")
|
||||
p
|
||||
| spaCy also maps all language-specific part-of-speech tags to a small,
|
||||
| fixed set of word type tags following the
|
||||
| #[+a("http://universaldependencies.org/u/pos/") Universal Dependencies scheme].
|
||||
| The universal tags don't code for any morphological features and only
|
||||
| cover the word type. They're available as the
|
||||
| #[+api("token#attributes") #[code Token.pos]] and
|
||||
| #[+api("token#attributes") #[code Token.pos_]] attributes.
|
||||
|
||||
+table(["POS", "Description", "Examples"])
|
||||
+ud-row("ADJ", "adjective", "big, old, green, incomprehensible, first")
|
||||
+ud-row("ADP", "adposition", "in, to, during")
|
||||
+ud-row("ADV", "adverb", "very, tomorrow, down, where, there")
|
||||
+ud-row("AUX", "auxiliary", "is, has (done), will (do), should (do)")
|
||||
+ud-row("CONJ", "conjunction", "and, or, but")
|
||||
+ud-row("CCONJ", "coordinating conjunction", "and, or, but")
|
||||
+ud-row("DET", "determiner", "a, an, the")
|
||||
+ud-row("INTJ", "interjection", "psst, ouch, bravo, hello")
|
||||
+ud-row("NOUN", "noun", "girl, cat, tree, air, beauty")
|
||||
+ud-row("NUM", "numeral", "1, 2017, one, seventy-seven, IV, MMXIV")
|
||||
+ud-row("PART", "particle", "'s, not, ")
|
||||
+ud-row("PRON", "pronoun", "I, you, he, she, myself, themselves, somebody")
|
||||
+ud-row("PROPN", "proper noun", "Mary, John, London, NATO, HBO")
|
||||
+ud-row("PUNCT", "punctuation", "., (, ), ?")
|
||||
+ud-row("SCONJ", "subordinating conjunction", "if, while, that")
|
||||
+ud-row("SYM", "symbol", "$, %, §, ©, +, −, ×, ÷, =, :), 😝")
|
||||
+ud-row("VERB", "verb", "run, runs, running, eat, ate, eating")
|
||||
+ud-row("X", "other", "sfpksdpsxmsa")
|
||||
+ud-row("SPACE", "space", "")
|
||||
|
||||
+accordion("English", "pos-en")
|
||||
p
|
||||
| The English part-of-speech tagger uses the
|
||||
| #[+a("https://catalog.ldc.upenn.edu/LDC2013T19") OntoNotes 5] version of
|
||||
| the Penn Treebank tag set. We also map the tags to the simpler Google
|
||||
| Universal POS tag set.
|
||||
|
||||
+table(["Tag", "POS", "Morphology", "Description"])
|
||||
+pos-row("-LRB-", "PUNCT", "PunctType=brck PunctSide=ini", "left round bracket")
|
||||
+pos-row("-RRB-", "PUNCT", "PunctType=brck PunctSide=fin", "right round bracket")
|
||||
+pos-row(",", "PUNCT", "PunctType=comm", "punctuation mark, comma")
|
||||
+pos-row(":", "PUNCT", "", "punctuation mark, colon or ellipsis")
|
||||
+pos-row(".", "PUNCT", "PunctType=peri", "punctuation mark, sentence closer")
|
||||
+pos-row("''", "PUNCT", "PunctType=quot PunctSide=fin", "closing quotation mark")
|
||||
+pos-row("\"\"", "PUNCT", "PunctType=quot PunctSide=fin", "closing quotation mark")
|
||||
+pos-row("#", "SYM", "SymType=numbersign", "symbol, number sign")
|
||||
+pos-row("``", "PUNCT", "PunctType=quot PunctSide=ini", "opening quotation mark")
|
||||
+pos-row("$", "SYM", "SymType=currency", "symbol, currency")
|
||||
+pos-row("ADD", "X", "", "email")
|
||||
+pos-row("AFX", "ADJ", "Hyph=yes", "affix")
|
||||
+pos-row("BES", "VERB", "", 'auxiliary "be"')
|
||||
+pos-row("CC", "CONJ", "ConjType=coor", "conjunction, coordinating")
|
||||
+pos-row("CD", "NUM", "NumType=card", "cardinal number")
|
||||
+pos-row("DT", "DET", "determiner")
|
||||
+pos-row("EX", "ADV", "AdvType=ex", "existential there")
|
||||
+pos-row("FW", "X", "Foreign=yes", "foreign word")
|
||||
+pos-row("GW", "X", "", "additional word in multi-word expression")
|
||||
+pos-row("HVS", "VERB", "", 'forms of "have"')
|
||||
+pos-row("HYPH", "PUNCT", "PunctType=dash", "punctuation mark, hyphen")
|
||||
+pos-row("IN", "ADP", "", "conjunction, subordinating or preposition")
|
||||
+pos-row("JJ", "ADJ", "Degree=pos", "adjective")
|
||||
+pos-row("JJR", "ADJ", "Degree=comp", "adjective, comparative")
|
||||
+pos-row("JJS", "ADJ", "Degree=sup", "adjective, superlative")
|
||||
+pos-row("LS", "PUNCT", "NumType=ord", "list item marker")
|
||||
+pos-row("MD", "VERB", "VerbType=mod", "verb, modal auxiliary")
|
||||
+pos-row("NFP", "PUNCT", "", "superfluous punctuation")
|
||||
+pos-row("NIL", "", "", "missing tag")
|
||||
+pos-row("NN", "NOUN", "Number=sing", "noun, singular or mass")
|
||||
+pos-row("NNP", "PROPN", "NounType=prop Number=sign", "noun, proper singular")
|
||||
+pos-row("NNPS", "PROPN", "NounType=prop Number=plur", "noun, proper plural")
|
||||
+pos-row("NNS", "NOUN", "Number=plur", "noun, plural")
|
||||
+pos-row("PDT", "ADJ", "AdjType=pdt PronType=prn", "predeterminer")
|
||||
+pos-row("POS", "PART", "Poss=yes", "possessive ending")
|
||||
+pos-row("PRP", "PRON", "PronType=prs", "pronoun, personal")
|
||||
+pos-row("PRP$", "ADJ", "PronType=prs Poss=yes", "pronoun, possessive")
|
||||
+pos-row("RB", "ADV", "Degree=pos", "adverb")
|
||||
+pos-row("RBR", "ADV", "Degree=comp", "adverb, comparative")
|
||||
+pos-row("RBS", "ADV", "Degree=sup", "adverb, superlative")
|
||||
+pos-row("RP", "PART", "", "adverb, particle")
|
||||
+pos-row("_SP", "SPACE", "", "space")
|
||||
+pos-row("SYM", "SYM", "", "symbol")
|
||||
+pos-row("TO", "PART", "PartType=inf VerbForm=inf", "infinitival to")
|
||||
+pos-row("UH", "INTJ", "", "interjection")
|
||||
+pos-row("VB", "VERB", "VerbForm=inf", "verb, base form")
|
||||
+pos-row("VBD", "VERB", "VerbForm=fin Tense=past", "verb, past tense")
|
||||
+pos-row("VBG", "VERB", "VerbForm=part Tense=pres Aspect=prog", "verb, gerund or present participle")
|
||||
+pos-row("VBN", "VERB", "VerbForm=part Tense=past Aspect=perf", "verb, past participle")
|
||||
+pos-row("VBP", "VERB", "VerbForm=fin Tense=pres", "verb, non-3rd person singular present")
|
||||
+pos-row("VBZ", "VERB", "VerbForm=fin Tense=pres Number=sing Person=3", "verb, 3rd person singular present")
|
||||
+pos-row("WDT", "ADJ", "PronType=int|rel", "wh-determiner")
|
||||
+pos-row("WP", "NOUN", "PronType=int|rel", "wh-pronoun, personal")
|
||||
+pos-row("WP$", "ADJ", "Poss=yes PronType=int|rel", "wh-pronoun, possessive")
|
||||
+pos-row("WRB", "ADV", "PronType=int|rel", "wh-adverb")
|
||||
+pos-row("XX", "X", "", "unknown")
|
||||
|
||||
+accordion("German", "pos-de")
|
||||
p
|
||||
| The German part-of-speech tagger uses the
|
||||
| #[+a("http://www.ims.uni-stuttgart.de/forschung/ressourcen/korpora/TIGERCorpus/annotation/index.html") TIGER Treebank]
|
||||
| annotation scheme. We also map the tags to the simpler Google
|
||||
| Universal POS tag set.
|
||||
|
||||
+table(["Tag", "POS", "Morphology", "Description"])
|
||||
+pos-row("$(", "PUNCT", "PunctType=brck", "other sentence-internal punctuation mark")
|
||||
+pos-row("$,", "PUNCT", "PunctType=comm", "comma")
|
||||
+pos-row("$.", "PUNCT", "PunctType=peri", "sentence-final punctuation mark")
|
||||
+pos-row("ADJA", "ADJ", "", "adjective, attributive")
|
||||
+pos-row("ADJD", "ADJ", "Variant=short", "adjective, adverbial or predicative")
|
||||
+pos-row("ADV", "ADV", "", "adverb")
|
||||
+pos-row("APPO", "ADP", "AdpType=post", "postposition")
|
||||
+pos-row("APPR", "ADP", "AdpType=prep", "preposition; circumposition left")
|
||||
+pos-row("APPRART", "ADP", "AdpType=prep PronType=art", "preposition with article")
|
||||
+pos-row("APZR", "ADP", "AdpType=circ", "circumposition right")
|
||||
+pos-row("ART", "DET", "PronType=art", "definite or indefinite article")
|
||||
+pos-row("CARD", "NUM", "NumType=card", "cardinal number")
|
||||
+pos-row("FM", "X", "Foreign=yes", "foreign language material")
|
||||
+pos-row("ITJ", "INTJ", "", "interjection")
|
||||
+pos-row("KOKOM", "CONJ", "ConjType=comp", "comparative conjunction")
|
||||
+pos-row("KON", "CONJ", "", "coordinate conjunction")
|
||||
+pos-row("KOUI", "SCONJ", "", 'subordinate conjunction with "zu" and infinitive')
|
||||
+pos-row("KOUS", "SCONJ", "", "subordinate conjunction with sentence")
|
||||
+pos-row("NE", "PROPN", "", "proper noun")
|
||||
+pos-row("NNE", "PROPN", "", "proper noun")
|
||||
+pos-row("NN", "NOUN", "", "noun, singular or mass")
|
||||
+pos-row("PAV", "ADV", "PronType=dem", "pronominal adverb")
|
||||
+pos-row("PROAV", "ADV", "PronType=dem", "pronominal adverb")
|
||||
+pos-row("PDAT", "DET", "PronType=dem", "attributive demonstrative pronoun")
|
||||
+pos-row("PDS", "PRON", "PronType=dem", "substituting demonstrative pronoun")
|
||||
+pos-row("PIAT", "DET", "PronType=ind|neg|tot", "attributive indefinite pronoun without determiner")
|
||||
+pos-row("PIDAT", "DET", "AdjType=pdt PronType=ind|neg|tot", "attributive indefinite pronoun with determiner")
|
||||
+pos-row("PIS", "PRON", "PronType=ind|neg|tot", "substituting indefinite pronoun")
|
||||
+pos-row("PPER", "PRON", "PronType=prs", "non-reflexive personal pronoun")
|
||||
+pos-row("PPOSAT", "DET", "Poss=yes PronType=prs", "attributive possessive pronoun")
|
||||
+pos-row("PPOSS", "PRON", "PronType=rel", "substituting possessive pronoun")
|
||||
+pos-row("PRELAT", "DET", "PronType=rel", "attributive relative pronoun")
|
||||
+pos-row("PRELS", "PRON", "PronType=rel", "substituting relative pronoun")
|
||||
+pos-row("PRF", "PRON", "PronType=prs Reflex=yes", "reflexive personal pronoun")
|
||||
+pos-row("PTKA", "PART", "", "particle with adjective or adverb")
|
||||
+pos-row("PTKANT", "PART", "PartType=res", "answer particle")
|
||||
+pos-row("PTKNEG", "PART", "Negative=yes", "negative particle")
|
||||
+pos-row("PTKVZ", "PART", "PartType=vbp", "separable verbal particle")
|
||||
+pos-row("PTKZU", "PART", "PartType=inf", '"zu" before infinitive')
|
||||
+pos-row("PWAT", "DET", "PronType=int", "attributive interrogative pronoun")
|
||||
+pos-row("PWAV", "ADV", "PronType=int", "adverbial interrogative or relative pronoun")
|
||||
+pos-row("PWS", "PRON", "PronType=int", "substituting interrogative pronoun")
|
||||
+pos-row("TRUNC", "X", "Hyph=yes", "word remnant")
|
||||
+pos-row("VAFIN", "AUX", "Mood=ind VerbForm=fin", "finite verb, auxiliary")
|
||||
+pos-row("VAIMP", "AUX", "Mood=imp VerbForm=fin", "imperative, auxiliary")
|
||||
+pos-row("VAINF", "AUX", "VerbForm=inf", "infinitive, auxiliary")
|
||||
+pos-row("VAPP", "AUX", "Aspect=perf VerbForm=fin", "perfect participle, auxiliary")
|
||||
+pos-row("VMFIN", "VERB", "Mood=ind VerbForm=fin VerbType=mod", "finite verb, modal")
|
||||
+pos-row("VMINF", "VERB", "VerbForm=fin VerbType=mod", "infinitive, modal")
|
||||
+pos-row("VMPP", "VERB", "Aspect=perf VerbForm=part VerbType=mod", "perfect participle, modal")
|
||||
+pos-row("VVFIN", "VERB", "Mood=ind VerbForm=fin", "finite verb, full")
|
||||
+pos-row("VVIMP", "VERB", "Mood=imp VerbForm=fin", "imperative, full")
|
||||
+pos-row("VVINF", "VERB", "VerbForm=inf", "infinitive, full")
|
||||
+pos-row("VVIZU", "VERB", "VerbForm=inf", 'infinitive with "zu", full')
|
||||
+pos-row("VVPP", "VERB", "Aspect=perf VerbForm=part", "perfect participle, full")
|
||||
+pos-row("XY", "X", "", "non-word containing non-letter")
|
||||
+pos-row("SP", "SPACE", "", "space")
|
||||
|
||||
for _, lang in MODELS
|
||||
- var exclude = ["en", "de", "xx"]
|
||||
if !exclude.includes(lang)
|
||||
- var lang_name = LANGUAGES[lang]
|
||||
- var file_path = "lang/" + lang + "/tag_map.py"
|
||||
+accordion(lang_name, "pos-" + lang)
|
||||
p
|
||||
| For details on the #{lang_name} tag map, see
|
||||
| #[+src(gh("spacy", "spacy/" + file_path)) #[code=file_path]].
|
|
@ -1,55 +0,0 @@
|
|||
//- 💫 DOCS > API > ANNOTATION > TEXT PROCESSING
|
||||
|
||||
+aside-code("Example").
|
||||
from spacy.lang.en import English
|
||||
nlp = English()
|
||||
tokens = nlp('Some\nspaces and\ttab characters')
|
||||
tokens_text = [t.text for t in tokens]
|
||||
assert tokens_text == ['Some', '\n', 'spaces', ' ', 'and',
|
||||
'\t', 'tab', 'characters']
|
||||
|
||||
p
|
||||
| Tokenization standards are based on the
|
||||
| #[+a("https://catalog.ldc.upenn.edu/LDC2013T19") OntoNotes 5] corpus.
|
||||
| The tokenizer differs from most by including
|
||||
| #[strong tokens for significant whitespace]. Any sequence of
|
||||
| whitespace characters beyond a single space (#[code ' ']) is included
|
||||
| as a token. The whitespace tokens are useful for much the same reason
|
||||
| punctuation is – it's often an important delimiter in the text. By
|
||||
| preserving it in the token output, we are able to maintain a simple
|
||||
| alignment between the tokens and the original string, and we ensure
|
||||
| that #[strong no information is lost] during processing.
|
||||
|
||||
+h(3, "lemmatization") Lemmatization
|
||||
|
||||
+aside("Examples")
|
||||
| In English, this means:#[br]
|
||||
| #[strong Adjectives]: happier, happiest → happy#[br]
|
||||
| #[strong Adverbs]: worse, worst → badly#[br]
|
||||
| #[strong Nouns]: dogs, children → dog, child#[br]
|
||||
| #[strong Verbs]: writes, wirting, wrote, written → write
|
||||
|
||||
|
||||
p
|
||||
| A lemma is the uninflected form of a word. The English lemmatization
|
||||
| data is taken from #[+a("https://wordnet.princeton.edu") WordNet].
|
||||
| Lookup tables are taken from
|
||||
| #[+a("http://www.lexiconista.com/datasets/lemmatization/") Lexiconista].
|
||||
| spaCy also adds a #[strong special case for pronouns]: all pronouns
|
||||
| are lemmatized to the special token #[code -PRON-].
|
||||
|
||||
+infobox("About spaCy's custom pronoun lemma", "⚠️")
|
||||
| Unlike verbs and common nouns, there's no clear base form of a personal
|
||||
| pronoun. Should the lemma of "me" be "I", or should we normalize person
|
||||
| as well, giving "it" — or maybe "he"? spaCy's solution is to introduce a
|
||||
| novel symbol, #[code -PRON-], which is used as the lemma for
|
||||
| all personal pronouns.
|
||||
|
||||
+h(3, "sentence-boundary") Sentence boundary detection
|
||||
|
||||
p
|
||||
| Sentence boundaries are calculated from the syntactic parse tree, so
|
||||
| features such as punctuation and capitalisation play an important but
|
||||
| non-decisive role in determining the sentence boundaries. Usually this
|
||||
| means that the sentence boundaries will at least coincide with clause
|
||||
| boundaries, even given poorly punctuated text.
|
|
@ -1,104 +0,0 @@
|
|||
//- 💫 DOCS > API > ANNOTATION > TRAINING
|
||||
|
||||
+h(3, "json-input") JSON input format for training
|
||||
|
||||
p
|
||||
| spaCy takes training data in JSON format. The built-in
|
||||
| #[+api("cli#convert") #[code convert]] command helps you convert the
|
||||
| #[code .conllu] format used by the
|
||||
| #[+a("https://github.com/UniversalDependencies") Universal Dependencies corpora]
|
||||
| to spaCy's training format.
|
||||
|
||||
+aside("Annotating entities")
|
||||
| Named entities are provided in the #[+a("/api/annotation#biluo") BILUO]
|
||||
| notation. Tokens outside an entity are set to #[code "O"] and tokens
|
||||
| that are part of an entity are set to the entity label, prefixed by the
|
||||
| BILUO marker. For example #[code "B-ORG"] describes the first token of
|
||||
| a multi-token #[code ORG] entity and #[code "U-PERSON"] a single
|
||||
| token representing a #[code PERSON] entity. The
|
||||
| #[+api("goldparse#biluo_tags_from_offsets") #[code biluo_tags_from_offsets]]
|
||||
| function can help you convert entity offsets to the right format.
|
||||
|
||||
+code("Example structure").
|
||||
[{
|
||||
"id": int, # ID of the document within the corpus
|
||||
"paragraphs": [{ # list of paragraphs in the corpus
|
||||
"raw": string, # raw text of the paragraph
|
||||
"sentences": [{ # list of sentences in the paragraph
|
||||
"tokens": [{ # list of tokens in the sentence
|
||||
"id": int, # index of the token in the document
|
||||
"dep": string, # dependency label
|
||||
"head": int, # offset of token head relative to token index
|
||||
"tag": string, # part-of-speech tag
|
||||
"orth": string, # verbatim text of the token
|
||||
"ner": string # BILUO label, e.g. "O" or "B-ORG"
|
||||
}],
|
||||
"brackets": [{ # phrase structure (NOT USED by current models)
|
||||
"first": int, # index of first token
|
||||
"last": int, # index of last token
|
||||
"label": string # phrase label
|
||||
}]
|
||||
}]
|
||||
}]
|
||||
}]
|
||||
|
||||
p
|
||||
| Here's an example of dependencies, part-of-speech tags and names
|
||||
| entities, taken from the English Wall Street Journal portion of the Penn
|
||||
| Treebank:
|
||||
|
||||
+github("spacy", "examples/training/training-data.json", false, false, "json")
|
||||
|
||||
+h(3, "vocab-jsonl") Lexical data for vocabulary
|
||||
+tag-new(2)
|
||||
|
||||
p
|
||||
| To populate a model's vocabulary, you can use the
|
||||
| #[+api("cli#vocab") #[code spacy vocab]] command and load in a
|
||||
| #[+a("https://jsonlines.readthedocs.io/en/latest/") newline-delimited JSON]
|
||||
| (JSONL) file containing one lexical entry per line. The first line
|
||||
| defines the language and vocabulary settings. All other lines are
|
||||
| expected to be JSON objects describing an individual lexeme. The lexical
|
||||
| attributes will be then set as attributes on spaCy's
|
||||
| #[+api("lexeme#attributes") #[code Lexeme]] object. The #[code vocab]
|
||||
| command outputs a ready-to-use spaCy model with a #[code Vocab]
|
||||
| containing the lexical data.
|
||||
|
||||
+code("First line").
|
||||
{"lang": "en", "settings": {"oov_prob": -20.502029418945312}}
|
||||
|
||||
+code("Entry structure").
|
||||
{
|
||||
"orth": string,
|
||||
"id": int,
|
||||
"lower": string,
|
||||
"norm": string,
|
||||
"shape": string
|
||||
"prefix": string,
|
||||
"suffix": string,
|
||||
"length": int,
|
||||
"cluster": string,
|
||||
"prob": float,
|
||||
"is_alpha": bool,
|
||||
"is_ascii": bool,
|
||||
"is_digit": bool,
|
||||
"is_lower": bool,
|
||||
"is_punct": bool,
|
||||
"is_space": bool,
|
||||
"is_title": bool,
|
||||
"is_upper": bool,
|
||||
"like_url": bool,
|
||||
"like_num": bool,
|
||||
"like_email": bool,
|
||||
"is_stop": bool,
|
||||
"is_oov": bool,
|
||||
"is_quote": bool,
|
||||
"is_left_punct": bool,
|
||||
"is_right_punct": bool
|
||||
}
|
||||
|
||||
p
|
||||
| Here's an example of the 20 most frequent lexemes in the English
|
||||
| training data:
|
||||
|
||||
+github("spacy", "examples/training/vocab-data.jsonl", false, false, "json")
|
|
@ -1,71 +0,0 @@
|
|||
//- 💫 DOCS > API > CYTHON > CLASSES > DOC
|
||||
|
||||
p
|
||||
| The #[code Doc] object holds an array of
|
||||
| #[+api("cython-structs#tokenc") #[code TokenC]] structs.
|
||||
|
||||
+infobox
|
||||
| This section documents the extra C-level attributes and methods that
|
||||
| can't be accessed from Python. For the Python documentation, see
|
||||
| #[+api("doc") #[code Doc]].
|
||||
|
||||
+h(3, "doc_attributes") Attributes
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code mem]
|
||||
+cell #[code cymem.Pool]
|
||||
+cell
|
||||
| A memory pool. Allocated memory will be freed once the
|
||||
| #[code Doc] object is garbage collected.
|
||||
|
||||
+row
|
||||
+cell #[code vocab]
|
||||
+cell #[code Vocab]
|
||||
+cell A reference to the shared #[code Vocab] object.
|
||||
|
||||
+row
|
||||
+cell #[code c]
|
||||
+cell #[code TokenC*]
|
||||
+cell
|
||||
| A pointer to a #[+api("cython-structs#tokenc") #[code TokenC]]
|
||||
| struct.
|
||||
|
||||
+row
|
||||
+cell #[code length]
|
||||
+cell #[code int]
|
||||
+cell The number of tokens in the document.
|
||||
|
||||
+row
|
||||
+cell #[code max_length]
|
||||
+cell #[code int]
|
||||
+cell The underlying size of the #[code Doc.c] array.
|
||||
|
||||
+h(3, "doc_push_back") Doc.push_back
|
||||
+tag method
|
||||
|
||||
p
|
||||
| Append a token to the #[code Doc]. The token can be provided as a
|
||||
| #[+api("cython-structs#lexemec") #[code LexemeC]] or
|
||||
| #[+api("cython-structs#tokenc") #[code TokenC]] pointer, using Cython's
|
||||
| #[+a("http://cython.readthedocs.io/en/latest/src/userguide/fusedtypes.html") fused types].
|
||||
|
||||
+aside-code("Example").
|
||||
from spacy.tokens cimport Doc
|
||||
from spacy.vocab cimport Vocab
|
||||
|
||||
doc = Doc(Vocab())
|
||||
lexeme = doc.vocab.get(u'hello')
|
||||
doc.push_back(lexeme, True)
|
||||
assert doc.text == u'hello '
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code lex_or_tok]
|
||||
+cell #[code LexemeOrToken]
|
||||
+cell The word to append to the #[code Doc].
|
||||
|
||||
+row
|
||||
+cell #[code has_space]
|
||||
+cell #[code bint]
|
||||
+cell Whether the word has trailing whitespace.
|
|
@ -1,30 +0,0 @@
|
|||
//- 💫 DOCS > API > CYTHON > CLASSES > LEXEME
|
||||
|
||||
p
|
||||
| A Cython class providing access and methods for an entry in the
|
||||
| vocabulary.
|
||||
|
||||
+infobox
|
||||
| This section documents the extra C-level attributes and methods that
|
||||
| can't be accessed from Python. For the Python documentation, see
|
||||
| #[+api("lexeme") #[code Lexeme]].
|
||||
|
||||
+h(3, "lexeme_attributes") Attributes
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code c]
|
||||
+cell #[code LexemeC*]
|
||||
+cell
|
||||
| A pointer to a #[+api("cython-structs#lexemec") #[code LexemeC]]
|
||||
| struct.
|
||||
|
||||
+row
|
||||
+cell #[code vocab]
|
||||
+cell #[code Vocab]
|
||||
+cell A reference to the shared #[code Vocab] object.
|
||||
|
||||
+row
|
||||
+cell #[code orth]
|
||||
+cell #[+abbr("uint64_t") #[code attr_t]]
|
||||
+cell ID of the verbatim text content.
|
|
@ -1,200 +0,0 @@
|
|||
//- 💫 DOCS > API > CYTHON > STRUCTS > LEXEMEC
|
||||
|
||||
p
|
||||
| Struct holding information about a lexical type. #[code LexemeC]
|
||||
| structs are usually owned by the #[code Vocab], and accessed through a
|
||||
| read-only pointer on the #[code TokenC] struct.
|
||||
|
||||
+aside-code("Example").
|
||||
lex = doc.c[3].lex
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code flags]
|
||||
+cell #[+abbr("uint64_t") #[code flags_t]]
|
||||
+cell Bit-field for binary lexical flag values.
|
||||
|
||||
+row
|
||||
+cell #[code id]
|
||||
+cell #[+abbr("uint64_t") #[code attr_t]]
|
||||
+cell
|
||||
| Usually used to map lexemes to rows in a matrix, e.g. for word
|
||||
| vectors. Does not need to be unique, so currently misnamed.
|
||||
|
||||
+row
|
||||
+cell #[code length]
|
||||
+cell #[+abbr("uint64_t") #[code attr_t]]
|
||||
+cell Number of unicode characters in the lexeme.
|
||||
|
||||
+row
|
||||
+cell #[code orth]
|
||||
+cell #[+abbr("uint64_t") #[code attr_t]]
|
||||
+cell ID of the verbatim text content.
|
||||
|
||||
+row
|
||||
+cell #[code lower]
|
||||
+cell #[+abbr("uint64_t") #[code attr_t]]
|
||||
+cell ID of the lowercase form of the lexeme.
|
||||
|
||||
+row
|
||||
+cell #[code norm]
|
||||
+cell #[+abbr("uint64_t") #[code attr_t]]
|
||||
+cell ID of the lexeme's norm, i.e. a normalised form of the text.
|
||||
|
||||
+row
|
||||
+cell #[code shape]
|
||||
+cell #[+abbr("uint64_t") #[code attr_t]]
|
||||
+cell Transform of the lexeme's string, to show orthographic features.
|
||||
|
||||
+row
|
||||
+cell #[code prefix]
|
||||
+cell #[+abbr("uint64_t") #[code attr_t]]
|
||||
+cell
|
||||
| Length-N substring from the start of the lexeme. Defaults to
|
||||
| #[code N=1].
|
||||
|
||||
+row
|
||||
+cell #[code suffix]
|
||||
+cell #[+abbr("uint64_t") #[code attr_t]]
|
||||
+cell
|
||||
| Length-N substring from the end of the lexeme. Defaults to
|
||||
| #[code N=3].
|
||||
|
||||
+row
|
||||
+cell #[code cluster]
|
||||
+cell #[+abbr("uint64_t") #[code attr_t]]
|
||||
+cell Brown cluster ID.
|
||||
|
||||
+row
|
||||
+cell #[code prob]
|
||||
+cell #[code float]
|
||||
+cell Smoothed log probability estimate of the lexeme's type.
|
||||
|
||||
+row
|
||||
+cell #[code sentiment]
|
||||
+cell #[code float]
|
||||
+cell A scalar value indicating positivity or negativity.
|
||||
|
||||
+h(3, "lexeme_get_struct_attr", "spacy/lexeme.pxd") Lexeme.get_struct_attr
|
||||
+tag staticmethod
|
||||
+tag nogil
|
||||
|
||||
p Get the value of an attribute from the #[code LexemeC] struct by attribute ID.
|
||||
|
||||
+aside-code("Example").
|
||||
from spacy.attrs cimport IS_ALPHA
|
||||
from spacy.lexeme cimport Lexeme
|
||||
|
||||
lexeme = doc.c[3].lex
|
||||
is_alpha = Lexeme.get_struct_attr(lexeme, IS_ALPHA)
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code lex]
|
||||
+cell #[code const LexemeC*]
|
||||
+cell A pointer to a #[code LexemeC] struct.
|
||||
|
||||
+row
|
||||
+cell #[code feat_name]
|
||||
+cell #[code attr_id_t]
|
||||
+cell
|
||||
| The ID of the attribute to look up. The attributes are
|
||||
| enumerated in #[code spacy.typedefs].
|
||||
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell #[+abbr("uint64_t") #[code attr_t]]
|
||||
+cell The value of the attribute.
|
||||
|
||||
+h(3, "lexeme_set_struct_attr", "spacy/lexeme.pxd") Lexeme.set_struct_attr
|
||||
+tag staticmethod
|
||||
+tag nogil
|
||||
|
||||
p Set the value of an attribute of the #[code LexemeC] struct by attribute ID.
|
||||
|
||||
+aside-code("Example").
|
||||
from spacy.attrs cimport NORM
|
||||
from spacy.lexeme cimport Lexeme
|
||||
|
||||
lexeme = doc.c[3].lex
|
||||
Lexeme.set_struct_attr(lexeme, NORM, lexeme.lower)
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code lex]
|
||||
+cell #[code const LexemeC*]
|
||||
+cell A pointer to a #[code LexemeC] struct.
|
||||
|
||||
+row
|
||||
+cell #[code feat_name]
|
||||
+cell #[code attr_id_t]
|
||||
+cell
|
||||
| The ID of the attribute to look up. The attributes are
|
||||
| enumerated in #[code spacy.typedefs].
|
||||
|
||||
+row
|
||||
+cell #[code value]
|
||||
+cell #[+abbr("uint64_t") #[code attr_t]]
|
||||
+cell The value to set.
|
||||
|
||||
+h(3, "lexeme_c_check_flag", "spacy/lexeme.pxd") Lexeme.c_check_flag
|
||||
+tag staticmethod
|
||||
+tag nogil
|
||||
|
||||
p Check the value of a binary flag attribute.
|
||||
|
||||
+aside-code("Example").
|
||||
from spacy.attrs cimport IS_STOP
|
||||
from spacy.lexeme cimport Lexeme
|
||||
|
||||
lexeme = doc.c[3].lex
|
||||
is_stop = Lexeme.c_check_flag(lexeme, IS_STOP)
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code lexeme]
|
||||
+cell #[code const LexemeC*]
|
||||
+cell A pointer to a #[code LexemeC] struct.
|
||||
|
||||
+row
|
||||
+cell #[code flag_id]
|
||||
+cell #[code attr_id_t]
|
||||
+cell
|
||||
| The ID of the flag to look up. The flag IDs are enumerated in
|
||||
| #[code spacy.typedefs].
|
||||
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell #[code bint]
|
||||
+cell The boolean value of the flag.
|
||||
|
||||
+h(3, "lexeme_c_set_flag", "spacy/lexeme.pxd") Lexeme.c_set_flag
|
||||
+tag staticmethod
|
||||
+tag nogil
|
||||
|
||||
p Set the value of a binary flag attribute.
|
||||
|
||||
+aside-code("Example").
|
||||
from spacy.attrs cimport IS_STOP
|
||||
from spacy.lexeme cimport Lexeme
|
||||
|
||||
lexeme = doc.c[3].lex
|
||||
Lexeme.c_set_flag(lexeme, IS_STOP, 0)
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code lexeme]
|
||||
+cell #[code const LexemeC*]
|
||||
+cell A pointer to a #[code LexemeC] struct.
|
||||
|
||||
+row
|
||||
+cell #[code flag_id]
|
||||
+cell #[code attr_id_t]
|
||||
+cell
|
||||
| The ID of the flag to look up. The flag IDs are enumerated in
|
||||
| #[code spacy.typedefs].
|
||||
|
||||
+row
|
||||
+cell #[code value]
|
||||
+cell #[code bint]
|
||||
+cell The value to set.
|
|
@ -1,43 +0,0 @@
|
|||
//- 💫 DOCS > API > CYTHON > CLASSES > SPAN
|
||||
|
||||
p
|
||||
| A Cython class providing access and methods for a slice of a #[code Doc]
|
||||
| object.
|
||||
|
||||
+infobox
|
||||
| This section documents the extra C-level attributes and methods that
|
||||
| can't be accessed from Python. For the Python documentation, see
|
||||
| #[+api("span") #[code Span]].
|
||||
|
||||
+h(3, "span_attributes") Attributes
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code doc]
|
||||
+cell #[code Doc]
|
||||
+cell The parent document.
|
||||
|
||||
+row
|
||||
+cell #[code start]
|
||||
+cell #[code int]
|
||||
+cell The index of the first token of the span.
|
||||
|
||||
+row
|
||||
+cell #[code end]
|
||||
+cell #[code int]
|
||||
+cell The index of the first token after the span.
|
||||
|
||||
+row
|
||||
+cell #[code start_char]
|
||||
+cell #[code int]
|
||||
+cell The index of the first character of the span.
|
||||
|
||||
+row
|
||||
+cell #[code end_char]
|
||||
+cell #[code int]
|
||||
+cell The index of the last character of the span.
|
||||
|
||||
+row
|
||||
+cell #[code label]
|
||||
+cell #[+abbr("uint64_t") #[code attr_t]]
|
||||
+cell A label to attach to the span, e.g. for named entities.
|
|
@ -1,23 +0,0 @@
|
|||
//- 💫 DOCS > API > CYTHON > CLASSES > STRINGSTORE
|
||||
|
||||
p A lookup table to retrieve strings by 64-bit hashes.
|
||||
|
||||
+infobox
|
||||
| This section documents the extra C-level attributes and methods that
|
||||
| can't be accessed from Python. For the Python documentation, see
|
||||
| #[+api("stringstore") #[code StringStore]].
|
||||
|
||||
+h(3, "stringstore_attributes") Attributes
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code mem]
|
||||
+cell #[code cymem.Pool]
|
||||
+cell
|
||||
| A memory pool. Allocated memory will be freed once the
|
||||
| #[code StringStore] object is garbage collected.
|
||||
|
||||
+row
|
||||
+cell #[code keys]
|
||||
+cell #[+abbr("vector[uint64_t]") #[code vector[hash_t]]]
|
||||
+cell A list of hash values in the #[code StringStore].
|
|
@ -1,73 +0,0 @@
|
|||
//- 💫 DOCS > API > CYTHON > CLASSES > TOKEN
|
||||
|
||||
p
|
||||
| A Cython class providing access and methods for a
|
||||
| #[+api("cython-structs#tokenc") #[code TokenC]] struct. Note that the
|
||||
| #[code Token] object does not own the struct. It only receives a pointer
|
||||
| to it.
|
||||
|
||||
+infobox
|
||||
| This section documents the extra C-level attributes and methods that
|
||||
| can't be accessed from Python. For the Python documentation, see
|
||||
| #[+api("token") #[code Token]].
|
||||
|
||||
+h(3, "token_attributes") Attributes
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code vocab]
|
||||
+cell #[code Vocab]
|
||||
+cell A reference to the shared #[code Vocab] object.
|
||||
|
||||
+row
|
||||
+cell #[code c]
|
||||
+cell #[code TokenC*]
|
||||
+cell
|
||||
| A pointer to a #[+api("cython-structs#tokenc") #[code TokenC]]
|
||||
| struct.
|
||||
|
||||
+row
|
||||
+cell #[code i]
|
||||
+cell #[code int]
|
||||
+cell The offset of the token within the document.
|
||||
|
||||
+row
|
||||
+cell #[code doc]
|
||||
+cell #[code Doc]
|
||||
+cell The parent document.
|
||||
|
||||
+h(3, "token_cinit") Token.cinit
|
||||
+tag method
|
||||
|
||||
p Create a #[code Token] object from a #[code TokenC*] pointer.
|
||||
|
||||
+aside-code("Example").
|
||||
token = Token.cinit(&doc.c[3], doc, 3)
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code vocab]
|
||||
+cell #[code Vocab]
|
||||
+cell A reference to the shared #[code Vocab].
|
||||
|
||||
+row
|
||||
+cell #[code c]
|
||||
+cell #[code TokenC*]
|
||||
+cell
|
||||
| A pointer to a #[+api("cython-structs#tokenc") #[code TokenC]]
|
||||
| struct.
|
||||
|
||||
+row
|
||||
+cell #[code offset]
|
||||
+cell #[code int]
|
||||
+cell The offset of the token within the document.
|
||||
|
||||
+row
|
||||
+cell #[code doc]
|
||||
+cell #[code Doc]
|
||||
+cell The parent document.
|
||||
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell #[code Token]
|
||||
+cell The newly constructed object.
|
|
@ -1,270 +0,0 @@
|
|||
//- 💫 DOCS > API > CYTHON > STRUCTS > TOKENC
|
||||
|
||||
p
|
||||
| Cython data container for the #[code Token] object.
|
||||
|
||||
+aside-code("Example").
|
||||
token = &doc.c[3]
|
||||
token_ptr = &doc.c[3]
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code lex]
|
||||
+cell #[code const LexemeC*]
|
||||
+cell A pointer to the lexeme for the token.
|
||||
|
||||
+row
|
||||
+cell #[code morph]
|
||||
+cell #[code uint64_t]
|
||||
+cell An ID allowing lookup of morphological attributes.
|
||||
|
||||
+row
|
||||
+cell #[code pos]
|
||||
+cell #[code univ_pos_t]
|
||||
+cell Coarse-grained part-of-speech tag.
|
||||
|
||||
+row
|
||||
+cell #[code spacy]
|
||||
+cell #[code bint]
|
||||
+cell A binary value indicating whether the token has trailing whitespace.
|
||||
|
||||
+row
|
||||
+cell #[code tag]
|
||||
+cell #[+abbr("uint64_t") #[code attr_t]]
|
||||
+cell Fine-grained part-of-speech tag.
|
||||
|
||||
+row
|
||||
+cell #[code idx]
|
||||
+cell #[code int]
|
||||
+cell The character offset of the token within the parent document.
|
||||
|
||||
+row
|
||||
+cell #[code lemma]
|
||||
+cell #[+abbr("uint64_t") #[code attr_t]]
|
||||
+cell Base form of the token, with no inflectional suffixes.
|
||||
|
||||
+row
|
||||
+cell #[code sense]
|
||||
+cell #[+abbr("uint64_t") #[code attr_t]]
|
||||
+cell Space for storing a word sense ID, currently unused.
|
||||
|
||||
+row
|
||||
+cell #[code head]
|
||||
+cell #[code int]
|
||||
+cell Offset of the syntactic parent relative to the token.
|
||||
|
||||
+row
|
||||
+cell #[code dep]
|
||||
+cell #[+abbr("uint64_t") #[code attr_t]]
|
||||
+cell Syntactic dependency relation.
|
||||
|
||||
+row
|
||||
+cell #[code l_kids]
|
||||
+cell #[code uint32_t]
|
||||
+cell Number of left children.
|
||||
|
||||
+row
|
||||
+cell #[code r_kids]
|
||||
+cell #[code uint32_t]
|
||||
+cell Number of right children.
|
||||
|
||||
+row
|
||||
+cell #[code l_edge]
|
||||
+cell #[code uint32_t]
|
||||
+cell Offset of the leftmost token of this token's syntactic descendents.
|
||||
|
||||
+row
|
||||
+cell #[code r_edge]
|
||||
+cell #[code uint32_t]
|
||||
+cell Offset of the rightmost token of this token's syntactic descendents.
|
||||
|
||||
+row
|
||||
+cell #[code sent_start]
|
||||
+cell #[code int]
|
||||
+cell
|
||||
| Ternary value indicating whether the token is the first word of
|
||||
| a sentence. #[code 0] indicates a missing value, #[code -1]
|
||||
| indicates #[code False] and #[code 1] indicates #[code True]. The default value, 0,
|
||||
| is interpretted as no sentence break. Sentence boundary detectors will usually
|
||||
| set 0 for all tokens except tokens that follow a sentence boundary.
|
||||
|
||||
+row
|
||||
+cell #[code ent_iob]
|
||||
+cell #[code int]
|
||||
+cell
|
||||
| IOB code of named entity tag. #[code 0] indicates a missing
|
||||
| value, #[code 1] indicates #[code I], #[code 2] indicates
|
||||
| #[code 0] and #[code 3] indicates #[code B].
|
||||
|
||||
+row
|
||||
+cell #[code ent_type]
|
||||
+cell #[+abbr("uint64_t") #[code attr_t]]
|
||||
+cell Named entity type.
|
||||
|
||||
+row
|
||||
+cell #[code ent_id]
|
||||
+cell #[+abbr("uint64_t") #[code hash_t]]
|
||||
+cell
|
||||
| ID of the entity the token is an instance of, if any. Currently
|
||||
| not used, but potentially for coreference resolution.
|
||||
|
||||
+h(3, "token_get_struct_attr", "spacy/tokens/token.pxd") Token.get_struct_attr
|
||||
+tag staticmethod
|
||||
+tag nogil
|
||||
|
||||
p Get the value of an attribute from the #[code TokenC] struct by attribute ID.
|
||||
|
||||
+aside-code("Example").
|
||||
from spacy.attrs cimport IS_ALPHA
|
||||
from spacy.tokens cimport Token
|
||||
|
||||
is_alpha = Token.get_struct_attr(&doc.c[3], IS_ALPHA)
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code token]
|
||||
+cell #[code const TokenC*]
|
||||
+cell A pointer to a #[code TokenC] struct.
|
||||
|
||||
+row
|
||||
+cell #[code feat_name]
|
||||
+cell #[code attr_id_t]
|
||||
+cell
|
||||
| The ID of the attribute to look up. The attributes are
|
||||
| enumerated in #[code spacy.typedefs].
|
||||
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell #[+abbr("uint64_t") #[code attr_t]]
|
||||
+cell The value of the attribute.
|
||||
|
||||
+h(3, "token_set_struct_attr", "spacy/tokens/token.pxd") Token.set_struct_attr
|
||||
+tag staticmethod
|
||||
+tag nogil
|
||||
|
||||
p Set the value of an attribute of the #[code TokenC] struct by attribute ID.
|
||||
|
||||
+aside-code("Example").
|
||||
from spacy.attrs cimport TAG
|
||||
from spacy.tokens cimport Token
|
||||
|
||||
token = &doc.c[3]
|
||||
Token.set_struct_attr(token, TAG, 0)
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code token]
|
||||
+cell #[code const TokenC*]
|
||||
+cell A pointer to a #[code TokenC] struct.
|
||||
|
||||
+row
|
||||
+cell #[code feat_name]
|
||||
+cell #[code attr_id_t]
|
||||
+cell
|
||||
| The ID of the attribute to look up. The attributes are
|
||||
| enumerated in #[code spacy.typedefs].
|
||||
|
||||
+row
|
||||
+cell #[code value]
|
||||
+cell #[+abbr("uint64_t") #[code attr_t]]
|
||||
+cell The value to set.
|
||||
|
||||
+h(3, "token_by_start", "spacy/tokens/doc.pxd") token_by_start
|
||||
+tag function
|
||||
|
||||
p Find a token in a #[code TokenC*] array by the offset of its first character.
|
||||
|
||||
+aside-code("Example").
|
||||
from spacy.tokens.doc cimport Doc, token_by_start
|
||||
from spacy.vocab cimport Vocab
|
||||
|
||||
doc = Doc(Vocab(), words=[u'hello', u'world'])
|
||||
assert token_by_start(doc.c, doc.length, 6) == 1
|
||||
assert token_by_start(doc.c, doc.length, 4) == -1
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code tokens]
|
||||
+cell #[code const TokenC*]
|
||||
+cell A #[code TokenC*] array.
|
||||
|
||||
+row
|
||||
+cell #[code length]
|
||||
+cell #[code int]
|
||||
+cell The number of tokens in the array.
|
||||
|
||||
+row
|
||||
+cell #[code start_char]
|
||||
+cell #[code int]
|
||||
+cell The start index to search for.
|
||||
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell #[code int]
|
||||
+cell The index of the token in the array or #[code -1] if not found.
|
||||
|
||||
+h(3, "token_by_end", "spacy/tokens/doc.pxd") token_by_end
|
||||
+tag function
|
||||
|
||||
p Find a token in a #[code TokenC*] array by the offset of its final character.
|
||||
|
||||
+aside-code("Example").
|
||||
from spacy.tokens.doc cimport Doc, token_by_end
|
||||
from spacy.vocab cimport Vocab
|
||||
|
||||
doc = Doc(Vocab(), words=[u'hello', u'world'])
|
||||
assert token_by_end(doc.c, doc.length, 5) == 0
|
||||
assert token_by_end(doc.c, doc.length, 1) == -1
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code tokens]
|
||||
+cell #[code const TokenC*]
|
||||
+cell A #[code TokenC*] array.
|
||||
|
||||
+row
|
||||
+cell #[code length]
|
||||
+cell #[code int]
|
||||
+cell The number of tokens in the array.
|
||||
|
||||
+row
|
||||
+cell #[code end_char]
|
||||
+cell #[code int]
|
||||
+cell The end index to search for.
|
||||
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell #[code int]
|
||||
+cell The index of the token in the array or #[code -1] if not found.
|
||||
|
||||
+h(3, "set_children_from_heads", "spacy/tokens/doc.pxd") set_children_from_heads
|
||||
+tag function
|
||||
|
||||
p
|
||||
| Set attributes that allow lookup of syntactic children on a
|
||||
| #[code TokenC*] array. This function must be called after making changes
|
||||
| to the #[code TokenC.head] attribute, in order to make the parse tree
|
||||
| navigation consistent.
|
||||
|
||||
+aside-code("Example").
|
||||
from spacy.tokens.doc cimport Doc, set_children_from_heads
|
||||
from spacy.vocab cimport Vocab
|
||||
|
||||
doc = Doc(Vocab(), words=[u'Baileys', u'from', u'a', u'shoe'])
|
||||
doc.c[0].head = 0
|
||||
doc.c[1].head = 0
|
||||
doc.c[2].head = 3
|
||||
doc.c[3].head = 1
|
||||
set_children_from_heads(doc.c, doc.length)
|
||||
assert doc.c[3].l_kids == 1
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code tokens]
|
||||
+cell #[code const TokenC*]
|
||||
+cell A #[code TokenC*] array.
|
||||
|
||||
+row
|
||||
+cell #[code length]
|
||||
+cell #[code int]
|
||||
+cell The number of tokens in the array.
|
|
@ -1,88 +0,0 @@
|
|||
//- 💫 DOCS > API > CYTHON > CLASSES > VOCAB
|
||||
|
||||
p
|
||||
| A Cython class providing access and methods for a vocabulary and other
|
||||
| data shared across a language.
|
||||
|
||||
+infobox
|
||||
| This section documents the extra C-level attributes and methods that
|
||||
| can't be accessed from Python. For the Python documentation, see
|
||||
| #[+api("vocab") #[code Vocab]].
|
||||
|
||||
+h(3, "vocab_attributes") Attributes
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code mem]
|
||||
+cell #[code cymem.Pool]
|
||||
+cell
|
||||
| A memory pool. Allocated memory will be freed once the
|
||||
| #[code Vocab] object is garbage collected.
|
||||
|
||||
+row
|
||||
+cell #[code strings]
|
||||
+cell #[code StringStore]
|
||||
+cell
|
||||
| A #[code StringStore] that maps string to hash values and vice
|
||||
| versa.
|
||||
|
||||
+row
|
||||
+cell #[code length]
|
||||
+cell #[code int]
|
||||
+cell The number of entries in the vocabulary.
|
||||
|
||||
+h(3, "vocab_get") Vocab.get
|
||||
+tag method
|
||||
|
||||
p
|
||||
| Retrieve a #[+api("cython-structs#lexemec") #[code LexemeC*]] pointer
|
||||
| from the vocabulary.
|
||||
|
||||
+aside-code("Example").
|
||||
lexeme = vocab.get(vocab.mem, u'hello')
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code mem]
|
||||
+cell #[code cymem.Pool]
|
||||
+cell
|
||||
| A memory pool. Allocated memory will be freed once the
|
||||
| #[code Vocab] object is garbage collected.
|
||||
|
||||
+row
|
||||
+cell #[code string]
|
||||
+cell #[code unicode]
|
||||
+cell The string of the word to look up.
|
||||
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell #[code const LexemeC*]
|
||||
+cell The lexeme in the vocabulary.
|
||||
|
||||
+h(3, "vocab_get_by_orth") Vocab.get_by_orth
|
||||
+tag method
|
||||
|
||||
p
|
||||
| Retrieve a #[+api("cython-structs#lexemec") #[code LexemeC*]] pointer
|
||||
| from the vocabulary.
|
||||
|
||||
+aside-code("Example").
|
||||
lexeme = vocab.get_by_orth(doc[0].lex.norm)
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code mem]
|
||||
+cell #[code cymem.Pool]
|
||||
+cell
|
||||
| A memory pool. Allocated memory will be freed once the
|
||||
| #[code Vocab] object is garbage collected.
|
||||
|
||||
+row
|
||||
+cell #[code orth]
|
||||
+cell #[+abbr("uint64_t") #[code attr_t]]
|
||||
+cell ID of the verbatim text content.
|
||||
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell #[code const LexemeC*]
|
||||
+cell The lexeme in the vocabulary.
|
|
@ -1,251 +0,0 @@
|
|||
{
|
||||
"sidebar": {
|
||||
"Overview": {
|
||||
"Architecture": "./",
|
||||
"Annotation Specs": "annotation",
|
||||
"Command Line": "cli",
|
||||
"Functions": "top-level"
|
||||
},
|
||||
|
||||
"Containers": {
|
||||
"Doc": "doc",
|
||||
"Token": "token",
|
||||
"Span": "span",
|
||||
"Lexeme": "lexeme"
|
||||
},
|
||||
|
||||
"Pipeline": {
|
||||
"Language": "language",
|
||||
"Pipe": "pipe",
|
||||
"Tagger": "tagger",
|
||||
"DependencyParser": "dependencyparser",
|
||||
"EntityRecognizer": "entityrecognizer",
|
||||
"TextCategorizer": "textcategorizer",
|
||||
"Tokenizer": "tokenizer",
|
||||
"Lemmatizer": "lemmatizer",
|
||||
"Matcher": "matcher",
|
||||
"PhraseMatcher": "phrasematcher"
|
||||
},
|
||||
|
||||
"Other": {
|
||||
"Vocab": "vocab",
|
||||
"StringStore": "stringstore",
|
||||
"Vectors": "vectors",
|
||||
"GoldParse": "goldparse",
|
||||
"GoldCorpus": "goldcorpus"
|
||||
},
|
||||
|
||||
"Cython": {
|
||||
"Architecture": "cython",
|
||||
"Structs": "cython-structs",
|
||||
"Classes": "cython-classes"
|
||||
}
|
||||
},
|
||||
|
||||
"index": {
|
||||
"title": "Architecture",
|
||||
"next": "annotation",
|
||||
"menu": {
|
||||
"Basics": "basics",
|
||||
"Neural Network Model": "nn-model"
|
||||
}
|
||||
},
|
||||
|
||||
"cli": {
|
||||
"title": "Command Line Interface",
|
||||
"teaser": "Download, train and package models, and debug spaCy.",
|
||||
"source": "spacy/cli"
|
||||
},
|
||||
|
||||
"top-level": {
|
||||
"title": "Top-level Functions",
|
||||
"menu": {
|
||||
"spacy": "spacy",
|
||||
"displacy": "displacy",
|
||||
"Utility Functions": "util",
|
||||
"Compatibility": "compat"
|
||||
}
|
||||
},
|
||||
|
||||
"language": {
|
||||
"title": "Language",
|
||||
"tag": "class",
|
||||
"teaser": "A text-processing pipeline.",
|
||||
"source": "spacy/language.py"
|
||||
},
|
||||
|
||||
"doc": {
|
||||
"title": "Doc",
|
||||
"tag": "class",
|
||||
"teaser": "A container for accessing linguistic annotations.",
|
||||
"source": "spacy/tokens/doc.pyx"
|
||||
},
|
||||
|
||||
"token": {
|
||||
"title": "Token",
|
||||
"tag": "class",
|
||||
"source": "spacy/tokens/token.pyx"
|
||||
},
|
||||
|
||||
"span": {
|
||||
"title": "Span",
|
||||
"tag": "class",
|
||||
"source": "spacy/tokens/span.pyx"
|
||||
},
|
||||
|
||||
"lexeme": {
|
||||
"title": "Lexeme",
|
||||
"tag": "class",
|
||||
"source": "spacy/lexeme.pyx"
|
||||
},
|
||||
|
||||
"vocab": {
|
||||
"title": "Vocab",
|
||||
"teaser": "A storage class for vocabulary and other data shared across a language.",
|
||||
"tag": "class",
|
||||
"source": "spacy/vocab.pyx"
|
||||
},
|
||||
|
||||
"stringstore": {
|
||||
"title": "StringStore",
|
||||
"tag": "class",
|
||||
"source": "spacy/strings.pyx"
|
||||
},
|
||||
|
||||
"matcher": {
|
||||
"title": "Matcher",
|
||||
"teaser": "Match sequences of tokens, based on pattern rules.",
|
||||
"tag": "class",
|
||||
"source": "spacy/matcher.pyx"
|
||||
},
|
||||
|
||||
"phrasematcher": {
|
||||
"title": "PhraseMatcher",
|
||||
"teaser": "Match sequences of tokens, based on documents.",
|
||||
"tag": "class",
|
||||
"tag_new": 2,
|
||||
"source": "spacy/matcher.pyx"
|
||||
},
|
||||
|
||||
"pipe": {
|
||||
"title": "Pipe",
|
||||
"teaser": "Abstract base class defining the API for pipeline components.",
|
||||
"tag": "class",
|
||||
"tag_new": 2,
|
||||
"source": "spacy/pipeline.pyx"
|
||||
},
|
||||
|
||||
"dependenyparser": {
|
||||
"title": "DependencyParser",
|
||||
"tag": "class",
|
||||
"source": "spacy/pipeline.pyx"
|
||||
},
|
||||
|
||||
"entityrecognizer": {
|
||||
"title": "EntityRecognizer",
|
||||
"teaser": "Annotate named entities on documents.",
|
||||
"tag": "class",
|
||||
"source": "spacy/pipeline.pyx"
|
||||
},
|
||||
|
||||
"textcategorizer": {
|
||||
"title": "TextCategorizer",
|
||||
"teaser": "Add text categorization models to spaCy pipelines.",
|
||||
"tag": "class",
|
||||
"tag_new": 2,
|
||||
"source": "spacy/pipeline.pyx"
|
||||
},
|
||||
|
||||
"dependencyparser": {
|
||||
"title": "DependencyParser",
|
||||
"teaser": "Annotate syntactic dependencies on documents.",
|
||||
"tag": "class",
|
||||
"source": "spacy/pipeline.pyx"
|
||||
},
|
||||
|
||||
"tokenizer": {
|
||||
"title": "Tokenizer",
|
||||
"teaser": "Segment text into words, punctuations marks etc.",
|
||||
"tag": "class",
|
||||
"source": "spacy/tokenizer.pyx"
|
||||
},
|
||||
|
||||
"lemmatizer": {
|
||||
"title": "Lemmatizer",
|
||||
"teaser": "Assign the base forms of words.",
|
||||
"tag": "class",
|
||||
"source": "spacy/lemmatizer.py"
|
||||
},
|
||||
|
||||
"tagger": {
|
||||
"title": "Tagger",
|
||||
"teaser": "Annotate part-of-speech tags on documents.",
|
||||
"tag": "class",
|
||||
"source": "spacy/pipeline.pyx"
|
||||
},
|
||||
|
||||
"goldparse": {
|
||||
"title": "GoldParse",
|
||||
"tag": "class",
|
||||
"source": "spacy/gold.pyx"
|
||||
},
|
||||
|
||||
"goldcorpus": {
|
||||
"title": "GoldCorpus",
|
||||
"teaser": "An annotated corpus, using the JSON file format.",
|
||||
"tag": "class",
|
||||
"tag_new": 2,
|
||||
"source": "spacy/gold.pyx"
|
||||
},
|
||||
|
||||
"vectors": {
|
||||
"title": "Vectors",
|
||||
"teaser": "Store, save and load word vectors.",
|
||||
"tag": "class",
|
||||
"tag_new": 2,
|
||||
"source": "spacy/vectors.pyx"
|
||||
},
|
||||
|
||||
"annotation": {
|
||||
"title": "Annotation Specifications",
|
||||
"teaser": "Schemes used for labels, tags and training data.",
|
||||
"menu": {
|
||||
"Text Processing": "text-processing",
|
||||
"POS Tagging": "pos-tagging",
|
||||
"Dependencies": "dependency-parsing",
|
||||
"Named Entities": "named-entities",
|
||||
"Models & Training": "training"
|
||||
}
|
||||
},
|
||||
|
||||
"cython": {
|
||||
"title": "Cython Architecture",
|
||||
"next": "cython-structs",
|
||||
"menu": {
|
||||
"Overview": "overview",
|
||||
"Conventions": "conventions"
|
||||
}
|
||||
},
|
||||
|
||||
"cython-structs": {
|
||||
"title": "Cython Structs",
|
||||
"teaser": "C-language objects that let you group variables together in a single contiguous block.",
|
||||
"next": "cython-classes",
|
||||
"menu": {
|
||||
"TokenC": "tokenc",
|
||||
"LexemeC": "lexemec"
|
||||
}
|
||||
},
|
||||
|
||||
"cython-classes": {
|
||||
"title": "Cython Classes",
|
||||
"menu": {
|
||||
"Doc": "doc",
|
||||
"Token": "token",
|
||||
"Span": "span",
|
||||
"Lexeme": "lexeme",
|
||||
"Vocab": "vocab",
|
||||
"StringStore": "stringstore"
|
||||
}
|
||||
}
|
||||
}
|
|
@ -1,84 +0,0 @@
|
|||
//- 💫 DOCS > API > TOP-LEVEL > COMPATIBILITY
|
||||
|
||||
p
|
||||
| All Python code is written in an
|
||||
| #[strong intersection of Python 2 and Python 3]. This is easy in Cython,
|
||||
| but somewhat ugly in Python. Logic that deals with Python or platform
|
||||
| compatibility only lives in #[code spacy.compat]. To distinguish them from
|
||||
| the builtin functions, replacement functions are suffixed with an
|
||||
| underscore, e.e #[code unicode_].
|
||||
|
||||
+aside-code("Example").
|
||||
from spacy.compat import unicode_
|
||||
|
||||
compatible_unicode = unicode_('hello world')
|
||||
|
||||
+table(["Name", "Python 2", "Python 3"])
|
||||
+row
|
||||
+cell #[code compat.bytes_]
|
||||
+cell #[code str]
|
||||
+cell #[code bytes]
|
||||
|
||||
+row
|
||||
+cell #[code compat.unicode_]
|
||||
+cell #[code unicode]
|
||||
+cell #[code str]
|
||||
|
||||
+row
|
||||
+cell #[code compat.basestring_]
|
||||
+cell #[code basestring]
|
||||
+cell #[code str]
|
||||
|
||||
+row
|
||||
+cell #[code compat.input_]
|
||||
+cell #[code raw_input]
|
||||
+cell #[code input]
|
||||
|
||||
+row
|
||||
+cell #[code compat.path2str]
|
||||
+cell #[code str(path)] with #[code .decode('utf8')]
|
||||
+cell #[code str(path)]
|
||||
|
||||
+h(3, "is_config") compat.is_config
|
||||
+tag function
|
||||
|
||||
p
|
||||
| Check if a specific configuration of Python version and operating system
|
||||
| matches the user's setup. Mostly used to display targeted error messages.
|
||||
|
||||
+aside-code("Example").
|
||||
from spacy.compat import is_config
|
||||
|
||||
if is_config(python2=True, windows=True):
|
||||
print("You are using Python 2 on Windows.")
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code python2]
|
||||
+cell bool
|
||||
+cell spaCy is executed with Python 2.x.
|
||||
|
||||
+row
|
||||
+cell #[code python3]
|
||||
+cell bool
|
||||
+cell spaCy is executed with Python 3.x.
|
||||
|
||||
+row
|
||||
+cell #[code windows]
|
||||
+cell bool
|
||||
+cell spaCy is executed on Windows.
|
||||
|
||||
+row
|
||||
+cell #[code linux]
|
||||
+cell bool
|
||||
+cell spaCy is executed on Linux.
|
||||
|
||||
+row
|
||||
+cell #[code osx]
|
||||
+cell bool
|
||||
+cell spaCy is executed on OS X or macOS.
|
||||
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell bool
|
||||
+cell Whether the specified configuration matches the user's platform.
|
|
@ -1,259 +0,0 @@
|
|||
//- 💫 DOCS > API > TOP-LEVEL > DISPLACY
|
||||
|
||||
p
|
||||
| As of v2.0, spaCy comes with a built-in visualization suite. For more
|
||||
| info and examples, see the usage guide on
|
||||
| #[+a("/usage/visualizers") visualizing spaCy].
|
||||
|
||||
|
||||
+h(3, "displacy.serve") displacy.serve
|
||||
+tag method
|
||||
+tag-new(2)
|
||||
|
||||
p
|
||||
| Serve a dependency parse tree or named entity visualization to view it
|
||||
| in your browser. Will run a simple web server.
|
||||
|
||||
+aside-code("Example").
|
||||
import spacy
|
||||
from spacy import displacy
|
||||
nlp = spacy.load('en')
|
||||
doc1 = nlp(u'This is a sentence.')
|
||||
doc2 = nlp(u'This is another sentence.')
|
||||
displacy.serve([doc1, doc2], style='dep')
|
||||
|
||||
+table(["Name", "Type", "Description", "Default"])
|
||||
+row
|
||||
+cell #[code docs]
|
||||
+cell list, #[code Doc], #[code Span]
|
||||
+cell Document(s) to visualize.
|
||||
+cell
|
||||
|
||||
+row
|
||||
+cell #[code style]
|
||||
+cell unicode
|
||||
+cell Visualization style, #[code 'dep'] or #[code 'ent'].
|
||||
+cell #[code 'dep']
|
||||
|
||||
+row
|
||||
+cell #[code page]
|
||||
+cell bool
|
||||
+cell Render markup as full HTML page.
|
||||
+cell #[code True]
|
||||
|
||||
+row
|
||||
+cell #[code minify]
|
||||
+cell bool
|
||||
+cell Minify HTML markup.
|
||||
+cell #[code False]
|
||||
|
||||
+row
|
||||
+cell #[code options]
|
||||
+cell dict
|
||||
+cell #[+a("#options") Visualizer-specific options], e.g. colors.
|
||||
+cell #[code {}]
|
||||
|
||||
+row
|
||||
+cell #[code manual]
|
||||
+cell bool
|
||||
+cell
|
||||
| Don't parse #[code Doc] and instead, expect a dict or list of
|
||||
| dicts. #[+a("/usage/visualizers#manual-usage") See here]
|
||||
| for formats and examples.
|
||||
+cell #[code False]
|
||||
|
||||
+row
|
||||
+cell #[code port]
|
||||
+cell int
|
||||
+cell Port to serve visualization.
|
||||
+cell #[code 5000]
|
||||
|
||||
+row
|
||||
+cell #[code host]
|
||||
+cell unicode
|
||||
+cell Host to serve visualization.
|
||||
+cell #[code '0.0.0.0']
|
||||
|
||||
+h(3, "displacy.render") displacy.render
|
||||
+tag method
|
||||
+tag-new(2)
|
||||
|
||||
p Render a dependency parse tree or named entity visualization.
|
||||
|
||||
+aside-code("Example").
|
||||
import spacy
|
||||
from spacy import displacy
|
||||
nlp = spacy.load('en')
|
||||
doc = nlp(u'This is a sentence.')
|
||||
html = displacy.render(doc, style='dep')
|
||||
|
||||
+table(["Name", "Type", "Description", "Default"])
|
||||
+row
|
||||
+cell #[code docs]
|
||||
+cell list, #[code Doc], #[code Span]
|
||||
+cell Document(s) to visualize.
|
||||
+cell
|
||||
|
||||
+row
|
||||
+cell #[code style]
|
||||
+cell unicode
|
||||
+cell Visualization style, #[code 'dep'] or #[code 'ent'].
|
||||
+cell #[code 'dep']
|
||||
|
||||
+row
|
||||
+cell #[code page]
|
||||
+cell bool
|
||||
+cell Render markup as full HTML page.
|
||||
+cell #[code False]
|
||||
|
||||
+row
|
||||
+cell #[code minify]
|
||||
+cell bool
|
||||
+cell Minify HTML markup.
|
||||
+cell #[code False]
|
||||
|
||||
+row
|
||||
+cell #[code jupyter]
|
||||
+cell bool
|
||||
+cell
|
||||
| Explicitly enable "#[+a("http://jupyter.org/") Jupyter] mode" to
|
||||
| return markup ready to be rendered in a notebook.
|
||||
+cell detected automatically
|
||||
|
||||
+row
|
||||
+cell #[code options]
|
||||
+cell dict
|
||||
+cell #[+a("#options") Visualizer-specific options], e.g. colors.
|
||||
+cell #[code {}]
|
||||
|
||||
+row
|
||||
+cell #[code manual]
|
||||
+cell bool
|
||||
+cell
|
||||
| Don't parse #[code Doc] and instead, expect a dict or list of
|
||||
| dicts. #[+a("/usage/visualizers#manual-usage") See here]
|
||||
| for formats and examples.
|
||||
+cell #[code False]
|
||||
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell unicode
|
||||
+cell Rendered HTML markup.
|
||||
+cell
|
||||
|
||||
+h(3, "displacy_options") Visualizer options
|
||||
|
||||
p
|
||||
| The #[code options] argument lets you specify additional settings for
|
||||
| each visualizer. If a setting is not present in the options, the default
|
||||
| value will be used.
|
||||
|
||||
+h(4, "options-dep") Dependency Visualizer options
|
||||
|
||||
+aside-code("Example").
|
||||
options = {'compact': True, 'color': 'blue'}
|
||||
displacy.serve(doc, style='dep', options=options)
|
||||
|
||||
+table(["Name", "Type", "Description", "Default"])
|
||||
+row
|
||||
+cell #[code collapse_punct]
|
||||
+cell bool
|
||||
+cell
|
||||
| Attach punctuation to tokens. Can make the parse more readable,
|
||||
| as it prevents long arcs to attach punctuation.
|
||||
+cell #[code True]
|
||||
|
||||
+row
|
||||
+cell #[code collapse_phrases]
|
||||
+cell bool
|
||||
+cell Merge noun phrases into one token.
|
||||
+cell #[code False]
|
||||
|
||||
+row
|
||||
+cell #[code compact]
|
||||
+cell bool
|
||||
+cell "Compact mode" with square arrows that takes up less space.
|
||||
+cell #[code False]
|
||||
|
||||
+row
|
||||
+cell #[code color]
|
||||
+cell unicode
|
||||
+cell Text color (HEX, RGB or color names).
|
||||
+cell #[code '#000000']
|
||||
|
||||
+row
|
||||
+cell #[code bg]
|
||||
+cell unicode
|
||||
+cell Background color (HEX, RGB or color names).
|
||||
+cell #[code '#ffffff']
|
||||
|
||||
+row
|
||||
+cell #[code font]
|
||||
+cell unicode
|
||||
+cell Font name or font family for all text.
|
||||
+cell #[code 'Arial']
|
||||
|
||||
+row
|
||||
+cell #[code offset_x]
|
||||
+cell int
|
||||
+cell Spacing on left side of the SVG in px.
|
||||
+cell #[code 50]
|
||||
|
||||
+row
|
||||
+cell #[code arrow_stroke]
|
||||
+cell int
|
||||
+cell Width of arrow path in px.
|
||||
+cell #[code 2]
|
||||
|
||||
+row
|
||||
+cell #[code arrow_width]
|
||||
+cell int
|
||||
+cell Width of arrow head in px.
|
||||
+cell #[code 10] / #[code 8] (compact)
|
||||
|
||||
+row
|
||||
+cell #[code arrow_spacing]
|
||||
+cell int
|
||||
+cell Spacing between arrows in px to avoid overlaps.
|
||||
+cell #[code 20] / #[code 12] (compact)
|
||||
|
||||
+row
|
||||
+cell #[code word_spacing]
|
||||
+cell int
|
||||
+cell Vertical spacing between words and arcs in px.
|
||||
+cell #[code 45]
|
||||
|
||||
+row
|
||||
+cell #[code distance]
|
||||
+cell int
|
||||
+cell Distance between words in px.
|
||||
+cell #[code 175] / #[code 85] (compact)
|
||||
|
||||
+h(4, "displacy_options-ent") Named Entity Visualizer options
|
||||
|
||||
+aside-code("Example").
|
||||
options = {'ents': ['PERSON', 'ORG', 'PRODUCT'],
|
||||
'colors': {'ORG': 'yellow'}}
|
||||
displacy.serve(doc, style='ent', options=options)
|
||||
|
||||
+table(["Name", "Type", "Description", "Default"])
|
||||
+row
|
||||
+cell #[code ents]
|
||||
+cell list
|
||||
+cell
|
||||
| Entity types to highlight (#[code None] for all types).
|
||||
+cell #[code None]
|
||||
|
||||
+row
|
||||
+cell #[code colors]
|
||||
+cell dict
|
||||
+cell
|
||||
| Color overrides. Entity types in uppercase should be mapped to
|
||||
| color names or values.
|
||||
+cell #[code {}]
|
||||
|
||||
p
|
||||
| By default, displaCy comes with colours for all
|
||||
| #[+a("/api/annotation#named-entities") entity types supported by spaCy].
|
||||
| If you're using custom entity types, you can use the #[code colors]
|
||||
| setting to add your own colours for them.
|
|
@ -1,201 +0,0 @@
|
|||
//- 💫 DOCS > API > TOP-LEVEL > SPACY
|
||||
|
||||
+h(3, "spacy.load") spacy.load
|
||||
+tag function
|
||||
+tag-model
|
||||
|
||||
p
|
||||
| Load a model via its #[+a("/usage/models#usage") shortcut link],
|
||||
| the name of an installed
|
||||
| #[+a("/usage/training#models-generating") model package], a unicode
|
||||
| path or a #[code Path]-like object. spaCy will try resolving the load
|
||||
| argument in this order. If a model is loaded from a shortcut link or
|
||||
| package name, spaCy will assume it's a Python package and import it and
|
||||
| call the model's own #[code load()] method. If a model is loaded from a
|
||||
| path, spaCy will assume it's a data directory, read the language and
|
||||
| pipeline settings off the meta.json and initialise the #[code Language]
|
||||
| class. The data will be loaded in via
|
||||
| #[+api("language#from_disk") #[code Language.from_disk()]].
|
||||
|
||||
+aside-code("Example").
|
||||
nlp = spacy.load('en') # shortcut link
|
||||
nlp = spacy.load('en_core_web_sm') # package
|
||||
nlp = spacy.load('/path/to/en') # unicode path
|
||||
nlp = spacy.load(Path('/path/to/en')) # pathlib Path
|
||||
|
||||
nlp = spacy.load('en', disable=['parser', 'tagger'])
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code name]
|
||||
+cell unicode or #[code Path]
|
||||
+cell Model to load, i.e. shortcut link, package name or path.
|
||||
|
||||
+row
|
||||
+cell #[code disable]
|
||||
+cell list
|
||||
+cell
|
||||
| Names of pipeline components to
|
||||
| #[+a("/usage/processing-pipelines#disabling") disable].
|
||||
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell #[code Language]
|
||||
+cell A #[code Language] object with the loaded model.
|
||||
|
||||
p
|
||||
| Essentially, #[code spacy.load()] is a convenience wrapper that reads
|
||||
| the language ID and pipeline components from a model's #[code meta.json],
|
||||
| initialises the #[code Language] class, loads in the model data and
|
||||
| returns it.
|
||||
|
||||
+code("Abstract example").
|
||||
cls = util.get_lang_class(lang) # get language for ID, e.g. 'en'
|
||||
nlp = cls() # initialise the language
|
||||
for name in pipeline:
|
||||
component = nlp.create_pipe(name) # create each pipeline component
|
||||
nlp.add_pipe(component) # add component to pipeline
|
||||
nlp.from_disk(model_data_path) # load in model data
|
||||
|
||||
+infobox("Changed in v2.0", "⚠️")
|
||||
| As of spaCy 2.0, the #[code path] keyword argument is deprecated. spaCy
|
||||
| will also raise an error if no model could be loaded and never just
|
||||
| return an empty #[code Language] object. If you need a blank language,
|
||||
| you can use the new function #[+api("spacy#blank") #[code spacy.blank()]]
|
||||
| or import the class explicitly, e.g.
|
||||
| #[code from spacy.lang.en import English].
|
||||
|
||||
+code-wrapper
|
||||
+code-new nlp = spacy.load('/model')
|
||||
+code-old nlp = spacy.load('en', path='/model')
|
||||
|
||||
+h(3, "spacy.blank") spacy.blank
|
||||
+tag function
|
||||
+tag-new(2)
|
||||
|
||||
p
|
||||
| Create a blank model of a given language class. This function is the
|
||||
| twin of #[code spacy.load()].
|
||||
|
||||
+aside-code("Example").
|
||||
nlp_en = spacy.blank('en')
|
||||
nlp_de = spacy.blank('de')
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code name]
|
||||
+cell unicode
|
||||
+cell
|
||||
| #[+a("https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes") ISO code]
|
||||
| of the language class to load.
|
||||
|
||||
+row
|
||||
+cell #[code disable]
|
||||
+cell list
|
||||
+cell
|
||||
| Names of pipeline components to
|
||||
| #[+a("/usage/processing-pipelines#disabling") disable].
|
||||
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell #[code Language]
|
||||
+cell An empty #[code Language] object of the appropriate subclass.
|
||||
|
||||
|
||||
+h(4, "spacy.info") spacy.info
|
||||
+tag function
|
||||
|
||||
p
|
||||
| The same as the #[+api("cli#info") #[code info] command]. Pretty-print
|
||||
| information about your installation, models and local setup from within
|
||||
| spaCy. To get the model meta data as a dictionary instead, you can
|
||||
| use the #[code meta] attribute on your #[code nlp] object with a
|
||||
| loaded model, e.g. #[code nlp.meta].
|
||||
|
||||
+aside-code("Example").
|
||||
spacy.info()
|
||||
spacy.info('en')
|
||||
spacy.info('de', markdown=True)
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code model]
|
||||
+cell unicode
|
||||
+cell A model, i.e. shortcut link, package name or path (optional).
|
||||
|
||||
+row
|
||||
+cell #[code markdown]
|
||||
+cell bool
|
||||
+cell Print information as Markdown.
|
||||
|
||||
|
||||
+h(3, "spacy.explain") spacy.explain
|
||||
+tag function
|
||||
|
||||
p
|
||||
| Get a description for a given POS tag, dependency label or entity type.
|
||||
| For a list of available terms, see
|
||||
| #[+src(gh("spacy", "spacy/glossary.py")) #[code glossary.py]].
|
||||
|
||||
+aside-code("Example").
|
||||
spacy.explain(u'NORP')
|
||||
# Nationalities or religious or political groups
|
||||
|
||||
doc = nlp(u'Hello world')
|
||||
for word in doc:
|
||||
print(word.text, word.tag_, spacy.explain(word.tag_))
|
||||
# Hello UH interjection
|
||||
# world NN noun, singular or mass
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code term]
|
||||
+cell unicode
|
||||
+cell Term to explain.
|
||||
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell unicode
|
||||
+cell The explanation, or #[code None] if not found in the glossary.
|
||||
|
||||
+h(3, "spacy.prefer_gpu") spacy.prefer_gpu
|
||||
+tag function
|
||||
+tag-new("2.0.14")
|
||||
|
||||
p
|
||||
| Allocate data and perform operations on #[+a("/usage/#gpu") GPU], if
|
||||
| available. If data has already been allocated on CPU, it will not be
|
||||
| moved. Ideally, this function should be called right after
|
||||
| importing spaCy and #[em before] loading any models.
|
||||
|
||||
+aside-code("Example").
|
||||
import spacy
|
||||
activated = spacy.prefer_gpu()
|
||||
nlp = spacy.load('en_core_web_sm')
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell bool
|
||||
+cell Whether the GPU was activated.
|
||||
|
||||
+h(3, "spacy.require_gpu") spacy.require_gpu
|
||||
+tag function
|
||||
+tag-new("2.0.14")
|
||||
|
||||
p
|
||||
| Allocate data and perform operations on #[+a("/usage/#gpu") GPU]. Will
|
||||
| raise an error if no GPU is available. If data has already been allocated
|
||||
| on CPU, it will not be moved. Ideally, this function should be called
|
||||
| right after importing spaCy and #[em before] loading any models.
|
||||
|
||||
+aside-code("Example").
|
||||
import spacy
|
||||
spacy.require_gpu()
|
||||
nlp = spacy.load('en_core_web_sm')
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell bool
|
||||
+cell #[code True]
|
|
@ -1,454 +0,0 @@
|
|||
//- 💫 DOCS > API > TOP-LEVEL > UTIL
|
||||
|
||||
p
|
||||
| spaCy comes with a small collection of utility functions located in
|
||||
| #[+src(gh("spaCy", "spacy/util.py")) #[code spacy/util.py]].
|
||||
| Because utility functions are mostly intended for
|
||||
| #[strong internal use within spaCy], their behaviour may change with
|
||||
| future releases. The functions documented on this page should be safe
|
||||
| to use and we'll try to ensure backwards compatibility. However, we
|
||||
| recommend having additional tests in place if your application depends on
|
||||
| any of spaCy's utilities.
|
||||
|
||||
+h(3, "util.get_data_path") util.get_data_path
|
||||
+tag function
|
||||
|
||||
p
|
||||
| Get path to the data directory where spaCy looks for models. Defaults to
|
||||
| #[code spacy/data].
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code require_exists]
|
||||
+cell bool
|
||||
+cell Only return path if it exists, otherwise return #[code None].
|
||||
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell #[code Path] / #[code None]
|
||||
+cell Data path or #[code None].
|
||||
|
||||
+h(3, "util.set_data_path") util.set_data_path
|
||||
+tag function
|
||||
|
||||
p
|
||||
| Set custom path to the data directory where spaCy looks for models.
|
||||
|
||||
+aside-code("Example").
|
||||
util.set_data_path('/custom/path')
|
||||
util.get_data_path()
|
||||
# PosixPath('/custom/path')
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code path]
|
||||
+cell unicode or #[code Path]
|
||||
+cell Path to new data directory.
|
||||
|
||||
+h(3, "util.get_lang_class") util.get_lang_class
|
||||
+tag function
|
||||
|
||||
p
|
||||
| Import and load a #[code Language] class. Allows lazy-loading
|
||||
| #[+a("/usage/adding-languages") language data] and importing
|
||||
| languages using the two-letter language code. To add a language code
|
||||
| for a custom language class, you can use the
|
||||
| #[+api("top-level#util.set_lang_class") #[code set_lang_class]] helper.
|
||||
|
||||
+aside-code("Example").
|
||||
for lang_id in ['en', 'de']:
|
||||
lang_class = util.get_lang_class(lang_id)
|
||||
lang = lang_class()
|
||||
tokenizer = lang.Defaults.create_tokenizer()
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code lang]
|
||||
+cell unicode
|
||||
+cell Two-letter language code, e.g. #[code 'en'].
|
||||
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell #[code Language]
|
||||
+cell Language class.
|
||||
|
||||
+h(3, "util.set_lang_class") util.set_lang_class
|
||||
+tag function
|
||||
|
||||
p
|
||||
| Set a custom #[code Language] class name that can be loaded via
|
||||
| #[+api("top-level#util.get_lang_class") #[code get_lang_class]]. If
|
||||
| your model uses a custom language, this is required so that spaCy can
|
||||
| load the correct class from the two-letter language code.
|
||||
|
||||
+aside-code("Example").
|
||||
from spacy.lang.xy import CustomLanguage
|
||||
|
||||
util.set_lang_class('xy', CustomLanguage)
|
||||
lang_class = util.get_lang_class('xy')
|
||||
nlp = lang_class()
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code name]
|
||||
+cell unicode
|
||||
+cell Two-letter language code, e.g. #[code 'en'].
|
||||
|
||||
+row
|
||||
+cell #[code cls]
|
||||
+cell #[code Language]
|
||||
+cell The language class, e.g. #[code English].
|
||||
|
||||
+h(3, "util.load_model") util.load_model
|
||||
+tag function
|
||||
+tag-new(2)
|
||||
|
||||
p
|
||||
| Load a model from a shortcut link, package or data path. If called with a
|
||||
| shortcut link or package name, spaCy will assume the model is a Python
|
||||
| package and import and call its #[code load()] method. If called with a
|
||||
| path, spaCy will assume it's a data directory, read the language and
|
||||
| pipeline settings from the meta.json and initialise a #[code Language]
|
||||
| class. The model data will then be loaded in via
|
||||
| #[+api("language#from_disk") #[code Language.from_disk()]].
|
||||
|
||||
+aside-code("Example").
|
||||
nlp = util.load_model('en')
|
||||
nlp = util.load_model('en_core_web_sm', disable=['ner'])
|
||||
nlp = util.load_model('/path/to/data')
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code name]
|
||||
+cell unicode
|
||||
+cell Package name, shortcut link or model path.
|
||||
|
||||
+row
|
||||
+cell #[code **overrides]
|
||||
+cell -
|
||||
+cell Specific overrides, like pipeline components to disable.
|
||||
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell #[code Language]
|
||||
+cell #[code Language] class with the loaded model.
|
||||
|
||||
+h(3, "util.load_model_from_path") util.load_model_from_path
|
||||
+tag function
|
||||
+tag-new(2)
|
||||
|
||||
p
|
||||
| Load a model from a data directory path. Creates the
|
||||
| #[+api("language") #[code Language]] class and pipeline based on the
|
||||
| directory's meta.json and then calls
|
||||
| #[+api("language#from_disk") #[code from_disk()]] with the path. This
|
||||
| function also makes it easy to test a new model that you haven't packaged
|
||||
| yet.
|
||||
|
||||
+aside-code("Example").
|
||||
nlp = load_model_from_path('/path/to/data')
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code model_path]
|
||||
+cell unicode
|
||||
+cell Path to model data directory.
|
||||
|
||||
+row
|
||||
+cell #[code meta]
|
||||
+cell dict
|
||||
+cell
|
||||
| Model meta data. If #[code False], spaCy will try to load the
|
||||
| meta from a meta.json in the same directory.
|
||||
|
||||
+row
|
||||
+cell #[code **overrides]
|
||||
+cell -
|
||||
+cell Specific overrides, like pipeline components to disable.
|
||||
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell #[code Language]
|
||||
+cell #[code Language] class with the loaded model.
|
||||
|
||||
+h(3, "util.load_model_from_init_py") util.load_model_from_init_py
|
||||
+tag function
|
||||
+tag-new(2)
|
||||
|
||||
p
|
||||
| A helper function to use in the #[code load()] method of a model package's
|
||||
| #[+src(gh("spacy-models", "template/model/xx_model_name/__init__.py")) #[code __init__.py]].
|
||||
|
||||
+aside-code("Example").
|
||||
from spacy.util import load_model_from_init_py
|
||||
|
||||
def load(**overrides):
|
||||
return load_model_from_init_py(__file__, **overrides)
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code init_file]
|
||||
+cell unicode
|
||||
+cell Path to model's __init__.py, i.e. #[code __file__].
|
||||
|
||||
+row
|
||||
+cell #[code **overrides]
|
||||
+cell -
|
||||
+cell Specific overrides, like pipeline components to disable.
|
||||
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell #[code Language]
|
||||
+cell #[code Language] class with the loaded model.
|
||||
|
||||
+h(3, "util.get_model_meta") util.get_model_meta
|
||||
+tag function
|
||||
+tag-new(2)
|
||||
|
||||
p
|
||||
| Get a model's meta.json from a directory path and validate its contents.
|
||||
|
||||
+aside-code("Example").
|
||||
meta = util.get_model_meta('/path/to/model')
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code path]
|
||||
+cell unicode or #[code Path]
|
||||
+cell Path to model directory.
|
||||
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell dict
|
||||
+cell The model's meta data.
|
||||
|
||||
+h(3, "util.is_package") util.is_package
|
||||
+tag function
|
||||
|
||||
p
|
||||
| Check if string maps to a package installed via pip. Mainly used to
|
||||
| validate #[+a("/usage/models") model packages].
|
||||
|
||||
+aside-code("Example").
|
||||
util.is_package('en_core_web_sm') # True
|
||||
util.is_package('xyz') # False
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code name]
|
||||
+cell unicode
|
||||
+cell Name of package.
|
||||
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell #[code bool]
|
||||
+cell #[code True] if installed package, #[code False] if not.
|
||||
|
||||
+h(3, "util.get_package_path") util.get_package_path
|
||||
+tag function
|
||||
+tag-new(2)
|
||||
|
||||
p
|
||||
| Get path to an installed package. Mainly used to resolve the location of
|
||||
| #[+a("/usage/models") model packages]. Currently imports the package
|
||||
| to find its path.
|
||||
|
||||
+aside-code("Example").
|
||||
util.get_package_path('en_core_web_sm')
|
||||
# /usr/lib/python3.6/site-packages/en_core_web_sm
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code package_name]
|
||||
+cell unicode
|
||||
+cell Name of installed package.
|
||||
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell #[code Path]
|
||||
+cell Path to model package directory.
|
||||
|
||||
+h(3, "util.is_in_jupyter") util.is_in_jupyter
|
||||
+tag function
|
||||
+tag-new(2)
|
||||
|
||||
p
|
||||
| Check if user is running spaCy from a #[+a("https://jupyter.org") Jupyter]
|
||||
| notebook by detecting the IPython kernel. Mainly used for the
|
||||
| #[+api("top-level#displacy") #[code displacy]] visualizer.
|
||||
|
||||
+aside-code("Example").
|
||||
html = '<h1>Hello world!</h1>'
|
||||
if util.is_in_jupyter():
|
||||
from IPython.core.display import display, HTML
|
||||
display(HTML(html))
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell bool
|
||||
+cell #[code True] if in Jupyter, #[code False] if not.
|
||||
|
||||
+h(3, "util.update_exc") util.update_exc
|
||||
+tag function
|
||||
|
||||
p
|
||||
| Update, validate and overwrite
|
||||
| #[+a("/usage/adding-languages#tokenizer-exceptions") tokenizer exceptions].
|
||||
| Used to combine global exceptions with custom, language-specific
|
||||
| exceptions. Will raise an error if key doesn't match #[code ORTH] values.
|
||||
|
||||
+aside-code("Example").
|
||||
BASE = {"a.": [{ORTH: "a."}], ":)": [{ORTH: ":)"}]}
|
||||
NEW = {"a.": [{ORTH: "a.", LEMMA: "all"}]}
|
||||
exceptions = util.update_exc(BASE, NEW)
|
||||
# {"a.": [{ORTH: "a.", LEMMA: "all"}], ":)": [{ORTH: ":)"}]}
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code base_exceptions]
|
||||
+cell dict
|
||||
+cell Base tokenizer exceptions.
|
||||
|
||||
+row
|
||||
+cell #[code *addition_dicts]
|
||||
+cell dicts
|
||||
+cell Exception dictionaries to add to the base exceptions, in order.
|
||||
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell dict
|
||||
+cell Combined tokenizer exceptions.
|
||||
|
||||
+h(3, "util.minibatch") util.minibatch
|
||||
+tag function
|
||||
+tag-new(2)
|
||||
|
||||
p
|
||||
| Iterate over batches of items. #[code size] may be an iterator, so that
|
||||
| batch-size can vary on each step.
|
||||
|
||||
+aside-code("Example").
|
||||
batches = minibatch(train_data)
|
||||
for batch in batches:
|
||||
texts, annotations = zip(*batch)
|
||||
nlp.update(texts, annotations)
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code items]
|
||||
+cell iterable
|
||||
+cell The items to batch up.
|
||||
|
||||
+row
|
||||
+cell #[code size]
|
||||
+cell int / iterable
|
||||
+cell
|
||||
| The batch size(s). Use
|
||||
| #[+api("top-level#util.compounding") #[code util.compounding]] or
|
||||
| #[+api("top-level#util.decaying") #[code util.decaying]] or
|
||||
| for an infinite series of compounding or decaying values.
|
||||
|
||||
+row("foot")
|
||||
+cell yields
|
||||
+cell list
|
||||
+cell The batches.
|
||||
|
||||
+h(3, "util.compounding") util.compounding
|
||||
+tag function
|
||||
+tag-new(2)
|
||||
|
||||
p
|
||||
| Yield an infinite series of compounding values. Each time the generator
|
||||
| is called, a value is produced by multiplying the previous value by the
|
||||
| compound rate.
|
||||
|
||||
+aside-code("Example").
|
||||
sizes = compounding(1., 10., 1.5)
|
||||
assert next(sizes) == 1.
|
||||
assert next(sizes) == 1. * 1.5
|
||||
assert next(sizes) == 1.5 * 1.5
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code start]
|
||||
+cell int / float
|
||||
+cell The first value.
|
||||
|
||||
+row
|
||||
+cell #[code stop]
|
||||
+cell int / float
|
||||
+cell The maximum value.
|
||||
|
||||
+row
|
||||
+cell #[code compound]
|
||||
+cell int / float
|
||||
+cell The compounding factor.
|
||||
|
||||
+row("foot")
|
||||
+cell yields
|
||||
+cell int
|
||||
+cell Compounding values.
|
||||
|
||||
+h(3, "util.decaying") util.decaying
|
||||
+tag function
|
||||
+tag-new(2)
|
||||
|
||||
p
|
||||
| Yield an infinite series of linearly decaying values.
|
||||
|
||||
+aside-code("Example").
|
||||
sizes = decaying(1., 10., 0.001)
|
||||
assert next(sizes) == 1.
|
||||
assert next(sizes) == 1. - 0.001
|
||||
assert next(sizes) == 0.999 - 0.001
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code start]
|
||||
+cell int / float
|
||||
+cell The first value.
|
||||
|
||||
+row
|
||||
+cell #[code end]
|
||||
+cell int / float
|
||||
+cell The maximum value.
|
||||
|
||||
+row
|
||||
+cell #[code decay]
|
||||
+cell int / float
|
||||
+cell The decaying factor.
|
||||
|
||||
+row("foot")
|
||||
+cell yields
|
||||
+cell int
|
||||
+cell The decaying values.
|
||||
|
||||
+h(3, "util.itershuffle") util.itershuffle
|
||||
+tag function
|
||||
+tag-new(2)
|
||||
|
||||
p
|
||||
| Shuffle an iterator. This works by holding #[code bufsize] items back and
|
||||
| yielding them sometime later. Obviously, this is not unbiased – but
|
||||
| should be good enough for batching. Larger bufsize means less bias.
|
||||
|
||||
+aside-code("Example").
|
||||
values = range(1000)
|
||||
shuffled = itershuffle(values)
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code iterable]
|
||||
+cell iterable
|
||||
+cell Iterator to shuffle.
|
||||
|
||||
+row
|
||||
+cell #[code buffsize]
|
||||
+cell int
|
||||
+cell Items to hold back.
|
||||
|
||||
+row("foot")
|
||||
+cell yields
|
||||
+cell iterable
|
||||
+cell The shuffled iterator.
|
|
@ -1,46 +0,0 @@
|
|||
//- 💫 DOCS > API > ANNOTATION SPECS
|
||||
|
||||
include ../_includes/_mixins
|
||||
|
||||
+section("text-processing")
|
||||
+h(2, "text-processing") Text Processing
|
||||
include _annotation/_text-processing
|
||||
|
||||
+section("pos-tagging")
|
||||
+h(2, "pos-tagging") Part-of-speech Tagging
|
||||
|
||||
+aside("Tip: Understanding tags")
|
||||
| You can also use #[code spacy.explain()] to get the description for the
|
||||
| string representation of a tag. For example,
|
||||
| #[code spacy.explain("RB")] will return "adverb".
|
||||
|
||||
include _annotation/_pos-tags
|
||||
|
||||
+section("dependency-parsing")
|
||||
+h(2, "dependency-parsing") Syntactic Dependency Parsing
|
||||
|
||||
+aside("Tip: Understanding labels")
|
||||
| You can also use #[code spacy.explain()] to get the description for the
|
||||
| string representation of a label. For example,
|
||||
| #[code spacy.explain("prt")] will return "particle".
|
||||
|
||||
include _annotation/_dep-labels
|
||||
|
||||
+section("named-entities")
|
||||
+h(2, "named-entities") Named Entity Recognition
|
||||
|
||||
+aside("Tip: Understanding entity types")
|
||||
| You can also use #[code spacy.explain()] to get the description for the
|
||||
| string representation of an entity label. For example,
|
||||
| #[code spacy.explain("LANGUAGE")] will return "any named language".
|
||||
|
||||
include _annotation/_named-entities
|
||||
|
||||
+h(3, "biluo") BILUO Scheme
|
||||
|
||||
include _annotation/_biluo
|
||||
|
||||
+section("training")
|
||||
+h(2, "training") Models and training data
|
||||
|
||||
include _annotation/_training
|
|
@ -1,738 +0,0 @@
|
|||
//- 💫 DOCS > API > COMMAND LINE INTERFACE
|
||||
|
||||
include ../_includes/_mixins
|
||||
|
||||
p
|
||||
| As of v1.7.0, spaCy comes with new command line helpers to download and
|
||||
| link models and show useful debugging information. For a list of available
|
||||
| commands, type #[code spacy --help].
|
||||
|
||||
+h(3, "download") Download
|
||||
|
||||
p
|
||||
| Download #[+a("/usage/models") models] for spaCy. The downloader finds the
|
||||
| best-matching compatible version, uses pip to download the model as a
|
||||
| package and automatically creates a
|
||||
| #[+a("/usage/models#usage") shortcut link] to load the model by name.
|
||||
| Direct downloads don't perform any compatibility checks and require the
|
||||
| model name to be specified with its version (e.g.
|
||||
| #[code en_core_web_sm-2.0.0]).
|
||||
|
||||
+aside("Downloading best practices")
|
||||
| The #[code download] command is mostly intended as a convenient,
|
||||
| interactive wrapper – it performs compatibility checks and prints
|
||||
| detailed messages in case things go wrong. It's #[strong not recommended]
|
||||
| to use this command as part of an automated process. If you know which
|
||||
| model your project needs, you should consider a
|
||||
| #[+a("/usage/models#download-pip") direct download via pip], or
|
||||
| uploading the model to a local PyPi installation and fetching it straight
|
||||
| from there. This will also allow you to add it as a versioned package
|
||||
| dependency to your project.
|
||||
|
||||
+code(false, "bash", "$").
|
||||
python -m spacy download [model] [--direct]
|
||||
|
||||
+table(["Argument", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code model]
|
||||
+cell positional
|
||||
+cell
|
||||
| Model name or shortcut (#[code en], #[code de],
|
||||
| #[code en_core_web_sm]).
|
||||
|
||||
+row
|
||||
+cell #[code --direct], #[code -d]
|
||||
+cell flag
|
||||
+cell Force direct download of exact model version.
|
||||
|
||||
+row
|
||||
+cell other
|
||||
+tag-new(2.1)
|
||||
+cell -
|
||||
+cell
|
||||
| Additional installation options to be passed to
|
||||
| #[code pip install] when installing the model package. For
|
||||
| example, #[code --user] to install to the user home directory.
|
||||
|
||||
+row
|
||||
+cell #[code --help], #[code -h]
|
||||
+cell flag
|
||||
+cell Show help message and available arguments.
|
||||
|
||||
+row("foot")
|
||||
+cell creates
|
||||
+cell directory, symlink
|
||||
+cell
|
||||
| The installed model package in your #[code site-packages]
|
||||
| directory and a shortcut link as a symlink in #[code spacy/data].
|
||||
|
||||
+h(3, "link") Link
|
||||
|
||||
p
|
||||
| Create a #[+a("/usage/models#usage") shortcut link] for a model,
|
||||
| either a Python package or a local directory. This will let you load
|
||||
| models from any location using a custom name via
|
||||
| #[+api("spacy#load") #[code spacy.load()]].
|
||||
|
||||
+infobox("Important note")
|
||||
| In spaCy v1.x, you had to use the model data directory to set up a shortcut
|
||||
| link for a local path. As of v2.0, spaCy expects all shortcut links to
|
||||
| be #[strong loadable model packages]. If you want to load a data directory,
|
||||
| call #[+api("spacy#load") #[code spacy.load()]] or
|
||||
| #[+api("language#from_disk") #[code Language.from_disk()]] with the path,
|
||||
| or use the #[+api("cli#package") #[code package]] command to create a
|
||||
| model package.
|
||||
|
||||
+code(false, "bash", "$").
|
||||
python -m spacy link [origin] [link_name] [--force]
|
||||
|
||||
+table(["Argument", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code origin]
|
||||
+cell positional
|
||||
+cell Model name if package, or path to local directory.
|
||||
|
||||
+row
|
||||
+cell #[code link_name]
|
||||
+cell positional
|
||||
+cell Name of the shortcut link to create.
|
||||
|
||||
+row
|
||||
+cell #[code --force], #[code -f]
|
||||
+cell flag
|
||||
+cell Force overwriting of existing link.
|
||||
|
||||
+row
|
||||
+cell #[code --help], #[code -h]
|
||||
+cell flag
|
||||
+cell Show help message and available arguments.
|
||||
|
||||
+row("foot")
|
||||
+cell creates
|
||||
+cell symlink
|
||||
+cell
|
||||
| A shortcut link of the given name as a symlink in
|
||||
| #[code spacy/data].
|
||||
|
||||
+h(3, "info") Info
|
||||
|
||||
p
|
||||
| Print information about your spaCy installation, models and local setup,
|
||||
| and generate #[+a("https://en.wikipedia.org/wiki/Markdown") Markdown]-formatted
|
||||
| markup to copy-paste into #[+a(gh("spacy") + "/issues") GitHub issues].
|
||||
|
||||
+code(false, "bash").
|
||||
python -m spacy info [--markdown]
|
||||
python -m spacy info [model] [--markdown]
|
||||
|
||||
+table(["Argument", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code model]
|
||||
+cell positional
|
||||
+cell A model, i.e. shortcut link, package name or path (optional).
|
||||
|
||||
+row
|
||||
+cell #[code --markdown], #[code -md]
|
||||
+cell flag
|
||||
+cell Print information as Markdown.
|
||||
|
||||
+row
|
||||
+cell #[code --silent], #[code -s]
|
||||
+tag-new("2.0.12")
|
||||
+cell flag
|
||||
+cell Don't print anything, just return the values.
|
||||
|
||||
+row
|
||||
+cell #[code --help], #[code -h]
|
||||
+cell flag
|
||||
+cell Show help message and available arguments.
|
||||
|
||||
+row("foot")
|
||||
+cell prints
|
||||
+cell #[code stdout]
|
||||
+cell Information about your spaCy installation.
|
||||
|
||||
+h(3, "validate") Validate
|
||||
+tag-new(2)
|
||||
|
||||
p
|
||||
| Find all models installed in the current environment (both packages and
|
||||
| shortcut links) and check whether they are compatible with the currently
|
||||
| installed version of spaCy. Should be run after upgrading spaCy via
|
||||
| #[code pip install -U spacy] to ensure that all installed models are
|
||||
| can be used with the new version. The command is also useful to detect
|
||||
| out-of-sync model links resulting from links created in different virtual
|
||||
| environments. It will a list of models, the installed versions, the
|
||||
| latest compatible version (if out of date) and the commands for updating.
|
||||
|
||||
+aside("Automated validation")
|
||||
| You can also use the #[code validate] command as part of your build
|
||||
| process or test suite, to ensure all models are up to date before
|
||||
| proceeding. If incompatible models or shortcut links are found, it will
|
||||
| return #[code 1].
|
||||
|
||||
+code(false, "bash", "$").
|
||||
python -m spacy validate
|
||||
|
||||
+table(["Argument", "Type", "Description"])
|
||||
+row("foot")
|
||||
+cell prints
|
||||
+cell #[code stdout]
|
||||
+cell Details about the compatibility of your installed models.
|
||||
|
||||
+h(3, "convert") Convert
|
||||
|
||||
p
|
||||
| Convert files into spaCy's #[+a("/api/annotation#json-input") JSON format]
|
||||
| for use with the #[code train] command and other experiment management
|
||||
| functions. The converter can be specified on the command line, or
|
||||
| chosen based on the file extension of the input file.
|
||||
|
||||
+code(false, "bash", "$", false, false, true).
|
||||
python -m spacy convert [input_file] [output_dir] [--converter] [--n-sents]
|
||||
[--morphology]
|
||||
|
||||
+table(["Argument", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code input_file]
|
||||
+cell positional
|
||||
+cell Input file.
|
||||
|
||||
+row
|
||||
+cell #[code output_dir]
|
||||
+cell positional
|
||||
+cell Output directory for converted JSON file.
|
||||
|
||||
+row
|
||||
+cell #[code converter], #[code -c]
|
||||
+cell option
|
||||
+cell #[+tag-new(2)] Name of converter to use (see below).
|
||||
|
||||
+row
|
||||
+cell #[code --n-sents], #[code -n]
|
||||
+cell option
|
||||
+cell Number of sentences per document.
|
||||
|
||||
+row
|
||||
+cell #[code --morphology], #[code -m]
|
||||
+cell option
|
||||
+cell Enable appending morphology to tags.
|
||||
|
||||
+row
|
||||
+cell #[code --help], #[code -h]
|
||||
+cell flag
|
||||
+cell Show help message and available arguments.
|
||||
|
||||
+row("foot")
|
||||
+cell creates
|
||||
+cell JSON
|
||||
+cell Data in spaCy's #[+a("/api/annotation#json-input") JSON format].
|
||||
|
||||
p The following file format converters are available:
|
||||
|
||||
+table(["ID", "Description"])
|
||||
+row
|
||||
+cell #[code auto]
|
||||
+cell Automatically pick converter based on file extension (default).
|
||||
|
||||
+row
|
||||
+cell #[code conllu], #[code conll]
|
||||
+cell Universal Dependencies #[code .conllu] or #[code .conll] format.
|
||||
|
||||
+row
|
||||
+cell #[code ner]
|
||||
+cell Tab-based named entity recognition format.
|
||||
|
||||
+row
|
||||
+cell #[code iob]
|
||||
+cell IOB or IOB2 named entity recognition format.
|
||||
|
||||
+h(3, "train") Train
|
||||
|
||||
p
|
||||
| Train a model. Expects data in spaCy's
|
||||
| #[+a("/api/annotation#json-input") JSON format]. On each epoch, a model
|
||||
| will be saved out to the directory. Accuracy scores and model details
|
||||
| will be added to a #[+a("/usage/training#models-generating") #[code meta.json]]
|
||||
| to allow packaging the model using the
|
||||
| #[+api("cli#package") #[code package]] command.
|
||||
|
||||
+infobox("Changed in v2.1", "⚠️")
|
||||
| As of spaCy 2.1, the #[code --no-tagger], #[code --no-parser] and
|
||||
| #[code --no-parser] flags have been replaced by a #[code --pipeline]
|
||||
| option, which lets you define comma-separated names of pipeline
|
||||
| components to train. For example, #[code --pipeline tagger,parser] will
|
||||
| only train the tagger and parser.
|
||||
|
||||
+code(false, "bash", "$", false, false, true).
|
||||
python -m spacy train [lang] [output_path] [train_path] [dev_path]
|
||||
[--base-model] [--pipeline] [--vectors] [--n-iter] [--n-examples] [--use-gpu]
|
||||
[--version] [--meta-path] [--init-tok2vec] [--parser-multitasks]
|
||||
[--entity-multitasks] [--gold-preproc] [--noise-level] [--learn-tokens]
|
||||
[--verbose]
|
||||
|
||||
+table(["Argument", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code lang]
|
||||
+cell positional
|
||||
+cell Model language.
|
||||
|
||||
+row
|
||||
+cell #[code output_path]
|
||||
+cell positional
|
||||
+cell Directory to store model in. Will be created if it doesn't exist.
|
||||
|
||||
+row
|
||||
+cell #[code train_path]
|
||||
+cell positional
|
||||
+cell Location of JSON-formatted training data.
|
||||
|
||||
+row
|
||||
+cell #[code dev_path]
|
||||
+cell positional
|
||||
+cell Location of JSON-formatted development data for evaluation.
|
||||
|
||||
+row
|
||||
+cell #[code --base-model], #[code -b]
|
||||
+cell option
|
||||
+cell
|
||||
| Optional name of base model to update. Can be any loadable
|
||||
| spaCy model.
|
||||
|
||||
+row
|
||||
+cell #[code --pipeline], #[code -p]
|
||||
+tag-new("2.1.0")
|
||||
+cell option
|
||||
+cell
|
||||
| Comma-separated names of pipeline components to train. Defaults
|
||||
| to #[code 'tagger,parser,ner'].
|
||||
|
||||
+row
|
||||
+cell #[code --vectors], #[code -v]
|
||||
+cell option
|
||||
+cell Model to load vectors from.
|
||||
|
||||
+row
|
||||
+cell #[code --n-iter], #[code -n]
|
||||
+cell option
|
||||
+cell Number of iterations (default: #[code 30]).
|
||||
|
||||
+row
|
||||
+cell #[code --n-examples], #[code -ns]
|
||||
+cell option
|
||||
+cell Number of examples to use (defaults to #[code 0] for all examples).
|
||||
|
||||
+row
|
||||
+cell #[code --use-gpu], #[code -g]
|
||||
+cell option
|
||||
+cell
|
||||
| Whether to use GPU. Can be either #[code 0], #[code 1] or
|
||||
| #[code -1].
|
||||
|
||||
+row
|
||||
+cell #[code --version], #[code -V]
|
||||
+cell option
|
||||
+cell
|
||||
| Model version. Will be written out to the model's
|
||||
| #[code meta.json] after training.
|
||||
|
||||
+row
|
||||
+cell #[code --meta-path], #[code -m]
|
||||
+tag-new(2)
|
||||
+cell option
|
||||
+cell
|
||||
| Optional path to model
|
||||
| #[+a("/usage/training#models-generating") #[code meta.json]].
|
||||
| All relevant properties like #[code lang], #[code pipeline] and
|
||||
| #[code spacy_version] will be overwritten.
|
||||
|
||||
+row
|
||||
+cell #[code --init-tok2vec], #[code -t2v]
|
||||
+tag-new("2.1.0")
|
||||
+cell option
|
||||
+cell
|
||||
| Path to pretrained weights for the token-to-vector parts of the
|
||||
| models. See #[code spacy pretrain]. Experimental.
|
||||
|
||||
+row
|
||||
+cell #[code --parser-multitasks], #[code -pt]
|
||||
+cell option
|
||||
+cell
|
||||
| Side objectives for parser CNN, e.g. #[code 'dep'] or
|
||||
| #[code 'dep,tag']
|
||||
|
||||
+row
|
||||
+cell #[code --entity-multitasks], #[code -et]
|
||||
+cell option
|
||||
+cell
|
||||
| Side objectives for NER CNN, e.g. #[code 'dep'] or
|
||||
| #[code 'dep,tag']
|
||||
|
||||
+row
|
||||
+cell #[code --noise-level], #[code -nl]
|
||||
+cell option
|
||||
+cell Float indicating the amount of corruption for data agumentation.
|
||||
|
||||
+row
|
||||
+cell #[code --gold-preproc], #[code -G]
|
||||
+cell flag
|
||||
+cell Use gold preprocessing.
|
||||
|
||||
+row
|
||||
+cell #[code --learn-tokens], #[code -T]
|
||||
+cell flag
|
||||
+cell
|
||||
| Make parser learn gold-standard tokenization by merging
|
||||
] subtokens. Typically used for languages like Chinese.
|
||||
|
||||
+row
|
||||
+cell #[code --verbose], #[code -VV]
|
||||
+tag-new("2.0.13")
|
||||
+cell flag
|
||||
+cell Show more detailed messages during training.
|
||||
|
||||
+row
|
||||
+cell #[code --help], #[code -h]
|
||||
+cell flag
|
||||
+cell Show help message and available arguments.
|
||||
|
||||
+row("foot")
|
||||
+cell creates
|
||||
+cell model, pickle
|
||||
+cell A spaCy model on each epoch.
|
||||
|
||||
+h(4, "train-hyperparams") Environment variables for hyperparameters
|
||||
+tag-new(2)
|
||||
|
||||
p
|
||||
| spaCy lets you set hyperparameters for training via environment variables.
|
||||
| This is useful, because it keeps the command simple and allows you to
|
||||
| #[+a("https://askubuntu.com/questions/17536/how-do-i-create-a-permanent-bash-alias/17537#17537") create an alias]
|
||||
| for your custom #[code train] command while still being able to easily
|
||||
| tweak the hyperparameters. For example:
|
||||
|
||||
+code(false, "bash", "$").
|
||||
parser_hidden_depth=2 parser_maxout_pieces=1 spacy train [...]
|
||||
|
||||
+code("Usage with alias", "bash", "$").
|
||||
alias train-parser="spacy train en /output /data /train /dev -n 1000"
|
||||
parser_maxout_pieces=1 train-parser
|
||||
|
||||
+table(["Name", "Description", "Default"])
|
||||
+row
|
||||
+cell #[code dropout_from]
|
||||
+cell Initial dropout rate.
|
||||
+cell #[code 0.2]
|
||||
|
||||
+row
|
||||
+cell #[code dropout_to]
|
||||
+cell Final dropout rate.
|
||||
+cell #[code 0.2]
|
||||
|
||||
+row
|
||||
+cell #[code dropout_decay]
|
||||
+cell Rate of dropout change.
|
||||
+cell #[code 0.0]
|
||||
|
||||
+row
|
||||
+cell #[code batch_from]
|
||||
+cell Initial batch size.
|
||||
+cell #[code 1]
|
||||
|
||||
+row
|
||||
+cell #[code batch_to]
|
||||
+cell Final batch size.
|
||||
+cell #[code 64]
|
||||
|
||||
+row
|
||||
+cell #[code batch_compound]
|
||||
+cell Rate of batch size acceleration.
|
||||
+cell #[code 1.001]
|
||||
|
||||
+row
|
||||
+cell #[code token_vector_width]
|
||||
+cell Width of embedding tables and convolutional layers.
|
||||
+cell #[code 128]
|
||||
|
||||
+row
|
||||
+cell #[code embed_size]
|
||||
+cell Number of rows in embedding tables.
|
||||
+cell #[code 7500]
|
||||
|
||||
//- +row
|
||||
//- +cell #[code parser_maxout_pieces]
|
||||
//- +cell Number of pieces in the parser's and NER's first maxout layer.
|
||||
//- +cell #[code 2]
|
||||
|
||||
//- +row
|
||||
//- +cell #[code parser_hidden_depth]
|
||||
//- +cell Number of hidden layers in the parser and NER.
|
||||
//- +cell #[code 1]
|
||||
|
||||
+row
|
||||
+cell #[code hidden_width]
|
||||
+cell Size of the parser's and NER's hidden layers.
|
||||
+cell #[code 128]
|
||||
|
||||
//- +row
|
||||
//- +cell #[code history_feats]
|
||||
//- +cell Number of previous action ID features for parser and NER.
|
||||
//- +cell #[code 128]
|
||||
|
||||
//- +row
|
||||
//- +cell #[code history_width]
|
||||
//- +cell Number of embedding dimensions for each action ID.
|
||||
//- +cell #[code 128]
|
||||
|
||||
+row
|
||||
+cell #[code learn_rate]
|
||||
+cell Learning rate.
|
||||
+cell #[code 0.001]
|
||||
|
||||
+row
|
||||
+cell #[code optimizer_B1]
|
||||
+cell Momentum for the Adam solver.
|
||||
+cell #[code 0.9]
|
||||
|
||||
+row
|
||||
+cell #[code optimizer_B2]
|
||||
+cell Adagrad-momentum for the Adam solver.
|
||||
+cell #[code 0.999]
|
||||
|
||||
+row
|
||||
+cell #[code optimizer_eps]
|
||||
+cell Epsilon value for the Adam solver.
|
||||
+cell #[code 1e-08]
|
||||
|
||||
+row
|
||||
+cell #[code L2_penalty]
|
||||
+cell L2 regularisation penalty.
|
||||
+cell #[code 1e-06]
|
||||
|
||||
+row
|
||||
+cell #[code grad_norm_clip]
|
||||
+cell Gradient L2 norm constraint.
|
||||
+cell #[code 1.0]
|
||||
|
||||
+h(3, "vocab") Vocab
|
||||
+tag-new(2)
|
||||
|
||||
p
|
||||
| Compile a vocabulary from a
|
||||
| #[+a("/api/annotation#vocab-jsonl") lexicon JSONL] file and optional
|
||||
| word vectors. Will save out a valid spaCy model that you can load via
|
||||
| #[+api("spacy#load") #[code spacy.load]] or package using the
|
||||
| #[+api("cli#package") #[code package]] command.
|
||||
|
||||
+code(false, "bash", "$").
|
||||
python -m spacy vocab [lang] [output_dir] [lexemes_loc] [vectors_loc]
|
||||
|
||||
+table(["Argument", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code lang]
|
||||
+cell positional
|
||||
+cell
|
||||
| Model language
|
||||
| #[+a("https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes") ISO code],
|
||||
| e.g. #[code en].
|
||||
|
||||
+row
|
||||
+cell #[code output_dir]
|
||||
+cell positional
|
||||
+cell Model output directory. Will be created if it doesn't exist.
|
||||
|
||||
+row
|
||||
+cell #[code lexemes_loc]
|
||||
+cell positional
|
||||
+cell
|
||||
| Location of lexical data in spaCy's
|
||||
| #[+a("/api/annotation#vocab-jsonl") JSONL format].
|
||||
|
||||
+row
|
||||
+cell #[code vectors_loc]
|
||||
+cell positional
|
||||
+cell Optional location of vectors data as numpy #[code .npz] file.
|
||||
|
||||
+row("foot")
|
||||
+cell creates
|
||||
+cell model
|
||||
+cell A spaCy model containing the vocab and vectors.
|
||||
|
||||
+h(3, "init-model") Init Model
|
||||
+tag-new(2)
|
||||
|
||||
p
|
||||
| Create a new model directory from raw data, like word frequencies, Brown
|
||||
| clusters and word vectors. This command is similar to the
|
||||
| #[code spacy model] command in v1.x.
|
||||
|
||||
+code(false, "bash", "$", false, false, true).
|
||||
python -m spacy init-model [lang] [output_dir] [freqs_loc] [--clusters-loc] [--vectors-loc] [--prune-vectors]
|
||||
|
||||
+table(["Argument", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code lang]
|
||||
+cell positional
|
||||
+cell
|
||||
| Model language
|
||||
| #[+a("https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes") ISO code],
|
||||
| e.g. #[code en].
|
||||
|
||||
+row
|
||||
+cell #[code output_dir]
|
||||
+cell positional
|
||||
+cell Model output directory. Will be created if it doesn't exist.
|
||||
|
||||
+row
|
||||
+cell #[code freqs_loc]
|
||||
+cell positional
|
||||
+cell
|
||||
| Location of word frequencies file. Should be a tab-separated
|
||||
| file with three columns: frequency, document frequency and
|
||||
| frequency count.
|
||||
|
||||
+row
|
||||
+cell #[code --clusters-loc], #[code -c]
|
||||
+cell option
|
||||
+cell
|
||||
| Optional location of clusters file. Should be a tab-separated
|
||||
| file with three columns: cluster, word and frequency.
|
||||
|
||||
+row
|
||||
+cell #[code --vectors-loc], #[code -v]
|
||||
+cell option
|
||||
+cell
|
||||
| Optional location of vectors file. Should be a tab-separated
|
||||
| file in Word2Vec format where the first column contains the word
|
||||
| and the remaining columns the values. File can be provided in
|
||||
| #[code .txt] format or as a zipped text file in #[code .zip] or
|
||||
| #[code .tar.gz] format.
|
||||
|
||||
+row
|
||||
+cell #[code --prune-vectors], #[code -V]
|
||||
+cell flag
|
||||
+cell
|
||||
| Number of vectors to prune the vocabulary to. Defaults to
|
||||
| #[code -1] for no pruning.
|
||||
|
||||
+row("foot")
|
||||
+cell creates
|
||||
+cell model
|
||||
+cell A spaCy model containing the vocab and vectors.
|
||||
|
||||
+h(3, "evaluate") Evaluate
|
||||
+tag-new(2)
|
||||
|
||||
p
|
||||
| Evaluate a model's accuracy and speed on JSON-formatted annotated data.
|
||||
| Will print the results and optionally export
|
||||
| #[+a("/usage/visualizers") displaCy visualizations] of a sample set of
|
||||
| parses to #[code .html] files. Visualizations for the dependency parse
|
||||
| and NER will be exported as separate files if the respective component
|
||||
| is present in the model's pipeline.
|
||||
|
||||
+code(false, "bash", "$", false, false, true).
|
||||
python -m spacy evaluate [model] [data_path] [--displacy-path] [--displacy-limit] [--gpu-id] [--gold-preproc]
|
||||
|
||||
+table(["Argument", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code model]
|
||||
+cell positional
|
||||
+cell
|
||||
| Model to evaluate. Can be a package or shortcut link name, or a
|
||||
| path to a model data directory.
|
||||
|
||||
+row
|
||||
+cell #[code data_path]
|
||||
+cell positional
|
||||
+cell Location of JSON-formatted evaluation data.
|
||||
|
||||
+row
|
||||
+cell #[code --displacy-path], #[code -dp]
|
||||
+cell option
|
||||
+cell
|
||||
| Directory to output rendered parses as HTML. If not set, no
|
||||
| visualizations will be generated.
|
||||
|
||||
+row
|
||||
+cell #[code --displacy-limit], #[code -dl]
|
||||
+cell option
|
||||
+cell
|
||||
| Number of parses to generate per file. Defaults to #[code 25].
|
||||
| Keep in mind that a significantly higher number might cause the
|
||||
| #[code .html] files to render slowly.
|
||||
|
||||
+row
|
||||
+cell #[code --gpu-id], #[code -g]
|
||||
+cell option
|
||||
+cell GPU to use, if any. Defaults to #[code -1] for CPU.
|
||||
|
||||
+row
|
||||
+cell #[code --gold-preproc], #[code -G]
|
||||
+cell flag
|
||||
+cell Use gold preprocessing.
|
||||
|
||||
+row("foot")
|
||||
+cell prints / creates
|
||||
+cell #[code stdout], HTML
|
||||
+cell Training results and optional displaCy visualizations.
|
||||
|
||||
|
||||
+h(3, "package") Package
|
||||
|
||||
p
|
||||
| Generate a #[+a("/usage/training#models-generating") model Python package]
|
||||
| from an existing model data directory. All data files are copied over.
|
||||
| If the path to a #[code meta.json] is supplied, or a #[code meta.json] is
|
||||
| found in the input directory, this file is used. Otherwise, the data can
|
||||
| be entered directly from the command line. After packaging, you can run
|
||||
| #[code python setup.py sdist] from the newly created directory to turn
|
||||
| your model into an installable archive file.
|
||||
|
||||
+code(false, "bash", "$", false, false, true).
|
||||
python -m spacy package [input_dir] [output_dir] [--meta-path] [--create-meta] [--force]
|
||||
|
||||
+aside-code("Example", "bash").
|
||||
python -m spacy package /input /output
|
||||
cd /output/en_model-0.0.0
|
||||
python setup.py sdist
|
||||
pip install dist/en_model-0.0.0.tar.gz
|
||||
|
||||
+table(["Argument", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code input_dir]
|
||||
+cell positional
|
||||
+cell Path to directory containing model data.
|
||||
|
||||
+row
|
||||
+cell #[code output_dir]
|
||||
+cell positional
|
||||
+cell Directory to create package folder in.
|
||||
|
||||
+row
|
||||
+cell #[code --meta-path], #[code -m]
|
||||
+cell option
|
||||
+cell #[+tag-new(2)] Path to #[code meta.json] file (optional).
|
||||
|
||||
+row
|
||||
+cell #[code --create-meta], #[code -c]
|
||||
+cell flag
|
||||
+cell
|
||||
| #[+tag-new(2)] Create a #[code meta.json] file on the command
|
||||
| line, even if one already exists in the directory. If an
|
||||
| existing file is found, its entries will be shown as the defaults
|
||||
| in the command line prompt.
|
||||
+row
|
||||
+cell #[code --force], #[code -f]
|
||||
+cell flag
|
||||
+cell Force overwriting of existing folder in output directory.
|
||||
|
||||
+row
|
||||
+cell #[code --help], #[code -h]
|
||||
+cell flag
|
||||
+cell Show help message and available arguments.
|
||||
|
||||
+row("foot")
|
||||
+cell creates
|
||||
+cell directory
|
||||
+cell A Python package containing the spaCy model.
|
|
@ -1,39 +0,0 @@
|
|||
//- 💫 DOCS > API > CYTHON > CLASSES
|
||||
|
||||
include ../_includes/_mixins
|
||||
|
||||
+section("doc")
|
||||
+h(2, "doc", "spacy/tokens/doc.pxd") Doc
|
||||
+tag cdef class
|
||||
|
||||
include _cython/_doc
|
||||
|
||||
+section("token")
|
||||
+h(2, "token", "spacy/tokens/token.pxd") Token
|
||||
+tag cdef class
|
||||
|
||||
include _cython/_token
|
||||
|
||||
+section("span")
|
||||
+h(2, "span", "spacy/tokens/span.pxd") Span
|
||||
+tag cdef class
|
||||
|
||||
include _cython/_span
|
||||
|
||||
+section("lexeme")
|
||||
+h(2, "lexeme", "spacy/lexeme.pxd") Lexeme
|
||||
+tag cdef class
|
||||
|
||||
include _cython/_lexeme
|
||||
|
||||
+section("vocab")
|
||||
+h(2, "vocab", "spacy/vocab.pxd") Vocab
|
||||
+tag cdef class
|
||||
|
||||
include _cython/_vocab
|
||||
|
||||
+section("stringstore")
|
||||
+h(2, "stringstore", "spacy/strings.pxd") StringStore
|
||||
+tag cdef class
|
||||
|
||||
include _cython/_stringstore
|
|
@ -1,15 +0,0 @@
|
|||
//- 💫 DOCS > API > CYTHON > STRUCTS
|
||||
|
||||
include ../_includes/_mixins
|
||||
|
||||
+section("tokenc")
|
||||
+h(2, "tokenc", "spacy/structs.pxd") TokenC
|
||||
+tag C struct
|
||||
|
||||
include _cython/_tokenc
|
||||
|
||||
+section("lexemec")
|
||||
+h(2, "lexemec", "spacy/structs.pxd") LexemeC
|
||||
+tag C struct
|
||||
|
||||
include _cython/_lexemec
|
|
@ -1,176 +0,0 @@
|
|||
//- 💫 DOCS > API > CYTHON > ARCHITECTURE
|
||||
|
||||
include ../_includes/_mixins
|
||||
|
||||
+section("overview")
|
||||
+aside("What's Cython?")
|
||||
| #[+a("http://cython.org/") Cython] is a language for writing
|
||||
| C extensions for Python. Most Python code is also valid Cython, but
|
||||
| you can add type declarations to get efficient memory-managed code
|
||||
| just like C or C++.
|
||||
|
||||
p
|
||||
| This section documents spaCy's C-level data structures and
|
||||
| interfaces, intended for use from Cython. Some of the attributes are
|
||||
| primarily for internal use, and all C-level functions and methods are
|
||||
| designed for speed over safety – if you make a mistake and access an
|
||||
| array out-of-bounds, the program may crash abruptly.
|
||||
|
||||
p
|
||||
| With Cython there are four ways of declaring complex data types.
|
||||
| Unfortunately we use all four in different places, as they all have
|
||||
| different utility:
|
||||
|
||||
+table(["Declaration", "Description", "Example"])
|
||||
+row
|
||||
+cell #[code class]
|
||||
+cell A normal Python class.
|
||||
+cell #[+api("language") #[code Language]]
|
||||
|
||||
+row
|
||||
+cell #[code cdef class]
|
||||
+cell
|
||||
| A Python extension type. Differs from a normal Python class
|
||||
| in that its attributes can be defined on the underlying
|
||||
| struct. Can have C-level objects as attributes (notably
|
||||
| structs and pointers), and can have methods which have
|
||||
| C-level objects as arguments or return types.
|
||||
+cell #[+api("cython-classes#lexeme") #[code Lexeme]]
|
||||
|
||||
+row
|
||||
+cell #[code cdef struct]
|
||||
+cell
|
||||
| A struct is just a collection of variables, sort of like a
|
||||
| named tuple, except the memory is contiguous. Structs can't
|
||||
| have methods, only attributes.
|
||||
+cell #[+api("cython-structs#lexemec") #[code LexemeC]]
|
||||
|
||||
+row
|
||||
+cell #[code cdef cppclass]
|
||||
+cell
|
||||
| A C++ class. Like a struct, this can be allocated on the
|
||||
| stack, but can have methods, a constructor and a destructor.
|
||||
| Differs from `cdef class` in that it can be created and
|
||||
| destroyed without acquiring the Python global interpreter
|
||||
| lock. This style is the most obscure.
|
||||
+cell #[+src(gh("spacy", "spacy/syntax/_state.pxd")) #[code StateC]]
|
||||
|
||||
p
|
||||
| The most important classes in spaCy are defined as #[code cdef class]
|
||||
| objects. The underlying data for these objects is usually gathered
|
||||
| into a struct, which is usually named #[code c]. For instance, the
|
||||
| #[+api("cython-classses#lexeme") #[code Lexeme]] class holds a
|
||||
| #[+api("cython-structs#lexemec") #[code LexemeC]] struct, at
|
||||
| #[code Lexeme.c]. This lets you shed the Python container, and pass
|
||||
| a pointer to the underlying data into C-level functions.
|
||||
|
||||
+section("conventions")
|
||||
+h(2, "conventions") Conventions
|
||||
|
||||
p
|
||||
| spaCy's core data structures are implemented as
|
||||
| #[+a("http://cython.org/") Cython] #[code cdef] classes. Memory is
|
||||
| managed through the #[+a(gh("cymem")) #[code cymem]]
|
||||
| #[code cymem.Pool] class, which allows you
|
||||
| to allocate memory which will be freed when the #[code Pool] object
|
||||
| is garbage collected. This means you usually don't have to worry
|
||||
| about freeing memory. You just have to decide which Python object
|
||||
| owns the memory, and make it own the #[code Pool]. When that object
|
||||
| goes out of scope, the memory will be freed. You do have to take
|
||||
| care that no pointers outlive the object that owns them — but this
|
||||
| is generally quite easy.
|
||||
|
||||
p
|
||||
| All Cython modules should have the #[code # cython: infer_types=True]
|
||||
| compiler directive at the top of the file. This makes the code much
|
||||
| cleaner, as it avoids the need for many type declarations. If
|
||||
| possible, you should prefer to declare your functions #[code nogil],
|
||||
| even if you don't especially care about multi-threading. The reason
|
||||
| is that #[code nogil] functions help the Cython compiler reason about
|
||||
| your code quite a lot — you're telling the compiler that no Python
|
||||
| dynamics are possible. This lets many errors be raised, and ensures
|
||||
| your function will run at C speed.
|
||||
|
||||
|
||||
p
|
||||
| Cython gives you many choices of sequences: you could have a Python
|
||||
| list, a numpy array, a memory view, a C++ vector, or a pointer.
|
||||
| Pointers are preferred, because they are fastest, have the most
|
||||
| explicit semantics, and let the compiler check your code more
|
||||
| strictly. C++ vectors are also great — but you should only use them
|
||||
| internally in functions. It's less friendly to accept a vector as an
|
||||
| argument, because that asks the user to do much more work. Here's
|
||||
| how to get a pointer from a numpy array, memory view or vector:
|
||||
|
||||
+code.
|
||||
cdef void get_pointers(np.ndarray[int, mode='c'] numpy_array, vector[int] cpp_vector, int[::1] memory_view) nogil:
|
||||
pointer1 = <int*>numpy_array.data
|
||||
pointer2 = cpp_vector.data()
|
||||
pointer3 = &memory_view[0]
|
||||
|
||||
p
|
||||
| Both C arrays and C++ vectors reassure the compiler that no Python
|
||||
| operations are possible on your variable. This is a big advantage:
|
||||
| it lets the Cython compiler raise many more errors for you.
|
||||
|
||||
p
|
||||
| When getting a pointer from a numpy array or memoryview, take care
|
||||
| that the data is actually stored in C-contiguous order — otherwise
|
||||
| you'll get a pointer to nonsense. The type-declarations in the code
|
||||
| above should generate runtime errors if buffers with incorrect
|
||||
| memory layouts are passed in. To iterate over the array, the
|
||||
| following style is preferred:
|
||||
|
||||
+code.
|
||||
cdef int c_total(const int* int_array, int length) nogil:
|
||||
total = 0
|
||||
for item in int_array[:length]:
|
||||
total += item
|
||||
return total
|
||||
|
||||
p
|
||||
| If this is confusing, consider that the compiler couldn't deal with
|
||||
| #[code for item in int_array:] — there's no length attached to a raw
|
||||
| pointer, so how could we figure out where to stop? The length is
|
||||
| provided in the slice notation as a solution to this. Note that we
|
||||
| don't have to declare the type of #[code item] in the code above —
|
||||
| the compiler can easily infer it. This gives us tidy code that looks
|
||||
| quite like Python, but is exactly as fast as C — because we've made
|
||||
| sure the compilation to C is trivial.
|
||||
|
||||
p
|
||||
| Your functions cannot be declared #[code nogil] if they need to
|
||||
| create Python objects or call Python functions. This is perfectly
|
||||
| okay — you shouldn't torture your code just to get #[code nogil]
|
||||
| functions. However, if your function isn't #[code nogil], you should
|
||||
| compile your module with #[code cython -a --cplus my_module.pyx] and
|
||||
| open the resulting #[code my_module.html] file in a browser. This
|
||||
| will let you see how Cython is compiling your code. Calls into the
|
||||
| Python run-time will be in bright yellow. This lets you easily see
|
||||
| whether Cython is able to correctly type your code, or whether there
|
||||
| are unexpected problems.
|
||||
|
||||
p
|
||||
| Working in Cython is very rewarding once you're over the initial
|
||||
| learning curve. As with C and C++, the first way you write something
|
||||
| in Cython will often be the performance-optimal approach. In
|
||||
| contrast, Python optimisation generally requires a lot of
|
||||
| experimentation. Is it faster to have an #[code if item in my_dict]
|
||||
| check, or to use #[code .get()]? What about
|
||||
| #[code try]/#[code except]? Does this numpy operation create a copy?
|
||||
| There's no way to guess the answers to these questions, and you'll
|
||||
| usually be dissatisfied with your results — so there's no way to
|
||||
| know when to stop this process. In the worst case, you'll make a
|
||||
| mess that invites the next reader to try their luck too. This is
|
||||
| like one of those
|
||||
| #[+a("http://www.wemjournal.org/article/S1080-6032%2809%2970088-2/abstract") volcanic gas-traps],
|
||||
| where the rescuers keep passing out from low oxygen, causing
|
||||
| another rescuer to follow — only to succumb themselves. In short,
|
||||
| just say no to optimizing your Python. If it's not fast enough the
|
||||
| first time, just switch to Cython.
|
||||
|
||||
+infobox("Resources")
|
||||
+list.o-no-block
|
||||
+item #[+a("http://docs.cython.org/en/latest/") Official Cython documentation] (cython.org)
|
||||
+item #[+a("https://explosion.ai/blog/writing-c-in-cython", true) Writing C in Cython] (explosion.ai)
|
||||
+item #[+a("https://explosion.ai/blog/multithreading-with-cython") Multi-threading spaCy’s parser and named entity recogniser] (explosion.ai)
|
|
@ -1,6 +0,0 @@
|
|||
//- 💫 DOCS > API > DEPENDENCYPARSER
|
||||
|
||||
include ../_includes/_mixins
|
||||
|
||||
//- This class inherits from Pipe, so this page uses the template in pipe.jade.
|
||||
!=partial("pipe", { subclass: "DependencyParser", short: "parser", pipeline_id: "parser" })
|
|
@ -1,827 +0,0 @@
|
|||
//- 💫 DOCS > API > DOC
|
||||
|
||||
include ../_includes/_mixins
|
||||
|
||||
p
|
||||
| A #[code Doc] is a sequence of #[+api("token") #[code Token]] objects.
|
||||
| Access sentences and named entities, export annotations to numpy arrays,
|
||||
| losslessly serialize to compressed binary strings. The #[code Doc] object
|
||||
| holds an array of #[code TokenC] structs. The Python-level #[code Token]
|
||||
| and #[+api("span") #[code Span]] objects are views of this array, i.e.
|
||||
| they don't own the data themselves.
|
||||
|
||||
+aside-code("Example").
|
||||
# Construction 1
|
||||
doc = nlp(u'Some text')
|
||||
|
||||
# Construction 2
|
||||
from spacy.tokens import Doc
|
||||
doc = Doc(nlp.vocab, words=[u'hello', u'world', u'!'],
|
||||
spaces=[True, False, False])
|
||||
|
||||
+h(2, "init") Doc.__init__
|
||||
+tag method
|
||||
|
||||
p
|
||||
| Construct a #[code Doc] object. The most common way to get a #[code Doc]
|
||||
| object is via the #[code nlp] object.
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code vocab]
|
||||
+cell #[code Vocab]
|
||||
+cell A storage container for lexical types.
|
||||
|
||||
+row
|
||||
+cell #[code words]
|
||||
+cell -
|
||||
+cell A list of strings to add to the container.
|
||||
|
||||
+row
|
||||
+cell #[code spaces]
|
||||
+cell -
|
||||
+cell
|
||||
| A list of boolean values indicating whether each word has a
|
||||
| subsequent space. Must have the same length as #[code words], if
|
||||
| specified. Defaults to a sequence of #[code True].
|
||||
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell #[code Doc]
|
||||
+cell The newly constructed object.
|
||||
|
||||
+h(2, "getitem") Doc.__getitem__
|
||||
+tag method
|
||||
|
||||
p
|
||||
| Get a #[+api("token") #[code Token]] object at position #[code i], where
|
||||
| #[code i] is an integer. Negative indexing is supported, and follows the
|
||||
| usual Python semantics, i.e. #[code doc[-2]] is #[code doc[len(doc) - 2]].
|
||||
|
||||
+aside-code("Example").
|
||||
doc = nlp(u'Give it back! He pleaded.')
|
||||
assert doc[0].text == 'Give'
|
||||
assert doc[-1].text == '.'
|
||||
span = doc[1:3]
|
||||
assert span.text == 'it back'
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code i]
|
||||
+cell int
|
||||
+cell The index of the token.
|
||||
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell #[code Token]
|
||||
+cell The token at #[code doc[i]].
|
||||
|
||||
p
|
||||
| Get a #[+api("span") #[code Span]] object, starting at position
|
||||
| #[code start] (token index) and ending at position #[code end] (token
|
||||
| index).
|
||||
|
||||
p
|
||||
| For instance, #[code doc[2:5]] produces a span consisting of tokens 2, 3
|
||||
| and 4. Stepped slices (e.g. #[code doc[start : end : step]]) are not
|
||||
| supported, as #[code Span] objects must be contiguous (cannot have gaps).
|
||||
| You can use negative indices and open-ended ranges, which have their
|
||||
| normal Python semantics.
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code start_end]
|
||||
+cell tuple
|
||||
+cell The slice of the document to get.
|
||||
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell #[code Span]
|
||||
+cell The span at #[code doc[start : end]].
|
||||
|
||||
+h(2, "iter") Doc.__iter__
|
||||
+tag method
|
||||
|
||||
p
|
||||
| Iterate over #[code Token] objects, from which the annotations can be
|
||||
| easily accessed.
|
||||
|
||||
+aside-code("Example").
|
||||
doc = nlp(u'Give it back')
|
||||
assert [t.text for t in doc] == [u'Give', u'it', u'back']
|
||||
|
||||
p
|
||||
| This is the main way of accessing #[+api("token") #[code Token]] objects,
|
||||
| which are the main way annotations are accessed from Python. If
|
||||
| faster-than-Python speeds are required, you can instead access the
|
||||
| annotations as a numpy array, or access the underlying C data directly
|
||||
| from Cython.
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row("foot")
|
||||
+cell yields
|
||||
+cell #[code Token]
|
||||
+cell A #[code Token] object.
|
||||
|
||||
+h(2, "len") Doc.__len__
|
||||
+tag method
|
||||
|
||||
p Get the number of tokens in the document.
|
||||
|
||||
+aside-code("Example").
|
||||
doc = nlp(u'Give it back! He pleaded.')
|
||||
assert len(doc) == 7
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell int
|
||||
+cell The number of tokens in the document.
|
||||
|
||||
+h(2, "set_extension") Doc.set_extension
|
||||
+tag classmethod
|
||||
+tag-new(2)
|
||||
|
||||
p
|
||||
| Define a custom attribute on the #[code Doc] which becomes available via
|
||||
| #[code Doc._]. For details, see the documentation on
|
||||
| #[+a("/usage/processing-pipelines#custom-components-attributes") custom attributes].
|
||||
|
||||
+aside-code("Example").
|
||||
from spacy.tokens import Doc
|
||||
city_getter = lambda doc: any(city in doc.text for city in ('New York', 'Paris', 'Berlin'))
|
||||
Doc.set_extension('has_city', getter=city_getter)
|
||||
doc = nlp(u'I like New York')
|
||||
assert doc._.has_city
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code name]
|
||||
+cell unicode
|
||||
+cell
|
||||
| Name of the attribute to set by the extension. For example,
|
||||
| #[code 'my_attr'] will be available as #[code doc._.my_attr].
|
||||
|
||||
+row
|
||||
+cell #[code default]
|
||||
+cell -
|
||||
+cell
|
||||
| Optional default value of the attribute if no getter or method
|
||||
| is defined.
|
||||
|
||||
+row
|
||||
+cell #[code method]
|
||||
+cell callable
|
||||
+cell
|
||||
| Set a custom method on the object, for example
|
||||
| #[code doc._.compare(other_doc)].
|
||||
|
||||
+row
|
||||
+cell #[code getter]
|
||||
+cell callable
|
||||
+cell
|
||||
| Getter function that takes the object and returns an attribute
|
||||
| value. Is called when the user accesses the #[code ._] attribute.
|
||||
|
||||
+row
|
||||
+cell #[code setter]
|
||||
+cell callable
|
||||
+cell
|
||||
| Setter function that takes the #[code Doc] and a value, and
|
||||
| modifies the object. Is called when the user writes to the
|
||||
| #[code Doc._] attribute.
|
||||
|
||||
+h(2, "get_extension") Doc.get_extension
|
||||
+tag classmethod
|
||||
+tag-new(2)
|
||||
|
||||
p
|
||||
| Look up a previously registered extension by name. Returns a 4-tuple
|
||||
| #[code.u-break (default, method, getter, setter)] if the extension is
|
||||
| registered. Raises a #[code KeyError] otherwise.
|
||||
|
||||
+aside-code("Example").
|
||||
from spacy.tokens import Doc
|
||||
Doc.set_extension('has_city', default=False)
|
||||
extension = Doc.get_extension('has_city')
|
||||
assert extension == (False, None, None, None)
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code name]
|
||||
+cell unicode
|
||||
+cell Name of the extension.
|
||||
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell tuple
|
||||
+cell
|
||||
| A #[code.u-break (default, method, getter, setter)] tuple of the
|
||||
| extension.
|
||||
|
||||
+h(2, "has_extension") Doc.has_extension
|
||||
+tag classmethod
|
||||
+tag-new(2)
|
||||
|
||||
p Check whether an extension has been registered on the #[code Doc] class.
|
||||
|
||||
+aside-code("Example").
|
||||
from spacy.tokens import Doc
|
||||
Doc.set_extension('has_city', default=False)
|
||||
assert Doc.has_extension('has_city')
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code name]
|
||||
+cell unicode
|
||||
+cell Name of the extension to check.
|
||||
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell bool
|
||||
+cell Whether the extension has been registered.
|
||||
|
||||
+h(2, "remove_extension") Doc.remove_extension
|
||||
+tag classmethod
|
||||
+tag-new("2.0.12")
|
||||
|
||||
p Remove a previously registered extension.
|
||||
|
||||
+aside-code("Example").
|
||||
from spacy.tokens import Doc
|
||||
Doc.set_extension('has_city', default=False)
|
||||
removed = Doc.remove_extension('has_city')
|
||||
assert not Doc.has_extension('has_city')
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code name]
|
||||
+cell unicode
|
||||
+cell Name of the extension.
|
||||
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell tuple
|
||||
+cell
|
||||
| A #[code.u-break (default, method, getter, setter)] tuple of the
|
||||
| removed extension.
|
||||
|
||||
+h(2, "char_span") Doc.char_span
|
||||
+tag method
|
||||
+tag-new(2)
|
||||
|
||||
p
|
||||
| Create a #[code Span] object from the slice #[code doc.text[start : end]].
|
||||
| Returns #[code None] if the character indices don't map to a valid span.
|
||||
|
||||
+aside-code("Example").
|
||||
doc = nlp(u'I like New York')
|
||||
span = doc.char_span(7, 15, label=u'GPE')
|
||||
assert span.text == 'New York'
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code start]
|
||||
+cell int
|
||||
+cell The index of the first character of the span.
|
||||
|
||||
+row
|
||||
+cell #[code end]
|
||||
+cell int
|
||||
+cell The index of the last character after the span.
|
||||
|
||||
+row
|
||||
+cell #[code label]
|
||||
+cell uint64 / unicode
|
||||
+cell A label to attach to the Span, e.g. for named entities.
|
||||
|
||||
+row
|
||||
+cell #[code vector]
|
||||
+cell #[code.u-break numpy.ndarray[ndim=1, dtype='float32']]
|
||||
+cell A meaning representation of the span.
|
||||
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell #[code Span]
|
||||
+cell The newly constructed object or #[code None].
|
||||
|
||||
+h(2, "similarity") Doc.similarity
|
||||
+tag method
|
||||
+tag-model("vectors")
|
||||
|
||||
p
|
||||
| Make a semantic similarity estimate. The default estimate is cosine
|
||||
| similarity using an average of word vectors.
|
||||
|
||||
+aside-code("Example").
|
||||
apples = nlp(u'I like apples')
|
||||
oranges = nlp(u'I like oranges')
|
||||
apples_oranges = apples.similarity(oranges)
|
||||
oranges_apples = oranges.similarity(apples)
|
||||
assert apples_oranges == oranges_apples
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code other]
|
||||
+cell -
|
||||
+cell
|
||||
| The object to compare with. By default, accepts #[code Doc],
|
||||
| #[code Span], #[code Token] and #[code Lexeme] objects.
|
||||
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell float
|
||||
+cell A scalar similarity score. Higher is more similar.
|
||||
|
||||
+h(2, "count_by") Doc.count_by
|
||||
+tag method
|
||||
|
||||
p
|
||||
| Count the frequencies of a given attribute. Produces a dict of
|
||||
| #[code {attr (int): count (ints)}] frequencies, keyed by the values
|
||||
| of the given attribute ID.
|
||||
|
||||
+aside-code("Example").
|
||||
from spacy.attrs import ORTH
|
||||
doc = nlp(u'apple apple orange banana')
|
||||
assert doc.count_by(ORTH) == {7024L: 1, 119552L: 1, 2087L: 2}
|
||||
doc.to_array([attrs.ORTH])
|
||||
# array([[11880], [11880], [7561], [12800]])
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code attr_id]
|
||||
+cell int
|
||||
+cell The attribute ID
|
||||
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell dict
|
||||
+cell A dictionary mapping attributes to integer counts.
|
||||
|
||||
+h(2, "get_lca_matrix") Doc.get_lca_matrix
|
||||
+tag method
|
||||
|
||||
p
|
||||
| Calculates the lowest common ancestor matrix for a given #[code Doc].
|
||||
| Returns LCA matrix containing the integer index of the ancestor, or
|
||||
| #[code -1] if no common ancestor is found, e.g. if span excludes a
|
||||
| necessary ancestor.
|
||||
|
||||
+aside-code("Example").
|
||||
doc = nlp(u"This is a test")
|
||||
matrix = doc.get_lca_matrix()
|
||||
# array([[0, 1, 1, 1], [1, 1, 1, 1], [1, 1, 2, 3], [1, 1, 3, 3]], dtype=int32)
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell #[code.u-break numpy.ndarray[ndim=2, dtype='int32']]
|
||||
+cell The lowest common ancestor matrix of the #[code Doc].
|
||||
|
||||
+h(2, "to_array") Doc.to_array
|
||||
+tag method
|
||||
|
||||
p
|
||||
| Export given token attributes to a numpy #[code ndarray].
|
||||
| If #[code attr_ids] is a sequence of #[code M] attributes,
|
||||
| the output array will be of shape #[code (N, M)], where #[code N]
|
||||
| is the length of the #[code Doc] (in tokens). If #[code attr_ids] is
|
||||
| a single attribute, the output shape will be #[code (N,)]. You can
|
||||
| specify attributes by integer ID (e.g. #[code spacy.attrs.LEMMA])
|
||||
| or string name (e.g. 'LEMMA' or 'lemma'). The values will be 64-bit
|
||||
| integers.
|
||||
|
||||
+aside-code("Example").
|
||||
from spacy.attrs import LOWER, POS, ENT_TYPE, IS_ALPHA
|
||||
doc = nlp(text)
|
||||
# All strings mapped to integers, for easy export to numpy
|
||||
np_array = doc.to_array([LOWER, POS, ENT_TYPE, IS_ALPHA])
|
||||
np_array = doc.to_array("POS")
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code attr_ids]
|
||||
+cell list or int or string
|
||||
+cell
|
||||
| A list of attributes (int IDs or string names) or
|
||||
| a single attribute (int ID or string name)
|
||||
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell
|
||||
| #[code.u-break numpy.ndarray[ndim=2, dtype='uint64']] or
|
||||
| #[code.u-break numpy.ndarray[ndim=1, dtype='uint64']] or
|
||||
+cell
|
||||
| The exported attributes as a 2D numpy array, with one row per
|
||||
| token and one column per attribute (when #[code attr_ids] is a
|
||||
| list), or as a 1D numpy array, with one item per attribute (when
|
||||
| #[code attr_ids] is a single value).
|
||||
|
||||
+h(2, "from_array") Doc.from_array
|
||||
+tag method
|
||||
|
||||
p
|
||||
| Load attributes from a numpy array. Write to a #[code Doc] object, from
|
||||
| an #[code (M, N)] array of attributes.
|
||||
|
||||
+aside-code("Example").
|
||||
from spacy.attrs import LOWER, POS, ENT_TYPE, IS_ALPHA
|
||||
from spacy.tokens import Doc
|
||||
doc = nlp("Hello world!")
|
||||
np_array = doc.to_array([LOWER, POS, ENT_TYPE, IS_ALPHA])
|
||||
doc2 = Doc(doc.vocab, words=[t.text for t in doc])
|
||||
doc2.from_array([LOWER, POS, ENT_TYPE, IS_ALPHA], np_array)
|
||||
assert doc[0].pos_ == doc2[0].pos_
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code attrs]
|
||||
+cell ints
|
||||
+cell A list of attribute ID ints.
|
||||
|
||||
+row
|
||||
+cell #[code array]
|
||||
+cell #[code.u-break numpy.ndarray[ndim=2, dtype='int32']]
|
||||
+cell The attribute values to load.
|
||||
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell #[code Doc]
|
||||
+cell Itself.
|
||||
|
||||
+h(2, "to_disk") Doc.to_disk
|
||||
+tag method
|
||||
+tag-new(2)
|
||||
|
||||
p Save the current state to a directory.
|
||||
|
||||
+aside-code("Example").
|
||||
doc.to_disk('/path/to/doc')
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code path]
|
||||
+cell unicode or #[code Path]
|
||||
+cell
|
||||
| A path to a directory, which will be created if it doesn't exist.
|
||||
| Paths may be either strings or #[code Path]-like objects.
|
||||
|
||||
+h(2, "from_disk") Doc.from_disk
|
||||
+tag method
|
||||
+tag-new(2)
|
||||
|
||||
p Loads state from a directory. Modifies the object in place and returns it.
|
||||
|
||||
+aside-code("Example").
|
||||
from spacy.tokens import Doc
|
||||
from spacy.vocab import Vocab
|
||||
doc = Doc(Vocab()).from_disk('/path/to/doc')
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code path]
|
||||
+cell unicode or #[code Path]
|
||||
+cell
|
||||
| A path to a directory. Paths may be either strings or
|
||||
| #[code Path]-like objects.
|
||||
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell #[code Doc]
|
||||
+cell The modified #[code Doc] object.
|
||||
|
||||
+h(2, "to_bytes") Doc.to_bytes
|
||||
+tag method
|
||||
|
||||
p Serialize, i.e. export the document contents to a binary string.
|
||||
|
||||
+aside-code("Example").
|
||||
doc = nlp(u'Give it back! He pleaded.')
|
||||
doc_bytes = doc.to_bytes()
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell bytes
|
||||
+cell
|
||||
| A losslessly serialized copy of the #[code Doc], including all
|
||||
| annotations.
|
||||
|
||||
+h(2, "from_bytes") Doc.from_bytes
|
||||
+tag method
|
||||
|
||||
p Deserialize, i.e. import the document contents from a binary string.
|
||||
|
||||
+aside-code("Example").
|
||||
from spacy.tokens import Doc
|
||||
text = u'Give it back! He pleaded.'
|
||||
doc = nlp(text)
|
||||
bytes = doc.to_bytes()
|
||||
doc2 = Doc(doc.vocab).from_bytes(bytes)
|
||||
assert doc.text == doc2.text
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code data]
|
||||
+cell bytes
|
||||
+cell The string to load from.
|
||||
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell #[code Doc]
|
||||
+cell The #[code Doc] object.
|
||||
|
||||
+h(2, "merge") Doc.merge
|
||||
+tag method
|
||||
|
||||
p
|
||||
| Retokenize the document, such that the span at
|
||||
| #[code doc.text[start_idx : end_idx]] is merged into a single token. If
|
||||
| #[code start_idx] and #[code end_idx] do not mark start and end token
|
||||
| boundaries, the document remains unchanged.
|
||||
|
||||
+aside-code("Example").
|
||||
doc = nlp(u'Los Angeles start.')
|
||||
doc.merge(0, len('Los Angeles'), 'NNP', 'Los Angeles', 'GPE')
|
||||
assert [t.text for t in doc] == [u'Los Angeles', u'start', u'.']
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code start_idx]
|
||||
+cell int
|
||||
+cell The character index of the start of the slice to merge.
|
||||
|
||||
+row
|
||||
+cell #[code end_idx]
|
||||
+cell int
|
||||
+cell The character index after the end of the slice to merge.
|
||||
|
||||
+row
|
||||
+cell #[code **attributes]
|
||||
+cell -
|
||||
+cell
|
||||
| Attributes to assign to the merged token. By default,
|
||||
| attributes are inherited from the syntactic root token of
|
||||
| the span.
|
||||
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell #[code Token]
|
||||
+cell
|
||||
| The newly merged token, or #[code None] if the start and end
|
||||
| indices did not fall at token boundaries
|
||||
|
||||
+h(2, "print_tree") Doc.print_tree
|
||||
+tag method
|
||||
+tag-model("parse")
|
||||
|
||||
p
|
||||
| Returns the parse trees in JSON (dict) format. Especially useful for
|
||||
| web applications.
|
||||
|
||||
+aside-code("Example").
|
||||
doc = nlp(u'Alice ate the pizza.')
|
||||
trees = doc.print_tree()
|
||||
# {'modifiers': [
|
||||
# {'modifiers': [], 'NE': 'PERSON', 'word': 'Alice', 'arc': 'nsubj', 'POS_coarse': 'PROPN', 'POS_fine': 'NNP', 'lemma': 'Alice'},
|
||||
# {'modifiers': [{'modifiers': [], 'NE': '', 'word': 'the', 'arc': 'det', 'POS_coarse': 'DET', 'POS_fine': 'DT', 'lemma': 'the'}], 'NE': '', 'word': 'pizza', 'arc': 'dobj', 'POS_coarse': 'NOUN', 'POS_fine': 'NN', 'lemma': 'pizza'},
|
||||
# {'modifiers': [], 'NE': '', 'word': '.', 'arc': 'punct', 'POS_coarse': 'PUNCT', 'POS_fine': '.', 'lemma': '.'}
|
||||
# ], 'NE': '', 'word': 'ate', 'arc': 'ROOT', 'POS_coarse': 'VERB', 'POS_fine': 'VBD', 'lemma': 'eat'}
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code light]
|
||||
+cell bool
|
||||
+cell Don't include lemmas or entities.
|
||||
|
||||
+row
|
||||
+cell #[code flat]
|
||||
+cell bool
|
||||
+cell Don't include arcs or modifiers.
|
||||
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell dict
|
||||
+cell Parse tree as dict.
|
||||
|
||||
+h(2, "ents") Doc.ents
|
||||
+tag property
|
||||
+tag-model("NER")
|
||||
|
||||
p
|
||||
| Iterate over the entities in the document. Yields named-entity
|
||||
| #[code Span] objects, if the entity recognizer has been applied to the
|
||||
| document.
|
||||
|
||||
+aside-code("Example").
|
||||
doc = nlp(u'Mr. Best flew to New York on Saturday morning.')
|
||||
ents = list(doc.ents)
|
||||
assert ents[0].label == 346
|
||||
assert ents[0].label_ == 'PERSON'
|
||||
assert ents[0].text == 'Mr. Best'
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row("foot")
|
||||
+cell yields
|
||||
+cell #[code Span]
|
||||
+cell Entities in the document.
|
||||
|
||||
+h(2, "noun_chunks") Doc.noun_chunks
|
||||
+tag property
|
||||
+tag-model("parse")
|
||||
|
||||
p
|
||||
| Iterate over the base noun phrases in the document. Yields base
|
||||
| noun-phrase #[code Span] objects, if the document has been syntactically
|
||||
| parsed. A base noun phrase, or "NP chunk", is a noun phrase that does not
|
||||
| permit other NPs to be nested within it – so no NP-level coordination, no
|
||||
| prepositional phrases, and no relative clauses.
|
||||
|
||||
+aside-code("Example").
|
||||
doc = nlp(u'A phrase with another phrase occurs.')
|
||||
chunks = list(doc.noun_chunks)
|
||||
assert chunks[0].text == "A phrase"
|
||||
assert chunks[1].text == "another phrase"
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row("foot")
|
||||
+cell yields
|
||||
+cell #[code Span]
|
||||
+cell Noun chunks in the document.
|
||||
|
||||
+h(2, "sents") Doc.sents
|
||||
+tag property
|
||||
+tag-model("parse")
|
||||
|
||||
p
|
||||
| Iterate over the sentences in the document. Sentence spans have no label.
|
||||
| To improve accuracy on informal texts, spaCy calculates sentence boundaries
|
||||
| from the syntactic dependency parse. If the parser is disabled,
|
||||
| the #[code sents] iterator will be unavailable.
|
||||
|
||||
+aside-code("Example").
|
||||
doc = nlp(u"This is a sentence. Here's another...")
|
||||
sents = list(doc.sents)
|
||||
assert len(sents) == 2
|
||||
assert [s.root.text for s in sents] == ["is", "'s"]
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row("foot")
|
||||
+cell yields
|
||||
+cell #[code Span]
|
||||
+cell Sentences in the document.
|
||||
|
||||
+h(2, "has_vector") Doc.has_vector
|
||||
+tag property
|
||||
+tag-model("vectors")
|
||||
|
||||
p
|
||||
| A boolean value indicating whether a word vector is associated with the
|
||||
| object.
|
||||
|
||||
+aside-code("Example").
|
||||
doc = nlp(u'I like apples')
|
||||
assert doc.has_vector
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell bool
|
||||
+cell Whether the document has a vector data attached.
|
||||
|
||||
+h(2, "vector") Doc.vector
|
||||
+tag property
|
||||
+tag-model("vectors")
|
||||
|
||||
p
|
||||
| A real-valued meaning representation. Defaults to an average of the
|
||||
| token vectors.
|
||||
|
||||
+aside-code("Example").
|
||||
doc = nlp(u'I like apples')
|
||||
assert doc.vector.dtype == 'float32'
|
||||
assert doc.vector.shape == (300,)
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell #[code.u-break numpy.ndarray[ndim=1, dtype='float32']]
|
||||
+cell A 1D numpy array representing the document's semantics.
|
||||
|
||||
+h(2, "vector_norm") Doc.vector_norm
|
||||
+tag property
|
||||
+tag-model("vectors")
|
||||
|
||||
p
|
||||
| The L2 norm of the document's vector representation.
|
||||
|
||||
+aside-code("Example").
|
||||
doc1 = nlp(u'I like apples')
|
||||
doc2 = nlp(u'I like oranges')
|
||||
doc1.vector_norm # 4.54232424414368
|
||||
doc2.vector_norm # 3.304373298575751
|
||||
assert doc1.vector_norm != doc2.vector_norm
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell float
|
||||
+cell The L2 norm of the vector representation.
|
||||
|
||||
+h(2, "attributes") Attributes
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code text]
|
||||
+cell unicode
|
||||
+cell A unicode representation of the document text.
|
||||
|
||||
+row
|
||||
+cell #[code text_with_ws]
|
||||
+cell unicode
|
||||
+cell
|
||||
| An alias of #[code Doc.text], provided for duck-type compatibility
|
||||
| with #[code Span] and #[code Token].
|
||||
|
||||
+row
|
||||
+cell #[code mem]
|
||||
+cell #[code Pool]
|
||||
+cell The document's local memory heap, for all C data it owns.
|
||||
|
||||
+row
|
||||
+cell #[code vocab]
|
||||
+cell #[code Vocab]
|
||||
+cell The store of lexical types.
|
||||
|
||||
+row
|
||||
+cell #[code tensor] #[+tag-new(2)]
|
||||
+cell object
|
||||
+cell Container for dense vector representations.
|
||||
|
||||
+row
|
||||
+cell #[code cats] #[+tag-new(2)]
|
||||
+cell dictionary
|
||||
+cell
|
||||
| Maps either a label to a score for categories applied to whole
|
||||
| document, or #[code (start_char, end_char, label)] to score for
|
||||
| categories applied to spans. #[code start_char] and #[code end_char]
|
||||
| should be character offsets, label can be either a string or an
|
||||
| integer ID, and score should be a float.
|
||||
|
||||
+row
|
||||
+cell #[code user_data]
|
||||
+cell -
|
||||
+cell A generic storage area, for user custom data.
|
||||
|
||||
+row
|
||||
+cell #[code is_tagged]
|
||||
+cell bool
|
||||
+cell
|
||||
| A flag indicating that the document has been part-of-speech
|
||||
| tagged.
|
||||
|
||||
+row
|
||||
+cell #[code is_parsed]
|
||||
+cell bool
|
||||
+cell A flag indicating that the document has been syntactically parsed.
|
||||
|
||||
+row
|
||||
+cell #[code is_sentenced]
|
||||
+cell bool
|
||||
+cell
|
||||
| A flag indicating that sentence boundaries have been applied to
|
||||
| the document.
|
||||
|
||||
+row
|
||||
+cell #[code sentiment]
|
||||
+cell float
|
||||
+cell The document's positivity/negativity score, if available.
|
||||
|
||||
+row
|
||||
+cell #[code user_hooks]
|
||||
+cell dict
|
||||
+cell
|
||||
| A dictionary that allows customisation of the #[code Doc]'s
|
||||
| properties.
|
||||
|
||||
+row
|
||||
+cell #[code user_token_hooks]
|
||||
+cell dict
|
||||
+cell
|
||||
| A dictionary that allows customisation of properties of
|
||||
| #[code Token] children.
|
||||
|
||||
+row
|
||||
+cell #[code user_span_hooks]
|
||||
+cell dict
|
||||
+cell
|
||||
| A dictionary that allows customisation of properties of
|
||||
| #[code Span] children.
|
||||
|
||||
+row
|
||||
+cell #[code _]
|
||||
+cell #[code Underscore]
|
||||
+cell
|
||||
| User space for adding custom
|
||||
| #[+a("/usage/processing-pipelines#custom-components-attributes") attribute extensions].
|
|
@ -1,6 +0,0 @@
|
|||
//- 💫 DOCS > API > ENTITYRECOGNIZER
|
||||
|
||||
include ../_includes/_mixins
|
||||
|
||||
//- This class inherits from Pipe, so this page uses the template in pipe.jade.
|
||||
!=partial("pipe", { subclass: "EntityRecognizer", short: "ner", pipeline_id: "ner" })
|
|
@ -1,35 +0,0 @@
|
|||
//- 💫 DOCS > API > GOLDCORPUS
|
||||
|
||||
include ../_includes/_mixins
|
||||
|
||||
p
|
||||
| This class manages annotations for tagging, dependency parsing and NER.
|
||||
|
||||
+h(2, "init") GoldCorpus.__init__
|
||||
+tag method
|
||||
|
||||
p Create a #[code GoldCorpus].
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code train]
|
||||
+cell unicode or #[code Path] or iterable
|
||||
+cell
|
||||
| Training data, as a path (file or directory) or iterable. If an
|
||||
| iterable, each item should be a #[code (text, paragraphs)]
|
||||
| tuple, where each paragraph is a tuple
|
||||
| #[code.u-break (sentences, brackets)],and each sentence is a
|
||||
| tuple #[code.u-break (ids, words, tags, heads, ner)]. See the
|
||||
| implementation of
|
||||
| #[+src(gh("spacy", "spacy/gold.pyx")) #[code gold.read_json_file]]
|
||||
| for further details.
|
||||
|
||||
+row
|
||||
+cell #[code dev]
|
||||
+cell unicode or #[code Path] or iterable
|
||||
+cell Development data, as a path (file or directory) or iterable.
|
||||
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell #[code GoldCorpus]
|
||||
+cell The newly constructed object.
|
|
@ -1,203 +0,0 @@
|
|||
//- 💫 DOCS > API > GOLDPARSE
|
||||
|
||||
include ../_includes/_mixins
|
||||
|
||||
p Collection for training annotations.
|
||||
|
||||
+h(2, "init") GoldParse.__init__
|
||||
+tag method
|
||||
|
||||
p Create a #[code GoldParse].
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code doc]
|
||||
+cell #[code Doc]
|
||||
+cell The document the annotations refer to.
|
||||
|
||||
+row
|
||||
+cell #[code words]
|
||||
+cell iterable
|
||||
+cell A sequence of unicode word strings.
|
||||
|
||||
+row
|
||||
+cell #[code tags]
|
||||
+cell iterable
|
||||
+cell A sequence of strings, representing tag annotations.
|
||||
|
||||
+row
|
||||
+cell #[code heads]
|
||||
+cell iterable
|
||||
+cell A sequence of integers, representing syntactic head offsets.
|
||||
|
||||
+row
|
||||
+cell #[code deps]
|
||||
+cell iterable
|
||||
+cell A sequence of strings, representing the syntactic relation types.
|
||||
|
||||
+row
|
||||
+cell #[code entities]
|
||||
+cell iterable
|
||||
+cell A sequence of named entity annotations, either as BILUO tag strings, or as #[code (start_char, end_char, label)] tuples, representing the entity positions.
|
||||
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell #[code GoldParse]
|
||||
+cell The newly constructed object.
|
||||
|
||||
+h(2, "len") GoldParse.__len__
|
||||
+tag method
|
||||
|
||||
p Get the number of gold-standard tokens.
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell int
|
||||
+cell The number of gold-standard tokens.
|
||||
|
||||
+h(2, "is_projective") GoldParse.is_projective
|
||||
+tag property
|
||||
|
||||
p
|
||||
| Whether the provided syntactic annotations form a projective dependency
|
||||
| tree.
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell bool
|
||||
+cell Whether annotations form projective tree.
|
||||
|
||||
|
||||
+h(2, "attributes") Attributes
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code tags]
|
||||
+cell list
|
||||
+cell The part-of-speech tag annotations.
|
||||
|
||||
+row
|
||||
+cell #[code heads]
|
||||
+cell list
|
||||
+cell The syntactic head annotations.
|
||||
|
||||
+row
|
||||
+cell #[code labels]
|
||||
+cell list
|
||||
+cell The syntactic relation-type annotations.
|
||||
|
||||
+row
|
||||
+cell #[code ents]
|
||||
+cell list
|
||||
+cell The named entity annotations.
|
||||
|
||||
+row
|
||||
+cell #[code cand_to_gold]
|
||||
+cell list
|
||||
+cell The alignment from candidate tokenization to gold tokenization.
|
||||
|
||||
+row
|
||||
+cell #[code gold_to_cand]
|
||||
+cell list
|
||||
+cell The alignment from gold tokenization to candidate tokenization.
|
||||
|
||||
+row
|
||||
+cell #[code cats] #[+tag-new(2)]
|
||||
+cell list
|
||||
+cell
|
||||
| Entries in the list should be either a label, or a
|
||||
| #[code (start, end, label)] triple. The tuple form is used for
|
||||
| categories applied to spans of the document.
|
||||
|
||||
|
||||
+h(2, "util") Utilities
|
||||
|
||||
+h(3, "biluo_tags_from_offsets") gold.biluo_tags_from_offsets
|
||||
+tag function
|
||||
|
||||
p
|
||||
| Encode labelled spans into per-token tags, using the
|
||||
| #[+a("/api/annotation#biluo") BILUO scheme] (Begin/In/Last/Unit/Out).
|
||||
|
||||
p
|
||||
| Returns a list of unicode strings, describing the tags. Each tag string
|
||||
| will be of the form of either #[code ""], #[code "O"] or
|
||||
| #[code "{action}-{label}"], where action is one of #[code "B"],
|
||||
| #[code "I"], #[code "L"], #[code "U"]. The string #[code "-"]
|
||||
| is used where the entity offsets don't align with the tokenization in the
|
||||
| #[code Doc] object. The training algorithm will view these as missing
|
||||
| values. #[code O] denotes a non-entity token. #[code B] denotes the
|
||||
| beginning of a multi-token entity, #[code I] the inside of an entity
|
||||
| of three or more tokens, and #[code L] the end of an entity of two or
|
||||
| more tokens. #[code U] denotes a single-token entity.
|
||||
|
||||
+aside-code("Example").
|
||||
from spacy.gold import biluo_tags_from_offsets
|
||||
|
||||
doc = nlp(u'I like London.')
|
||||
entities = [(7, 13, 'LOC')]
|
||||
tags = biluo_tags_from_offsets(doc, entities)
|
||||
assert tags == ['O', 'O', 'U-LOC', 'O']
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code doc]
|
||||
+cell #[code Doc]
|
||||
+cell
|
||||
| The document that the entity offsets refer to. The output tags
|
||||
| will refer to the token boundaries within the document.
|
||||
|
||||
+row
|
||||
+cell #[code entities]
|
||||
+cell iterable
|
||||
+cell
|
||||
| A sequence of #[code (start, end, label)] triples. #[code start]
|
||||
| and #[code end] should be character-offset integers denoting the
|
||||
| slice into the original string.
|
||||
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell list
|
||||
+cell
|
||||
| Unicode strings, describing the
|
||||
| #[+a("/api/annotation#biluo") BILUO] tags.
|
||||
|
||||
+h(3, "offsets_from_biluo_tags") gold.offsets_from_biluo_tags
|
||||
|
||||
p
|
||||
| Encode per-token tags following the
|
||||
| #[+a("/api/annotation#biluo") BILUO scheme] into entity offsets.
|
||||
|
||||
+aside-code("Example").
|
||||
from spacy.gold import offsets_from_biluo_tags
|
||||
|
||||
doc = nlp('I like London.')
|
||||
tags = ['O', 'O', 'U-LOC', 'O']
|
||||
entities = offsets_from_biluo_tags(doc, tags)
|
||||
assert entities == [(7, 13, 'LOC')]
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code doc]
|
||||
+cell #[code Doc]
|
||||
+cell The document that the BILUO tags refer to.
|
||||
|
||||
+row
|
||||
+cell #[code entities]
|
||||
+cell iterable
|
||||
+cell
|
||||
| A sequence of #[+a("/api/annotation#biluo") BILUO] tags with
|
||||
| each tag describing one token. Each tag string will be of the
|
||||
| form of either #[code ""], #[code "O"] or
|
||||
| #[code "{action}-{label}"], where action is one of #[code "B"],
|
||||
| #[code "I"], #[code "L"], #[code "U"].
|
||||
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell list
|
||||
+cell
|
||||
| A sequence of #[code (start, end, label)] triples. #[code start]
|
||||
| and #[code end] will be character-offset integers denoting the
|
||||
| slice into the original string.
|
|
@ -1,157 +0,0 @@
|
|||
//- 💫 DOCS > API > ARCHITECTURE
|
||||
|
||||
include ../_includes/_mixins
|
||||
|
||||
+section("basics")
|
||||
include ../usage/_spacy-101/_architecture
|
||||
|
||||
+section("nn-model")
|
||||
+h(2, "nn-model") Neural network model architecture
|
||||
|
||||
p
|
||||
| spaCy's statistical models have been custom-designed to give a
|
||||
| high-performance mix of speed and accuracy. The current architecture
|
||||
| hasn't been published yet, but in the meantime we prepared a video that
|
||||
| explains how the models work, with particular focus on NER.
|
||||
|
||||
+youtube("sqDHBH9IjRU")
|
||||
|
||||
p
|
||||
| The parsing model is a blend of recent results. The two recent
|
||||
| inspirations have been the work of Eli Klipperwasser and Yoav Goldberg at
|
||||
| Bar Ilan#[+fn(1)], and the SyntaxNet team from Google. The foundation of
|
||||
| the parser is still based on the work of Joakim Nivre#[+fn(2)], who
|
||||
| introduced the transition-based framework#[+fn(3)], the arc-eager
|
||||
| transition system, and the imitation learning objective. The model is
|
||||
| implemented using #[+a(gh("thinc")) Thinc], spaCy's machine learning
|
||||
| library. We first predict context-sensitive vectors for each word in the
|
||||
| input:
|
||||
|
||||
+code.
|
||||
(embed_lower | embed_prefix | embed_suffix | embed_shape)
|
||||
>> Maxout(token_width)
|
||||
>> convolution ** 4
|
||||
|
||||
p
|
||||
| This convolutional layer is shared between the tagger, parser and NER,
|
||||
| and will also be shared by the future neural lemmatizer. Because the
|
||||
| parser shares these layers with the tagger, the parser does not require
|
||||
| tag features. I got this trick from David Weiss's "Stack Combination"
|
||||
| paper#[+fn(4)].
|
||||
|
||||
p
|
||||
| To boost the representation, the tagger actually predicts a "super tag"
|
||||
| with POS, morphology and dependency label#[+fn(5)]. The tagger predicts
|
||||
| these supertags by adding a softmax layer onto the convolutional layer –
|
||||
| so, we're teaching the convolutional layer to give us a representation
|
||||
| that's one affine transform from this informative lexical information.
|
||||
| This is obviously good for the parser (which backprops to the
|
||||
| convolutions too). The parser model makes a state vector by concatenating
|
||||
| the vector representations for its context tokens. The current context
|
||||
| tokens:
|
||||
|
||||
+table
|
||||
+row
|
||||
+cell #[code S0], #[code S1], #[code S2]
|
||||
+cell Top three words on the stack.
|
||||
|
||||
+row
|
||||
+cell #[code B0], #[code B1]
|
||||
+cell First two words of the buffer.
|
||||
|
||||
+row
|
||||
+cell
|
||||
| #[code S0L1], #[code S1L1], #[code S2L1], #[code B0L1],
|
||||
| #[code B1L1]#[br]
|
||||
| #[code S0L2], #[code S1L2], #[code S2L2], #[code B0L2],
|
||||
| #[code B1L2]
|
||||
+cell
|
||||
| Leftmost and second leftmost children of #[code S0], #[code S1],
|
||||
| #[code S2], #[code B0] and #[code B1].
|
||||
|
||||
+row
|
||||
+cell
|
||||
| #[code S0R1], #[code S1R1], #[code S2R1], #[code B0R1],
|
||||
| #[code B1R1]#[br]
|
||||
| #[code S0R2], #[code S1R2], #[code S2R2], #[code B0R2],
|
||||
| #[code B1R2]
|
||||
+cell
|
||||
| Rightmost and second rightmost children of #[code S0], #[code S1],
|
||||
| #[code S2], #[code B0] and #[code B1].
|
||||
|
||||
p
|
||||
| This makes the state vector quite long: #[code 13*T], where #[code T] is
|
||||
| the token vector width (128 is working well). Fortunately, there's a way
|
||||
| to structure the computation to save some expense (and make it more
|
||||
| GPU-friendly).
|
||||
|
||||
p
|
||||
| The parser typically visits #[code 2*N] states for a sentence of length
|
||||
| #[code N] (although it may visit more, if it back-tracks with a
|
||||
| non-monotonic transition#[+fn(4)]). A naive implementation would require
|
||||
| #[code 2*N (B, 13*T) @ (13*T, H)] matrix multiplications for a batch of
|
||||
| size #[code B]. We can instead perform one #[code (B*N, T) @ (T, 13*H)]
|
||||
| multiplication, to pre-compute the hidden weights for each positional
|
||||
| feature with respect to the words in the batch. (Note that our token
|
||||
| vectors come from the CNN — so we can't play this trick over the
|
||||
| vocabulary. That's how Stanford's NN parser#[+fn(3)] works — and why its
|
||||
| model is so big.)
|
||||
|
||||
p
|
||||
| This pre-computation strategy allows a nice compromise between
|
||||
| GPU-friendliness and implementation simplicity. The CNN and the wide
|
||||
| lower layer are computed on the GPU, and then the precomputed hidden
|
||||
| weights are moved to the CPU, before we start the transition-based
|
||||
| parsing process. This makes a lot of things much easier. We don't have to
|
||||
| worry about variable-length batch sizes, and we don't have to implement
|
||||
| the dynamic oracle in CUDA to train.
|
||||
|
||||
p
|
||||
| Currently the parser's loss function is multilabel log loss#[+fn(6)], as
|
||||
| the dynamic oracle allows multiple states to be 0 cost. This is defined
|
||||
| as follows, where #[code gZ] is the sum of the scores assigned to gold
|
||||
| classes:
|
||||
|
||||
+code.
|
||||
(exp(score) / Z) - (exp(score) / gZ)
|
||||
|
||||
+bibliography
|
||||
+item
|
||||
| #[+a("https://www.semanticscholar.org/paper/Simple-and-Accurate-Dependency-Parsing-Using-Bidir-Kiperwasser-Goldberg/3cf31ecb2724b5088783d7c96a5fc0d5604cbf41") Simple and Accurate Dependency Parsing Using Bidirectional LSTM Feature Representations]
|
||||
br
|
||||
| Eliyahu Kiperwasser, Yoav Goldberg. (2016)
|
||||
|
||||
+item
|
||||
| #[+a("https://www.semanticscholar.org/paper/A-Dynamic-Oracle-for-Arc-Eager-Dependency-Parsing-Goldberg-Nivre/22697256ec19ecc3e14fcfc63624a44cf9c22df4") A Dynamic Oracle for Arc-Eager Dependency Parsing]
|
||||
br
|
||||
| Yoav Goldberg, Joakim Nivre (2012)
|
||||
|
||||
+item
|
||||
| #[+a("https://explosion.ai/blog/parsing-english-in-python") Parsing English in 500 Lines of Python]
|
||||
br
|
||||
| Matthew Honnibal (2013)
|
||||
|
||||
+item
|
||||
| #[+a("https://www.semanticscholar.org/paper/Stack-propagation-Improved-Representation-Learning-Zhang-Weiss/0c133f79b23e8c680891d2e49a66f0e3d37f1466") Stack-propagation: Improved Representation Learning for Syntax]
|
||||
br
|
||||
| Yuan Zhang, David Weiss (2016)
|
||||
|
||||
+item
|
||||
| #[+a("https://www.semanticscholar.org/paper/Deep-multi-task-learning-with-low-level-tasks-supe-S%C3%B8gaard-Goldberg/03ad06583c9721855ccd82c3d969a01360218d86") Deep multi-task learning with low level tasks supervised at lower layers]
|
||||
br
|
||||
| Anders Søgaard, Yoav Goldberg (2016)
|
||||
|
||||
+item
|
||||
| #[+a("https://www.semanticscholar.org/paper/An-Improved-Non-monotonic-Transition-System-for-De-Honnibal-Johnson/4094cee47ade13b77b5ab4d2e6cb9dd2b8a2917c") An Improved Non-monotonic Transition System for Dependency Parsing]
|
||||
br
|
||||
| Matthew Honnibal, Mark Johnson (2015)
|
||||
|
||||
+item
|
||||
| #[+a("http://cs.stanford.edu/people/danqi/papers/emnlp2014.pdf") A Fast and Accurate Dependency Parser using Neural Networks]
|
||||
br
|
||||
| Danqi Cheng, Christopher D. Manning (2014)
|
||||
|
||||
+item
|
||||
| #[+a("https://www.semanticscholar.org/paper/Parsing-the-Wall-Street-Journal-using-a-Lexical-Fu-Riezler-King/0ad07862a91cd59b7eb5de38267e47725a62b8b2") Parsing the Wall Street Journal using a Lexical-Functional Grammar and Discriminative Estimation Techniques]
|
||||
br
|
||||
| Stefan Riezler et al. (2002)
|
|
@ -1,702 +0,0 @@
|
|||
//- 💫 DOCS > API > LANGUAGE
|
||||
|
||||
include ../_includes/_mixins
|
||||
|
||||
p
|
||||
| Usually you'll load this once per process as #[code nlp] and pass the
|
||||
| instance around your application. The #[code Language] class is created
|
||||
| when you call #[+api("spacy#load") #[code spacy.load()]] and contains
|
||||
| the shared vocabulary and #[+a("/usage/adding-languages") language data],
|
||||
| optional model data loaded from a #[+a("/models") model package] or
|
||||
| a path, and a #[+a("/usage/processing-pipelines") processing pipeline]
|
||||
| containing components like the tagger or parser that are called on a
|
||||
| document in order. You can also add your own processing pipeline
|
||||
| components that take a #[code Doc] object, modify it and return it.
|
||||
|
||||
+h(2, "init") Language.__init__
|
||||
+tag method
|
||||
|
||||
p Initialise a #[code Language] object.
|
||||
|
||||
+aside-code("Example").
|
||||
from spacy.vocab import Vocab
|
||||
from spacy.language import Language
|
||||
nlp = Language(Vocab())
|
||||
|
||||
from spacy.lang.en import English
|
||||
nlp = English()
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code vocab]
|
||||
+cell #[code Vocab]
|
||||
+cell
|
||||
| A #[code Vocab] object. If #[code True], a vocab is created via
|
||||
| #[code Language.Defaults.create_vocab].
|
||||
|
||||
+row
|
||||
+cell #[code make_doc]
|
||||
+cell callable
|
||||
+cell
|
||||
| A function that takes text and returns a #[code Doc] object.
|
||||
| Usually a #[code Tokenizer].
|
||||
|
||||
+row
|
||||
+cell #[code meta]
|
||||
+cell dict
|
||||
+cell
|
||||
| Custom meta data for the #[code Language] class. Is written to by
|
||||
| models to add model meta data.
|
||||
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell #[code Language]
|
||||
+cell The newly constructed object.
|
||||
|
||||
+h(2, "call") Language.__call__
|
||||
+tag method
|
||||
|
||||
p
|
||||
| Apply the pipeline to some text. The text can span multiple sentences,
|
||||
| and can contain arbtrary whitespace. Alignment into the original string
|
||||
| is preserved.
|
||||
|
||||
+aside-code("Example").
|
||||
doc = nlp(u'An example sentence. Another sentence.')
|
||||
assert (doc[0].text, doc[0].head.tag_) == ('An', 'NN')
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code text]
|
||||
+cell unicode
|
||||
+cell The text to be processed.
|
||||
|
||||
+row
|
||||
+cell #[code disable]
|
||||
+cell list
|
||||
+cell
|
||||
| Names of pipeline components to
|
||||
| #[+a("/usage/processing-pipelines#disabling") disable].
|
||||
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell #[code Doc]
|
||||
+cell A container for accessing the annotations.
|
||||
|
||||
+infobox("Changed in v2.0", "⚠️")
|
||||
| Pipeline components to prevent from being loaded can now be added as
|
||||
| a list to #[code disable], instead of specifying one keyword argument
|
||||
| per component.
|
||||
|
||||
+code-wrapper
|
||||
+code-new doc = nlp(u"I don't want parsed", disable=['parser'])
|
||||
+code-old doc = nlp(u"I don't want parsed", parse=False)
|
||||
|
||||
+h(2, "pipe") Language.pipe
|
||||
+tag method
|
||||
|
||||
p
|
||||
| Process texts as a stream, and yield #[code Doc] objects in order.
|
||||
| Supports GIL-free multi-threading.
|
||||
|
||||
+infobox("Important note for spaCy v2.0.x", "⚠️")
|
||||
| By default, multiple threads will be launched for matrix multiplication,
|
||||
| which may be inefficient on multi-core machines. Setting
|
||||
| #[code OPENBLAS_NUM_THREADS=1] should fix this problem. spaCy v2.1.x
|
||||
| will be switching to single-thread by default.
|
||||
|
||||
+aside-code("Example").
|
||||
texts = [u'One document.', u'...', u'Lots of documents']
|
||||
for doc in nlp.pipe(texts, batch_size=50, n_threads=4):
|
||||
assert doc.is_parsed
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code texts]
|
||||
+cell -
|
||||
+cell A sequence of unicode objects.
|
||||
|
||||
+row
|
||||
+cell #[code as_tuples]
|
||||
+cell bool
|
||||
+cell
|
||||
| If set to #[code True], inputs should be a sequence of
|
||||
| #[code (text, context)] tuples. Output will then be a sequence of
|
||||
| #[code (doc, context)] tuples. Defaults to #[code False].
|
||||
|
||||
+row
|
||||
+cell #[code n_threads]
|
||||
+cell int
|
||||
+cell
|
||||
| The number of worker threads to use. If #[code -1], OpenMP will
|
||||
| decide how many to use at run time. Default is #[code 2].
|
||||
|
||||
+row
|
||||
+cell #[code batch_size]
|
||||
+cell int
|
||||
+cell The number of texts to buffer.
|
||||
|
||||
+row
|
||||
+cell #[code disable]
|
||||
+cell list
|
||||
+cell
|
||||
| Names of pipeline components to
|
||||
| #[+a("/usage/processing-pipelines#disabling") disable].
|
||||
|
||||
+row("foot")
|
||||
+cell yields
|
||||
+cell #[code Doc]
|
||||
+cell Documents in the order of the original text.
|
||||
|
||||
+h(2, "update") Language.update
|
||||
+tag method
|
||||
|
||||
p Update the models in the pipeline.
|
||||
|
||||
+aside-code("Example").
|
||||
for raw_text, entity_offsets in train_data:
|
||||
doc = nlp.make_doc(raw_text)
|
||||
gold = GoldParse(doc, entities=entity_offsets)
|
||||
nlp.update([doc], [gold], drop=0.5, sgd=optimizer)
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code docs]
|
||||
+cell iterable
|
||||
+cell
|
||||
| A batch of #[code Doc] objects or unicode. If unicode, a
|
||||
| #[code Doc] object will be created from the text.
|
||||
|
||||
+row
|
||||
+cell #[code golds]
|
||||
+cell iterable
|
||||
+cell
|
||||
| A batch of #[code GoldParse] objects or dictionaries.
|
||||
| Dictionaries will be used to create
|
||||
| #[+api("goldparse") #[code GoldParse]] objects. For the available
|
||||
| keys and their usage, see
|
||||
| #[+api("goldparse#init") #[code GoldParse.__init__]].
|
||||
|
||||
+row
|
||||
+cell #[code drop]
|
||||
+cell float
|
||||
+cell The dropout rate.
|
||||
|
||||
+row
|
||||
+cell #[code sgd]
|
||||
+cell callable
|
||||
+cell An optimizer.
|
||||
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell dict
|
||||
+cell Results from the update.
|
||||
|
||||
+h(2, "begin_training") Language.begin_training
|
||||
+tag method
|
||||
|
||||
p
|
||||
| Allocate models, pre-process training data and acquire an optimizer.
|
||||
|
||||
+aside-code("Example").
|
||||
optimizer = nlp.begin_training(gold_tuples)
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code gold_tuples]
|
||||
+cell iterable
|
||||
+cell Gold-standard training data.
|
||||
|
||||
+row
|
||||
+cell #[code **cfg]
|
||||
+cell -
|
||||
+cell Config parameters.
|
||||
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell callable
|
||||
+cell An optimizer.
|
||||
|
||||
+h(2, "use_params") Language.use_params
|
||||
+tag contextmanager
|
||||
+tag method
|
||||
|
||||
p
|
||||
| Replace weights of models in the pipeline with those provided in the
|
||||
| params dictionary. Can be used as a contextmanager, in which case, models
|
||||
| go back to their original weights after the block.
|
||||
|
||||
+aside-code("Example").
|
||||
with nlp.use_params(optimizer.averages):
|
||||
nlp.to_disk('/tmp/checkpoint')
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code params]
|
||||
+cell dict
|
||||
+cell A dictionary of parameters keyed by model ID.
|
||||
|
||||
+row
|
||||
+cell #[code **cfg]
|
||||
+cell -
|
||||
+cell Config parameters.
|
||||
|
||||
+h(2, "preprocess_gold") Language.preprocess_gold
|
||||
+tag method
|
||||
|
||||
p
|
||||
| Can be called before training to pre-process gold data. By default, it
|
||||
| handles nonprojectivity and adds missing tags to the tag map.
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code docs_golds]
|
||||
+cell iterable
|
||||
+cell Tuples of #[code Doc] and #[code GoldParse] objects.
|
||||
|
||||
+row("foot")
|
||||
+cell yields
|
||||
+cell tuple
|
||||
+cell Tuples of #[code Doc] and #[code GoldParse] objects.
|
||||
|
||||
+h(2, "create_pipe") Language.create_pipe
|
||||
+tag method
|
||||
+tag-new(2)
|
||||
|
||||
p Create a pipeline component from a factory.
|
||||
|
||||
+aside-code("Example").
|
||||
parser = nlp.create_pipe('parser')
|
||||
nlp.add_pipe(parser)
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code name]
|
||||
+cell unicode
|
||||
+cell
|
||||
| Factory name to look up in
|
||||
| #[+api("language#class-attributes") #[code Language.factories]].
|
||||
|
||||
+row
|
||||
+cell #[code config]
|
||||
+cell dict
|
||||
+cell Configuration parameters to initialise component.
|
||||
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell callable
|
||||
+cell The pipeline component.
|
||||
|
||||
+h(2, "add_pipe") Language.add_pipe
|
||||
+tag method
|
||||
+tag-new(2)
|
||||
|
||||
p
|
||||
| Add a component to the processing pipeline. Valid components are
|
||||
| callables that take a #[code Doc] object, modify it and return it. Only
|
||||
| one of #[code before], #[code after], #[code first] or #[code last] can
|
||||
| be set. Default behaviour is #[code last=True].
|
||||
|
||||
+aside-code("Example").
|
||||
def component(doc):
|
||||
# modify Doc and return it
|
||||
return doc
|
||||
|
||||
nlp.add_pipe(component, before='ner')
|
||||
nlp.add_pipe(component, name='custom_name', last=True)
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code component]
|
||||
+cell callable
|
||||
+cell The pipeline component.
|
||||
|
||||
+row
|
||||
+cell #[code name]
|
||||
+cell unicode
|
||||
+cell
|
||||
| Name of pipeline component. Overwrites existing
|
||||
| #[code component.name] attribute if available. If no #[code name]
|
||||
| is set and the component exposes no name attribute,
|
||||
| #[code component.__name__] is used. An error is raised if the
|
||||
| name already exists in the pipeline.
|
||||
|
||||
+row
|
||||
+cell #[code before]
|
||||
+cell unicode
|
||||
+cell Component name to insert component directly before.
|
||||
|
||||
+row
|
||||
+cell #[code after]
|
||||
+cell unicode
|
||||
+cell Component name to insert component directly after:
|
||||
|
||||
+row
|
||||
+cell #[code first]
|
||||
+cell bool
|
||||
+cell Insert component first / not first in the pipeline.
|
||||
|
||||
+row
|
||||
+cell #[code last]
|
||||
+cell bool
|
||||
+cell Insert component last / not last in the pipeline.
|
||||
|
||||
+h(2, "has_pipe") Language.has_pipe
|
||||
+tag method
|
||||
+tag-new(2)
|
||||
|
||||
p
|
||||
| Check whether a component is present in the pipeline. Equivalent to
|
||||
| #[code name in nlp.pipe_names].
|
||||
|
||||
+aside-code("Example").
|
||||
nlp.add_pipe(lambda doc: doc, name='component')
|
||||
assert 'component' in nlp.pipe_names
|
||||
assert nlp.has_pipe('component')
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code name]
|
||||
+cell unicode
|
||||
+cell Name of the pipeline component to check.
|
||||
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell bool
|
||||
+cell Whether a component of that name exists in the pipeline.
|
||||
|
||||
+h(2, "get_pipe") Language.get_pipe
|
||||
+tag method
|
||||
+tag-new(2)
|
||||
|
||||
p Get a pipeline component for a given component name.
|
||||
|
||||
+aside-code("Example").
|
||||
parser = nlp.get_pipe('parser')
|
||||
custom_component = nlp.get_pipe('custom_component')
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code name]
|
||||
+cell unicode
|
||||
+cell Name of the pipeline component to get.
|
||||
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell callable
|
||||
+cell The pipeline component.
|
||||
|
||||
+h(2, "replace_pipe") Language.replace_pipe
|
||||
+tag method
|
||||
+tag-new(2)
|
||||
|
||||
p Replace a component in the pipeline.
|
||||
|
||||
+aside-code("Example").
|
||||
nlp.replace_pipe('parser', my_custom_parser)
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code name]
|
||||
+cell unicode
|
||||
+cell Name of the component to replace.
|
||||
|
||||
+row
|
||||
+cell #[code component]
|
||||
+cell callable
|
||||
+cell The pipeline component to inser.
|
||||
|
||||
|
||||
+h(2, "rename_pipe") Language.rename_pipe
|
||||
+tag method
|
||||
+tag-new(2)
|
||||
|
||||
p
|
||||
| Rename a component in the pipeline. Useful to create custom names for
|
||||
| pre-defined and pre-loaded components. To change the default name of
|
||||
| a component added to the pipeline, you can also use the #[code name]
|
||||
| argument on #[+api("language#add_pipe") #[code add_pipe]].
|
||||
|
||||
+aside-code("Example").
|
||||
nlp.rename_pipe('parser', 'spacy_parser')
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code old_name]
|
||||
+cell unicode
|
||||
+cell Name of the component to rename.
|
||||
|
||||
+row
|
||||
+cell #[code new_name]
|
||||
+cell unicode
|
||||
+cell New name of the component.
|
||||
|
||||
+h(2, "remove_pipe") Language.remove_pipe
|
||||
+tag method
|
||||
+tag-new(2)
|
||||
|
||||
p
|
||||
| Remove a component from the pipeline. Returns the removed component name
|
||||
| and component function.
|
||||
|
||||
+aside-code("Example").
|
||||
name, component = nlp.remove_pipe('parser')
|
||||
assert name == 'parser'
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code name]
|
||||
+cell unicode
|
||||
+cell Name of the component to remove.
|
||||
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell tuple
|
||||
+cell A #[code (name, component)] tuple of the removed component.
|
||||
|
||||
+h(2, "disable_pipes") Language.disable_pipes
|
||||
+tag contextmanager
|
||||
+tag-new(2)
|
||||
|
||||
p
|
||||
| Disable one or more pipeline components. If used as a context manager,
|
||||
| the pipeline will be restored to the initial state at the end of the
|
||||
| block. Otherwise, a #[code DisabledPipes] object is returned, that has a
|
||||
| #[code .restore()] method you can use to undo your changes.
|
||||
|
||||
+aside-code("Example").
|
||||
with nlp.disable_pipes('tagger', 'parser'):
|
||||
optimizer = nlp.begin_training(gold_tuples)
|
||||
|
||||
disabled = nlp.disable_pipes('tagger', 'parser')
|
||||
optimizer = nlp.begin_training(gold_tuples)
|
||||
disabled.restore()
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code *disabled]
|
||||
+cell unicode
|
||||
+cell Names of pipeline components to disable.
|
||||
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell #[code DisabledPipes]
|
||||
+cell
|
||||
| The disabled pipes that can be restored by calling the object's
|
||||
| #[code .restore()] method.
|
||||
|
||||
+h(2, "to_disk") Language.to_disk
|
||||
+tag method
|
||||
+tag-new(2)
|
||||
|
||||
p
|
||||
| Save the current state to a directory. If a model is loaded, this will
|
||||
| #[strong include the model].
|
||||
|
||||
+aside-code("Example").
|
||||
nlp.to_disk('/path/to/models')
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code path]
|
||||
+cell unicode or #[code Path]
|
||||
+cell
|
||||
| A path to a directory, which will be created if it doesn't exist.
|
||||
| Paths may be either strings or #[code Path]-like objects.
|
||||
|
||||
+row
|
||||
+cell #[code disable]
|
||||
+cell list
|
||||
+cell
|
||||
| Names of pipeline components to
|
||||
| #[+a("/usage/processing-pipelines#disabling") disable]
|
||||
| and prevent from being saved.
|
||||
|
||||
+h(2, "from_disk") Language.from_disk
|
||||
+tag method
|
||||
+tag-new(2)
|
||||
|
||||
p
|
||||
| Loads state from a directory. Modifies the object in place and returns
|
||||
| it. If the saved #[code Language] object contains a model, the
|
||||
| model will be loaded. Note that this method is commonly used via the
|
||||
| subclasses like #[code English] or #[code German] to make
|
||||
| language-specific functionality like the
|
||||
| #[+a("/usage/adding-languages#lex-attrs") lexical attribute getters]
|
||||
| available to the loaded object.
|
||||
|
||||
+aside-code("Example").
|
||||
from spacy.language import Language
|
||||
nlp = Language().from_disk('/path/to/model')
|
||||
|
||||
# using language-specific subclass
|
||||
from spacy.lang.en import English
|
||||
nlp = English().from_disk('/path/to/en_model')
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code path]
|
||||
+cell unicode or #[code Path]
|
||||
+cell
|
||||
| A path to a directory. Paths may be either strings or
|
||||
| #[code Path]-like objects.
|
||||
|
||||
+row
|
||||
+cell #[code disable]
|
||||
+cell list
|
||||
+cell
|
||||
| Names of pipeline components to
|
||||
| #[+a("/usage/processing-pipelines#disabling") disable].
|
||||
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell #[code Language]
|
||||
+cell The modified #[code Language] object.
|
||||
|
||||
+infobox("Changed in v2.0", "⚠️")
|
||||
| As of spaCy v2.0, the #[code save_to_directory] method has been
|
||||
| renamed to #[code to_disk], to improve consistency across classes.
|
||||
| Pipeline components to prevent from being loaded can now be added as
|
||||
| a list to #[code disable], instead of specifying one keyword argument
|
||||
| per component.
|
||||
|
||||
+code-wrapper
|
||||
+code-new nlp = English().from_disk(disable=['tagger', 'ner'])
|
||||
+code-old nlp = spacy.load('en', tagger=False, entity=False)
|
||||
|
||||
+h(2, "to_bytes") Language.to_bytes
|
||||
+tag method
|
||||
|
||||
p Serialize the current state to a binary string.
|
||||
|
||||
+aside-code("Example").
|
||||
nlp_bytes = nlp.to_bytes()
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code disable]
|
||||
+cell list
|
||||
+cell
|
||||
| Names of pipeline components to
|
||||
| #[+a("/usage/processing-pipelines#disabling") disable]
|
||||
| and prevent from being serialized.
|
||||
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell bytes
|
||||
+cell The serialized form of the #[code Language] object.
|
||||
|
||||
+h(2, "from_bytes") Language.from_bytes
|
||||
+tag method
|
||||
|
||||
p
|
||||
| Load state from a binary string. Note that this method is commonly used
|
||||
| via the subclasses like #[code English] or #[code German] to make
|
||||
| language-specific functionality like the
|
||||
| #[+a("/usage/adding-languages#lex-attrs") lexical attribute getters]
|
||||
| available to the loaded object.
|
||||
|
||||
+aside-code("Example").
|
||||
from spacy.lang.en import English
|
||||
nlp_bytes = nlp.to_bytes()
|
||||
nlp2 = English()
|
||||
nlp2.from_bytes(nlp_bytes)
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code bytes_data]
|
||||
+cell bytes
|
||||
+cell The data to load from.
|
||||
|
||||
+row
|
||||
+cell #[code disable]
|
||||
+cell list
|
||||
+cell
|
||||
| Names of pipeline components to
|
||||
| #[+a("/usage/processing-pipelines#disabling") disable].
|
||||
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell #[code Language]
|
||||
+cell The #[code Language] object.
|
||||
|
||||
+infobox("Changed in v2.0", "⚠️")
|
||||
| Pipeline components to prevent from being loaded can now be added as
|
||||
| a list to #[code disable], instead of specifying one keyword argument
|
||||
| per component.
|
||||
|
||||
+code-wrapper
|
||||
+code-new nlp = English().from_bytes(bytes, disable=['tagger', 'ner'])
|
||||
+code-old nlp = English().from_bytes('en', tagger=False, entity=False)
|
||||
|
||||
+h(2, "attributes") Attributes
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code vocab]
|
||||
+cell #[code Vocab]
|
||||
+cell A container for the lexical types.
|
||||
|
||||
+row
|
||||
+cell #[code tokenizer]
|
||||
+cell #[code Tokenizer]
|
||||
+cell The tokenizer.
|
||||
|
||||
+row
|
||||
+cell #[code make_doc]
|
||||
+cell #[code lambda text: Doc]
|
||||
+cell Create a #[code Doc] object from unicode text.
|
||||
|
||||
+row
|
||||
+cell #[code pipeline]
|
||||
+cell list
|
||||
+cell
|
||||
| List of #[code (name, component)] tuples describing the current
|
||||
| processing pipeline, in order.
|
||||
|
||||
+row
|
||||
+cell #[code pipe_names]
|
||||
+tag-new(2)
|
||||
+cell list
|
||||
+cell List of pipeline component names, in order.
|
||||
|
||||
+row
|
||||
+cell #[code meta]
|
||||
+cell dict
|
||||
+cell
|
||||
| Custom meta data for the Language class. If a model is loaded,
|
||||
| contains meta data of the model.
|
||||
|
||||
+row
|
||||
+cell #[code path]
|
||||
+tag-new(2)
|
||||
+cell #[code Path]
|
||||
+cell
|
||||
| Path to the model data directory, if a model is loaded. Otherwise
|
||||
| #[code None].
|
||||
|
||||
+h(2, "class-attributes") Class attributes
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code Defaults]
|
||||
+cell class
|
||||
+cell
|
||||
| Settings, data and factory methods for creating the
|
||||
| #[code nlp] object and processing pipeline.
|
||||
|
||||
+row
|
||||
+cell #[code lang]
|
||||
+cell unicode
|
||||
+cell
|
||||
| Two-letter language ID, i.e.
|
||||
| #[+a("https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes") ISO code].
|
||||
|
||||
+row
|
||||
+cell #[code factories]
|
||||
+tag-new(2)
|
||||
+cell dict
|
||||
+cell
|
||||
| Factories that create pre-defined pipeline components, e.g. the
|
||||
| tagger, parser or entity recognizer, keyed by their component
|
||||
| name.
|
|
@ -1,160 +0,0 @@
|
|||
//- 💫 DOCS > API > LEMMATIZER
|
||||
|
||||
include ../_includes/_mixins
|
||||
|
||||
p
|
||||
| The #[code Lemmatizer] supports simple part-of-speech-sensitive suffix
|
||||
| rules and lookup tables.
|
||||
|
||||
+h(2, "init") Lemmatizer.__init__
|
||||
+tag method
|
||||
|
||||
p Create a #[code Lemmatizer].
|
||||
|
||||
+aside-code("Example").
|
||||
from spacy.lemmatizer import Lemmatizer
|
||||
lemmatizer = Lemmatizer()
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code index]
|
||||
+cell dict / #[code None]
|
||||
+cell Inventory of lemmas in the language.
|
||||
|
||||
+row
|
||||
+cell #[code exceptions]
|
||||
+cell dict / #[code None]
|
||||
+cell Mapping of string forms to lemmas that bypass the #[code rules].
|
||||
|
||||
+row
|
||||
+cell #[code rules]
|
||||
+cell dict / #[code None]
|
||||
+cell List of suffix rewrite rules.
|
||||
|
||||
+row
|
||||
+cell #[code lookup]
|
||||
+cell dict / #[code None]
|
||||
+cell Lookup table mapping string to their lemmas.
|
||||
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell #[code Lemmatizer]
|
||||
+cell The newly created object.
|
||||
|
||||
+h(2, "call") Lemmatizer.__call__
|
||||
+tag method
|
||||
|
||||
p Lemmatize a string.
|
||||
|
||||
+aside-code("Example").
|
||||
from spacy.lemmatizer import Lemmatizer
|
||||
from spacy.lang.en import LEMMA_INDEX, LEMMA_EXC, LEMMA_RULES
|
||||
lemmatizer = Lemmatizer(LEMMA_INDEX, LEMMA_EXC, LEMMA_RULES)
|
||||
lemmas = lemmatizer(u'ducks', u'NOUN')
|
||||
assert lemmas == [u'duck']
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code string]
|
||||
+cell unicode
|
||||
+cell The string to lemmatize, e.g. the token text.
|
||||
|
||||
+row
|
||||
+cell #[code univ_pos]
|
||||
+cell unicode / int
|
||||
+cell The token's universal part-of-speech tag.
|
||||
|
||||
+row
|
||||
+cell #[code morphology]
|
||||
+cell dict / #[code None]
|
||||
+cell
|
||||
| Morphological features following the
|
||||
| #[+a("http://universaldependencies.org/") Universal Dependencies]
|
||||
| scheme.
|
||||
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell list
|
||||
+cell The available lemmas for the string.
|
||||
|
||||
+h(2, "lookup") Lemmatizer.lookup
|
||||
+tag method
|
||||
+tag-new(2)
|
||||
|
||||
p
|
||||
| Look up a lemma in the lookup table, if available. If no lemma is found,
|
||||
| the original string is returned. Languages can provide a
|
||||
| #[+a("/usage/adding-languages#lemmatizer") lookup table] via the
|
||||
| #[code lemma_lookup] variable, set on the individual #[code Language]
|
||||
| class.
|
||||
|
||||
+aside-code("Example").
|
||||
lookup = {u'going': u'go'}
|
||||
lemmatizer = Lemmatizer(lookup=lookup)
|
||||
assert lemmatizer.lookup(u'going') == u'go'
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code string]
|
||||
+cell unicode
|
||||
+cell The string to look up.
|
||||
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell unicode
|
||||
+cell The lemma if the string was found, otherwise the original string.
|
||||
|
||||
+h(2, "is_base_form") Lemmatizer.is_base_form
|
||||
+tag method
|
||||
|
||||
p
|
||||
| Check whether we're dealing with an uninflected paradigm, so we can
|
||||
| avoid lemmatization entirely.
|
||||
|
||||
+aside-code("Example").
|
||||
pos = 'verb'
|
||||
morph = {'VerbForm': 'inf'}
|
||||
is_base_form = lemmatizer.is_base_form(pos, morph)
|
||||
assert is_base_form == True
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code univ_pos]
|
||||
+cell unicode / int
|
||||
+cell The token's universal part-of-speech tag.
|
||||
|
||||
+row
|
||||
+cell #[code morphology]
|
||||
+cell dict
|
||||
+cell The token's morphological features.
|
||||
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell bool
|
||||
+cell
|
||||
| Whether the token's part-of-speech tag and morphological features
|
||||
| describe a base form.
|
||||
|
||||
+h(2, "attributes") Attributes
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code index]
|
||||
+cell dict / #[code None]
|
||||
+cell Inventory of lemmas in the language.
|
||||
|
||||
+row
|
||||
+cell #[code exc]
|
||||
+cell dict / #[code None]
|
||||
+cell Mapping of string forms to lemmas that bypass the #[code rules].
|
||||
|
||||
+row
|
||||
+cell #[code rules]
|
||||
+cell dict / #[code None]
|
||||
+cell List of suffix rewrite rules.
|
||||
|
||||
+row
|
||||
+cell #[code lookup_table]
|
||||
+tag-new(2)
|
||||
+cell dict / #[code None]
|
||||
+cell The lemma lookup table, if available.
|
|
@ -1,384 +0,0 @@
|
|||
//- 💫 DOCS > API > LEXEME
|
||||
|
||||
include ../_includes/_mixins
|
||||
|
||||
p
|
||||
| An entry in the vocabulary. A #[code Lexeme] has no string context – it's
|
||||
| a word type, as opposed to a word token. It therefore has no
|
||||
| part-of-speech tag, dependency parse, or lemma (if lemmatization depends
|
||||
| on the part-of-speech tag).
|
||||
|
||||
+h(2, "init") Lexeme.__init__
|
||||
+tag method
|
||||
|
||||
p Create a #[code Lexeme] object.
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code vocab]
|
||||
+cell #[code Vocab]
|
||||
+cell The parent vocabulary.
|
||||
|
||||
+row
|
||||
+cell #[code orth]
|
||||
+cell int
|
||||
+cell The orth id of the lexeme.
|
||||
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell #[code Lexeme]
|
||||
+cell The newly constructed object.
|
||||
|
||||
+h(2, "set_flag") Lexeme.set_flag
|
||||
+tag method
|
||||
|
||||
p Change the value of a boolean flag.
|
||||
|
||||
+aside-code("Example").
|
||||
COOL_FLAG = nlp.vocab.add_flag(lambda text: False)
|
||||
nlp.vocab[u'spaCy'].set_flag(COOL_FLAG, True)
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code flag_id]
|
||||
+cell int
|
||||
+cell The attribute ID of the flag to set.
|
||||
|
||||
+row
|
||||
+cell #[code value]
|
||||
+cell bool
|
||||
+cell The new value of the flag.
|
||||
|
||||
+h(2, "check_flag") Lexeme.check_flag
|
||||
+tag method
|
||||
|
||||
p Check the value of a boolean flag.
|
||||
|
||||
+aside-code("Example").
|
||||
is_my_library = lambda text: text in ['spaCy', 'Thinc']
|
||||
MY_LIBRARY = nlp.vocab.add_flag(is_my_library)
|
||||
assert nlp.vocab[u'spaCy'].check_flag(MY_LIBRARY) == True
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code flag_id]
|
||||
+cell int
|
||||
+cell The attribute ID of the flag to query.
|
||||
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell bool
|
||||
+cell The value of the flag.
|
||||
|
||||
+h(2, "similarity") Lexeme.similarity
|
||||
+tag method
|
||||
+tag-model("vectors")
|
||||
|
||||
p Compute a semantic similarity estimate. Defaults to cosine over vectors.
|
||||
|
||||
+aside-code("Example").
|
||||
apple = nlp.vocab[u'apple']
|
||||
orange = nlp.vocab[u'orange']
|
||||
apple_orange = apple.similarity(orange)
|
||||
orange_apple = orange.similarity(apple)
|
||||
assert apple_orange == orange_apple
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell other
|
||||
+cell -
|
||||
+cell
|
||||
| The object to compare with. By default, accepts #[code Doc],
|
||||
| #[code Span], #[code Token] and #[code Lexeme] objects.
|
||||
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell float
|
||||
+cell A scalar similarity score. Higher is more similar.
|
||||
|
||||
|
||||
+h(2, "has_vector") Lexeme.has_vector
|
||||
+tag property
|
||||
+tag-model("vectors")
|
||||
|
||||
p
|
||||
| A boolean value indicating whether a word vector is associated with the
|
||||
| lexeme.
|
||||
|
||||
+aside-code("Example").
|
||||
apple = nlp.vocab[u'apple']
|
||||
assert apple.has_vector
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell bool
|
||||
+cell Whether the lexeme has a vector data attached.
|
||||
|
||||
+h(2, "vector") Lexeme.vector
|
||||
+tag property
|
||||
+tag-model("vectors")
|
||||
|
||||
p A real-valued meaning representation.
|
||||
|
||||
+aside-code("Example").
|
||||
apple = nlp.vocab[u'apple']
|
||||
assert apple.vector.dtype == 'float32'
|
||||
assert apple.vector.shape == (300,)
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell #[code.u-break numpy.ndarray[ndim=1, dtype='float32']]
|
||||
+cell A 1D numpy array representing the lexeme's semantics.
|
||||
|
||||
+h(2, "vector_norm") Lexeme.vector_norm
|
||||
+tag property
|
||||
+tag-model("vectors")
|
||||
|
||||
p The L2 norm of the lexeme's vector representation.
|
||||
|
||||
+aside-code("Example").
|
||||
apple = nlp.vocab[u'apple']
|
||||
pasta = nlp.vocab[u'pasta']
|
||||
apple.vector_norm # 7.1346845626831055
|
||||
pasta.vector_norm # 7.759851932525635
|
||||
assert apple.vector_norm != pasta.vector_norm
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell float
|
||||
+cell The L2 norm of the vector representation.
|
||||
|
||||
+h(2, "attributes") Attributes
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code vocab]
|
||||
+cell #[code Vocab]
|
||||
+cell The lexeme's vocabulary.
|
||||
|
||||
+row
|
||||
+cell #[code text]
|
||||
+cell unicode
|
||||
+cell Verbatim text content.
|
||||
|
||||
+row
|
||||
+cell #[code orth]
|
||||
+cell int
|
||||
+cell ID of the verbatim text content.
|
||||
|
||||
+row
|
||||
+cell #[code orth_]
|
||||
+cell unicode
|
||||
+cell
|
||||
| Verbatim text content (identical to #[code Lexeme.text]). Exists
|
||||
| mostly for consistency with the other attributes.
|
||||
|
||||
+row
|
||||
+cell #[code lex_id]
|
||||
+cell int
|
||||
+cell ID of the lexeme's lexical type.
|
||||
|
||||
+row
|
||||
+cell #[code rank]
|
||||
+cell int
|
||||
+cell
|
||||
| Sequential ID of the lexemes's lexical type, used to index into
|
||||
| tables, e.g. for word vectors.
|
||||
|
||||
+row
|
||||
+cell #[code flags]
|
||||
+cell int
|
||||
+cell Container of the lexeme's binary flags.
|
||||
|
||||
+row
|
||||
+cell #[code norm]
|
||||
+cell int
|
||||
+cell The lexemes's norm, i.e. a normalised form of the lexeme text.
|
||||
|
||||
+row
|
||||
+cell #[code norm_]
|
||||
+cell unicode
|
||||
+cell The lexemes's norm, i.e. a normalised form of the lexeme text.
|
||||
|
||||
+row
|
||||
+cell #[code lower]
|
||||
+cell int
|
||||
+cell Lowercase form of the word.
|
||||
|
||||
+row
|
||||
+cell #[code lower_]
|
||||
+cell unicode
|
||||
+cell Lowercase form of the word.
|
||||
|
||||
+row
|
||||
+cell #[code shape]
|
||||
+cell int
|
||||
+cell Transform of the word's string, to show orthographic features.
|
||||
|
||||
+row
|
||||
+cell #[code shape_]
|
||||
+cell unicode
|
||||
+cell Transform of the word's string, to show orthographic features.
|
||||
|
||||
+row
|
||||
+cell #[code prefix]
|
||||
+cell int
|
||||
+cell
|
||||
| Length-N substring from the start of the word. Defaults to
|
||||
| #[code N=1].
|
||||
|
||||
+row
|
||||
+cell #[code prefix_]
|
||||
+cell unicode
|
||||
+cell
|
||||
| Length-N substring from the start of the word. Defaults to
|
||||
| #[code N=1].
|
||||
|
||||
+row
|
||||
+cell #[code suffix]
|
||||
+cell int
|
||||
+cell
|
||||
| Length-N substring from the end of the word. Defaults to
|
||||
| #[code N=3].
|
||||
|
||||
+row
|
||||
+cell #[code suffix_]
|
||||
+cell unicode
|
||||
+cell
|
||||
| Length-N substring from the start of the word. Defaults to
|
||||
| #[code N=3].
|
||||
|
||||
+row
|
||||
+cell #[code is_alpha]
|
||||
+cell bool
|
||||
+cell
|
||||
| Does the lexeme consist of alphabetic characters? Equivalent to
|
||||
| #[code lexeme.text.isalpha()].
|
||||
|
||||
+row
|
||||
+cell #[code is_ascii]
|
||||
+cell bool
|
||||
+cell
|
||||
| Does the lexeme consist of ASCII characters? Equivalent to
|
||||
| #[code [any(ord(c) >= 128 for c in lexeme.text)]].
|
||||
|
||||
+row
|
||||
+cell #[code is_digit]
|
||||
+cell bool
|
||||
+cell
|
||||
| Does the lexeme consist of digits? Equivalent to
|
||||
| #[code lexeme.text.isdigit()].
|
||||
|
||||
+row
|
||||
+cell #[code is_lower]
|
||||
+cell bool
|
||||
+cell
|
||||
| Is the lexeme in lowercase? Equivalent to
|
||||
| #[code lexeme.text.islower()].
|
||||
|
||||
+row
|
||||
+cell #[code is_upper]
|
||||
+cell bool
|
||||
+cell
|
||||
| Is the lexeme in uppercase? Equivalent to
|
||||
| #[code lexeme.text.isupper()].
|
||||
|
||||
+row
|
||||
+cell #[code is_title]
|
||||
+cell bool
|
||||
+cell
|
||||
| Is the lexeme in titlecase? Equivalent to
|
||||
| #[code lexeme.text.istitle()].
|
||||
|
||||
+row
|
||||
+cell #[code is_punct]
|
||||
+cell bool
|
||||
+cell Is the lexeme punctuation?
|
||||
|
||||
+row
|
||||
+cell #[code is_left_punct]
|
||||
+cell bool
|
||||
+cell Is the lexeme a left punctuation mark, e.g. #[code (]?
|
||||
|
||||
+row
|
||||
+cell #[code is_right_punct]
|
||||
+cell bool
|
||||
+cell Is the lexeme a right punctuation mark, e.g. #[code )]?
|
||||
|
||||
+row
|
||||
+cell #[code is_space]
|
||||
+cell bool
|
||||
+cell
|
||||
| Does the lexeme consist of whitespace characters? Equivalent to
|
||||
| #[code lexeme.text.isspace()].
|
||||
|
||||
+row
|
||||
+cell #[code is_bracket]
|
||||
+cell bool
|
||||
+cell Is the lexeme a bracket?
|
||||
|
||||
+row
|
||||
+cell #[code is_quote]
|
||||
+cell bool
|
||||
+cell Is the lexeme a quotation mark?
|
||||
|
||||
+row
|
||||
+cell #[code is_currency]
|
||||
+tag-new("2.0.8")
|
||||
+cell bool
|
||||
+cell Is the lexeme a currency symbol?
|
||||
|
||||
+row
|
||||
+cell #[code like_url]
|
||||
+cell bool
|
||||
+cell Does the lexeme resemble a URL?
|
||||
|
||||
+row
|
||||
+cell #[code like_num]
|
||||
+cell bool
|
||||
+cell Does the lexeme represent a number? e.g. "10.9", "10", "ten", etc.
|
||||
|
||||
+row
|
||||
+cell #[code like_email]
|
||||
+cell bool
|
||||
+cell Does the lexeme resemble an email address?
|
||||
|
||||
+row
|
||||
+cell #[code is_oov]
|
||||
+cell bool
|
||||
+cell Is the lexeme out-of-vocabulary?
|
||||
|
||||
+row
|
||||
+cell #[code is_stop]
|
||||
+cell bool
|
||||
+cell Is the lexeme part of a "stop list"?
|
||||
|
||||
+row
|
||||
+cell #[code lang]
|
||||
+cell int
|
||||
+cell Language of the parent vocabulary.
|
||||
|
||||
+row
|
||||
+cell #[code lang_]
|
||||
+cell unicode
|
||||
+cell Language of the parent vocabulary.
|
||||
|
||||
+row
|
||||
+cell #[code prob]
|
||||
+cell float
|
||||
+cell Smoothed log probability estimate of the lexeme's type.
|
||||
|
||||
+row
|
||||
+cell #[code cluster]
|
||||
+cell int
|
||||
+cell Brown cluster ID.
|
||||
|
||||
+row
|
||||
+cell #[code sentiment]
|
||||
+cell float
|
||||
+cell
|
||||
| A scalar value indicating the positivity or negativity of the
|
||||
| lexeme.
|
|
@ -1,281 +0,0 @@
|
|||
//- 💫 DOCS > API > MATCHER
|
||||
|
||||
include ../_includes/_mixins
|
||||
|
||||
+infobox("Changed in v2.0", "⚠️")
|
||||
| As of spaCy 2.0, #[code Matcher.add_pattern] and #[code Matcher.add_entity]
|
||||
| are deprecated and have been replaced with a simpler
|
||||
| #[+api("matcher#add") #[code Matcher.add]] that lets you add a list of
|
||||
| patterns and a callback for a given match ID. #[code Matcher.get_entity]
|
||||
| is now called #[+api("matcher#get") #[code matcher.get]].
|
||||
| #[code Matcher.load] (not useful, as it didn't allow specifying callbacks),
|
||||
| and #[code Matcher.has_entity] (now redundant) have been removed. The
|
||||
| concept of "acceptor functions" has also been retired – this logic can
|
||||
| now be handled in the callback functions.
|
||||
|
||||
+h(2, "init") Matcher.__init__
|
||||
+tag method
|
||||
|
||||
p Create the rule-based #[code Matcher].
|
||||
|
||||
+aside-code("Example").
|
||||
from spacy.matcher import Matcher
|
||||
|
||||
patterns = {'HelloWorld': [{'LOWER': 'hello'}, {'LOWER': 'world'}]}
|
||||
matcher = Matcher(nlp.vocab)
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code vocab]
|
||||
+cell #[code Vocab]
|
||||
+cell
|
||||
| The vocabulary object, which must be shared with the documents
|
||||
| the matcher will operate on.
|
||||
|
||||
+row
|
||||
+cell #[code patterns]
|
||||
+cell dict
|
||||
+cell Patterns to add to the matcher, keyed by ID.
|
||||
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell #[code Matcher]
|
||||
+cell The newly constructed object.
|
||||
|
||||
+h(2, "call") Matcher.__call__
|
||||
+tag method
|
||||
|
||||
p Find all token sequences matching the supplied patterns on the #[code Doc].
|
||||
|
||||
+aside-code("Example").
|
||||
from spacy.matcher import Matcher
|
||||
|
||||
matcher = Matcher(nlp.vocab)
|
||||
pattern = [{'LOWER': "hello"}, {'LOWER': "world"}]
|
||||
matcher.add("HelloWorld", None, pattern)
|
||||
doc = nlp(u'hello world!')
|
||||
matches = matcher(doc)
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code doc]
|
||||
+cell #[code Doc]
|
||||
+cell The document to match over.
|
||||
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell list
|
||||
+cell
|
||||
| A list of #[code (match_id, start, end)] tuples, describing the
|
||||
| matches. A match tuple describes a span #[code doc[start:end]].
|
||||
| The #[code match_id] is the ID of the added match pattern.
|
||||
|
||||
+infobox("Important note")
|
||||
| By default, the matcher #[strong does not perform any action] on matches,
|
||||
| like tagging matched phrases with entity types. Instead, actions need to
|
||||
| be specified when #[strong adding patterns or entities], by
|
||||
| passing in a callback function as the #[code on_match] argument on
|
||||
| #[+api("matcher#add") #[code add]]. This allows you to define custom
|
||||
| actions per pattern within the same matcher. For example, you might only
|
||||
| want to merge some entity types, and set custom flags for other matched
|
||||
| patterns. For more details and examples, see the usage guide on
|
||||
| #[+a("/usage/linguistic-features#rule-based-matching") rule-based matching].
|
||||
|
||||
+h(2, "pipe") Matcher.pipe
|
||||
+tag method
|
||||
|
||||
p Match a stream of documents, yielding them in turn.
|
||||
|
||||
+aside-code("Example").
|
||||
from spacy.matcher import Matcher
|
||||
matcher = Matcher(nlp.vocab)
|
||||
for doc in matcher.pipe(docs, batch_size=50, n_threads=4):
|
||||
pass
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code docs]
|
||||
+cell iterable
|
||||
+cell A stream of documents.
|
||||
|
||||
+row
|
||||
+cell #[code batch_size]
|
||||
+cell int
|
||||
+cell The number of documents to accumulate into a working set.
|
||||
|
||||
+row
|
||||
+cell #[code n_threads]
|
||||
+cell int
|
||||
+cell
|
||||
| The number of threads with which to work on the buffer in
|
||||
| parallel, if the #[code Matcher] implementation supports
|
||||
| multi-threading.
|
||||
|
||||
+row
|
||||
+cell #[code return_matches]
|
||||
+tag-new(2.1)
|
||||
+cell bool
|
||||
+cell
|
||||
| Yield the match lists along with the docs, making results
|
||||
| #[code (doc, matches)] tuples.
|
||||
|
||||
+row
|
||||
+cell #[code as_tuples]
|
||||
+tag-new(2.1)
|
||||
+cell bool
|
||||
+cell
|
||||
| Interpret the input stream as #[code (doc, context)] tuples, and
|
||||
| yield #[code (result, context)] tuples out. If both
|
||||
| #[code return_matches] and #[code as_tuples] are #[code True],
|
||||
| the output will be a sequence of
|
||||
| #[code ((doc, matches), context)] tuples.
|
||||
|
||||
+row("foot")
|
||||
+cell yields
|
||||
+cell #[code Doc]
|
||||
+cell Documents, in order.
|
||||
|
||||
+h(2, "len") Matcher.__len__
|
||||
+tag method
|
||||
+tag-new(2)
|
||||
|
||||
p
|
||||
| Get the number of rules added to the matcher. Note that this only returns
|
||||
| the number of rules (identical with the number of IDs), not the number
|
||||
| of individual patterns.
|
||||
|
||||
+aside-code("Example").
|
||||
matcher = Matcher(nlp.vocab)
|
||||
assert len(matcher) == 0
|
||||
matcher.add('Rule', None, [{'ORTH': 'test'}])
|
||||
assert len(matcher) == 1
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell int
|
||||
+cell The number of rules.
|
||||
|
||||
+h(2, "contains") Matcher.__contains__
|
||||
+tag method
|
||||
+tag-new(2)
|
||||
|
||||
p Check whether the matcher contains rules for a match ID.
|
||||
|
||||
+aside-code("Example").
|
||||
matcher = Matcher(nlp.vocab)
|
||||
assert 'Rule' not in matcher
|
||||
matcher.add('Rule', None, [{'ORTH': 'test'}])
|
||||
assert 'Rule' in matcher
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code key]
|
||||
+cell unicode
|
||||
+cell The match ID.
|
||||
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell int
|
||||
+cell Whether the matcher contains rules for this match ID.
|
||||
|
||||
+h(2, "add") Matcher.add
|
||||
+tag method
|
||||
+tag-new(2)
|
||||
|
||||
p
|
||||
| Add a rule to the matcher, consisting of an ID key, one or more patterns, and
|
||||
| a callback function to act on the matches. The callback function will
|
||||
| receive the arguments #[code matcher], #[code doc], #[code i] and
|
||||
| #[code matches]. If a pattern already exists for the given ID, the
|
||||
| patterns will be extended. An #[code on_match] callback will be
|
||||
| overwritten.
|
||||
|
||||
+aside-code("Example").
|
||||
def on_match(matcher, doc, id, matches):
|
||||
print('Matched!', matches)
|
||||
|
||||
matcher = Matcher(nlp.vocab)
|
||||
matcher.add('HelloWorld', on_match, [{'LOWER': 'hello'}, {'LOWER': 'world'}])
|
||||
matcher.add('GoogleMaps', on_match, [{'ORTH': 'Google'}, {'ORTH': 'Maps'}])
|
||||
doc = nlp(u'HELLO WORLD on Google Maps.')
|
||||
matches = matcher(doc)
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code match_id]
|
||||
+cell unicode
|
||||
+cell An ID for the thing you're matching.
|
||||
|
||||
+row
|
||||
+cell #[code on_match]
|
||||
+cell callable or #[code None]
|
||||
+cell
|
||||
| Callback function to act on matches. Takes the arguments
|
||||
| #[code matcher], #[code doc], #[code i] and #[code matches].
|
||||
|
||||
+row
|
||||
+cell #[code *patterns]
|
||||
+cell list
|
||||
+cell
|
||||
| Match pattern. A pattern consists of a list of dicts, where each
|
||||
| dict describes a token.
|
||||
|
||||
+infobox("Changed in v2.0", "⚠️")
|
||||
| As of spaCy 2.0, #[code Matcher.add_pattern] and #[code Matcher.add_entity]
|
||||
| are deprecated and have been replaced with a simpler
|
||||
| #[+api("matcher#add") #[code Matcher.add]] that lets you add a list of
|
||||
| patterns and a callback for a given match ID.
|
||||
|
||||
+code-wrapper
|
||||
+code-new.
|
||||
matcher.add('GoogleNow', merge_phrases, [{ORTH: 'Google'}, {ORTH: 'Now'}])
|
||||
|
||||
+code-old.
|
||||
matcher.add_entity('GoogleNow', on_match=merge_phrases)
|
||||
matcher.add_pattern('GoogleNow', [{ORTH: 'Google'}, {ORTH: 'Now'}])
|
||||
|
||||
+h(2, "remove") Matcher.remove
|
||||
+tag method
|
||||
+tag-new(2)
|
||||
|
||||
p
|
||||
| Remove a rule from the matcher. A #[code KeyError] is raised if the match
|
||||
| ID does not exist.
|
||||
|
||||
+aside-code("Example").
|
||||
matcher.add('Rule', None, [{'ORTH': 'test'}])
|
||||
assert 'Rule' in matcher
|
||||
matcher.remove('Rule')
|
||||
assert 'Rule' not in matcher
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code key]
|
||||
+cell unicode
|
||||
+cell The ID of the match rule.
|
||||
|
||||
+h(2, "get") Matcher.get
|
||||
+tag method
|
||||
+tag-new(2)
|
||||
|
||||
p
|
||||
| Retrieve the pattern stored for a key. Returns the rule as an
|
||||
| #[code (on_match, patterns)] tuple containing the callback and available
|
||||
| patterns.
|
||||
|
||||
+aside-code("Example").
|
||||
pattern = [{'ORTH': 'test'}]
|
||||
matcher.add('Rule', None, pattern)
|
||||
on_match, patterns = matcher.get('Rule')
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code key]
|
||||
+cell unicode
|
||||
+cell The ID of the match rule.
|
||||
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell tuple
|
||||
+cell The rule, as an #[code (on_match, patterns)] tuple.
|
|
@ -1,181 +0,0 @@
|
|||
//- 💫 DOCS > API > PHRASEMATCHER
|
||||
|
||||
include ../_includes/_mixins
|
||||
|
||||
p
|
||||
| The #[code PhraseMatcher] lets you efficiently match large terminology
|
||||
| lists. While the #[+api("matcher") #[code Matcher]] lets you match
|
||||
| sequences based on lists of token descriptions, the #[code PhraseMatcher]
|
||||
| accepts match patterns in the form of #[code Doc] objects.
|
||||
|
||||
+h(2, "init") PhraseMatcher.__init__
|
||||
+tag method
|
||||
|
||||
p Create the rule-based #[code PhraseMatcher].
|
||||
|
||||
+aside-code("Example").
|
||||
from spacy.matcher import PhraseMatcher
|
||||
matcher = PhraseMatcher(nlp.vocab, max_length=6)
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code vocab]
|
||||
+cell #[code Vocab]
|
||||
+cell
|
||||
| The vocabulary object, which must be shared with the documents
|
||||
| the matcher will operate on.
|
||||
|
||||
+row
|
||||
+cell #[code max_length]
|
||||
+cell int
|
||||
+cell Maximum length of a phrase pattern to add.
|
||||
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell #[code PhraseMatcher]
|
||||
+cell The newly constructed object.
|
||||
|
||||
+h(2, "call") PhraseMatcher.__call__
|
||||
+tag method
|
||||
|
||||
p Find all token sequences matching the supplied patterns on the #[code Doc].
|
||||
|
||||
+aside-code("Example").
|
||||
from spacy.matcher import PhraseMatcher
|
||||
|
||||
matcher = PhraseMatcher(nlp.vocab)
|
||||
matcher.add('OBAMA', None, nlp(u"Barack Obama"))
|
||||
doc = nlp(u"Barack Obama lifts America one last time in emotional farewell")
|
||||
matches = matcher(doc)
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code doc]
|
||||
+cell #[code Doc]
|
||||
+cell The document to match over.
|
||||
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell list
|
||||
+cell
|
||||
| A list of #[code (match_id, start, end)] tuples, describing the
|
||||
| matches. A match tuple describes a span #[code doc[start:end]].
|
||||
| The #[code match_id] is the ID of the added match pattern.
|
||||
|
||||
+h(2, "pipe") PhraseMatcher.pipe
|
||||
+tag method
|
||||
|
||||
p Match a stream of documents, yielding them in turn.
|
||||
|
||||
+aside-code("Example").
|
||||
from spacy.matcher import PhraseMatcher
|
||||
matcher = PhraseMatcher(nlp.vocab)
|
||||
for doc in matcher.pipe(texts, batch_size=50, n_threads=4):
|
||||
pass
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code docs]
|
||||
+cell iterable
|
||||
+cell A stream of documents.
|
||||
|
||||
+row
|
||||
+cell #[code batch_size]
|
||||
+cell int
|
||||
+cell The number of documents to accumulate into a working set.
|
||||
|
||||
+row
|
||||
+cell #[code n_threads]
|
||||
+cell int
|
||||
+cell
|
||||
| The number of threads with which to work on the buffer in
|
||||
| parallel, if the #[code PhraseMatcher] implementation supports
|
||||
| multi-threading.
|
||||
|
||||
+row("foot")
|
||||
+cell yields
|
||||
+cell #[code Doc]
|
||||
+cell Documents, in order.
|
||||
|
||||
+h(2, "len") PhraseMatcher.__len__
|
||||
+tag method
|
||||
|
||||
p
|
||||
| Get the number of rules added to the matcher. Note that this only returns
|
||||
| the number of rules (identical with the number of IDs), not the number
|
||||
| of individual patterns.
|
||||
|
||||
+aside-code("Example").
|
||||
matcher = PhraseMatcher(nlp.vocab)
|
||||
assert len(matcher) == 0
|
||||
matcher.add('OBAMA', None, nlp(u"Barack Obama"))
|
||||
assert len(matcher) == 1
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell int
|
||||
+cell The number of rules.
|
||||
|
||||
+h(2, "contains") PhraseMatcher.__contains__
|
||||
+tag method
|
||||
|
||||
p Check whether the matcher contains rules for a match ID.
|
||||
|
||||
+aside-code("Example").
|
||||
matcher = PhraseMatcher(nlp.vocab)
|
||||
assert 'OBAMA' not in matcher
|
||||
matcher.add('OBAMA', None, nlp(u"Barack Obama"))
|
||||
assert 'OBAMA' in matcher
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code key]
|
||||
+cell unicode
|
||||
+cell The match ID.
|
||||
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell int
|
||||
+cell Whether the matcher contains rules for this match ID.
|
||||
|
||||
+h(2, "add") PhraseMatcher.add
|
||||
+tag method
|
||||
|
||||
p
|
||||
| Add a rule to the matcher, consisting of an ID key, one or more patterns, and
|
||||
| a callback function to act on the matches. The callback function will
|
||||
| receive the arguments #[code matcher], #[code doc], #[code i] and
|
||||
| #[code matches]. If a pattern already exists for the given ID, the
|
||||
| patterns will be extended. An #[code on_match] callback will be
|
||||
| overwritten.
|
||||
|
||||
+aside-code("Example").
|
||||
def on_match(matcher, doc, id, matches):
|
||||
print('Matched!', matches)
|
||||
|
||||
matcher = PhraseMatcher(nlp.vocab)
|
||||
matcher.add('OBAMA', on_match, nlp(u"Barack Obama"))
|
||||
matcher.add('HEALTH', on_match, nlp(u"health care reform"),
|
||||
nlp(u"healthcare reform"))
|
||||
doc = nlp(u"Barack Obama urges Congress to find courage to defend his healthcare reforms")
|
||||
matches = matcher(doc)
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code match_id]
|
||||
+cell unicode
|
||||
+cell An ID for the thing you're matching.
|
||||
|
||||
+row
|
||||
+cell #[code on_match]
|
||||
+cell callable or #[code None]
|
||||
+cell
|
||||
| Callback function to act on matches. Takes the arguments
|
||||
| #[code matcher], #[code doc], #[code i] and #[code matches].
|
||||
|
||||
+row
|
||||
+cell #[code *docs]
|
||||
+cell list
|
||||
+cell
|
||||
| #[code Doc] objects of the phrases to match.
|
|
@ -1,449 +0,0 @@
|
|||
//- 💫 DOCS > API > PIPE
|
||||
|
||||
include ../_includes/_mixins
|
||||
|
||||
//- This page can be used as a template for all other classes that inherit
|
||||
//- from `Pipe`.
|
||||
|
||||
if subclass
|
||||
+infobox
|
||||
| This class is a subclass of #[+api("pipe") #[code Pipe]] and
|
||||
| follows the same API. The pipeline component is available in the
|
||||
| #[+a("/usage/processing-pipelines") processing pipeline] via the ID
|
||||
| #[code "#{pipeline_id}"].
|
||||
|
||||
else
|
||||
p
|
||||
| This class is not instantiated directly. Components inherit from it,
|
||||
| and it defines the interface that components should follow to
|
||||
| function as components in a spaCy analysis pipeline.
|
||||
|
||||
- CLASSNAME = subclass || 'Pipe'
|
||||
- VARNAME = short || CLASSNAME.toLowerCase()
|
||||
|
||||
|
||||
+h(2, "model") #{CLASSNAME}.Model
|
||||
+tag classmethod
|
||||
|
||||
p
|
||||
| Initialise a model for the pipe. The model should implement the
|
||||
| #[code thinc.neural.Model] API. Wrappers are under development for
|
||||
| most major machine learning libraries.
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code **kwargs]
|
||||
+cell -
|
||||
+cell Parameters for initialising the model
|
||||
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell object
|
||||
+cell The initialised model.
|
||||
|
||||
+h(2, "init") #{CLASSNAME}.__init__
|
||||
+tag method
|
||||
|
||||
p Create a new pipeline instance.
|
||||
|
||||
+aside-code("Example").
|
||||
from spacy.pipeline import #{CLASSNAME}
|
||||
#{VARNAME} = #{CLASSNAME}(nlp.vocab)
|
||||
#{VARNAME}.from_disk('/path/to/model')
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code vocab]
|
||||
+cell #[code Vocab]
|
||||
+cell The shared vocabulary.
|
||||
|
||||
+row
|
||||
+cell #[code model]
|
||||
+cell #[code thinc.neural.Model] or #[code True]
|
||||
+cell
|
||||
| The model powering the pipeline component. If no model is
|
||||
| supplied, the model is created when you call
|
||||
| #[code begin_training], #[code from_disk] or #[code from_bytes].
|
||||
|
||||
+row
|
||||
+cell #[code **cfg]
|
||||
+cell -
|
||||
+cell Configuration parameters.
|
||||
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell #[code=CLASSNAME]
|
||||
+cell The newly constructed object.
|
||||
|
||||
+h(2, "call") #{CLASSNAME}.__call__
|
||||
+tag method
|
||||
|
||||
p
|
||||
| Apply the pipe to one document. The document is modified in place, and
|
||||
| returned. Both #[code #{CLASSNAME}.__call__] and
|
||||
| #[code #{CLASSNAME}.pipe] should delegate to the
|
||||
| #[code #{CLASSNAME}.predict] and #[code #{CLASSNAME}.set_annotations]
|
||||
| methods.
|
||||
|
||||
+aside-code("Example").
|
||||
#{VARNAME} = #{CLASSNAME}(nlp.vocab)
|
||||
doc = nlp(u"This is a sentence.")
|
||||
processed = #{VARNAME}(doc)
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code doc]
|
||||
+cell #[code Doc]
|
||||
+cell The document to process.
|
||||
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell #[code Doc]
|
||||
+cell The processed document.
|
||||
|
||||
+h(2, "pipe") #{CLASSNAME}.pipe
|
||||
+tag method
|
||||
|
||||
p
|
||||
| Apply the pipe to a stream of documents. Both
|
||||
| #[code #{CLASSNAME}.__call__] and #[code #{CLASSNAME}.pipe] should
|
||||
| delegate to the #[code #{CLASSNAME}.predict] and
|
||||
| #[code #{CLASSNAME}.set_annotations] methods.
|
||||
|
||||
+aside-code("Example").
|
||||
texts = [u'One doc', u'...', u'Lots of docs']
|
||||
#{VARNAME} = #{CLASSNAME}(nlp.vocab)
|
||||
for doc in #{VARNAME}.pipe(texts, batch_size=50):
|
||||
pass
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code stream]
|
||||
+cell iterable
|
||||
+cell A stream of documents.
|
||||
|
||||
+row
|
||||
+cell #[code batch_size]
|
||||
+cell int
|
||||
+cell The number of texts to buffer. Defaults to #[code 128].
|
||||
|
||||
+row
|
||||
+cell #[code n_threads]
|
||||
+cell int
|
||||
+cell
|
||||
| The number of worker threads to use. If #[code -1], OpenMP will
|
||||
| decide how many to use at run time. Default is #[code -1].
|
||||
|
||||
+row("foot")
|
||||
+cell yields
|
||||
+cell #[code Doc]
|
||||
+cell Processed documents in the order of the original text.
|
||||
|
||||
+h(2, "predict") #{CLASSNAME}.predict
|
||||
+tag method
|
||||
|
||||
p
|
||||
| Apply the pipeline's model to a batch of docs, without modifying them.
|
||||
|
||||
+aside-code("Example").
|
||||
#{VARNAME} = #{CLASSNAME}(nlp.vocab)
|
||||
scores = #{VARNAME}.predict([doc1, doc2])
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code docs]
|
||||
+cell iterable
|
||||
+cell The documents to predict.
|
||||
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell -
|
||||
+cell Scores from the model.
|
||||
|
||||
+h(2, "set_annotations") #{CLASSNAME}.set_annotations
|
||||
+tag method
|
||||
|
||||
p
|
||||
| Modify a batch of documents, using pre-computed scores.
|
||||
|
||||
+aside-code("Example").
|
||||
#{VARNAME} = #{CLASSNAME}(nlp.vocab)
|
||||
scores = #{VARNAME}.predict([doc1, doc2])
|
||||
#{VARNAME}.set_annotations([doc1, doc2], scores)
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code docs]
|
||||
+cell iterable
|
||||
+cell The documents to modify.
|
||||
|
||||
+row
|
||||
+cell #[code scores]
|
||||
+cell -
|
||||
+cell The scores to set, produced by #[code #{CLASSNAME}.predict].
|
||||
|
||||
+h(2, "update") #{CLASSNAME}.update
|
||||
+tag method
|
||||
|
||||
p
|
||||
| Learn from a batch of documents and gold-standard information, updating
|
||||
| the pipe's model. Delegates to #[code #{CLASSNAME}.predict] and
|
||||
| #[code #{CLASSNAME}.get_loss].
|
||||
|
||||
+aside-code("Example").
|
||||
#{VARNAME} = #{CLASSNAME}(nlp.vocab)
|
||||
losses = {}
|
||||
optimizer = nlp.begin_training()
|
||||
#{VARNAME}.update([doc1, doc2], [gold1, gold2], losses=losses, sgd=optimizer)
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code docs]
|
||||
+cell iterable
|
||||
+cell A batch of documents to learn from.
|
||||
|
||||
+row
|
||||
+cell #[code golds]
|
||||
+cell iterable
|
||||
+cell The gold-standard data. Must have the same length as #[code docs].
|
||||
|
||||
+row
|
||||
+cell #[code drop]
|
||||
+cell float
|
||||
+cell The dropout rate.
|
||||
|
||||
+row
|
||||
+cell #[code sgd]
|
||||
+cell callable
|
||||
+cell
|
||||
| The optimizer. Should take two arguments #[code weights] and
|
||||
| #[code gradient], and an optional ID.
|
||||
|
||||
+row
|
||||
+cell #[code losses]
|
||||
+cell dict
|
||||
+cell
|
||||
| Optional record of the loss during training. The value keyed by
|
||||
| the model's name is updated.
|
||||
|
||||
+h(2, "get_loss") #{CLASSNAME}.get_loss
|
||||
+tag method
|
||||
|
||||
p
|
||||
| Find the loss and gradient of loss for the batch of documents and their
|
||||
| predicted scores.
|
||||
|
||||
+aside-code("Example").
|
||||
#{VARNAME} = #{CLASSNAME}(nlp.vocab)
|
||||
scores = #{VARNAME}.predict([doc1, doc2])
|
||||
loss, d_loss = #{VARNAME}.get_loss([doc1, doc2], [gold1, gold2], scores)
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code docs]
|
||||
+cell iterable
|
||||
+cell The batch of documents.
|
||||
|
||||
+row
|
||||
+cell #[code golds]
|
||||
+cell iterable
|
||||
+cell The gold-standard data. Must have the same length as #[code docs].
|
||||
|
||||
+row
|
||||
+cell #[code scores]
|
||||
+cell -
|
||||
+cell Scores representing the model's predictions.
|
||||
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell tuple
|
||||
+cell The loss and the gradient, i.e. #[code (loss, gradient)].
|
||||
|
||||
+h(2, "begin_training") #{CLASSNAME}.begin_training
|
||||
+tag method
|
||||
|
||||
p
|
||||
| Initialise the pipe for training, using data exampes if available. If no
|
||||
| model has been initialised yet, the model is added.
|
||||
|
||||
+aside-code("Example").
|
||||
#{VARNAME} = #{CLASSNAME}(nlp.vocab)
|
||||
nlp.pipeline.append(#{VARNAME})
|
||||
optimizer = #{VARNAME}.begin_training(pipeline=nlp.pipeline)
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code gold_tuples]
|
||||
+cell iterable
|
||||
+cell
|
||||
| Optional gold-standard annotations from which to construct
|
||||
| #[+api("goldparse") #[code GoldParse]] objects.
|
||||
|
||||
+row
|
||||
+cell #[code pipeline]
|
||||
+cell list
|
||||
+cell
|
||||
| Optional list of #[+api("pipe") #[code Pipe]] components that
|
||||
| this component is part of.
|
||||
|
||||
+row
|
||||
+cell #[code sgd]
|
||||
+cell callable
|
||||
+cell
|
||||
| An optional optimizer. Should take two arguments #[code weights]
|
||||
| and #[code gradient], and an optional ID. Will be created via
|
||||
| #[+api(CLASSNAME.toLowerCase() + "#create_optimizer") #[code create_optimizer]]
|
||||
| if not set.
|
||||
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell callable
|
||||
+cell An optimizer.
|
||||
|
||||
+h(2, "create_optimizer") #{CLASSNAME}.create_optimizer
|
||||
+tag method
|
||||
|
||||
p
|
||||
| Create an optmizer for the pipeline component.
|
||||
|
||||
+aside-code("Example").
|
||||
#{VARNAME} = #{CLASSNAME}(nlp.vocab)
|
||||
optimizer = #{VARNAME}.create_optimizer()
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell callable
|
||||
+cell The optimizer.
|
||||
|
||||
+h(2, "use_params") #{CLASSNAME}.use_params
|
||||
+tag method
|
||||
+tag contextmanager
|
||||
|
||||
p Modify the pipe's model, to use the given parameter values.
|
||||
|
||||
+aside-code("Example").
|
||||
#{VARNAME} = #{CLASSNAME}(nlp.vocab)
|
||||
with #{VARNAME}.use_params():
|
||||
#{VARNAME}.to_disk('/best_model')
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code params]
|
||||
+cell -
|
||||
+cell
|
||||
| The parameter values to use in the model. At the end of the
|
||||
| context, the original parameters are restored.
|
||||
|
||||
+h(2, "add_label") #{CLASSNAME}.add_label
|
||||
+tag method
|
||||
|
||||
p Add a new label to the pipe.
|
||||
|
||||
if CLASSNAME == "Tagger"
|
||||
+aside-code("Example").
|
||||
#{VARNAME} = #{CLASSNAME}(nlp.vocab)
|
||||
#{VARNAME}.add_label('MY_LABEL', {POS: 'NOUN'})
|
||||
else
|
||||
+aside-code("Example").
|
||||
#{VARNAME} = #{CLASSNAME}(nlp.vocab)
|
||||
#{VARNAME}.add_label('MY_LABEL')
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code label]
|
||||
+cell unicode
|
||||
+cell The label to add.
|
||||
|
||||
if CLASSNAME == "Tagger"
|
||||
+row
|
||||
+cell #[code values]
|
||||
+cell dict
|
||||
+cell
|
||||
| Optional values to map to the label, e.g. a tag map
|
||||
| dictionary.
|
||||
|
||||
+h(2, "to_disk") #{CLASSNAME}.to_disk
|
||||
+tag method
|
||||
|
||||
p Serialize the pipe to disk.
|
||||
|
||||
+aside-code("Example").
|
||||
#{VARNAME} = #{CLASSNAME}(nlp.vocab)
|
||||
#{VARNAME}.to_disk('/path/to/#{VARNAME}')
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code path]
|
||||
+cell unicode or #[code Path]
|
||||
+cell
|
||||
| A path to a directory, which will be created if it doesn't exist.
|
||||
| Paths may be either strings or #[code Path]-like objects.
|
||||
|
||||
+h(2, "from_disk") #{CLASSNAME}.from_disk
|
||||
+tag method
|
||||
|
||||
p Load the pipe from disk. Modifies the object in place and returns it.
|
||||
|
||||
+aside-code("Example").
|
||||
#{VARNAME} = #{CLASSNAME}(nlp.vocab)
|
||||
#{VARNAME}.from_disk('/path/to/#{VARNAME}')
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code path]
|
||||
+cell unicode or #[code Path]
|
||||
+cell
|
||||
| A path to a directory. Paths may be either strings or
|
||||
| #[code Path]-like objects.
|
||||
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell #[code=CLASSNAME]
|
||||
+cell The modified #[code=CLASSNAME] object.
|
||||
|
||||
+h(2, "to_bytes") #{CLASSNAME}.to_bytes
|
||||
+tag method
|
||||
|
||||
+aside-code("example").
|
||||
#{VARNAME} = #{CLASSNAME}(nlp.vocab)
|
||||
#{VARNAME}_bytes = #{VARNAME}.to_bytes()
|
||||
|
||||
p Serialize the pipe to a bytestring.
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code **exclude]
|
||||
+cell -
|
||||
+cell Named attributes to prevent from being serialized.
|
||||
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell bytes
|
||||
+cell The serialized form of the #[code=CLASSNAME] object.
|
||||
|
||||
+h(2, "from_bytes") #{CLASSNAME}.from_bytes
|
||||
+tag method
|
||||
|
||||
p Load the pipe from a bytestring. Modifies the object in place and returns it.
|
||||
|
||||
+aside-code("Example").
|
||||
#{VARNAME}_bytes = #{VARNAME}.to_bytes()
|
||||
#{VARNAME} = #{CLASSNAME}(nlp.vocab)
|
||||
#{VARNAME}.from_bytes(#{VARNAME}_bytes)
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code bytes_data]
|
||||
+cell bytes
|
||||
+cell The data to load from.
|
||||
|
||||
+row
|
||||
+cell #[code **exclude]
|
||||
+cell -
|
||||
+cell Named attributes to prevent from being loaded.
|
||||
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell #[code=CLASSNAME]
|
||||
+cell The #[code=CLASSNAME] object.
|
|
@ -1,655 +0,0 @@
|
|||
//- 💫 DOCS > API > SPAN
|
||||
|
||||
include ../_includes/_mixins
|
||||
|
||||
p A slice from a #[+api("doc") #[code Doc]] object.
|
||||
|
||||
+h(2, "init") Span.__init__
|
||||
+tag method
|
||||
|
||||
p Create a Span object from the #[code slice doc[start : end]].
|
||||
|
||||
+aside-code("Example").
|
||||
doc = nlp(u'Give it back! He pleaded.')
|
||||
span = doc[1:4]
|
||||
assert [t.text for t in span] == [u'it', u'back', u'!']
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code doc]
|
||||
+cell #[code Doc]
|
||||
+cell The parent document.
|
||||
|
||||
+row
|
||||
+cell #[code start]
|
||||
+cell int
|
||||
+cell The index of the first token of the span.
|
||||
|
||||
+row
|
||||
+cell #[code end]
|
||||
+cell int
|
||||
+cell The index of the first token after the span.
|
||||
|
||||
+row
|
||||
+cell #[code label]
|
||||
+cell int
|
||||
+cell A label to attach to the span, e.g. for named entities.
|
||||
|
||||
+row
|
||||
+cell #[code vector]
|
||||
+cell #[code.u-break numpy.ndarray[ndim=1, dtype='float32']]
|
||||
+cell A meaning representation of the span.
|
||||
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell #[code Span]
|
||||
+cell The newly constructed object.
|
||||
|
||||
+h(2, "getitem") Span.__getitem__
|
||||
+tag method
|
||||
|
||||
p Get a #[code Token] object.
|
||||
|
||||
+aside-code("Example").
|
||||
doc = nlp(u'Give it back! He pleaded.')
|
||||
span = doc[1:4]
|
||||
assert span[1].text == 'back'
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code i]
|
||||
+cell int
|
||||
+cell The index of the token within the span.
|
||||
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell #[code Token]
|
||||
+cell The token at #[code span[i]].
|
||||
|
||||
p Get a #[code Span] object.
|
||||
|
||||
+aside-code("Example").
|
||||
doc = nlp(u'Give it back! He pleaded.')
|
||||
span = doc[1:4]
|
||||
assert span[1:3].text == 'back!'
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code start_end]
|
||||
+cell tuple
|
||||
+cell The slice of the span to get.
|
||||
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell #[code Span]
|
||||
+cell The span at #[code span[start : end]].
|
||||
|
||||
+h(2, "iter") Span.__iter__
|
||||
+tag method
|
||||
|
||||
p Iterate over #[code Token] objects.
|
||||
|
||||
+aside-code("Example").
|
||||
doc = nlp(u'Give it back! He pleaded.')
|
||||
span = doc[1:4]
|
||||
assert [t.text for t in span] == ['it', 'back', '!']
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row("foot")
|
||||
+cell yields
|
||||
+cell #[code Token]
|
||||
+cell A #[code Token] object.
|
||||
|
||||
+h(2, "len") Span.__len__
|
||||
+tag method
|
||||
|
||||
p Get the number of tokens in the span.
|
||||
|
||||
+aside-code("Example").
|
||||
doc = nlp(u'Give it back! He pleaded.')
|
||||
span = doc[1:4]
|
||||
assert len(span) == 3
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell int
|
||||
+cell The number of tokens in the span.
|
||||
|
||||
+h(2, "set_extension") Span.set_extension
|
||||
+tag classmethod
|
||||
+tag-new(2)
|
||||
|
||||
p
|
||||
| Define a custom attribute on the #[code Span] which becomes available via
|
||||
| #[code Span._]. For details, see the documentation on
|
||||
| #[+a("/usage/processing-pipelines#custom-components-attributes") custom attributes].
|
||||
|
||||
+aside-code("Example").
|
||||
from spacy.tokens import Span
|
||||
city_getter = lambda span: any(city in span.text for city in ('New York', 'Paris', 'Berlin'))
|
||||
Span.set_extension('has_city', getter=city_getter)
|
||||
doc = nlp(u'I like New York in Autumn')
|
||||
assert doc[1:4]._.has_city
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code name]
|
||||
+cell unicode
|
||||
+cell
|
||||
| Name of the attribute to set by the extension. For example,
|
||||
| #[code 'my_attr'] will be available as #[code span._.my_attr].
|
||||
|
||||
+row
|
||||
+cell #[code default]
|
||||
+cell -
|
||||
+cell
|
||||
| Optional default value of the attribute if no getter or method
|
||||
| is defined.
|
||||
|
||||
+row
|
||||
+cell #[code method]
|
||||
+cell callable
|
||||
+cell
|
||||
| Set a custom method on the object, for example
|
||||
| #[code span._.compare(other_span)].
|
||||
|
||||
+row
|
||||
+cell #[code getter]
|
||||
+cell callable
|
||||
+cell
|
||||
| Getter function that takes the object and returns an attribute
|
||||
| value. Is called when the user accesses the #[code ._] attribute.
|
||||
|
||||
+row
|
||||
+cell #[code setter]
|
||||
+cell callable
|
||||
+cell
|
||||
| Setter function that takes the #[code Span] and a value, and
|
||||
| modifies the object. Is called when the user writes to the
|
||||
| #[code Span._] attribute.
|
||||
|
||||
+h(2, "get_extension") Span.get_extension
|
||||
+tag classmethod
|
||||
+tag-new(2)
|
||||
|
||||
p
|
||||
| Look up a previously registered extension by name. Returns a 4-tuple
|
||||
| #[code.u-break (default, method, getter, setter)] if the extension is
|
||||
| registered. Raises a #[code KeyError] otherwise.
|
||||
|
||||
+aside-code("Example").
|
||||
from spacy.tokens import Span
|
||||
Span.set_extension('is_city', default=False)
|
||||
extension = Span.get_extension('is_city')
|
||||
assert extension == (False, None, None, None)
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code name]
|
||||
+cell unicode
|
||||
+cell Name of the extension.
|
||||
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell tuple
|
||||
+cell
|
||||
| A #[code.u-break (default, method, getter, setter)] tuple of the
|
||||
| extension.
|
||||
|
||||
+h(2, "has_extension") Span.has_extension
|
||||
+tag classmethod
|
||||
+tag-new(2)
|
||||
|
||||
p Check whether an extension has been registered on the #[code Span] class.
|
||||
|
||||
+aside-code("Example").
|
||||
from spacy.tokens import Span
|
||||
Span.set_extension('is_city', default=False)
|
||||
assert Span.has_extension('is_city')
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code name]
|
||||
+cell unicode
|
||||
+cell Name of the extension to check.
|
||||
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell bool
|
||||
+cell Whether the extension has been registered.
|
||||
|
||||
+h(2, "remove_extension") Span.remove_extension
|
||||
+tag classmethod
|
||||
+tag-new("2.0.12")
|
||||
|
||||
p Remove a previously registered extension.
|
||||
|
||||
+aside-code("Example").
|
||||
from spacy.tokens import Span
|
||||
Span.set_extension('is_city', default=False)
|
||||
removed = Span.remove_extension('is_city')
|
||||
assert not Span.has_extension('is_city')
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code name]
|
||||
+cell unicode
|
||||
+cell Name of the extension.
|
||||
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell tuple
|
||||
+cell
|
||||
| A #[code.u-break (default, method, getter, setter)] tuple of the
|
||||
| removed extension.
|
||||
|
||||
+h(2, "similarity") Span.similarity
|
||||
+tag method
|
||||
+tag-model("vectors")
|
||||
|
||||
p
|
||||
| Make a semantic similarity estimate. The default estimate is cosine
|
||||
| similarity using an average of word vectors.
|
||||
|
||||
+aside-code("Example").
|
||||
doc = nlp(u'green apples and red oranges')
|
||||
green_apples = doc[:2]
|
||||
red_oranges = doc[3:]
|
||||
apples_oranges = green_apples.similarity(red_oranges)
|
||||
oranges_apples = red_oranges.similarity(green_apples)
|
||||
assert apples_oranges == oranges_apples
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code other]
|
||||
+cell -
|
||||
+cell
|
||||
| The object to compare with. By default, accepts #[code Doc],
|
||||
| #[code Span], #[code Token] and #[code Lexeme] objects.
|
||||
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell float
|
||||
+cell A scalar similarity score. Higher is more similar.
|
||||
|
||||
+h(2, "get_lca_matrix") Span.get_lca_matrix
|
||||
+tag method
|
||||
|
||||
p
|
||||
| Calculates the lowest common ancestor matrix for a given #[code Span].
|
||||
| Returns LCA matrix containing the integer index of the ancestor, or
|
||||
| #[code -1] if no common ancestor is found, e.g. if span excludes a
|
||||
| necessary ancestor.
|
||||
|
||||
+aside-code("Example").
|
||||
doc = nlp(u'I like New York in Autumn')
|
||||
span = doc[1:4]
|
||||
matrix = span.get_lca_matrix()
|
||||
# array([[0, 0, 0], [0, 1, 2], [0, 2, 2]], dtype=int32)
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell #[code.u-break numpy.ndarray[ndim=2, dtype='int32']]
|
||||
+cell The lowest common ancestor matrix of the #[code Span].
|
||||
|
||||
|
||||
+h(2, "to_array") Span.to_array
|
||||
+tag method
|
||||
+tag-new(2)
|
||||
|
||||
p
|
||||
| Given a list of #[code M] attribute IDs, export the tokens to a numpy
|
||||
| #[code ndarray] of shape #[code (N, M)], where #[code N] is the length of
|
||||
| the document. The values will be 32-bit integers.
|
||||
|
||||
+aside-code("Example").
|
||||
from spacy.attrs import LOWER, POS, ENT_TYPE, IS_ALPHA
|
||||
doc = nlp(u'I like New York in Autumn.')
|
||||
span = doc[2:3]
|
||||
# All strings mapped to integers, for easy export to numpy
|
||||
np_array = span.to_array([LOWER, POS, ENT_TYPE, IS_ALPHA])
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code attr_ids]
|
||||
+cell list
|
||||
+cell A list of attribute ID ints.
|
||||
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell #[code.u-break numpy.ndarray[long, ndim=2]]
|
||||
+cell
|
||||
| A feature matrix, with one row per word, and one column per
|
||||
| attribute indicated in the input #[code attr_ids].
|
||||
|
||||
+h(2, "merge") Span.merge
|
||||
+tag method
|
||||
|
||||
p Retokenize the document, such that the span is merged into a single token.
|
||||
|
||||
+aside-code("Example").
|
||||
doc = nlp(u'I like New York in Autumn.')
|
||||
span = doc[2:4]
|
||||
span.merge()
|
||||
assert len(doc) == 6
|
||||
assert doc[2].text == 'New York'
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code **attributes]
|
||||
+cell -
|
||||
+cell
|
||||
| Attributes to assign to the merged token. By default, attributes
|
||||
| are inherited from the syntactic root token of the span.
|
||||
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell #[code Token]
|
||||
+cell The newly merged token.
|
||||
|
||||
+h(2, "ents") Span.ents
|
||||
+tag property
|
||||
+tag-model("NER")
|
||||
+tag-new("2.0.12")
|
||||
|
||||
p
|
||||
| Iterate over the entities in the span. Yields named-entity
|
||||
| #[code Span] objects, if the entity recognizer has been applied to the
|
||||
| parent document.
|
||||
|
||||
+aside-code("Example").
|
||||
doc = nlp(u'Mr. Best flew to New York on Saturday morning.')
|
||||
span = doc[0:6]
|
||||
ents = list(span.ents)
|
||||
assert ents[0].label == 346
|
||||
assert ents[0].label_ == 'PERSON'
|
||||
assert ents[0].text == 'Mr. Best'
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row("foot")
|
||||
+cell yields
|
||||
+cell #[code Span]
|
||||
+cell Entities in the document.
|
||||
|
||||
|
||||
+h(2, "as_doc") Span.as_doc
|
||||
|
||||
p
|
||||
| Create a new #[code Doc] object corresponding to the #[code Span], with
|
||||
| a copy of the data.
|
||||
|
||||
+aside-code("Example").
|
||||
doc = nlp(u'I like New York in Autumn.')
|
||||
span = doc[2:4]
|
||||
doc2 = span.as_doc()
|
||||
assert doc2.text == 'New York'
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell #[code Doc]
|
||||
+cell A #[code Doc] object of the #[code Span]'s content.
|
||||
|
||||
|
||||
+h(2, "root") Span.root
|
||||
+tag property
|
||||
+tag-model("parse")
|
||||
|
||||
p
|
||||
| The token within the span that's highest in the parse tree. If there's a
|
||||
| tie, the earliest is preferred.
|
||||
|
||||
+aside-code("Example").
|
||||
doc = nlp(u'I like New York in Autumn.')
|
||||
i, like, new, york, in_, autumn, dot = range(len(doc))
|
||||
assert doc[new].head.text == 'York'
|
||||
assert doc[york].head.text == 'like'
|
||||
new_york = doc[new:york+1]
|
||||
assert new_york.root.text == 'York'
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell #[code Token]
|
||||
+cell The root token.
|
||||
|
||||
+h(2, "lefts") Span.lefts
|
||||
+tag property
|
||||
+tag-model("parse")
|
||||
|
||||
p Tokens that are to the left of the span, whose heads are within the span.
|
||||
|
||||
+aside-code("Example").
|
||||
doc = nlp(u'I like New York in Autumn.')
|
||||
lefts = [t.text for t in doc[3:7].lefts]
|
||||
assert lefts == [u'New']
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row("foot")
|
||||
+cell yields
|
||||
+cell #[code Token]
|
||||
+cell A left-child of a token of the span.
|
||||
|
||||
+h(2, "rights") Span.rights
|
||||
+tag property
|
||||
+tag-model("parse")
|
||||
|
||||
p Tokens that are to the right of the span, whose heads are within the span.
|
||||
|
||||
+aside-code("Example").
|
||||
doc = nlp(u'I like New York in Autumn.')
|
||||
rights = [t.text for t in doc[2:4].rights]
|
||||
assert rights == [u'in']
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row("foot")
|
||||
+cell yields
|
||||
+cell #[code Token]
|
||||
+cell A right-child of a token of the span.
|
||||
|
||||
+h(2, "n_lefts") Span.n_lefts
|
||||
+tag property
|
||||
+tag-model("parse")
|
||||
|
||||
p
|
||||
| The number of tokens that are to the left of the span, whose heads are
|
||||
| within the span.
|
||||
|
||||
+aside-code("Example").
|
||||
doc = nlp(u'I like New York in Autumn.')
|
||||
assert doc[3:7].n_lefts == 1
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell int
|
||||
+cell The number of left-child tokens.
|
||||
|
||||
+h(2, "n_rights") Span.n_rights
|
||||
+tag property
|
||||
+tag-model("parse")
|
||||
|
||||
p
|
||||
| The number of tokens that are to the right of the span, whose heads are
|
||||
| within the span.
|
||||
|
||||
+aside-code("Example").
|
||||
doc = nlp(u'I like New York in Autumn.')
|
||||
assert doc[2:4].n_rights == 1
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell int
|
||||
+cell The number of right-child tokens.
|
||||
|
||||
+h(2, "subtree") Span.subtree
|
||||
+tag property
|
||||
+tag-model("parse")
|
||||
|
||||
p Tokens within the span and tokens which descend from them.
|
||||
|
||||
+aside-code("Example").
|
||||
doc = nlp(u'Give it back! He pleaded.')
|
||||
subtree = [t.text for t in doc[:3].subtree]
|
||||
assert subtree == [u'Give', u'it', u'back', u'!']
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row("foot")
|
||||
+cell yields
|
||||
+cell #[code Token]
|
||||
+cell A token within the span, or a descendant from it.
|
||||
|
||||
+h(2, "has_vector") Span.has_vector
|
||||
+tag property
|
||||
+tag-model("vectors")
|
||||
|
||||
p
|
||||
| A boolean value indicating whether a word vector is associated with the
|
||||
| object.
|
||||
|
||||
+aside-code("Example").
|
||||
doc = nlp(u'I like apples')
|
||||
assert doc[1:].has_vector
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell bool
|
||||
+cell Whether the span has a vector data attached.
|
||||
|
||||
+h(2, "vector") Span.vector
|
||||
+tag property
|
||||
+tag-model("vectors")
|
||||
|
||||
p
|
||||
| A real-valued meaning representation. Defaults to an average of the
|
||||
| token vectors.
|
||||
|
||||
+aside-code("Example").
|
||||
doc = nlp(u'I like apples')
|
||||
assert doc[1:].vector.dtype == 'float32'
|
||||
assert doc[1:].vector.shape == (300,)
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell #[code.u-break numpy.ndarray[ndim=1, dtype='float32']]
|
||||
+cell A 1D numpy array representing the span's semantics.
|
||||
|
||||
+h(2, "vector_norm") Span.vector_norm
|
||||
+tag property
|
||||
+tag-model("vectors")
|
||||
|
||||
p
|
||||
| The L2 norm of the span's vector representation.
|
||||
|
||||
+aside-code("Example").
|
||||
doc = nlp(u'I like apples')
|
||||
doc[1:].vector_norm # 4.800883928527915
|
||||
doc[2:].vector_norm # 6.895897646384268
|
||||
assert doc[1:].vector_norm != doc[2:].vector_norm
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell float
|
||||
+cell The L2 norm of the vector representation.
|
||||
|
||||
+h(2, "attributes") Attributes
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code doc]
|
||||
+cell #[code Doc]
|
||||
+cell The parent document.
|
||||
|
||||
+row
|
||||
+cell #[code sent]
|
||||
+cell #[code Span]
|
||||
+cell The sentence span that this span is a part of.
|
||||
|
||||
+row
|
||||
+cell #[code start]
|
||||
+cell int
|
||||
+cell The token offset for the start of the span.
|
||||
|
||||
+row
|
||||
+cell #[code end]
|
||||
+cell int
|
||||
+cell The token offset for the end of the span.
|
||||
|
||||
+row
|
||||
+cell #[code start_char]
|
||||
+cell int
|
||||
+cell The character offset for the start of the span.
|
||||
|
||||
+row
|
||||
+cell #[code end_char]
|
||||
+cell int
|
||||
+cell The character offset for the end of the span.
|
||||
|
||||
+row
|
||||
+cell #[code text]
|
||||
+cell unicode
|
||||
+cell A unicode representation of the span text.
|
||||
|
||||
+row
|
||||
+cell #[code text_with_ws]
|
||||
+cell unicode
|
||||
+cell
|
||||
| The text content of the span with a trailing whitespace character
|
||||
| if the last token has one.
|
||||
|
||||
+row
|
||||
+cell #[code orth]
|
||||
+cell int
|
||||
+cell ID of the verbatim text content.
|
||||
|
||||
+row
|
||||
+cell #[code orth_]
|
||||
+cell unicode
|
||||
+cell
|
||||
| Verbatim text content (identical to #[code Span.text]). Exists
|
||||
| mostly for consistency with the other attributes.
|
||||
|
||||
+row
|
||||
+cell #[code label]
|
||||
+cell int
|
||||
+cell The span's label.
|
||||
|
||||
+row
|
||||
+cell #[code label_]
|
||||
+cell unicode
|
||||
+cell The span's label.
|
||||
|
||||
+row
|
||||
+cell #[code lemma_]
|
||||
+cell unicode
|
||||
+cell The span's lemma.
|
||||
|
||||
+row
|
||||
+cell #[code ent_id]
|
||||
+cell int
|
||||
+cell The hash value of the named entity the token is an instance of.
|
||||
|
||||
+row
|
||||
+cell #[code ent_id_]
|
||||
+cell unicode
|
||||
+cell The string ID of the named entity the token is an instance of.
|
||||
|
||||
+row
|
||||
+cell #[code sentiment]
|
||||
+cell float
|
||||
+cell
|
||||
| A scalar value indicating the positivity or negativity of the
|
||||
| span.
|
||||
|
||||
+row
|
||||
+cell #[code _]
|
||||
+cell #[code Underscore]
|
||||
+cell
|
||||
| User space for adding custom
|
||||
| #[+a("/usage/processing-pipelines#custom-components-attributes") attribute extensions].
|
|
@ -1,239 +0,0 @@
|
|||
//- 💫 DOCS > API > STRINGSTORE
|
||||
|
||||
include ../_includes/_mixins
|
||||
|
||||
p
|
||||
| Look up strings by 64-bit hashes. As of v2.0, spaCy uses hash values
|
||||
| instead of integer IDs. This ensures that strings always map to the
|
||||
| same ID, even from different #[code StringStores].
|
||||
|
||||
+h(2, "init") StringStore.__init__
|
||||
+tag method
|
||||
|
||||
p
|
||||
| Create the #[code StringStore].
|
||||
|
||||
+aside-code("Example").
|
||||
from spacy.strings import StringStore
|
||||
stringstore = StringStore([u'apple', u'orange'])
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code strings]
|
||||
+cell iterable
|
||||
+cell A sequence of unicode strings to add to the store.
|
||||
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell #[code StringStore]
|
||||
+cell The newly constructed object.
|
||||
|
||||
+h(2, "len") StringStore.__len__
|
||||
+tag method
|
||||
|
||||
p Get the number of strings in the store.
|
||||
|
||||
+aside-code("Example").
|
||||
stringstore = StringStore([u'apple', u'orange'])
|
||||
assert len(stringstore) == 2
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell int
|
||||
+cell The number of strings in the store.
|
||||
|
||||
+h(2, "getitem") StringStore.__getitem__
|
||||
+tag method
|
||||
|
||||
p Retrieve a string from a given hash, or vice versa.
|
||||
|
||||
+aside-code("Example").
|
||||
stringstore = StringStore([u'apple', u'orange'])
|
||||
apple_hash = stringstore[u'apple']
|
||||
assert apple_hash == 8566208034543834098
|
||||
assert stringstore[apple_hash] == u'apple'
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code string_or_id]
|
||||
+cell bytes, unicode or uint64
|
||||
+cell The value to encode.
|
||||
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell unicode or int
|
||||
+cell The value to be retrieved.
|
||||
|
||||
+h(2, "contains") StringStore.__contains__
|
||||
+tag method
|
||||
|
||||
p Check whether a string is in the store.
|
||||
|
||||
+aside-code("Example").
|
||||
stringstore = StringStore([u'apple', u'orange'])
|
||||
assert u'apple' in stringstore
|
||||
assert not u'cherry' in stringstore
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code string]
|
||||
+cell unicode
|
||||
+cell The string to check.
|
||||
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell bool
|
||||
+cell Whether the store contains the string.
|
||||
|
||||
+h(2, "iter") StringStore.__iter__
|
||||
+tag method
|
||||
|
||||
p
|
||||
| Iterate over the strings in the store, in order. Note that a newly
|
||||
| initialised store will always include an empty string #[code ''] at
|
||||
| position #[code 0].
|
||||
|
||||
+aside-code("Example").
|
||||
stringstore = StringStore([u'apple', u'orange'])
|
||||
all_strings = [s for s in stringstore]
|
||||
assert all_strings == [u'apple', u'orange']
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row("foot")
|
||||
+cell yields
|
||||
+cell unicode
|
||||
+cell A string in the store.
|
||||
|
||||
+h(2, "add") StringStore.add
|
||||
+tag method
|
||||
+tag-new(2)
|
||||
|
||||
p Add a string to the #[code StringStore].
|
||||
|
||||
+aside-code("Example").
|
||||
stringstore = StringStore([u'apple', u'orange'])
|
||||
banana_hash = stringstore.add(u'banana')
|
||||
assert len(stringstore) == 3
|
||||
assert banana_hash == 2525716904149915114
|
||||
assert stringstore[banana_hash] == u'banana'
|
||||
assert stringstore[u'banana'] == banana_hash
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code string]
|
||||
+cell unicode
|
||||
+cell The string to add.
|
||||
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell uint64
|
||||
+cell The string's hash value.
|
||||
|
||||
|
||||
+h(2, "to_disk") StringStore.to_disk
|
||||
+tag method
|
||||
+tag-new(2)
|
||||
|
||||
p Save the current state to a directory.
|
||||
|
||||
+aside-code("Example").
|
||||
stringstore.to_disk('/path/to/strings')
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code path]
|
||||
+cell unicode or #[code Path]
|
||||
+cell
|
||||
| A path to a directory, which will be created if it doesn't exist.
|
||||
| Paths may be either strings or #[code Path]-like objects.
|
||||
|
||||
+h(2, "from_disk") StringStore.from_disk
|
||||
+tag method
|
||||
+tag-new(2)
|
||||
|
||||
p Loads state from a directory. Modifies the object in place and returns it.
|
||||
|
||||
+aside-code("Example").
|
||||
from spacy.strings import StringStore
|
||||
stringstore = StringStore().from_disk('/path/to/strings')
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code path]
|
||||
+cell unicode or #[code Path]
|
||||
+cell
|
||||
| A path to a directory. Paths may be either strings or
|
||||
| #[code Path]-like objects.
|
||||
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell #[code StringStore]
|
||||
+cell The modified #[code StringStore] object.
|
||||
|
||||
+h(2, "to_bytes") StringStore.to_bytes
|
||||
+tag method
|
||||
|
||||
p Serialize the current state to a binary string.
|
||||
|
||||
+aside-code("Example").
|
||||
store_bytes = stringstore.to_bytes()
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code **exclude]
|
||||
+cell -
|
||||
+cell Named attributes to prevent from being serialized.
|
||||
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell bytes
|
||||
+cell The serialized form of the #[code StringStore] object.
|
||||
|
||||
+h(2, "from_bytes") StringStore.from_bytes
|
||||
+tag method
|
||||
|
||||
p Load state from a binary string.
|
||||
|
||||
+aside-code("Example").
|
||||
fron spacy.strings import StringStore
|
||||
store_bytes = stringstore.to_bytes()
|
||||
new_store = StringStore().from_bytes(store_bytes)
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code bytes_data]
|
||||
+cell bytes
|
||||
+cell The data to load from.
|
||||
|
||||
+row
|
||||
+cell #[code **exclude]
|
||||
+cell -
|
||||
+cell Named attributes to prevent from being loaded.
|
||||
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell #[code StringStore]
|
||||
+cell The #[code StringStore] object.
|
||||
|
||||
+h(2, "util") Utilities
|
||||
|
||||
+h(3, "hash_string") strings.hash_string
|
||||
+tag function
|
||||
|
||||
p Get a 64-bit hash for a given string.
|
||||
|
||||
+aside-code("Example").
|
||||
from spacy.strings import hash_string
|
||||
assert hash_string(u'apple') == 8566208034543834098
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code string]
|
||||
+cell unicode
|
||||
+cell The string to hash.
|
||||
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell uint64
|
||||
+cell The hash.
|
|
@ -1,6 +0,0 @@
|
|||
//- 💫 DOCS > API > TAGGER
|
||||
|
||||
include ../_includes/_mixins
|
||||
|
||||
//- This class inherits from Pipe, so this page uses the template in pipe.jade.
|
||||
!=partial("pipe", { subclass: "Tagger", pipeline_id: "tagger" })
|
|
@ -1,19 +0,0 @@
|
|||
//- 💫 DOCS > API > TEXTCATEGORIZER
|
||||
|
||||
include ../_includes/_mixins
|
||||
|
||||
p
|
||||
| The model supports classification with multiple, non-mutually exclusive
|
||||
| labels. You can change the model architecture rather easily, but by
|
||||
| default, the #[code TextCategorizer] class uses a convolutional
|
||||
| neural network to assign position-sensitive vectors to each word in the
|
||||
| document. The #[code TextCategorizer] uses its own CNN model, to
|
||||
| avoid sharing weights with the other pipeline components. The document
|
||||
| tensor is then summarized by concatenating max and mean pooling, and a
|
||||
| multilayer perceptron is used to predict an output vector of length
|
||||
| #[code nr_class], before a logistic activation is applied elementwise.
|
||||
| The value of each output neuron is the probability that some class is
|
||||
| present.
|
||||
|
||||
//- This class inherits from Pipe, so this page uses the template in pipe.jade.
|
||||
!=partial("pipe", { subclass: "TextCategorizer", short: "textcat", pipeline_id: "textcat" })
|
|
@ -1,890 +0,0 @@
|
|||
|
||||
//- 💫 DOCS > API > TOKEN
|
||||
|
||||
include ../_includes/_mixins
|
||||
|
||||
p An individual token — i.e. a word, punctuation symbol, whitespace, etc.
|
||||
|
||||
+h(2, "init") Token.__init__
|
||||
+tag method
|
||||
|
||||
p Construct a #[code Token] object.
|
||||
|
||||
+aside-code("Example").
|
||||
doc = nlp(u'Give it back! He pleaded.')
|
||||
token = doc[0]
|
||||
assert token.text == u'Give'
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code vocab]
|
||||
+cell #[code Vocab]
|
||||
+cell A storage container for lexical types.
|
||||
|
||||
+row
|
||||
+cell #[code doc]
|
||||
+cell #[code Doc]
|
||||
+cell The parent document.
|
||||
|
||||
+row
|
||||
+cell #[code offset]
|
||||
+cell int
|
||||
+cell The index of the token within the document.
|
||||
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell #[code Token]
|
||||
+cell The newly constructed object.
|
||||
|
||||
+h(2, "len") Token.__len__
|
||||
+tag method
|
||||
|
||||
p The number of unicode characters in the token, i.e. #[code token.text].
|
||||
|
||||
+aside-code("Example").
|
||||
doc = nlp(u'Give it back! He pleaded.')
|
||||
token = doc[0]
|
||||
assert len(token) == 4
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell int
|
||||
+cell The number of unicode characters in the token.
|
||||
|
||||
+h(2, "set_extension") Token.set_extension
|
||||
+tag classmethod
|
||||
+tag-new(2)
|
||||
|
||||
p
|
||||
| Define a custom attribute on the #[code Token] which becomes available
|
||||
| via #[code Token._]. For details, see the documentation on
|
||||
| #[+a("/usage/processing-pipelines#custom-components-attributes") custom attributes].
|
||||
|
||||
+aside-code("Example").
|
||||
from spacy.tokens import Token
|
||||
fruit_getter = lambda token: token.text in ('apple', 'pear', 'banana')
|
||||
Token.set_extension('is_fruit', getter=fruit_getter)
|
||||
doc = nlp(u'I have an apple')
|
||||
assert doc[3]._.is_fruit
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code name]
|
||||
+cell unicode
|
||||
+cell
|
||||
| Name of the attribute to set by the extension. For example,
|
||||
| #[code 'my_attr'] will be available as #[code token._.my_attr].
|
||||
|
||||
+row
|
||||
+cell #[code default]
|
||||
+cell -
|
||||
+cell
|
||||
| Optional default value of the attribute if no getter or method
|
||||
| is defined.
|
||||
|
||||
+row
|
||||
+cell #[code method]
|
||||
+cell callable
|
||||
+cell
|
||||
| Set a custom method on the object, for example
|
||||
| #[code token._.compare(other_token)].
|
||||
|
||||
+row
|
||||
+cell #[code getter]
|
||||
+cell callable
|
||||
+cell
|
||||
| Getter function that takes the object and returns an attribute
|
||||
| value. Is called when the user accesses the #[code ._] attribute.
|
||||
|
||||
+row
|
||||
+cell #[code setter]
|
||||
+cell callable
|
||||
+cell
|
||||
| Setter function that takes the #[code Token] and a value, and
|
||||
| modifies the object. Is called when the user writes to the
|
||||
| #[code Token._] attribute.
|
||||
|
||||
+h(2, "get_extension") Token.get_extension
|
||||
+tag classmethod
|
||||
+tag-new(2)
|
||||
|
||||
p
|
||||
| Look up a previously registered extension by name. Returns a 4-tuple
|
||||
| #[code.u-break (default, method, getter, setter)] if the extension is
|
||||
| registered. Raises a #[code KeyError] otherwise.
|
||||
|
||||
+aside-code("Example").
|
||||
from spacy.tokens import Token
|
||||
Token.set_extension('is_fruit', default=False)
|
||||
extension = Token.get_extension('is_fruit')
|
||||
assert extension == (False, None, None, None)
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code name]
|
||||
+cell unicode
|
||||
+cell Name of the extension.
|
||||
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell tuple
|
||||
+cell
|
||||
| A #[code.u-break (default, method, getter, setter)] tuple of the
|
||||
| extension.
|
||||
|
||||
+h(2, "has_extension") Token.has_extension
|
||||
+tag classmethod
|
||||
+tag-new(2)
|
||||
|
||||
p Check whether an extension has been registered on the #[code Token] class.
|
||||
|
||||
+aside-code("Example").
|
||||
from spacy.tokens import Token
|
||||
Token.set_extension('is_fruit', default=False)
|
||||
assert Token.has_extension('is_fruit')
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code name]
|
||||
+cell unicode
|
||||
+cell Name of the extension to check.
|
||||
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell bool
|
||||
+cell Whether the extension has been registered.
|
||||
|
||||
+h(2, "remove_extension") Token.remove_extension
|
||||
+tag classmethod
|
||||
+tag-new("2.0.11")
|
||||
|
||||
p Remove a previously registered extension.
|
||||
|
||||
+aside-code("Example").
|
||||
from spacy.tokens import Token
|
||||
Token.set_extension('is_fruit', default=False)
|
||||
removed = Token.remove_extension('is_fruit')
|
||||
assert not Token.has_extension('is_fruit')
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code name]
|
||||
+cell unicode
|
||||
+cell Name of the extension.
|
||||
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell tuple
|
||||
+cell
|
||||
| A #[code.u-break (default, method, getter, setter)] tuple of the
|
||||
| removed extension.
|
||||
|
||||
+h(2, "check_flag") Token.check_flag
|
||||
+tag method
|
||||
|
||||
p Check the value of a boolean flag.
|
||||
|
||||
+aside-code("Example").
|
||||
from spacy.attrs import IS_TITLE
|
||||
doc = nlp(u'Give it back! He pleaded.')
|
||||
token = doc[0]
|
||||
assert token.check_flag(IS_TITLE) == True
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code flag_id]
|
||||
+cell int
|
||||
+cell The attribute ID of the flag to check.
|
||||
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell bool
|
||||
+cell Whether the flag is set.
|
||||
|
||||
+h(2, "similarity") Token.similarity
|
||||
+tag method
|
||||
+tag-model("vectors")
|
||||
|
||||
p Compute a semantic similarity estimate. Defaults to cosine over vectors.
|
||||
|
||||
+aside-code("Example").
|
||||
apples, _, oranges = nlp(u'apples and oranges')
|
||||
apples_oranges = apples.similarity(oranges)
|
||||
oranges_apples = oranges.similarity(apples)
|
||||
assert apples_oranges == oranges_apples
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell other
|
||||
+cell -
|
||||
+cell
|
||||
| The object to compare with. By default, accepts #[code Doc],
|
||||
| #[code Span], #[code Token] and #[code Lexeme] objects.
|
||||
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell float
|
||||
+cell A scalar similarity score. Higher is more similar.
|
||||
|
||||
+h(2, "nbor") Token.nbor
|
||||
+tag method
|
||||
|
||||
p Get a neighboring token.
|
||||
|
||||
+aside-code("Example").
|
||||
doc = nlp(u'Give it back! He pleaded.')
|
||||
give_nbor = doc[0].nbor()
|
||||
assert give_nbor.text == u'it'
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code i]
|
||||
+cell int
|
||||
+cell The relative position of the token to get. Defaults to #[code 1].
|
||||
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell #[code Token]
|
||||
+cell The token at position #[code self.doc[self.i+i]].
|
||||
|
||||
+h(2, "is_ancestor") Token.is_ancestor
|
||||
+tag method
|
||||
+tag-model("parse")
|
||||
|
||||
p
|
||||
| Check whether this token is a parent, grandparent, etc. of another
|
||||
| in the dependency tree.
|
||||
|
||||
+aside-code("Example").
|
||||
doc = nlp(u'Give it back! He pleaded.')
|
||||
give = doc[0]
|
||||
it = doc[1]
|
||||
assert give.is_ancestor(it)
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell descendant
|
||||
+cell #[code Token]
|
||||
+cell Another token.
|
||||
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell bool
|
||||
+cell Whether this token is the ancestor of the descendant.
|
||||
|
||||
+h(2, "ancestors") Token.ancestors
|
||||
+tag property
|
||||
+tag-model("parse")
|
||||
|
||||
p The rightmost token of this token's syntactic descendants.
|
||||
|
||||
+aside-code("Example").
|
||||
doc = nlp(u'Give it back! He pleaded.')
|
||||
it_ancestors = doc[1].ancestors
|
||||
assert [t.text for t in it_ancestors] == [u'Give']
|
||||
he_ancestors = doc[4].ancestors
|
||||
assert [t.text for t in he_ancestors] == [u'pleaded']
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row("foot")
|
||||
+cell yields
|
||||
+cell #[code Token]
|
||||
+cell
|
||||
| A sequence of ancestor tokens such that
|
||||
| #[code ancestor.is_ancestor(self)].
|
||||
|
||||
+h(2, "conjuncts") Token.conjuncts
|
||||
+tag property
|
||||
+tag-model("parse")
|
||||
|
||||
p A sequence of coordinated tokens, including the token itself.
|
||||
|
||||
+aside-code("Example").
|
||||
doc = nlp(u'I like apples and oranges')
|
||||
apples_conjuncts = doc[2].conjuncts
|
||||
assert [t.text for t in apples_conjuncts] == [u'oranges']
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row("foot")
|
||||
+cell yields
|
||||
+cell #[code Token]
|
||||
+cell A coordinated token.
|
||||
|
||||
+h(2, "children") Token.children
|
||||
+tag property
|
||||
+tag-model("parse")
|
||||
|
||||
p A sequence of the token's immediate syntactic children.
|
||||
|
||||
+aside-code("Example").
|
||||
doc = nlp(u'Give it back! He pleaded.')
|
||||
give_children = doc[0].children
|
||||
assert [t.text for t in give_children] == [u'it', u'back', u'!']
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row("foot")
|
||||
+cell yields
|
||||
+cell #[code Token]
|
||||
+cell A child token such that #[code child.head==self].
|
||||
|
||||
+h(2, "lefts") Token.lefts
|
||||
+tag property
|
||||
+tag-model("parse")
|
||||
|
||||
p
|
||||
| The leftward immediate children of the word, in the syntactic dependency
|
||||
| parse.
|
||||
|
||||
+aside-code("Example").
|
||||
doc = nlp(u'I like New York in Autumn.')
|
||||
lefts = [t.text for t in doc[3].lefts]
|
||||
assert lefts == [u'New']
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row("foot")
|
||||
+cell yields
|
||||
+cell #[code Token]
|
||||
+cell A left-child of the token.
|
||||
|
||||
+h(2, "rights") Token.rights
|
||||
+tag property
|
||||
+tag-model("parse")
|
||||
|
||||
p
|
||||
| The rightward immediate children of the word, in the syntactic
|
||||
| dependency parse.
|
||||
|
||||
+aside-code("Example").
|
||||
doc = nlp(u'I like New York in Autumn.')
|
||||
rights = [t.text for t in doc[3].rights]
|
||||
assert rights == [u'in']
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row("foot")
|
||||
+cell yields
|
||||
+cell #[code Token]
|
||||
+cell A right-child of the token.
|
||||
|
||||
+h(2, "n_lefts") Token.n_lefts
|
||||
+tag property
|
||||
+tag-model("parse")
|
||||
|
||||
p
|
||||
| The number of leftward immediate children of the word, in the syntactic
|
||||
| dependency parse.
|
||||
|
||||
+aside-code("Example").
|
||||
doc = nlp(u'I like New York in Autumn.')
|
||||
assert doc[3].n_lefts == 1
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell int
|
||||
+cell The number of left-child tokens.
|
||||
|
||||
+h(2, "n_rights") Token.n_rights
|
||||
+tag property
|
||||
+tag-model("parse")
|
||||
|
||||
p
|
||||
| The number of rightward immediate children of the word, in the syntactic
|
||||
| dependency parse.
|
||||
|
||||
+aside-code("Example").
|
||||
doc = nlp(u'I like New York in Autumn.')
|
||||
assert doc[3].n_rights == 1
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell int
|
||||
+cell The number of right-child tokens.
|
||||
|
||||
+h(2, "subtree") Token.subtree
|
||||
+tag property
|
||||
+tag-model("parse")
|
||||
|
||||
p A sequence containing the token and all the token's syntactic descendants.
|
||||
|
||||
+aside-code("Example").
|
||||
doc = nlp(u'Give it back! He pleaded.')
|
||||
give_subtree = doc[0].subtree
|
||||
assert [t.text for t in give_subtree] == [u'Give', u'it', u'back', u'!']
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row("foot")
|
||||
+cell yields
|
||||
+cell #[code Token]
|
||||
+cell A descendant token such that #[code self.is_ancestor(token) or token == self].
|
||||
|
||||
+h(2, "is_sent_start") Token.is_sent_start
|
||||
+tag property
|
||||
+tag-new(2)
|
||||
|
||||
p
|
||||
| A boolean value indicating whether the token starts a sentence.
|
||||
| #[code None] if unknown.
|
||||
|
||||
+aside-code("Example").
|
||||
doc = nlp(u'Give it back! He pleaded.')
|
||||
assert doc[4].is_sent_start
|
||||
assert not doc[5].is_sent_start
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell bool
|
||||
+cell Whether the token starts a sentence.
|
||||
|
||||
+infobox("Changed in v2.0", "⚠️")
|
||||
| As of spaCy v2.0, the #[code Token.sent_start] property is deprecated and
|
||||
| has been replaced with #[code Token.is_sent_start], which returns a
|
||||
| boolean value instead of a misleading #[code 0] for #[code False] and
|
||||
| #[code 1] for #[code True]. It also now returns #[code None] if the
|
||||
| answer is unknown, and fixes a quirk in the old logic that would always
|
||||
| set the property to #[code 0] for the first word of the document.
|
||||
|
||||
+code-wrapper
|
||||
+code-new assert doc[4].is_sent_start == True
|
||||
+code-old assert doc[4].sent_start == 1
|
||||
|
||||
+h(2, "has_vector") Token.has_vector
|
||||
+tag property
|
||||
+tag-model("vectors")
|
||||
|
||||
p
|
||||
| A boolean value indicating whether a word vector is associated with the
|
||||
| token.
|
||||
|
||||
+aside-code("Example").
|
||||
doc = nlp(u'I like apples')
|
||||
apples = doc[2]
|
||||
assert apples.has_vector
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell bool
|
||||
+cell Whether the token has a vector data attached.
|
||||
|
||||
+h(2, "vector") Token.vector
|
||||
+tag property
|
||||
+tag-model("vectors")
|
||||
|
||||
p A real-valued meaning representation.
|
||||
|
||||
+aside-code("Example").
|
||||
doc = nlp(u'I like apples')
|
||||
apples = doc[2]
|
||||
assert apples.vector.dtype == 'float32'
|
||||
assert apples.vector.shape == (300,)
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell #[code.u-break numpy.ndarray[ndim=1, dtype='float32']]
|
||||
+cell A 1D numpy array representing the token's semantics.
|
||||
|
||||
+h(2, "vector_norm") Token.vector_norm
|
||||
+tag property
|
||||
+tag-model("vectors")
|
||||
|
||||
p The L2 norm of the token's vector representation.
|
||||
|
||||
+aside-code("Example").
|
||||
doc = nlp(u'I like apples and pasta')
|
||||
apples = doc[2]
|
||||
pasta = doc[4]
|
||||
apples.vector_norm # 6.89589786529541
|
||||
pasta.vector_norm # 7.759851932525635
|
||||
assert apples.vector_norm != pasta.vector_norm
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell float
|
||||
+cell The L2 norm of the vector representation.
|
||||
|
||||
+h(2, "attributes") Attributes
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code doc]
|
||||
+cell #[code Doc]
|
||||
+cell The parent document.
|
||||
|
||||
+row
|
||||
+cell #[code sent]
|
||||
+tag-new("2.0.12")
|
||||
+cell #[code Span]
|
||||
+cell The sentence span that this token is a part of.
|
||||
|
||||
+row
|
||||
+cell #[code text]
|
||||
+cell unicode
|
||||
+cell Verbatim text content.
|
||||
|
||||
+row
|
||||
+cell #[code text_with_ws]
|
||||
+cell unicode
|
||||
+cell Text content, with trailing space character if present.
|
||||
|
||||
+row
|
||||
+cell #[code whitespace_]
|
||||
+cell unicode
|
||||
+cell Trailing space character if present.
|
||||
|
||||
+row
|
||||
+cell #[code orth]
|
||||
+cell int
|
||||
+cell ID of the verbatim text content.
|
||||
|
||||
+row
|
||||
+cell #[code orth_]
|
||||
+cell unicode
|
||||
+cell
|
||||
| Verbatim text content (identical to #[code Token.text]). Exists
|
||||
| mostly for consistency with the other attributes.
|
||||
|
||||
+row
|
||||
+cell #[code vocab]
|
||||
+cell #[code Vocab]
|
||||
+cell The vocab object of the parent #[code Doc].
|
||||
|
||||
+row
|
||||
+cell #[code doc]
|
||||
+cell #[code Doc]
|
||||
+cell The parent document.
|
||||
|
||||
+row
|
||||
+cell #[code head]
|
||||
+cell #[code Token]
|
||||
+cell The syntactic parent, or "governor", of this token.
|
||||
|
||||
+row
|
||||
+cell #[code left_edge]
|
||||
+cell #[code Token]
|
||||
+cell The leftmost token of this token's syntactic descendants.
|
||||
|
||||
+row
|
||||
+cell #[code right_edge]
|
||||
+cell #[code Token]
|
||||
+cell The rightmost token of this token's syntactic descendants.
|
||||
|
||||
+row
|
||||
+cell #[code i]
|
||||
+cell int
|
||||
+cell The index of the token within the parent document.
|
||||
|
||||
+row
|
||||
+cell #[code ent_type]
|
||||
+cell int
|
||||
+cell Named entity type.
|
||||
|
||||
+row
|
||||
+cell #[code ent_type_]
|
||||
+cell unicode
|
||||
+cell Named entity type.
|
||||
|
||||
+row
|
||||
+cell #[code ent_iob]
|
||||
+cell int
|
||||
+cell
|
||||
| IOB code of named entity tag. #[code "B"]
|
||||
| means the token begins an entity, #[code "I"] means it is inside
|
||||
| an entity, #[code "O"] means it is outside an entity, and
|
||||
| #[code ""] means no entity tag is set.
|
||||
|
||||
+row
|
||||
+cell #[code ent_iob_]
|
||||
+cell unicode
|
||||
+cell
|
||||
| IOB code of named entity tag. #[code "B"]
|
||||
| means the token begins an entity, #[code "I"] means it is inside
|
||||
| an entity, #[code "O"] means it is outside an entity, and
|
||||
| #[code ""] means no entity tag is set.
|
||||
|
||||
+row
|
||||
+cell #[code ent_id]
|
||||
+cell int
|
||||
+cell
|
||||
| ID of the entity the token is an instance of, if any. Currently
|
||||
| not used, but potentially for coreference resolution.
|
||||
|
||||
+row
|
||||
+cell #[code ent_id_]
|
||||
+cell unicode
|
||||
+cell
|
||||
| ID of the entity the token is an instance of, if any. Currently
|
||||
| not used, but potentially for coreference resolution.
|
||||
|
||||
+row
|
||||
+cell #[code lemma]
|
||||
+cell int
|
||||
+cell
|
||||
| Base form of the token, with no inflectional suffixes.
|
||||
|
||||
+row
|
||||
+cell #[code lemma_]
|
||||
+cell unicode
|
||||
+cell Base form of the token, with no inflectional suffixes.
|
||||
|
||||
+row
|
||||
+cell #[code norm]
|
||||
+cell int
|
||||
+cell
|
||||
| The token's norm, i.e. a normalised form of the token text.
|
||||
| Usually set in the language's
|
||||
| #[+a("/usage/adding-languages#tokenizer-exceptions") tokenizer exceptions] or
|
||||
| #[+a("/usage/adding-languages#norm-exceptions") norm exceptions].
|
||||
|
||||
+row
|
||||
+cell #[code norm_]
|
||||
+cell unicode
|
||||
+cell
|
||||
| The token's norm, i.e. a normalised form of the token text.
|
||||
| Usually set in the language's
|
||||
| #[+a("/usage/adding-languages#tokenizer-exceptions") tokenizer exceptions] or
|
||||
| #[+a("/usage/adding-languages#norm-exceptions") norm exceptions].
|
||||
|
||||
+row
|
||||
+cell #[code lower]
|
||||
+cell int
|
||||
+cell Lowercase form of the token.
|
||||
|
||||
+row
|
||||
+cell #[code lower_]
|
||||
+cell unicode
|
||||
+cell
|
||||
| Lowercase form of the token text. Equivalent to
|
||||
| #[code Token.text.lower()].
|
||||
|
||||
+row
|
||||
+cell #[code shape]
|
||||
+cell int
|
||||
+cell
|
||||
| Transform of the tokens's string, to show orthographic features.
|
||||
| For example, "Xxxx" or "dd".
|
||||
|
||||
+row
|
||||
+cell #[code shape_]
|
||||
+cell unicode
|
||||
+cell
|
||||
| Transform of the tokens's string, to show orthographic features.
|
||||
| For example, "Xxxx" or "dd".
|
||||
|
||||
+row
|
||||
+cell #[code prefix]
|
||||
+cell int
|
||||
+cell
|
||||
| Hash value of a length-N substring from the start of the
|
||||
| token. Defaults to #[code N=1].
|
||||
|
||||
+row
|
||||
+cell #[code prefix_]
|
||||
+cell unicode
|
||||
+cell
|
||||
| A length-N substring from the start of the token. Defaults to
|
||||
| #[code N=1].
|
||||
|
||||
+row
|
||||
+cell #[code suffix]
|
||||
+cell int
|
||||
+cell
|
||||
| Hash value of a length-N substring from the end of the token.
|
||||
| Defaults to #[code N=3].
|
||||
|
||||
+row
|
||||
+cell #[code suffix_]
|
||||
+cell unicode
|
||||
+cell
|
||||
| Length-N substring from the end of the token. Defaults to
|
||||
| #[code N=3].
|
||||
|
||||
+row
|
||||
+cell #[code is_alpha]
|
||||
+cell bool
|
||||
+cell
|
||||
| Does the token consist of alphabetic characters? Equivalent to
|
||||
| #[code token.text.isalpha()].
|
||||
|
||||
+row
|
||||
+cell #[code is_ascii]
|
||||
+cell bool
|
||||
+cell
|
||||
| Does the token consist of ASCII characters? Equivalent to
|
||||
| #[code all(ord(c) < 128 for c in token.text)].
|
||||
|
||||
+row
|
||||
+cell #[code is_digit]
|
||||
+cell bool
|
||||
+cell
|
||||
| Does the token consist of digits? Equivalent to
|
||||
| #[code token.text.isdigit()].
|
||||
|
||||
+row
|
||||
+cell #[code is_lower]
|
||||
+cell bool
|
||||
+cell
|
||||
| Is the token in lowercase? Equivalent to
|
||||
| #[code token.text.islower()].
|
||||
|
||||
+row
|
||||
+cell #[code is_upper]
|
||||
+cell bool
|
||||
+cell
|
||||
| Is the token in uppercase? Equivalent to
|
||||
| #[code token.text.isupper()].
|
||||
|
||||
+row
|
||||
+cell #[code is_title]
|
||||
+cell bool
|
||||
+cell
|
||||
| Is the token in titlecase? Equivalent to
|
||||
| #[code token.text.istitle()].
|
||||
|
||||
+row
|
||||
+cell #[code is_punct]
|
||||
+cell bool
|
||||
+cell Is the token punctuation?
|
||||
|
||||
+row
|
||||
+cell #[code is_left_punct]
|
||||
+cell bool
|
||||
+cell Is the token a left punctuation mark, e.g. #[code (]?
|
||||
|
||||
+row
|
||||
+cell #[code is_right_punct]
|
||||
+cell bool
|
||||
+cell Is the token a right punctuation mark, e.g. #[code )]?
|
||||
|
||||
+row
|
||||
+cell #[code is_space]
|
||||
+cell bool
|
||||
+cell
|
||||
| Does the token consist of whitespace characters? Equivalent to
|
||||
| #[code token.text.isspace()].
|
||||
|
||||
+row
|
||||
+cell #[code is_bracket]
|
||||
+cell bool
|
||||
+cell Is the token a bracket?
|
||||
|
||||
+row
|
||||
+cell #[code is_quote]
|
||||
+cell bool
|
||||
+cell Is the token a quotation mark?
|
||||
|
||||
+row
|
||||
+cell #[code is_currency]
|
||||
+tag-new("2.0.8")
|
||||
+cell bool
|
||||
+cell Is the token a currency symbol?
|
||||
|
||||
+row
|
||||
+cell #[code like_url]
|
||||
+cell bool
|
||||
+cell Does the token resemble a URL?
|
||||
|
||||
+row
|
||||
+cell #[code like_num]
|
||||
+cell bool
|
||||
+cell Does the token represent a number? e.g. "10.9", "10", "ten", etc.
|
||||
|
||||
+row
|
||||
+cell #[code like_email]
|
||||
+cell bool
|
||||
+cell Does the token resemble an email address?
|
||||
|
||||
+row
|
||||
+cell #[code is_oov]
|
||||
+cell bool
|
||||
+cell Is the token out-of-vocabulary?
|
||||
|
||||
+row
|
||||
+cell #[code is_stop]
|
||||
+cell bool
|
||||
+cell Is the token part of a "stop list"?
|
||||
|
||||
+row
|
||||
+cell #[code pos]
|
||||
+cell int
|
||||
+cell Coarse-grained part-of-speech.
|
||||
|
||||
+row
|
||||
+cell #[code pos_]
|
||||
+cell unicode
|
||||
+cell Coarse-grained part-of-speech.
|
||||
|
||||
+row
|
||||
+cell #[code tag]
|
||||
+cell int
|
||||
+cell Fine-grained part-of-speech.
|
||||
|
||||
+row
|
||||
+cell #[code tag_]
|
||||
+cell unicode
|
||||
+cell Fine-grained part-of-speech.
|
||||
|
||||
+row
|
||||
+cell #[code dep]
|
||||
+cell int
|
||||
+cell Syntactic dependency relation.
|
||||
|
||||
+row
|
||||
+cell #[code dep_]
|
||||
+cell unicode
|
||||
+cell Syntactic dependency relation.
|
||||
|
||||
+row
|
||||
+cell #[code lang]
|
||||
+cell int
|
||||
+cell Language of the parent document's vocabulary.
|
||||
|
||||
+row
|
||||
+cell #[code lang_]
|
||||
+cell unicode
|
||||
+cell Language of the parent document's vocabulary.
|
||||
|
||||
+row
|
||||
+cell #[code prob]
|
||||
+cell float
|
||||
+cell Smoothed log probability estimate of token's type.
|
||||
|
||||
+row
|
||||
+cell #[code idx]
|
||||
+cell int
|
||||
+cell The character offset of the token within the parent document.
|
||||
|
||||
+row
|
||||
+cell #[code sentiment]
|
||||
+cell float
|
||||
+cell
|
||||
| A scalar value indicating the positivity or negativity of the
|
||||
| token.
|
||||
|
||||
+row
|
||||
+cell #[code lex_id]
|
||||
+cell int
|
||||
+cell Sequential ID of the token's lexical type.
|
||||
|
||||
+row
|
||||
+cell #[code rank]
|
||||
+cell int
|
||||
+cell
|
||||
| Sequential ID of the token's lexical type, used to index into
|
||||
| tables, e.g. for word vectors.
|
||||
|
||||
+row
|
||||
+cell #[code cluster]
|
||||
+cell int
|
||||
+cell Brown cluster ID.
|
||||
|
||||
+row
|
||||
+cell #[code _]
|
||||
+cell #[code Underscore]
|
||||
+cell
|
||||
| User space for adding custom
|
||||
| #[+a("/usage/processing-pipelines#custom-components-attributes") attribute extensions].
|
|
@ -1,229 +0,0 @@
|
|||
//- 💫 DOCS > API > TOKENIZER
|
||||
|
||||
include ../_includes/_mixins
|
||||
|
||||
p
|
||||
| Segment text, and create #[code Doc] objects with the discovered segment
|
||||
| boundaries.
|
||||
|
||||
+h(2, "init") Tokenizer.__init__
|
||||
+tag method
|
||||
|
||||
p Create a #[code Tokenizer], to create #[code Doc] objects given unicode text.
|
||||
|
||||
+aside-code("Example").
|
||||
# Construction 1
|
||||
from spacy.tokenizer import Tokenizer
|
||||
tokenizer = Tokenizer(nlp.vocab)
|
||||
|
||||
# Construction 2
|
||||
from spacy.lang.en import English
|
||||
tokenizer = English().Defaults.create_tokenizer(nlp)
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code vocab]
|
||||
+cell #[code Vocab]
|
||||
+cell A storage container for lexical types.
|
||||
|
||||
+row
|
||||
+cell #[code rules]
|
||||
+cell dict
|
||||
+cell Exceptions and special-cases for the tokenizer.
|
||||
|
||||
+row
|
||||
+cell #[code prefix_search]
|
||||
+cell callable
|
||||
+cell
|
||||
| A function matching the signature of
|
||||
| #[code re.compile(string).search] to match prefixes.
|
||||
|
||||
+row
|
||||
+cell #[code suffix_search]
|
||||
+cell callable
|
||||
+cell
|
||||
| A function matching the signature of
|
||||
| #[code re.compile(string).search] to match suffixes.
|
||||
|
||||
+row
|
||||
+cell #[code infix_finditer]
|
||||
+cell callable
|
||||
+cell
|
||||
| A function matching the signature of
|
||||
| #[code re.compile(string).finditer] to find infixes.
|
||||
|
||||
+row
|
||||
+cell #[code token_match]
|
||||
+cell callable
|
||||
+cell A boolean function matching strings to be recognised as tokens.
|
||||
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell #[code Tokenizer]
|
||||
+cell The newly constructed object.
|
||||
|
||||
+h(2, "call") Tokenizer.__call__
|
||||
+tag method
|
||||
|
||||
p Tokenize a string.
|
||||
|
||||
+aside-code("Example").
|
||||
tokens = tokenizer(u'This is a sentence')
|
||||
assert len(tokens) == 4
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code string]
|
||||
+cell unicode
|
||||
+cell The string to tokenize.
|
||||
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell #[code Doc]
|
||||
+cell A container for linguistic annotations.
|
||||
|
||||
+h(2, "pipe") Tokenizer.pipe
|
||||
+tag method
|
||||
|
||||
p Tokenize a stream of texts.
|
||||
|
||||
+aside-code("Example").
|
||||
texts = [u'One document.', u'...', u'Lots of documents']
|
||||
for doc in tokenizer.pipe(texts, batch_size=50):
|
||||
pass
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code texts]
|
||||
+cell -
|
||||
+cell A sequence of unicode texts.
|
||||
|
||||
+row
|
||||
+cell #[code batch_size]
|
||||
+cell int
|
||||
+cell The number of texts to accumulate in an internal buffer.
|
||||
|
||||
+row
|
||||
+cell #[code n_threads]
|
||||
+cell int
|
||||
+cell
|
||||
| The number of threads to use, if the implementation supports
|
||||
| multi-threading. The default tokenizer is single-threaded.
|
||||
|
||||
+row("foot")
|
||||
+cell yields
|
||||
+cell #[code Doc]
|
||||
+cell A sequence of Doc objects, in order.
|
||||
|
||||
+h(2, "find_infix") Tokenizer.find_infix
|
||||
+tag method
|
||||
|
||||
p Find internal split points of the string.
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code string]
|
||||
+cell unicode
|
||||
+cell The string to split.
|
||||
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell list
|
||||
+cell
|
||||
| A list of #[code re.MatchObject] objects that have #[code .start()]
|
||||
| and #[code .end()] methods, denoting the placement of internal
|
||||
| segment separators, e.g. hyphens.
|
||||
|
||||
+h(2, "find_prefix") Tokenizer.find_prefix
|
||||
+tag method
|
||||
|
||||
p
|
||||
| Find the length of a prefix that should be segmented from the string, or
|
||||
| #[code None] if no prefix rules match.
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code string]
|
||||
+cell unicode
|
||||
+cell The string to segment.
|
||||
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell int
|
||||
+cell The length of the prefix if present, otherwise #[code None].
|
||||
|
||||
+h(2, "find_suffix") Tokenizer.find_suffix
|
||||
+tag method
|
||||
|
||||
p
|
||||
| Find the length of a suffix that should be segmented from the string, or
|
||||
| #[code None] if no suffix rules match.
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code string]
|
||||
+cell unicode
|
||||
+cell The string to segment.
|
||||
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell int / #[code None]
|
||||
+cell The length of the suffix if present, otherwise #[code None].
|
||||
|
||||
+h(2, "add_special_case") Tokenizer.add_special_case
|
||||
+tag method
|
||||
|
||||
p
|
||||
| Add a special-case tokenization rule. This mechanism is also used to add
|
||||
| custom tokenizer exceptions to the language data. See the usage guide
|
||||
| on #[+a("/usage/adding-languages#tokenizer-exceptions") adding languages]
|
||||
| for more details and examples.
|
||||
|
||||
+aside-code("Example").
|
||||
from spacy.attrs import ORTH, LEMMA
|
||||
case = [{"don't": [{ORTH: "do"}, {ORTH: "n't", LEMMA: "not"}]}]
|
||||
tokenizer.add_special_case(case)
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code string]
|
||||
+cell unicode
|
||||
+cell The string to specially tokenize.
|
||||
|
||||
+row
|
||||
+cell #[code token_attrs]
|
||||
+cell iterable
|
||||
+cell
|
||||
| A sequence of dicts, where each dict describes a token and its
|
||||
| attributes. The #[code ORTH] fields of the attributes must
|
||||
| exactly match the string when they are concatenated.
|
||||
|
||||
+h(2, "attributes") Attributes
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code vocab]
|
||||
+cell #[code Vocab]
|
||||
+cell The vocab object of the parent #[code Doc].
|
||||
|
||||
+row
|
||||
+cell #[code prefix_search]
|
||||
+cell -
|
||||
+cell
|
||||
| A function to find segment boundaries from the start of a
|
||||
| string. Returns the length of the segment, or #[code None].
|
||||
|
||||
+row
|
||||
+cell #[code suffix_search]
|
||||
+cell -
|
||||
+cell
|
||||
| A function to find segment boundaries from the end of a string.
|
||||
| Returns the length of the segment, or #[code None].
|
||||
|
||||
+row
|
||||
+cell #[code infix_finditer]
|
||||
+cell -
|
||||
+cell
|
||||
| A function to find internal segment separators, e.g. hyphens.
|
||||
| Returns a (possibly empty) list of #[code re.MatchObject]
|
||||
| objects.
|
|
@ -1,20 +0,0 @@
|
|||
//- 💫 DOCS > API > TOP-LEVEL
|
||||
|
||||
include ../_includes/_mixins
|
||||
|
||||
+section("spacy")
|
||||
//-+h(2, "spacy") spaCy
|
||||
//- spacy/__init__.py
|
||||
include _top-level/_spacy
|
||||
|
||||
+section("displacy")
|
||||
+h(2, "displacy", "spacy/displacy") displaCy
|
||||
include _top-level/_displacy
|
||||
|
||||
+section("util")
|
||||
+h(2, "util", "spacy/util.py") Utility functions
|
||||
include _top-level/_util
|
||||
|
||||
+section("compat")
|
||||
+h(2, "compat", "spacy/compaty.py") Compatibility functions
|
||||
include _top-level/_compat
|
|
@ -1,476 +0,0 @@
|
|||
//- 💫 DOCS > API > VECTORS
|
||||
|
||||
include ../_includes/_mixins
|
||||
|
||||
p
|
||||
| Vectors data is kept in the #[code Vectors.data] attribute, which should
|
||||
| be an instance of #[code numpy.ndarray] (for CPU vectors) or
|
||||
| #[code cupy.ndarray] (for GPU vectors). Multiple keys can be mapped to
|
||||
| the same vector, and not all of the rows in the table need to be
|
||||
| assigned – so #[code vectors.n_keys] may be greater or smaller than
|
||||
| #[code vectors.shape[0]].
|
||||
|
||||
+h(2, "init") Vectors.__init__
|
||||
+tag method
|
||||
|
||||
p
|
||||
| Create a new vector store. You can set the vector values and keys
|
||||
| directly on initialisation, or supply a #[code shape] keyword argument
|
||||
| to create an empty table you can add vectors to later.
|
||||
|
||||
+aside-code("Example").
|
||||
from spacy.vectors import Vectors
|
||||
|
||||
empty_vectors = Vectors(shape=(10000, 300))
|
||||
|
||||
data = numpy.zeros((3, 300), dtype='f')
|
||||
keys = [u'cat', u'dog', u'rat']
|
||||
vectors = Vectors(data=data, keys=keys)
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code data]
|
||||
+cell #[code.u-break ndarray[ndim=1, dtype='float32']]
|
||||
+cell The vector data.
|
||||
|
||||
+row
|
||||
+cell #[code keys]
|
||||
+cell iterable
|
||||
+cell A sequence of keys aligned with the data.
|
||||
|
||||
+row
|
||||
+cell #[code shape]
|
||||
+cell tuple
|
||||
+cell
|
||||
| Size of the table as #[code (n_entries, n_columns)], the number
|
||||
| of entries and number of columns. Not required if you're
|
||||
| initialising the object with #[code data] and #[code keys].
|
||||
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell #[code Vectors]
|
||||
+cell The newly created object.
|
||||
|
||||
+h(2, "getitem") Vectors.__getitem__
|
||||
+tag method
|
||||
|
||||
p
|
||||
| Get a vector by key. If the key is not found in the table, a
|
||||
| #[code KeyError] is raised.
|
||||
|
||||
+aside-code("Example").
|
||||
cat_id = nlp.vocab.strings[u'cat']
|
||||
cat_vector = nlp.vocab.vectors[cat_id]
|
||||
assert cat_vector == nlp.vocab[u'cat'].vector
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code key]
|
||||
+cell int
|
||||
+cell The key to get the vector for.
|
||||
|
||||
+row
|
||||
+cell returns
|
||||
+cell #[code.u-break ndarray[ndim=1, dtype='float32']]
|
||||
+cell The vector for the key.
|
||||
|
||||
+h(2, "setitem") Vectors.__setitem__
|
||||
+tag method
|
||||
|
||||
p
|
||||
| Set a vector for the given key.
|
||||
|
||||
+aside-code("Example").
|
||||
cat_id = nlp.vocab.strings[u'cat']
|
||||
vector = numpy.random.uniform(-1, 1, (300,))
|
||||
nlp.vocab.vectors[cat_id] = vector
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code key]
|
||||
+cell int
|
||||
+cell The key to set the vector for.
|
||||
|
||||
+row
|
||||
+cell #[code vector]
|
||||
+cell #[code.u-break ndarray[ndim=1, dtype='float32']]
|
||||
+cell The vector to set.
|
||||
|
||||
+h(2, "iter") Vectors.__iter__
|
||||
+tag method
|
||||
|
||||
p Iterate over the keys in the table.
|
||||
|
||||
+aside-code("Example").
|
||||
for key in nlp.vocab.vectors:
|
||||
print(key, nlp.vocab.strings[key])
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row("foot")
|
||||
+cell yields
|
||||
+cell int
|
||||
+cell A key in the table.
|
||||
|
||||
+h(2, "len") Vectors.__len__
|
||||
+tag method
|
||||
|
||||
p Return the number of vectors in the table.
|
||||
|
||||
+aside-code("Example").
|
||||
vectors = Vectors(shape=(3, 300))
|
||||
assert len(vectors) == 3
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell int
|
||||
+cell The number of vectors in the table.
|
||||
|
||||
+h(2, "contains") Vectors.__contains__
|
||||
+tag method
|
||||
|
||||
p
|
||||
| Check whether a key has been mapped to a vector entry in the table.
|
||||
|
||||
+aside-code("Example").
|
||||
cat_id = nlp.vocab.strings[u'cat']
|
||||
nlp.vectors.add(cat_id, numpy.random.uniform(-1, 1, (300,)))
|
||||
assert cat_id in vectors
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code key]
|
||||
+cell int
|
||||
+cell The key to check.
|
||||
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell bool
|
||||
+cell Whether the key has a vector entry.
|
||||
|
||||
+h(2, "add") Vectors.add
|
||||
+tag method
|
||||
|
||||
p
|
||||
| Add a key to the table, optionally setting a vector value as well. Keys
|
||||
| can be mapped to an existing vector by setting #[code row], or a new
|
||||
| vector can be added. When adding unicode keys, keep in mind that the
|
||||
| #[code Vectors] class itself has no
|
||||
| #[+api("stringstore") #[code StringStore]], so you have to store the
|
||||
| hash-to-string mapping separately. If you need to manage the strings,
|
||||
| you should use the #[code Vectors] via the
|
||||
| #[+api("vocab") #[code Vocab]] class, e.g. #[code vocab.vectors].
|
||||
|
||||
+aside-code("Example").
|
||||
vector = numpy.random.uniform(-1, 1, (300,))
|
||||
cat_id = nlp.vocab.strings[u'cat']
|
||||
nlp.vocab.vectors.add(cat_id, vector=vector)
|
||||
nlp.vocab.vectors.add(u'dog', row=0)
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code key]
|
||||
+cell unicode / int
|
||||
+cell The key to add.
|
||||
|
||||
+row
|
||||
+cell #[code vector]
|
||||
+cell #[code.u-break ndarray[ndim=1, dtype='float32']]
|
||||
+cell An optional vector to add for the key.
|
||||
|
||||
+row
|
||||
+cell #[code row]
|
||||
+cell int
|
||||
+cell An optional row number of a vector to map the key to.
|
||||
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell int
|
||||
+cell The row the vector was added to.
|
||||
|
||||
+h(2, "resize") Vectors.resize
|
||||
+tag method
|
||||
|
||||
p
|
||||
| Resize the underlying vectors array. If #[code inplace=True], the memory
|
||||
| is reallocated. This may cause other references to the data to become
|
||||
| invalid, so only use #[code inplace=True] if you're sure that's what you
|
||||
| want. If the number of vectors is reduced, keys mapped to rows that have
|
||||
| been deleted are removed. These removed items are returned as a list of
|
||||
| #[code (key, row)] tuples.
|
||||
|
||||
+aside-code("Example").
|
||||
removed = nlp.vocab.vectors.resize((10000, 300))
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code shape]
|
||||
+cell tuple
|
||||
+cell
|
||||
| A #[code (rows, dims)] tuple describing the number of rows and
|
||||
| dimensions.
|
||||
|
||||
+row
|
||||
+cell #[code inplace]
|
||||
+cell bool
|
||||
+cell Reallocate the memory.
|
||||
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell list
|
||||
+cell The removed items as a list of #[code (key, row)] tuples.
|
||||
|
||||
+h(2, "keys") Vectors.keys
|
||||
+tag method
|
||||
|
||||
p A sequence of the keys in the table.
|
||||
|
||||
+aside-code("Example").
|
||||
for key in nlp.vocab.vectors.keys():
|
||||
print(key, nlp.vocab.strings[key])
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell iterable
|
||||
+cell The keys.
|
||||
|
||||
+h(2, "values") Vectors.values
|
||||
+tag method
|
||||
|
||||
p
|
||||
| Iterate over vectors that have been assigned to at least one key. Note
|
||||
| that some vectors may be unassigned, so the number of vectors returned
|
||||
| may be less than the length of the vectors table.
|
||||
|
||||
+aside-code("Example").
|
||||
for vector in nlp.vocab.vectors.values():
|
||||
print(vector)
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row("foot")
|
||||
+cell yields
|
||||
+cell #[code.u-break ndarray[ndim=1, dtype='float32']]
|
||||
+cell A vector in the table.
|
||||
|
||||
+h(2, "items") Vectors.items
|
||||
+tag method
|
||||
|
||||
p Iterate over #[code (key, vector)] pairs, in order.
|
||||
|
||||
+aside-code("Example").
|
||||
for key, vector in nlp.vocab.vectors.items():
|
||||
print(key, nlp.vocab.strings[key], vector)
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row("foot")
|
||||
+cell yields
|
||||
+cell tuple
|
||||
+cell #[code (key, vector)] pairs, in order.
|
||||
|
||||
+h(2, "shape") Vectors.shape
|
||||
+tag property
|
||||
|
||||
p
|
||||
| Get #[code (rows, dims)] tuples of number of rows and number of
|
||||
| dimensions in the vector table.
|
||||
|
||||
+aside-code("Example").
|
||||
vectors = Vectors(shape(1, 300))
|
||||
vectors.add(u'cat', numpy.random.uniform(-1, 1, (300,)))
|
||||
rows, dims = vectors.shape
|
||||
assert rows == 1
|
||||
assert dims == 300
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell tuple
|
||||
+cell A #[code (rows, dims)] pair.
|
||||
|
||||
+h(2, "size") Vectors.size
|
||||
+tag property
|
||||
|
||||
p The vector size, i.e. #[code rows * dims].
|
||||
|
||||
+aside-code("Example").
|
||||
vectors = Vectors(shape=(500, 300))
|
||||
assert vectors.size == 150000
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell int
|
||||
+cell The vector size.
|
||||
|
||||
+h(2, "is_full") Vectors.is_full
|
||||
+tag property
|
||||
|
||||
p
|
||||
| Whether the vectors table is full and has no slots are available for new
|
||||
| keys. If a table is full, it can be resized using
|
||||
| #[+api("vectors#resize") #[code Vectors.resize]].
|
||||
|
||||
+aside-code("Example").
|
||||
vectors = Vectors(shape=(1, 300))
|
||||
vectors.add(u'cat', numpy.random.uniform(-1, 1, (300,)))
|
||||
assert vectors.is_full
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell bool
|
||||
+cell Whether the vectors table is full.
|
||||
|
||||
+h(2, "n_keys") Vectors.n_keys
|
||||
+tag property
|
||||
|
||||
p
|
||||
| Get the number of keys in the table. Note that this is the number of
|
||||
| #[em all] keys, not just unique vectors. If several keys are mapped
|
||||
| are mapped to the same vectors, they will be counted individually.
|
||||
|
||||
+aside-code("Example").
|
||||
vectors = Vectors(shape=(10, 300))
|
||||
assert len(vectors) == 10
|
||||
assert vectors.n_keys == 0
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell int
|
||||
+cell The number of all keys in the table.
|
||||
|
||||
+h(2, "from_glove") Vectors.from_glove
|
||||
+tag method
|
||||
|
||||
p
|
||||
| Load #[+a("https://nlp.stanford.edu/projects/glove/") GloVe] vectors from
|
||||
| a directory. Assumes binary format, that the vocab is in a
|
||||
| #[code vocab.txt], and that vectors are named
|
||||
| #[code vectors.{size}.[fd].bin], e.g. #[code vectors.128.f.bin] for 128d
|
||||
| float32 vectors, #[code vectors.300.d.bin] for 300d float64 (double)
|
||||
| vectors, etc. By default GloVe outputs 64-bit vectors.
|
||||
|
||||
+aside-code("Example").
|
||||
vectors = Vectors()
|
||||
vectors.from_glove('/path/to/glove_vectors')
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code path]
|
||||
+cell unicode / #[code Path]
|
||||
+cell The path to load the GloVe vectors from.
|
||||
|
||||
+h(2, "to_disk") Vectors.to_disk
|
||||
+tag method
|
||||
|
||||
p Save the current state to a directory.
|
||||
|
||||
+aside-code("Example").
|
||||
vectors.to_disk('/path/to/vectors')
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code path]
|
||||
+cell unicode / #[code Path]
|
||||
+cell
|
||||
| A path to a directory, which will be created if it doesn't exist.
|
||||
| Paths may be either strings or #[code Path]-like objects.
|
||||
|
||||
+row
|
||||
+cell #[code **exclude]
|
||||
+cell -
|
||||
+cell Named attributes to prevent from being saved.
|
||||
|
||||
+h(2, "from_disk") Vectors.from_disk
|
||||
+tag method
|
||||
|
||||
p Loads state from a directory. Modifies the object in place and returns it.
|
||||
|
||||
+aside-code("Example").
|
||||
vectors = Vectors(StringStore())
|
||||
vectors.from_disk('/path/to/vectors')
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code path]
|
||||
+cell unicode / #[code Path]
|
||||
+cell
|
||||
| A path to a directory. Paths may be either strings or
|
||||
| #[code Path]-like objects.
|
||||
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell #[code Vectors]
|
||||
+cell The modified #[code Vectors] object.
|
||||
|
||||
+h(2, "to_bytes") Vectors.to_bytes
|
||||
+tag method
|
||||
|
||||
p Serialize the current state to a binary string.
|
||||
|
||||
+aside-code("Example").
|
||||
vectors_bytes = vectors.to_bytes()
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code **exclude]
|
||||
+cell -
|
||||
+cell Named attributes to prevent from being serialized.
|
||||
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell bytes
|
||||
+cell The serialized form of the #[code Vectors] object.
|
||||
|
||||
+h(2, "from_bytes") Vectors.from_bytes
|
||||
+tag method
|
||||
|
||||
p Load state from a binary string.
|
||||
|
||||
+aside-code("Example").
|
||||
fron spacy.vectors import Vectors
|
||||
vectors_bytes = vectors.to_bytes()
|
||||
new_vectors = Vectors(StringStore())
|
||||
new_vectors.from_bytes(vectors_bytes)
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code data]
|
||||
+cell bytes
|
||||
+cell The data to load from.
|
||||
|
||||
+row
|
||||
+cell #[code **exclude]
|
||||
+cell -
|
||||
+cell Named attributes to prevent from being loaded.
|
||||
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell #[code Vectors]
|
||||
+cell The #[code Vectors] object.
|
||||
|
||||
+h(2, "attributes") Attributes
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code data]
|
||||
+cell #[code.u-break ndarray[ndim=1, dtype='float32']]
|
||||
+cell
|
||||
| Stored vectors data. #[code numpy] is used for CPU vectors,
|
||||
| #[code cupy] for GPU vectors.
|
||||
|
||||
+row
|
||||
+cell #[code key2row]
|
||||
+cell dict
|
||||
+cell
|
||||
| Dictionary mapping word hashes to rows in the
|
||||
| #[code Vectors.data] table.
|
||||
|
||||
+row
|
||||
+cell #[code keys]
|
||||
+cell #[code.u-break ndarray[ndim=1, dtype='float32']]
|
||||
+cell
|
||||
| Array keeping the keys in order, such that
|
||||
| #[code keys[vectors.key2row[key]] == key]
|
|
@ -1,411 +0,0 @@
|
|||
//- 💫 DOCS > API > VOCAB
|
||||
|
||||
include ../_includes/_mixins
|
||||
|
||||
p
|
||||
| The #[code Vocab] object provides a lookup table that allows you to
|
||||
| access #[+api("lexeme") #[code Lexeme]] objects, as well as the
|
||||
| #[+api("stringstore") #[code StringStore]]. It also owns underlying
|
||||
| C-data that is shared between #[code Doc] objects.
|
||||
|
||||
+h(2, "init") Vocab.__init__
|
||||
+tag method
|
||||
|
||||
p Create the vocabulary.
|
||||
|
||||
+aside-code("Example").
|
||||
from spacy.vocab import Vocab
|
||||
vocab = Vocab(strings=[u'hello', u'world'])
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code lex_attr_getters]
|
||||
+cell dict
|
||||
+cell
|
||||
| A dictionary mapping attribute IDs to functions to compute them.
|
||||
| Defaults to #[code None].
|
||||
|
||||
+row
|
||||
+cell #[code tag_map]
|
||||
+cell dict
|
||||
+cell
|
||||
| A dictionary mapping fine-grained tags to coarse-grained
|
||||
| parts-of-speech, and optionally morphological attributes.
|
||||
|
||||
+row
|
||||
+cell #[code lemmatizer]
|
||||
+cell object
|
||||
+cell A lemmatizer. Defaults to #[code None].
|
||||
|
||||
+row
|
||||
+cell #[code strings]
|
||||
+cell #[code StringStore] or list
|
||||
+cell
|
||||
| A #[+api("stringstore") #[code StringStore]] that maps
|
||||
| strings to hash values, and vice versa, or a list of strings.
|
||||
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell #[code Vocab]
|
||||
+cell The newly constructed object.
|
||||
|
||||
+h(2, "len") Vocab.__len__
|
||||
+tag method
|
||||
|
||||
p Get the current number of lexemes in the vocabulary.
|
||||
|
||||
+aside-code("Example").
|
||||
doc = nlp(u'This is a sentence.')
|
||||
assert len(nlp.vocab) > 0
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell int
|
||||
+cell The number of lexems in the vocabulary.
|
||||
|
||||
+h(2, "getitem") Vocab.__getitem__
|
||||
+tag method
|
||||
|
||||
p
|
||||
| Retrieve a lexeme, given an int ID or a unicode string. If a previously
|
||||
| unseen unicode string is given, a new lexeme is created and stored.
|
||||
|
||||
+aside-code("Example").
|
||||
apple = nlp.vocab.strings['apple']
|
||||
assert nlp.vocab[apple] == nlp.vocab[u'apple']
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code id_or_string]
|
||||
+cell int / unicode
|
||||
+cell The hash value of a word, or its unicode string.
|
||||
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell #[code Lexeme]
|
||||
+cell The lexeme indicated by the given ID.
|
||||
|
||||
+h(2, "iter") Vocab.__iter__
|
||||
+tag method
|
||||
|
||||
p Iterate over the lexemes in the vocabulary.
|
||||
|
||||
+aside-code("Example").
|
||||
stop_words = (lex for lex in nlp.vocab if lex.is_stop)
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row("foot")
|
||||
+cell yields
|
||||
+cell #[code Lexeme]
|
||||
+cell An entry in the vocabulary.
|
||||
|
||||
+h(2, "contains") Vocab.__contains__
|
||||
+tag method
|
||||
|
||||
p
|
||||
| Check whether the string has an entry in the vocabulary. To get the ID
|
||||
| for a given string, you need to look it up in
|
||||
| #[+api("vocab#attributes") #[code vocab.strings]].
|
||||
|
||||
+aside-code("Example").
|
||||
apple = nlp.vocab.strings['apple']
|
||||
oov = nlp.vocab.strings['dskfodkfos']
|
||||
assert apple in nlp.vocab
|
||||
assert oov not in nlp.vocab
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code string]
|
||||
+cell unicode
|
||||
+cell The ID string.
|
||||
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell bool
|
||||
+cell Whether the string has an entry in the vocabulary.
|
||||
|
||||
+h(2, "add_flag") Vocab.add_flag
|
||||
+tag method
|
||||
|
||||
p
|
||||
| Set a new boolean flag to words in the vocabulary. The #[code flag_getter]
|
||||
| function will be called over the words currently in the vocab, and then
|
||||
| applied to new words as they occur. You'll then be able to access the flag
|
||||
| value on each token, using #[code token.check_flag(flag_id)].
|
||||
|
||||
+aside-code("Example").
|
||||
def is_my_product(text):
|
||||
products = [u'spaCy', u'Thinc', u'displaCy']
|
||||
return text in products
|
||||
|
||||
MY_PRODUCT = nlp.vocab.add_flag(is_my_product)
|
||||
doc = nlp(u'I like spaCy')
|
||||
assert doc[2].check_flag(MY_PRODUCT) == True
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code flag_getter]
|
||||
+cell dict
|
||||
+cell A function #[code f(unicode) -> bool], to get the flag value.
|
||||
|
||||
+row
|
||||
+cell #[code flag_id]
|
||||
+cell int
|
||||
+cell
|
||||
| An integer between 1 and 63 (inclusive), specifying the bit at
|
||||
| which the flag will be stored. If #[code -1], the lowest
|
||||
| available bit will be chosen.
|
||||
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell int
|
||||
+cell The integer ID by which the flag value can be checked.
|
||||
|
||||
+h(2, "reset_vectors") Vocab.reset_vectors
|
||||
+tag method
|
||||
+tag-new(2)
|
||||
|
||||
p
|
||||
| Drop the current vector table. Because all vectors must be the same
|
||||
| width, you have to call this to change the size of the vectors. Only
|
||||
| one of the #[code width] and #[code shape] keyword arguments can be
|
||||
| specified.
|
||||
|
||||
+aside-code("Example").
|
||||
nlp.vocab.reset_vectors(width=300)
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code width]
|
||||
+cell int
|
||||
+cell The new width (keyword argument only).
|
||||
|
||||
+row
|
||||
+cell #[code shape]
|
||||
+cell int
|
||||
+cell The new shape (keyword argument only).
|
||||
|
||||
+h(2, "prune_vectors") Vocab.prune_vectors
|
||||
+tag method
|
||||
+tag-new(2)
|
||||
|
||||
p
|
||||
| Reduce the current vector table to #[code nr_row] unique entries. Words
|
||||
| mapped to the discarded vectors will be remapped to the closest vector
|
||||
| among those remaining. For example, suppose the original table had
|
||||
| vectors for the words:
|
||||
| #[code.u-break ['sat', 'cat', 'feline', 'reclined']]. If we prune the
|
||||
| vector table to, two rows, we would discard the vectors for "feline"
|
||||
| and "reclined". These words would then be remapped to the closest
|
||||
| remaining vector – so "feline" would have the same vector as "cat",
|
||||
| and "reclined" would have the same vector as "sat". The similarities are
|
||||
| judged by cosine. The original vectors may be large, so the cosines are
|
||||
| calculated in minibatches, to reduce memory usage.
|
||||
|
||||
+aside-code("Example").
|
||||
nlp.vocab.prune_vectors(10000)
|
||||
assert len(nlp.vocab.vectors) <= 1000
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code nr_row]
|
||||
+cell int
|
||||
+cell The number of rows to keep in the vector table.
|
||||
|
||||
+row
|
||||
+cell #[code batch_size]
|
||||
+cell int
|
||||
+cell
|
||||
| Batch of vectors for calculating the similarities. Larger batch
|
||||
| sizes might be faster, while temporarily requiring more memory.
|
||||
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell dict
|
||||
+cell
|
||||
| A dictionary keyed by removed words mapped to
|
||||
| #[code (string, score)] tuples, where #[code string] is the entry
|
||||
| the removed word was mapped to, and #[code score] the similarity
|
||||
| score between the two words.
|
||||
|
||||
+h(2, "get_vector") Vocab.get_vector
|
||||
+tag method
|
||||
+tag-new(2)
|
||||
|
||||
p
|
||||
| Retrieve a vector for a word in the vocabulary. Words can be looked up
|
||||
| by string or hash value. If no vectors data is loaded, a
|
||||
| #[code ValueError] is raised.
|
||||
|
||||
+aside-code("Example").
|
||||
nlp.vocab.get_vector(u'apple')
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code orth]
|
||||
+cell int / unicode
|
||||
+cell The hash value of a word, or its unicode string.
|
||||
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell #[code.u-break numpy.ndarray[ndim=1, dtype='float32']]
|
||||
+cell
|
||||
| A word vector. Size and shape are determined by the
|
||||
| #[code Vocab.vectors] instance.
|
||||
|
||||
+h(2, "set_vector") Vocab.set_vector
|
||||
+tag method
|
||||
+tag-new(2)
|
||||
|
||||
p
|
||||
| Set a vector for a word in the vocabulary. Words can be referenced by
|
||||
| by string or hash value.
|
||||
|
||||
+aside-code("Example").
|
||||
nlp.vocab.set_vector(u'apple', array([...]))
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code orth]
|
||||
+cell int / unicode
|
||||
+cell The hash value of a word, or its unicode string.
|
||||
|
||||
+row
|
||||
+cell #[code vector]
|
||||
+cell #[code.u-break numpy.ndarray[ndim=1, dtype='float32']]
|
||||
+cell The vector to set.
|
||||
|
||||
+h(2, "has_vector") Vocab.has_vector
|
||||
+tag method
|
||||
+tag-new(2)
|
||||
|
||||
p
|
||||
| Check whether a word has a vector. Returns #[code False] if no vectors
|
||||
| are loaded. Words can be looked up by string or hash value.
|
||||
|
||||
+aside-code("Example").
|
||||
if nlp.vocab.has_vector(u'apple'):
|
||||
vector = nlp.vocab.get_vector(u'apple')
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code orth]
|
||||
+cell int / unicode
|
||||
+cell The hash value of a word, or its unicode string.
|
||||
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell bool
|
||||
+cell Whether the word has a vector.
|
||||
|
||||
+h(2, "to_disk") Vocab.to_disk
|
||||
+tag method
|
||||
+tag-new(2)
|
||||
|
||||
p Save the current state to a directory.
|
||||
|
||||
+aside-code("Example").
|
||||
nlp.vocab.to_disk('/path/to/vocab')
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code path]
|
||||
+cell unicode or #[code Path]
|
||||
+cell
|
||||
| A path to a directory, which will be created if it doesn't exist.
|
||||
| Paths may be either strings or #[code Path]-like objects.
|
||||
|
||||
+h(2, "from_disk") Vocab.from_disk
|
||||
+tag method
|
||||
+tag-new(2)
|
||||
|
||||
p Loads state from a directory. Modifies the object in place and returns it.
|
||||
|
||||
+aside-code("Example").
|
||||
from spacy.vocab import Vocab
|
||||
vocab = Vocab().from_disk('/path/to/vocab')
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code path]
|
||||
+cell unicode or #[code Path]
|
||||
+cell
|
||||
| A path to a directory. Paths may be either strings or
|
||||
| #[code Path]-like objects.
|
||||
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell #[code Vocab]
|
||||
+cell The modified #[code Vocab] object.
|
||||
|
||||
+h(2, "to_bytes") Vocab.to_bytes
|
||||
+tag method
|
||||
|
||||
p Serialize the current state to a binary string.
|
||||
|
||||
+aside-code("Example").
|
||||
vocab_bytes = nlp.vocab.to_bytes()
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code **exclude]
|
||||
+cell -
|
||||
+cell Named attributes to prevent from being serialized.
|
||||
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell bytes
|
||||
+cell The serialized form of the #[code Vocab] object.
|
||||
|
||||
+h(2, "from_bytes") Vocab.from_bytes
|
||||
+tag method
|
||||
|
||||
p Load state from a binary string.
|
||||
|
||||
+aside-code("Example").
|
||||
fron spacy.vocab import Vocab
|
||||
vocab_bytes = nlp.vocab.to_bytes()
|
||||
vocab = Vocab()
|
||||
vocab.from_bytes(vocab_bytes)
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code bytes_data]
|
||||
+cell bytes
|
||||
+cell The data to load from.
|
||||
|
||||
+row
|
||||
+cell #[code **exclude]
|
||||
+cell -
|
||||
+cell Named attributes to prevent from being loaded.
|
||||
|
||||
+row("foot")
|
||||
+cell returns
|
||||
+cell #[code Vocab]
|
||||
+cell The #[code Vocab] object.
|
||||
|
||||
+h(2, "attributes") Attributes
|
||||
|
||||
+aside-code("Example").
|
||||
apple_id = nlp.vocab.strings['apple']
|
||||
assert type(apple_id) == int
|
||||
PERSON = nlp.vocab.strings['PERSON']
|
||||
assert type(PERSON) == int
|
||||
|
||||
+table(["Name", "Type", "Description"])
|
||||
+row
|
||||
+cell #[code strings]
|
||||
+cell #[code StringStore]
|
||||
+cell A table managing the string-to-int mapping.
|
||||
|
||||
+row
|
||||
+cell #[code vectors]
|
||||
+tag-new(2)
|
||||
+cell #[code Vectors]
|
||||
+cell A table associating word IDs to word vectors.
|
||||
|
||||
+row
|
||||
+cell #[code vectors_length]
|
||||
+cell int
|
||||
+cell Number of dimensions for each word vector.
|
|
@ -1,28 +0,0 @@
|
|||
//- 💫 CSS > BASE > ANIMATIONS
|
||||
|
||||
//- Fade in
|
||||
|
||||
@keyframes fadeIn
|
||||
from
|
||||
opacity: 0
|
||||
|
||||
to
|
||||
opacity: 1
|
||||
|
||||
|
||||
//- Element slides in from the top
|
||||
|
||||
@keyframes slideInDown
|
||||
from
|
||||
transform: translate3d(0, -100%, 0)
|
||||
visibility: visible
|
||||
|
||||
to
|
||||
transform: translate3d(0, 0, 0)
|
||||
|
||||
|
||||
//- Element rotates
|
||||
|
||||
@keyframes rotate
|
||||
to
|
||||
transform: rotate(360deg)
|
|
@ -1,27 +0,0 @@
|
|||
//- 💫 CSS > BASE > FONTS
|
||||
|
||||
// HK Grotesk
|
||||
|
||||
@font-face
|
||||
font-family: "HK Grotesk"
|
||||
font-style: normal
|
||||
font-weight: 500
|
||||
src: url("/assets/fonts/hkgrotesk-semibold.woff2") format("woff2"), url("/assets/fonts/hkgrotesk-semibold.woff") format("woff")
|
||||
|
||||
@font-face
|
||||
font-family: "HK Grotesk"
|
||||
font-style: italic
|
||||
font-weight: 500
|
||||
src: url("/assets/fonts/hkgrotesk-semibolditalic.woff2") format("woff2"), url("/assets/fonts/hkgrotesk-semibolditalic.woff") format("woff")
|
||||
|
||||
@font-face
|
||||
font-family: "HK Grotesk"
|
||||
font-style: normal
|
||||
font-weight: 600
|
||||
src: url("/assets/fonts/hkgrotesk-bold.woff2") format("woff2"), url("/assets/fonts/hkgrotesk-bold.woff") format("woff")
|
||||
|
||||
@font-face
|
||||
font-family: "HK Grotesk"
|
||||
font-style: italic
|
||||
font-weight: 600
|
||||
src: url("/assets/fonts/hkgrotesk-bolditalic.woff2") format("woff2"), url("/assets/fonts/hkgrotesk-bolditalic.woff") format("woff")
|
|
@ -1,59 +0,0 @@
|
|||
//- 💫 CSS > BASE > GRID
|
||||
|
||||
//- Grid container
|
||||
|
||||
.o-grid
|
||||
display: flex
|
||||
flex-wrap: wrap
|
||||
|
||||
@include breakpoint(min, sm)
|
||||
flex-direction: row
|
||||
align-items: stretch
|
||||
justify-content: space-between
|
||||
|
||||
&.o-grid--center
|
||||
align-items: center
|
||||
justify-content: center
|
||||
|
||||
&.o-grid--vcenter
|
||||
align-items: center
|
||||
|
||||
&.o-grid--space
|
||||
justify-content: space-between
|
||||
|
||||
&.o-grid--nowrap
|
||||
flex-wrap: nowrap
|
||||
|
||||
|
||||
//- Grid column
|
||||
|
||||
.o-grid__col
|
||||
$grid-gutter: 2rem
|
||||
|
||||
margin-top: $grid-gutter
|
||||
min-width: 0 // hack to prevent overflow
|
||||
|
||||
@include breakpoint(min, lg)
|
||||
display: flex
|
||||
flex: 0 0 100%
|
||||
flex-direction: column
|
||||
flex-wrap: wrap
|
||||
|
||||
@each $mode, $count in $grid
|
||||
&.o-grid__col--#{$mode}
|
||||
$percentage: calc(#{100% / $count} - #{$grid-gutter})
|
||||
flex: 0 0 $percentage
|
||||
max-width: $percentage
|
||||
|
||||
@include breakpoint(max, md)
|
||||
flex: 0 0 100%
|
||||
flex-flow: column wrap
|
||||
|
||||
&.o-grid__col--no-gutter
|
||||
margin-top: 0
|
||||
|
||||
// Fix overflow issue in old browsers
|
||||
|
||||
& > *
|
||||
flex-shrink: 1
|
||||
max-width: 100%
|
|
@ -1,43 +0,0 @@
|
|||
//- 💫 CSS > BASE > LAYOUT
|
||||
|
||||
//- HTML
|
||||
|
||||
html
|
||||
font-size: $type-base
|
||||
|
||||
|
||||
//- Body
|
||||
|
||||
body
|
||||
animation: fadeIn 0.25s ease
|
||||
background: $color-back
|
||||
color: $color-front
|
||||
|
||||
|
||||
//- Paragraphs
|
||||
|
||||
p
|
||||
@extend .o-block, .u-text
|
||||
|
||||
p:empty
|
||||
margin-bottom: 0
|
||||
|
||||
|
||||
//- Links
|
||||
|
||||
main p a,
|
||||
main table a,
|
||||
main > *:not(footer) li a,
|
||||
main aside a
|
||||
@extend .u-link
|
||||
|
||||
a:focus
|
||||
outline: 1px dotted $color-theme
|
||||
|
||||
|
||||
//- Selection
|
||||
|
||||
::selection
|
||||
background: $color-theme
|
||||
color: $color-back
|
||||
text-shadow: none
|
|
@ -1,249 +0,0 @@
|
|||
//- 💫 CSS > BASE > OBJECTS
|
||||
|
||||
//- Main container
|
||||
|
||||
.o-main
|
||||
padding: $nav-height 0 0 0
|
||||
max-width: 100%
|
||||
min-height: 100vh
|
||||
|
||||
@include breakpoint(min, md)
|
||||
&.o-main--sidebar
|
||||
margin-left: $sidebar-width
|
||||
|
||||
&.o-main--aside
|
||||
margin-right: $aside-width
|
||||
position: relative
|
||||
|
||||
&:after
|
||||
@include position(absolute, top, left, 0, 100%)
|
||||
@include size($aside-width, 100%)
|
||||
content: ""
|
||||
display: block
|
||||
background: $pattern
|
||||
z-index: -1
|
||||
min-height: 100vh
|
||||
|
||||
|
||||
//- Content container
|
||||
|
||||
.o-content
|
||||
padding: 3rem 7.5rem
|
||||
margin: 0 auto
|
||||
width: $content-width
|
||||
max-width: 100%
|
||||
|
||||
@include breakpoint(max, sm)
|
||||
padding: 3rem
|
||||
|
||||
|
||||
//- Footer
|
||||
|
||||
.o-footer
|
||||
position: relative
|
||||
padding: 2.5rem 0
|
||||
overflow: auto
|
||||
background: $color-subtle-light
|
||||
|
||||
.o-main &
|
||||
border-top-left-radius: $border-radius
|
||||
|
||||
|
||||
//- Blocks
|
||||
|
||||
.o-section
|
||||
width: 100%
|
||||
max-width: 100%
|
||||
|
||||
&:not(:last-child)
|
||||
margin-bottom: 7rem
|
||||
padding-bottom: 4rem
|
||||
border-bottom: 1px dotted $color-subtle
|
||||
|
||||
&.o-section--small
|
||||
overflow: auto
|
||||
|
||||
&:not(:last-child)
|
||||
margin-bottom: 3.5rem
|
||||
padding-bottom: 2rem
|
||||
|
||||
.o-block
|
||||
margin-bottom: 4rem
|
||||
|
||||
.o-block-small
|
||||
margin-bottom: 2rem
|
||||
|
||||
.o-no-block.o-no-block
|
||||
margin-bottom: 0
|
||||
|
||||
.o-card
|
||||
background: $color-back
|
||||
border-radius: $border-radius
|
||||
box-shadow: $box-shadow
|
||||
|
||||
|
||||
//- Accordion
|
||||
|
||||
.o-accordion
|
||||
&:not(:last-child)
|
||||
margin-bottom: 2rem
|
||||
|
||||
.o-accordion__content
|
||||
margin-top: 3rem
|
||||
|
||||
.o-accordion__button
|
||||
font: inherit
|
||||
border-radius: $border-radius
|
||||
width: 100%
|
||||
padding: 1.5rem 2rem
|
||||
background: $color-subtle-light
|
||||
|
||||
&[aria-expanded="true"]
|
||||
border-bottom: 3px solid $color-subtle
|
||||
border-bottom-left-radius: 0
|
||||
border-bottom-right-radius: 0
|
||||
|
||||
.o-accordion__hide
|
||||
display: none
|
||||
|
||||
&:focus:not([aria-expanded="true"])
|
||||
background: $color-subtle
|
||||
|
||||
.o-accordion__icon
|
||||
@include size(2.5rem)
|
||||
background: $color-theme
|
||||
color: $color-back
|
||||
border-radius: 50%
|
||||
padding: 0.35rem
|
||||
pointer-events: none
|
||||
|
||||
//- Box
|
||||
|
||||
.o-box
|
||||
background: $color-subtle-light
|
||||
padding: 2rem
|
||||
border-radius: $border-radius
|
||||
|
||||
.o-box__logos
|
||||
padding-bottom: 1rem
|
||||
|
||||
|
||||
//- Icons
|
||||
|
||||
.o-icon
|
||||
vertical-align: middle
|
||||
|
||||
&.o-icon--inline
|
||||
margin: 0 0.5rem 0 0.1rem
|
||||
|
||||
&.o-icon--tag
|
||||
vertical-align: bottom
|
||||
height: 100%
|
||||
position: relative
|
||||
top: 1px
|
||||
|
||||
.o-emoji
|
||||
margin-right: 0.75rem
|
||||
vertical-align: text-bottom
|
||||
|
||||
.o-badge
|
||||
border-radius: 1em
|
||||
|
||||
.o-thumb
|
||||
@include size(100px)
|
||||
overflow: hidden
|
||||
border-radius: 50%
|
||||
|
||||
&.o-thumb--small
|
||||
@include size(35px)
|
||||
|
||||
|
||||
//- SVG
|
||||
|
||||
.o-svg
|
||||
height: auto
|
||||
|
||||
|
||||
//- Inline List
|
||||
|
||||
.o-inline-list > *
|
||||
display: inline
|
||||
|
||||
&:not(:last-child)
|
||||
margin-right: 3rem
|
||||
|
||||
|
||||
//- Logo
|
||||
|
||||
.o-logo
|
||||
@include size($logo-width, $logo-height)
|
||||
fill: currentColor
|
||||
vertical-align: middle
|
||||
margin: 0 0.5rem
|
||||
|
||||
|
||||
//- Embeds
|
||||
|
||||
.o-chart
|
||||
max-width: 100%
|
||||
|
||||
.cp_embed_iframe
|
||||
border: 1px solid $color-subtle
|
||||
border-radius: $border-radius
|
||||
|
||||
|
||||
//- Responsive Video embeds
|
||||
|
||||
.o-video
|
||||
position: relative
|
||||
height: 0
|
||||
|
||||
@each $ratio1, $ratio2 in (16, 9), (4, 3)
|
||||
&.o-video--#{$ratio1}x#{$ratio2}
|
||||
padding-bottom: (100% * $ratio2 / $ratio1)
|
||||
|
||||
.o-video__iframe
|
||||
@include position(absolute, top, left, 0, 0)
|
||||
@include size(100%)
|
||||
border-radius: var(--border-radius)
|
||||
|
||||
|
||||
//- Form fields
|
||||
|
||||
.o-field
|
||||
background: $color-back
|
||||
padding: 0 0.25em
|
||||
border-radius: 2em
|
||||
border: 1px solid $color-subtle
|
||||
margin-bottom: 0.25rem
|
||||
|
||||
.o-field__input,
|
||||
.o-field__button
|
||||
padding: 0 0.35em
|
||||
|
||||
.o-field__input
|
||||
width: 100%
|
||||
|
||||
.o-field__select
|
||||
background: transparent
|
||||
color: $color-dark
|
||||
height: 1.4em
|
||||
border: none
|
||||
text-align-last: center
|
||||
width: 100%
|
||||
|
||||
//- Abbreviations
|
||||
|
||||
.o-abbr
|
||||
+breakpoint(min, md)
|
||||
cursor: help
|
||||
border-bottom: 2px dotted $color-theme
|
||||
padding-bottom: 3px
|
||||
|
||||
+breakpoint(max, sm)
|
||||
&[data-tooltip]:before
|
||||
content: none
|
||||
|
||||
&:after
|
||||
content: " (" attr(aria-label) ")"
|
||||
color: $color-subtle-dark
|
|
@ -1,103 +0,0 @@
|
|||
//- 💫 CSS > BASE > RESET
|
||||
|
||||
*, *:before, *:after
|
||||
box-sizing: border-box
|
||||
padding: 0
|
||||
margin: 0
|
||||
border: 0
|
||||
outline: 0
|
||||
|
||||
html
|
||||
font-family: sans-serif
|
||||
text-rendering: optimizeSpeed
|
||||
-ms-text-size-adjust: 100%
|
||||
-webkit-text-size-adjust: 100%
|
||||
-webkit-font-smoothing: antialiased
|
||||
-moz-osx-font-smoothing: grayscale
|
||||
|
||||
body
|
||||
margin: 0
|
||||
|
||||
article, aside, details, figcaption, figure, footer, header, main, menu, nav,
|
||||
section, summary, progress
|
||||
display: block
|
||||
|
||||
a
|
||||
background-color: transparent
|
||||
color: inherit
|
||||
text-decoration: none
|
||||
|
||||
&:active,
|
||||
&:hover
|
||||
outline: 0
|
||||
|
||||
abbr[title]
|
||||
border-bottom: none
|
||||
text-decoration: underline
|
||||
text-decoration: underline dotted
|
||||
|
||||
b, strong
|
||||
font-weight: inherit
|
||||
font-weight: bolder
|
||||
|
||||
small
|
||||
font-size: 80%
|
||||
|
||||
sub, sup
|
||||
position: relative
|
||||
font-size: 65%
|
||||
line-height: 0
|
||||
vertical-align: baseline
|
||||
|
||||
sup
|
||||
top: -0.5em
|
||||
|
||||
sub
|
||||
bottom: -0.15em
|
||||
|
||||
img
|
||||
border: 0
|
||||
height: auto
|
||||
max-width: 100%
|
||||
|
||||
svg
|
||||
max-width: 100%
|
||||
color-interpolation-filters: sRGB
|
||||
fill: currentColor
|
||||
|
||||
&:not(:root)
|
||||
overflow: hidden
|
||||
|
||||
hr
|
||||
box-sizing: content-box
|
||||
overflow: visible
|
||||
height: 0
|
||||
|
||||
pre
|
||||
overflow: auto
|
||||
|
||||
code, pre
|
||||
font-family: monospace, monospace
|
||||
font-size: 1em
|
||||
|
||||
table
|
||||
text-align: left
|
||||
width: 100%
|
||||
max-width: 100%
|
||||
border-collapse: collapse
|
||||
|
||||
td, th
|
||||
vertical-align: top
|
||||
|
||||
ul, ol
|
||||
list-style: none
|
||||
|
||||
input, button
|
||||
appearance: none
|
||||
background: transparent
|
||||
|
||||
button
|
||||
cursor: pointer
|
||||
|
||||
progress
|
||||
appearance: none
|
|
@ -1,267 +0,0 @@
|
|||
//- 💫 CSS > BASE > UTILITIES
|
||||
|
||||
//- Text
|
||||
|
||||
.u-text,
|
||||
.u-text-small,
|
||||
.u-text-tiny
|
||||
font-family: $font-primary
|
||||
|
||||
.u-text
|
||||
font-size: 1.35rem
|
||||
line-height: 1.5
|
||||
|
||||
.u-text-small
|
||||
font-size: 1.3rem
|
||||
line-height: 1.375
|
||||
|
||||
.u-text-tiny
|
||||
font-size: 1.1rem
|
||||
line-height: 1.375
|
||||
|
||||
//- Labels & Tags
|
||||
|
||||
.u-text-label
|
||||
font: normal 600 1.4rem/#{1.5} $font-secondary
|
||||
text-transform: uppercase
|
||||
|
||||
&.u-text-label--light,
|
||||
&.u-text-label--dark
|
||||
display: inline-block
|
||||
border-radius: 1em
|
||||
padding: 0 1rem 0.15rem
|
||||
|
||||
&.u-text-label--dark
|
||||
background: $color-dark
|
||||
box-shadow: inset 1px 1px 1px rgba($color-front, 0.25)
|
||||
color: $color-back
|
||||
margin: 1.5rem 0 0 2rem
|
||||
|
||||
&.u-text-label--light
|
||||
background: $color-back
|
||||
color: $color-theme
|
||||
margin-bottom: 1rem
|
||||
|
||||
.u-text-tag
|
||||
display: inline-block
|
||||
font: 600 1.1rem/#{1} $font-secondary
|
||||
background: $color-theme
|
||||
color: $color-back
|
||||
padding: 2px 6px 4px
|
||||
border-radius: 1em
|
||||
text-transform: uppercase
|
||||
vertical-align: middle
|
||||
|
||||
&.u-text-tag--spaced
|
||||
margin-left: 0.75em
|
||||
margin-right: 0.5em
|
||||
|
||||
|
||||
//- Headings
|
||||
|
||||
.u-heading
|
||||
margin-bottom: 1em
|
||||
|
||||
@include breakpoint(max, md)
|
||||
word-wrap: break-word
|
||||
|
||||
&:not(:first-child)
|
||||
padding-top: 3.5rem
|
||||
|
||||
&.u-heading--title:after
|
||||
content: ""
|
||||
display: block
|
||||
width: 10%
|
||||
min-width: 6rem
|
||||
height: 6px
|
||||
background: $color-theme
|
||||
margin-top: 3rem
|
||||
|
||||
.u-heading-0
|
||||
font: normal 600 7rem/#{1} $font-secondary
|
||||
|
||||
@include breakpoint(max, sm)
|
||||
font-size: 6rem
|
||||
|
||||
|
||||
@each $level, $size in $headings
|
||||
.u-heading-#{$level}
|
||||
font: normal 500 #{$size}rem/#{1.1} $font-secondary
|
||||
|
||||
.u-heading__teaser
|
||||
margin-top: 2rem
|
||||
font-weight: normal
|
||||
|
||||
|
||||
//- Links
|
||||
|
||||
.u-link
|
||||
color: $color-theme
|
||||
border-bottom: 1px solid
|
||||
transition: color 0.2s ease
|
||||
|
||||
&:hover
|
||||
color: $color-theme-dark
|
||||
|
||||
.u-hand
|
||||
cursor: pointer
|
||||
|
||||
.u-hide-link.u-hide-link
|
||||
border: none
|
||||
color: inherit
|
||||
|
||||
&:hover
|
||||
color: inherit
|
||||
|
||||
.u-permalink
|
||||
position: relative
|
||||
|
||||
&:before
|
||||
content: "\00b6"
|
||||
font-size: 0.9em
|
||||
font-weight: normal
|
||||
color: $color-subtle
|
||||
@include position(absolute, top, left, 0.15em, -2.85rem)
|
||||
opacity: 0
|
||||
transition: opacity 0.2s ease
|
||||
|
||||
&:hover:before
|
||||
opacity: 1
|
||||
|
||||
&:active:before
|
||||
color: $color-theme
|
||||
|
||||
&:target
|
||||
display: inline-block
|
||||
|
||||
&:before
|
||||
bottom: 0.15em
|
||||
top: initial
|
||||
|
||||
|
||||
[id]:target
|
||||
padding-top: $nav-height * 1.25
|
||||
|
||||
|
||||
|
||||
//- Layout
|
||||
|
||||
.u-width-full
|
||||
width: 100%
|
||||
|
||||
.u-float-left
|
||||
float: left
|
||||
margin-right: 1rem
|
||||
|
||||
.u-float-right
|
||||
float: right
|
||||
margin-left: 1rem
|
||||
|
||||
.u-text-center
|
||||
text-align: center
|
||||
|
||||
.u-text-right
|
||||
text-align: right
|
||||
|
||||
.u-padding
|
||||
padding: 5rem
|
||||
|
||||
.u-padding-small
|
||||
padding: 0.5em 0.75em
|
||||
|
||||
.u-padding-medium
|
||||
padding: 1.8rem
|
||||
|
||||
.u-padding-top
|
||||
padding-top: 2rem
|
||||
|
||||
.u-inline-block
|
||||
display: inline-block
|
||||
|
||||
.u-flex-full
|
||||
flex: 1
|
||||
|
||||
.u-nowrap
|
||||
white-space: nowrap
|
||||
|
||||
.u-wrap
|
||||
white-space: pre-wrap
|
||||
|
||||
.u-break.u-break
|
||||
word-wrap: break-word
|
||||
white-space: initial
|
||||
|
||||
&.u-break--all
|
||||
word-break: break-all
|
||||
|
||||
.u-no-border
|
||||
border: none
|
||||
|
||||
.u-border
|
||||
border: 1px solid $color-subtle
|
||||
border-radius: 2px
|
||||
|
||||
.u-border-dotted
|
||||
border-bottom: 1px dotted $color-subtle
|
||||
|
||||
@each $name, $color in (theme: $color-theme, dark: $color-dark, subtle: $color-subtle-dark, light: $color-back, red: $color-red, green: $color-green, yellow: $color-yellow)
|
||||
.u-color-#{$name}
|
||||
color: $color
|
||||
|
||||
.u-grayscale
|
||||
filter: grayscale(100%)
|
||||
transition: filter 0.15s ease
|
||||
user-select: none
|
||||
|
||||
&:hover
|
||||
filter: none
|
||||
|
||||
.u-pattern
|
||||
background: $pattern
|
||||
|
||||
|
||||
//- Loaders
|
||||
|
||||
.u-loading,
|
||||
[data-loading]
|
||||
$spinner-size: 75px
|
||||
$spinner-bar: 8px
|
||||
|
||||
min-height: $spinner-size * 2
|
||||
position: relative
|
||||
|
||||
& > *
|
||||
opacity: 0.35
|
||||
|
||||
&:before
|
||||
@include position(absolute, top, left, 0, 0)
|
||||
@include size($spinner-size)
|
||||
right: 0
|
||||
bottom: 0
|
||||
margin: auto
|
||||
content: ""
|
||||
border: $spinner-bar solid $color-subtle
|
||||
border-right: $spinner-bar solid $color-theme
|
||||
border-radius: 50%
|
||||
animation: rotate 1s linear infinite
|
||||
z-index: 10
|
||||
|
||||
|
||||
//- Hidden elements
|
||||
|
||||
.u-hidden,
|
||||
[v-cloak]
|
||||
display: none !important
|
||||
|
||||
@each $breakpoint in (xs, sm, md)
|
||||
.u-hidden-#{$breakpoint}.u-hidden-#{$breakpoint}
|
||||
@include breakpoint(max, $breakpoint)
|
||||
display: none
|
||||
|
||||
//- Transitions
|
||||
|
||||
.u-fade-enter-active
|
||||
transition: opacity 0.5s
|
||||
|
||||
.u-fade-enter
|
||||
opacity: 0
|
|
@ -1,43 +0,0 @@
|
|||
//- 💫 CSS > COMPONENTS > ASIDES
|
||||
|
||||
//- Aside container
|
||||
|
||||
.c-aside
|
||||
position: relative
|
||||
|
||||
|
||||
//- Aside content
|
||||
|
||||
.c-aside__content
|
||||
background: $color-front
|
||||
border-top-left-radius: $border-radius
|
||||
border-bottom-left-radius: $border-radius
|
||||
z-index: 10
|
||||
|
||||
@include breakpoint(min, md)
|
||||
@include position(absolute, top, left, -3rem, calc(100% + 5.5rem))
|
||||
width: calc(#{$aside-width} + 2rem)
|
||||
|
||||
// Banner effect
|
||||
|
||||
&:after
|
||||
$triangle-size: 2rem
|
||||
|
||||
@include position(absolute, bottom, left, -$triangle-size / 2, $border-radius / 2)
|
||||
@include size(0)
|
||||
border-color: transparent
|
||||
border-style: solid
|
||||
border-top-color: $color-dark
|
||||
border-width: $triangle-size / 2 0 0 calc(#{$triangle-size} - #{$border-radius / 2})
|
||||
content: ""
|
||||
|
||||
@include breakpoint(max, sm)
|
||||
display: block
|
||||
margin: 2rem 0
|
||||
|
||||
|
||||
//- Aside text
|
||||
|
||||
.c-aside__text
|
||||
color: $color-back
|
||||
padding: 1.5rem 2.5rem 3rem 2rem
|
|
@ -1,52 +0,0 @@
|
|||
//- 💫 CSS > COMPONENTS > BUTTONS
|
||||
|
||||
.c-button
|
||||
display: inline-block
|
||||
font-weight: bold
|
||||
padding: 0.8em 1.1em 1em
|
||||
margin-bottom: 1px
|
||||
border: 2px solid $color-theme
|
||||
border-radius: 2em
|
||||
text-align: center
|
||||
transition: background-color, color 0.25s ease
|
||||
|
||||
&:hover
|
||||
border-color: $color-theme-dark
|
||||
|
||||
&.c-button--small
|
||||
font-size: 1.1rem
|
||||
padding: 0.65rem 1.1rem 0.825rem
|
||||
|
||||
&.c-button--primary
|
||||
background: $color-theme
|
||||
color: $color-back
|
||||
|
||||
&:hover
|
||||
background: $color-theme-dark
|
||||
|
||||
&.c-button--secondary
|
||||
background: $color-back
|
||||
color: $color-theme
|
||||
|
||||
&:hover
|
||||
color: $color-theme-dark
|
||||
|
||||
&.c-button--secondary-light
|
||||
background: transparent
|
||||
color: $color-back
|
||||
border-color: $color-back
|
||||
|
||||
.c-icon-button
|
||||
@include size(35px)
|
||||
background: $color-subtle-light
|
||||
color: $color-subtle-dark
|
||||
border-radius: 50%
|
||||
padding: 0.5rem
|
||||
transition: color 0.2s ease
|
||||
|
||||
&:hover
|
||||
color: $color-theme
|
||||
|
||||
&.c-icon-button--right
|
||||
float: right
|
||||
margin-left: 3rem
|
|
@ -1,105 +0,0 @@
|
|||
//- 💫 CSS > COMPONENTS > CHAT
|
||||
|
||||
.c-chat
|
||||
@include position(fixed, top, left, 0, 60%)
|
||||
bottom: 0
|
||||
right: 0
|
||||
display: flex
|
||||
flex-flow: column nowrap
|
||||
background: $color-back
|
||||
transition: transform 0.3s cubic-bezier(0.16, 0.22, 0.22, 1.7)
|
||||
box-shadow: -0.25rem 0 1rem 0 rgba($color-front, 0.25)
|
||||
z-index: 100
|
||||
|
||||
@include breakpoint(min, md)
|
||||
left: calc(100% - #{$aside-width} - #{$aside-padding})
|
||||
|
||||
@include breakpoint(max, sm)
|
||||
left: 50%
|
||||
|
||||
@include breakpoint(max, xs)
|
||||
left: 0
|
||||
|
||||
&.is-collapsed:not(.is-loading)
|
||||
transform: translateX(110%)
|
||||
|
||||
&:before
|
||||
@include position(absolute, top, left, 1.25rem, 2rem)
|
||||
content: attr(data-title)
|
||||
font: bold 1.4rem $font-secondary
|
||||
text-transform: uppercase
|
||||
color: $color-back
|
||||
|
||||
&:after
|
||||
@include position(absolute, top, left, 0, 100%)
|
||||
content: ""
|
||||
z-index: -1
|
||||
bottom: 0
|
||||
right: -100%
|
||||
background: $color-back
|
||||
|
||||
& > iframe
|
||||
width: 100%
|
||||
flex: 1 1 calc(100% - #{$nav-height})
|
||||
border: 0
|
||||
|
||||
.gitter-chat-embed-loading-wrapper
|
||||
@include position(absolute, top, left, 0, 0)
|
||||
right: 0
|
||||
bottom: 0
|
||||
display: none
|
||||
justify-content: center
|
||||
align-items: center
|
||||
|
||||
.is-loading &
|
||||
display: flex
|
||||
|
||||
.gitter-chat-embed-action-bar,
|
||||
.gitter-chat-embed-action-bar-item
|
||||
display: flex
|
||||
|
||||
.gitter-chat-embed-action-bar
|
||||
align-items: center
|
||||
justify-content: flex-end
|
||||
background: $color-theme
|
||||
padding: 0 1rem 0 2rem
|
||||
flex: 0 0 $nav-height
|
||||
|
||||
.gitter-chat-embed-action-bar-item
|
||||
@include size(40px)
|
||||
padding: 0
|
||||
opacity: 0.75
|
||||
background-position: 50%
|
||||
background-repeat: no-repeat
|
||||
background-size: 22px 22px
|
||||
border: 0
|
||||
cursor: pointer
|
||||
transition: all 0.2s ease
|
||||
|
||||
&:focus,
|
||||
&:hover
|
||||
opacity: 1
|
||||
|
||||
&.gitter-chat-embed-action-bar-item-pop-out
|
||||
background-image: url(data:image/svg+xml;base64,PHN2ZyB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciIHdpZHRoPSIyMCIgaGVpZ2h0PSIyMCIgdmlld0JveD0iMCAwIDIwIDIwIj48cGF0aCBmaWxsPSIjZmZmIiBkPSJNMTYgMmgtOC4wMjFjLTEuMDk5IDAtMS45NzkgMC44OC0xLjk3OSAxLjk4djguMDIwYzAgMS4xIDAuOSAyIDIgMmg4YzEuMSAwIDItMC45IDItMnYtOGMwLTEuMS0wLjktMi0yLTJ6TTE2IDEyaC04di04aDh2OHpNNCAxMGgtMnY2YzAgMS4xIDAuOSAyIDIgMmg2di0yaC02di02eiI+PC9wYXRoPjwvc3ZnPg==)
|
||||
margin-right: -4px
|
||||
|
||||
&.gitter-chat-embed-action-bar-item-collapse-chat
|
||||
background-image: url(data:image/svg+xml;base64,PHN2ZyB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciIHdpZHRoPSIyNCIgaGVpZ2h0PSIyNCIgdmlld0JveD0iMCAwIDI0IDI0Ij48cGF0aCBmaWxsPSIjZmZmIiBkPSJNMTguOTg0IDYuNDIybC01LjU3OCA1LjU3OCA1LjU3OCA1LjU3OC0xLjQwNiAxLjQwNi01LjU3OC01LjU3OC01LjU3OCA1LjU3OC0xLjQwNi0xLjQwNiA1LjU3OC01LjU3OC01LjU3OC01LjU3OCAxLjQwNi0xLjQwNiA1LjU3OCA1LjU3OCA1LjU3OC01LjU3OHoiPjwvcGF0aD48L3N2Zz4=)
|
||||
|
||||
.c-chat__button
|
||||
@include position(fixed, bottom, right, 1.5rem, 1.5rem)
|
||||
z-index: 5
|
||||
color: $color-back
|
||||
background: $color-front
|
||||
border-radius: 1em
|
||||
padding: 0.5rem 1.15rem 0.35rem
|
||||
opacity: 0.7
|
||||
transition: opacity 0.2s ease
|
||||
|
||||
&:hover
|
||||
opacity: 1
|
||||
|
||||
|
||||
.gitter-open-chat-button
|
||||
display: none
|
|
@ -1,202 +0,0 @@
|
|||
//- 💫 CSS > COMPONENTS > CODE
|
||||
|
||||
//- Code block
|
||||
|
||||
.c-code-block,
|
||||
.juniper-cell
|
||||
background: $color-front
|
||||
color: darken($color-back, 20)
|
||||
padding: 0.75em 0
|
||||
border-radius: $border-radius
|
||||
overflow: auto
|
||||
width: 100%
|
||||
max-width: 100%
|
||||
white-space: pre
|
||||
direction: ltr
|
||||
|
||||
.c-code-block--has-icon
|
||||
padding: 0
|
||||
display: flex
|
||||
border-top-left-radius: 0
|
||||
border-bottom-left-radius: 0
|
||||
|
||||
.c-code-block__icon
|
||||
padding: 0 0 0 1rem
|
||||
display: flex
|
||||
justify-content: center
|
||||
align-items: center
|
||||
|
||||
&.c-code-block__icon--border
|
||||
border-left: 6px solid
|
||||
|
||||
//- Code block content
|
||||
|
||||
.c-code-block__content,
|
||||
.juniper-input,
|
||||
.jp-OutputArea
|
||||
display: block
|
||||
font: normal normal 1.1rem/#{1.9} $font-code
|
||||
padding: 1em 2em
|
||||
|
||||
.c-code-block__content[data-prompt]:before,
|
||||
content: attr(data-prompt)
|
||||
margin-right: 0.65em
|
||||
display: inline-block
|
||||
vertical-align: middle
|
||||
opacity: 0.5
|
||||
|
||||
//- Juniper
|
||||
|
||||
[data-executable]
|
||||
margin-bottom: 0
|
||||
|
||||
.juniper-cell
|
||||
border: 0
|
||||
|
||||
.juniper-input
|
||||
padding: 0
|
||||
|
||||
.juniper-output
|
||||
color: inherit
|
||||
background: inherit
|
||||
padding: 0
|
||||
|
||||
.jp-OutputArea
|
||||
&:not(:empty)
|
||||
padding: 2rem 2rem 1rem
|
||||
border-top: 1px solid $color-dark
|
||||
margin-top: 2rem
|
||||
|
||||
.entities, svg
|
||||
white-space: initial
|
||||
font-family: inherit
|
||||
|
||||
.entities
|
||||
font-size: 1.35rem
|
||||
|
||||
.jp-OutputArea pre
|
||||
font: inherit
|
||||
|
||||
.jp-OutputPrompt.jp-OutputArea-prompt
|
||||
padding-top: 0.5em
|
||||
margin-right: 1rem
|
||||
font-family: inherit
|
||||
font-weight: bold
|
||||
|
||||
.juniper-button
|
||||
@extend .u-text-label, .u-text-label--dark
|
||||
position: static
|
||||
|
||||
.juniper-wrapper
|
||||
position: relative
|
||||
|
||||
.juniper-wrapper__text
|
||||
@include position(absolute, top, right, 1.25rem, 1.25rem)
|
||||
color: $color-subtle-dark
|
||||
z-index: 10
|
||||
|
||||
//- Code
|
||||
|
||||
code, .CodeMirror, .jp-RenderedText, .jp-OutputArea
|
||||
-webkit-font-smoothing: subpixel-antialiased
|
||||
-moz-osx-font-smoothing: auto
|
||||
|
||||
|
||||
//- Inline code
|
||||
|
||||
*:not(a):not(.c-code-block) > code
|
||||
color: $color-dark
|
||||
|
||||
*:not(.c-code-block) > code
|
||||
font-size: 90%
|
||||
background-color: $color-subtle-light
|
||||
padding: 0.2rem 0.4rem
|
||||
border-radius: 0.25rem
|
||||
font-family: $font-code
|
||||
margin: 0
|
||||
box-decoration-break: clone
|
||||
white-space: nowrap
|
||||
|
||||
.c-aside__content &
|
||||
background: lighten($color-front, 10)
|
||||
color: $color-back
|
||||
text-shadow: none
|
||||
|
||||
|
||||
//- Syntax Highlighting (Prism)
|
||||
|
||||
[class*="language-"] .token
|
||||
&.comment, &.prolog, &.doctype, &.cdata, &.punctuation
|
||||
color: map-get($syntax-highlighting, comment)
|
||||
|
||||
&.property, &.tag, &.constant, &.symbol, &.deleted
|
||||
color: map-get($syntax-highlighting, tag)
|
||||
|
||||
&.boolean, &.number
|
||||
color: map-get($syntax-highlighting, number)
|
||||
|
||||
&.selector, &.attr-name, &.string, &.char, &.builtin, &.inserted
|
||||
color: map-get($syntax-highlighting, selector)
|
||||
|
||||
@at-root .language-css .token.string,
|
||||
&.operator, &.entity, &.url, &.variable
|
||||
color: map-get($syntax-highlighting, operator)
|
||||
|
||||
&.atrule, &.attr-value, &.function
|
||||
color: map-get($syntax-highlighting, function)
|
||||
|
||||
&.regex, &.important
|
||||
color: map-get($syntax-highlighting, regex)
|
||||
|
||||
&.keyword
|
||||
color: map-get($syntax-highlighting, keyword)
|
||||
|
||||
&.italic
|
||||
font-style: italic
|
||||
|
||||
//- Syntax Highlighting (CodeMirror)
|
||||
|
||||
.CodeMirror.cm-s-default
|
||||
background: $color-front
|
||||
color: darken($color-back, 20)
|
||||
|
||||
.CodeMirror-selected
|
||||
background: $color-theme
|
||||
color: $color-back
|
||||
|
||||
.CodeMirror-cursor
|
||||
border-left-color: currentColor
|
||||
|
||||
.cm-variable-2
|
||||
color: inherit
|
||||
font-style: italic
|
||||
|
||||
.cm-comment
|
||||
color: map-get($syntax-highlighting, comment)
|
||||
|
||||
.cm-keyword, .cm-builtin
|
||||
color: map-get($syntax-highlighting, keyword)
|
||||
|
||||
.cm-operator
|
||||
color: map-get($syntax-highlighting, operator)
|
||||
|
||||
.cm-string
|
||||
color: map-get($syntax-highlighting, selector)
|
||||
|
||||
.cm-number
|
||||
color: map-get($syntax-highlighting, number)
|
||||
|
||||
.cm-def
|
||||
color: map-get($syntax-highlighting, function)
|
||||
|
||||
//- Syntax highlighting (Jupyter)
|
||||
|
||||
.jp-RenderedText pre
|
||||
.ansi-cyan-fg
|
||||
color: map-get($syntax-highlighting, function)
|
||||
|
||||
.ansi-green-fg
|
||||
color: $color-green
|
||||
|
||||
.ansi-red-fg
|
||||
color: map-get($syntax-highlighting, operator)
|
|
@ -1,63 +0,0 @@
|
|||
//- 💫 CSS > COMPONENTS > LANDING
|
||||
|
||||
.c-landing
|
||||
background: $color-theme
|
||||
padding-top: $nav-height * 1.5
|
||||
width: 100%
|
||||
|
||||
.c-landing__wrapper
|
||||
background: $pattern
|
||||
width: 100%
|
||||
|
||||
.c-landing__content
|
||||
background: $pattern-overlay
|
||||
width: 100%
|
||||
min-height: 573px
|
||||
|
||||
.c-landing__headlines
|
||||
position: relative
|
||||
top: -1.5rem
|
||||
left: 1rem
|
||||
|
||||
.c-landing__title
|
||||
color: $color-back
|
||||
text-align: center
|
||||
margin-bottom: 0.75rem
|
||||
|
||||
.c-landing__blocks
|
||||
@include breakpoint(min, sm)
|
||||
position: relative
|
||||
top: -25rem
|
||||
margin-bottom: -25rem
|
||||
|
||||
.c-landing__card
|
||||
padding: 3rem 2.5rem
|
||||
|
||||
.c-landing__banner
|
||||
background: $color-theme
|
||||
|
||||
.c-landing__banner__content
|
||||
@include breakpoint(min, md)
|
||||
border: 4px solid
|
||||
padding: 1rem 6.5rem 2rem 4rem
|
||||
|
||||
|
||||
.c-landing__banner__text
|
||||
font-weight: 500
|
||||
|
||||
strong
|
||||
font-weight: 800
|
||||
|
||||
p
|
||||
font-size: 1.5rem
|
||||
|
||||
@include breakpoint(min, md)
|
||||
padding-top: 7rem
|
||||
|
||||
.c-landing__badge
|
||||
transform: rotate(7deg)
|
||||
display: block
|
||||
text-align: center
|
||||
|
||||
@include breakpoint(min, md)
|
||||
@include position(absolute, top, right, 16rem, 6rem)
|
|
@ -1,39 +0,0 @@
|
|||
//- 💫 CSS > COMPONENTS > LISTS
|
||||
|
||||
//- List Container
|
||||
|
||||
.c-list
|
||||
@each $type, $counter in (numbers: decimal, letters: upper-latin, roman: lower-roman)
|
||||
&.c-list--#{$type}
|
||||
counter-reset: li
|
||||
|
||||
.c-list__item:before
|
||||
content: counter(li, #{$counter}) '.'
|
||||
font-size: 1em
|
||||
padding-right: 1rem
|
||||
|
||||
|
||||
//- List Item
|
||||
|
||||
.c-list__item
|
||||
padding-left: 2rem
|
||||
margin-bottom: 0.5em
|
||||
margin-left: 1.25rem
|
||||
|
||||
&:before
|
||||
content: '\25CF'
|
||||
display: inline-block
|
||||
font-size: 0.6em
|
||||
font-weight: bold
|
||||
padding-right: 1em
|
||||
margin-left: -3.75rem
|
||||
text-align: right
|
||||
width: 2.5rem
|
||||
counter-increment: li
|
||||
box-sizing: content-box
|
||||
|
||||
|
||||
//- List icon
|
||||
|
||||
.c-list__icon
|
||||
margin-right: 1rem
|
|
@ -1,68 +0,0 @@
|
|||
//- 💫 CSS > COMPONENTS > MISC
|
||||
|
||||
.x-terminal
|
||||
background: $color-subtle-light
|
||||
color: $color-front
|
||||
padding: $border-radius
|
||||
border-radius: 1em
|
||||
width: 100%
|
||||
position: relative
|
||||
|
||||
&.x-terminal--small
|
||||
background: $color-dark
|
||||
color: $color-subtle
|
||||
border-radius: 4px
|
||||
margin-bottom: 4rem
|
||||
|
||||
.x-terminal__icons
|
||||
display: none
|
||||
position: absolute
|
||||
padding: 10px
|
||||
|
||||
@include breakpoint(min, sm)
|
||||
display: block
|
||||
|
||||
&:before,
|
||||
&:after,
|
||||
span
|
||||
@include size(15px)
|
||||
display: inline-block
|
||||
float: left
|
||||
border-radius: 50%
|
||||
margin-right: 10px
|
||||
|
||||
&:before
|
||||
content: ""
|
||||
background: $color-red
|
||||
|
||||
span
|
||||
background: $color-green
|
||||
|
||||
&:after
|
||||
content: ""
|
||||
background: $color-yellow
|
||||
|
||||
&.x-terminal__icons--small
|
||||
&:before,
|
||||
&:after,
|
||||
span
|
||||
@include size(10px)
|
||||
|
||||
.x-terminal__code
|
||||
margin: 0
|
||||
border: none
|
||||
border-bottom-left-radius: 5px
|
||||
border-bottom-right-radius: 5px
|
||||
width: 100%
|
||||
max-width: 100%
|
||||
white-space: pre-wrap
|
||||
|
||||
|
||||
.x-terminal__button.x-terminal__button
|
||||
@include position(absolute, bottom, right, 2.65rem, 2.6rem)
|
||||
background: $color-dark
|
||||
border-color: $color-dark
|
||||
|
||||
&:hover
|
||||
background: darken($color-dark, 5)
|
||||
border-color: darken($color-dark, 5)
|
|
@ -1,61 +0,0 @@
|
|||
//- 💫 CSS > COMPONENTS > NAVIGATION
|
||||
|
||||
.c-nav
|
||||
@include position(fixed, top, left, 0, 0)
|
||||
@include size(100%, $nav-height)
|
||||
background: $color-back
|
||||
color: $color-theme
|
||||
align-items: center
|
||||
display: flex
|
||||
justify-content: space-between
|
||||
flex-flow: row nowrap
|
||||
padding: 0 0 0 1rem
|
||||
z-index: 30
|
||||
width: 100%
|
||||
box-shadow: $box-shadow
|
||||
|
||||
&.is-fixed
|
||||
animation: slideInDown 0.5s ease-in-out
|
||||
position: fixed
|
||||
|
||||
.c-nav__menu
|
||||
@include size(100%)
|
||||
display: flex
|
||||
flex-flow: row nowrap
|
||||
border-color: inherit
|
||||
flex: 1
|
||||
|
||||
|
||||
@include breakpoint(max, sm)
|
||||
@include scroll-shadow-base($color-front)
|
||||
overflow-x: auto
|
||||
overflow-y: hidden
|
||||
-webkit-overflow-scrolling: touch
|
||||
|
||||
@include breakpoint(min, md)
|
||||
justify-content: flex-end
|
||||
|
||||
.c-nav__menu__item
|
||||
display: flex
|
||||
align-items: center
|
||||
height: 100%
|
||||
text-transform: uppercase
|
||||
font-family: $font-secondary
|
||||
font-size: 1.6rem
|
||||
font-weight: bold
|
||||
color: $color-theme
|
||||
|
||||
&:not(:first-child)
|
||||
margin-left: 2em
|
||||
|
||||
&:last-child
|
||||
@include scroll-shadow-cover(right, $color-back)
|
||||
padding-right: 2rem
|
||||
|
||||
&:first-child
|
||||
@include scroll-shadow-cover(left, $color-back)
|
||||
padding-left: 2rem
|
||||
|
||||
&.is-active
|
||||
color: $color-dark
|
||||
pointer-events: none
|
|
@ -1,100 +0,0 @@
|
|||
//- 💫 CSS > COMPONENTS > QUICKSTART
|
||||
|
||||
.c-quickstart
|
||||
border-radius: $border-radius
|
||||
display: none
|
||||
background: $color-subtle-light
|
||||
|
||||
&:not([style]) + .c-quickstart__info
|
||||
display: none
|
||||
|
||||
.c-code-block
|
||||
border-top-left-radius: 0
|
||||
border-top-right-radius: 0
|
||||
|
||||
.c-quickstart__content
|
||||
padding: 2rem 3rem
|
||||
|
||||
.c-quickstart__input
|
||||
@include size(0)
|
||||
opacity: 0
|
||||
position: absolute
|
||||
left: -9999px
|
||||
|
||||
.c-quickstart__label
|
||||
cursor: pointer
|
||||
background: $color-back
|
||||
border: 1px solid $color-subtle
|
||||
border-radius: 2px
|
||||
display: inline-block
|
||||
padding: 0.75rem 1.25rem
|
||||
margin: 0 0.5rem 0.5rem 0
|
||||
font-weight: bold
|
||||
|
||||
&:hover
|
||||
background: lighten($color-theme-light, 5)
|
||||
|
||||
.c-quickstart__input:focus + &
|
||||
border: 1px solid $color-theme
|
||||
|
||||
.c-quickstart__input--radio:checked + &
|
||||
color: $color-back
|
||||
border-color: $color-theme
|
||||
background: $color-theme
|
||||
|
||||
.c-quickstart__input--check + &:before
|
||||
content: ""
|
||||
background: $color-back
|
||||
display: inline-block
|
||||
width: 20px
|
||||
height: 20px
|
||||
border: 1px solid $color-subtle
|
||||
vertical-align: middle
|
||||
margin-right: 1rem
|
||||
cursor: pointer
|
||||
border-radius: 2px
|
||||
|
||||
.c-quickstart__input--check:checked + &:before
|
||||
background: $color-theme url(data:image/svg+xml;base64,PHN2ZyB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciIHdpZHRoPSIyNCIgaGVpZ2h0PSIyNCIgdmlld0JveD0iMCAwIDI0IDI0Ij4gICAgPHBhdGggZmlsbD0iI2ZmZiIgZD0iTTkgMTYuMTcybDEwLjU5NC0xMC41OTQgMS40MDYgMS40MDYtMTIgMTItNS41NzgtNS41NzggMS40MDYtMS40MDZ6Ii8+PC9zdmc+)
|
||||
background-size: contain
|
||||
border-color: $color-theme
|
||||
|
||||
.c-quickstart__label__meta
|
||||
font-weight: normal
|
||||
color: $color-subtle-dark
|
||||
|
||||
.c-quickstart__group
|
||||
@include breakpoint(min, md)
|
||||
display: flex
|
||||
flex-flow: row nowrap
|
||||
|
||||
&:not(:last-child)
|
||||
margin-bottom: 1rem
|
||||
|
||||
.c-quickstart__fields
|
||||
flex: 100%
|
||||
|
||||
.c-quickstart__legend
|
||||
margin-right: 2rem
|
||||
padding-top: 0.75rem
|
||||
flex: 1 1 35%
|
||||
font-weight: bold
|
||||
|
||||
.c-quickstart__line
|
||||
display: block
|
||||
|
||||
&:before
|
||||
color: $color-theme
|
||||
margin-right: 1em
|
||||
|
||||
&.c-quickstart__line--bash:before
|
||||
content: "$"
|
||||
|
||||
&.c-quickstart__line--python:before
|
||||
content: ">>>"
|
||||
|
||||
&.c-quickstart__line--divider
|
||||
padding: 1.5rem 0
|
||||
|
||||
.c-quickstart__code
|
||||
font-size: 1.4rem
|
|
@ -1,95 +0,0 @@
|
|||
//- 💫 CSS > COMPONENTS > SIDEBAR
|
||||
|
||||
//- Sidebar container
|
||||
|
||||
.c-sidebar
|
||||
overflow-y: auto
|
||||
|
||||
@include breakpoint(min, md)
|
||||
@include position(fixed, top, left, 0, 0)
|
||||
@include size($sidebar-width, calc(100vh - 3px))
|
||||
@include scroll-shadow($color-back, $color-front, $nav-height)
|
||||
flex: 0 0 $sidebar-width
|
||||
padding: calc(#{$nav-height} + 1.5rem) 0 0
|
||||
z-index: 10
|
||||
|
||||
@include breakpoint(max, sm)
|
||||
flex: 100%
|
||||
width: 100%
|
||||
margin-top: $nav-height
|
||||
display: flex
|
||||
flex-flow: row wrap
|
||||
width: 100%
|
||||
|
||||
|
||||
//- Sidebar section
|
||||
|
||||
.c-sidebar__section
|
||||
& > *
|
||||
padding: 0 2rem 0.35rem
|
||||
|
||||
@include breakpoint(max, sm)
|
||||
flex: 1 1 0
|
||||
padding: 1.25rem 0
|
||||
border-bottom: 1px solid $color-subtle
|
||||
margin: 0
|
||||
|
||||
&:not(:last-child)
|
||||
border-right: 1px solid $color-subtle
|
||||
|
||||
.c-sidebar__item
|
||||
color: $color-theme
|
||||
|
||||
&:hover
|
||||
color: $color-theme-dark
|
||||
|
||||
& > .is-active
|
||||
font-weight: bold
|
||||
color: $color-dark
|
||||
margin-top: 1rem
|
||||
|
||||
|
||||
//- Sidebar subsections
|
||||
|
||||
$crumb-bullet: 14px
|
||||
$crumb-bar: 2px
|
||||
|
||||
.c-sidebar__crumb
|
||||
display: block
|
||||
padding-top: 1rem
|
||||
padding-left: 1rem
|
||||
position: relative
|
||||
|
||||
.c-sidebar__crumb__item
|
||||
margin-bottom: $crumb-bullet / 2
|
||||
position: relative
|
||||
padding-left: 2rem
|
||||
color: $color-theme
|
||||
font-size: 1.2rem
|
||||
|
||||
&:hover
|
||||
color: $color-theme-dark
|
||||
|
||||
&:after
|
||||
@include size($crumb-bullet)
|
||||
@include position(absolute, top, left, $crumb-bullet / 4, 0)
|
||||
content: ""
|
||||
border-radius: 50%
|
||||
background: $color-theme
|
||||
z-index: 10
|
||||
|
||||
&:not(:last-child):before
|
||||
@include size($crumb-bar, 100%)
|
||||
@include position(absolute, top, left, $crumb-bullet, ($crumb-bullet - $crumb-bar) / 2)
|
||||
content: ""
|
||||
background: $color-subtle
|
||||
|
||||
&:first-child:before
|
||||
height: calc(100% + #{$crumb-bullet * 2})
|
||||
top: -$crumb-bullet / 2
|
||||
|
||||
&.is-active
|
||||
color: $color-dark
|
||||
|
||||
&:after
|
||||
background: $color-dark
|
|
@ -1,86 +0,0 @@
|
|||
//- 💫 CSS > COMPONENTS > TABLES
|
||||
|
||||
//- Table container
|
||||
|
||||
.c-table
|
||||
vertical-align: top
|
||||
|
||||
|
||||
//- Table row
|
||||
|
||||
.c-table__row
|
||||
&:nth-child(odd):not(.c-table__row--head)
|
||||
background: rgba($color-subtle-light, 0.35)
|
||||
|
||||
&.c-table__row--foot
|
||||
background: $color-theme-light
|
||||
border-top: 2px solid $color-theme
|
||||
|
||||
.c-table__cell:first-child
|
||||
@extend .u-text-label
|
||||
color: $color-theme
|
||||
|
||||
&.c-table__row--divider
|
||||
border-top: 2px solid $color-theme
|
||||
|
||||
|
||||
//- Table cell
|
||||
|
||||
.c-table__cell
|
||||
padding: 1rem
|
||||
|
||||
&:not(:last-child)
|
||||
border-right: 1px solid $color-subtle
|
||||
|
||||
&.c-table__cell--num
|
||||
text-align: right
|
||||
font-feature-settings: "tnum"
|
||||
font-variant-numeric: tabular-nums
|
||||
|
||||
& > strong
|
||||
font-feature-settings: initial
|
||||
font-variant-numeric: initial
|
||||
|
||||
|
||||
//- Table head cell
|
||||
|
||||
.c-table__head-cell
|
||||
font-weight: bold
|
||||
color: $color-theme
|
||||
padding: 1rem 0.5rem
|
||||
border-bottom: 2px solid $color-theme
|
||||
|
||||
|
||||
//- Responsive table
|
||||
//- Shadows adapted from "CSS only Responsive Tables" by David Bushell
|
||||
//- http://codepen.io/dbushell/pen/wGaamR
|
||||
|
||||
@include breakpoint(max, md)
|
||||
.c-table
|
||||
@include scroll-shadow-base($color-front)
|
||||
display: inline-block
|
||||
overflow-x: auto
|
||||
overflow-y: hidden
|
||||
width: auto
|
||||
-webkit-overflow-scrolling: touch
|
||||
|
||||
.c-table__cell,
|
||||
.c-table__head-cell
|
||||
&:first-child
|
||||
@include scroll-shadow-cover(left, $color-back)
|
||||
|
||||
&:last-child
|
||||
@include scroll-shadow-cover(right, $color-back)
|
||||
|
||||
&:first-child:last-child
|
||||
@include scroll-shadow-cover(both, $color-back)
|
||||
|
||||
.c-table__row--foot .c-table__cell
|
||||
&:first-child
|
||||
@include scroll-shadow-cover(left, lighten($color-subtle-light, 2))
|
||||
|
||||
&:last-child
|
||||
@include scroll-shadow-cover(right, lighten($color-subtle-light, 2))
|
||||
|
||||
&:first-child:last-child
|
||||
@include scroll-shadow-cover(both, lighten($color-subtle-light, 2))
|
|
@ -1,39 +0,0 @@
|
|||
//- 💫 CSS > COMPONENTS > TOOLTIPS
|
||||
|
||||
[data-tooltip]
|
||||
position: relative
|
||||
|
||||
@include breakpoint(min, sm)
|
||||
&[data-tooltip-style="code"]:before
|
||||
-webkit-font-smoothing: subpixel-antialiased
|
||||
-moz-osx-font-smoothing: auto
|
||||
padding: 0.35em 0.85em 0.45em
|
||||
font: normal 1rem/#{1.25} $font-code
|
||||
white-space: nowrap
|
||||
min-width: auto
|
||||
|
||||
&:before
|
||||
@include position(absolute, top, left, 125%, 50%)
|
||||
display: inline-block
|
||||
content: attr(data-tooltip)
|
||||
background: $color-front
|
||||
border-radius: $border-radius
|
||||
border: 1px solid rgba($color-subtle-dark, 0.5)
|
||||
color: $color-back
|
||||
font: normal 1.2rem/#{1.25} $font-primary
|
||||
text-transform: none
|
||||
text-align: left
|
||||
opacity: 0
|
||||
transform: translateX(-50%) translateY(-2px)
|
||||
transition: opacity 0.1s ease-out, transform 0.1s ease-out
|
||||
visibility: hidden
|
||||
max-width: 300px
|
||||
min-width: 200px
|
||||
padding: 0.75em 1em 1em
|
||||
z-index: 200
|
||||
white-space: pre-wrap
|
||||
|
||||
&:hover:before
|
||||
opacity: 1
|
||||
transform: translateX(-50%) translateY(0)
|
||||
visibility: visible
|
|
@ -1,80 +0,0 @@
|
|||
//- 💫 CSS > MIXINS
|
||||
|
||||
// Helper for position
|
||||
// $position - valid position value (static, absolute, fixed, relative)
|
||||
// $pos-y - position direction Y (top, bottom)
|
||||
// $pos-x - position direction X (left, right)
|
||||
// $pos-y-value - value of position Y direction
|
||||
// $pos-x-value - value of position X direction
|
||||
|
||||
@mixin position($position, $pos-y, $pos-x, $pos-y-value, $pos-x-value)
|
||||
position: $position
|
||||
#{$pos-y}: $pos-y-value
|
||||
#{$pos-x}: $pos-x-value
|
||||
|
||||
|
||||
// Helper for width and height
|
||||
// $width - width of element
|
||||
// $height - height of element (default: $width)
|
||||
|
||||
@mixin size($width, $height: $width)
|
||||
width: $width
|
||||
height: $height
|
||||
|
||||
|
||||
//- Responsive Breakpoint utility
|
||||
|
||||
@mixin breakpoint($limit, $size)
|
||||
$breakpoints-max: ( xs: map-get($breakpoints, sm) - 1, sm: map-get($breakpoints, md) - 1, md: map-get($breakpoints, lg) - 1 )
|
||||
|
||||
@if $limit == "min"
|
||||
@media(min-width: #{map-get($breakpoints, $size)})
|
||||
@content
|
||||
|
||||
@else if $limit == "max"
|
||||
@media(max-width: #{map-get($breakpoints-max, $size)})
|
||||
@content
|
||||
|
||||
|
||||
// Scroll shadows for reponsive tables
|
||||
// adapted from David Bushell, http://codepen.io/dbushell/pen/wGaamR
|
||||
// $scroll-shadow-color - color of shadow
|
||||
// $scroll-shadow-side - side to cover shadow (left or right)
|
||||
// $scroll-shadow-background - original background color to match
|
||||
|
||||
@function scroll-shadow-gradient($scroll-gradient-direction, $scroll-shadow-background)
|
||||
@return linear-gradient(to #{$scroll-gradient-direction}, rgba($scroll-shadow-background, 1) 50%, rgba($scroll-shadow-background, 0) 100%)
|
||||
|
||||
@mixin scroll-shadow-base($scroll-shadow-color, $scroll-shadow-intensity: 0.2)
|
||||
background: radial-gradient(ellipse at 0 50%, rgba($scroll-shadow-color, $scroll-shadow-intensity) 0%, rgba(0,0,0,0) 75%) 0 center, radial-gradient(ellipse at 100% 50%, rgba($scroll-shadow-color, $scroll-shadow-intensity) 0%, transparent 75%) 100% center
|
||||
background-attachment: scroll, scroll
|
||||
background-repeat: no-repeat
|
||||
background-size: 10px 100%, 10px 100%
|
||||
|
||||
@mixin scroll-shadow-cover($scroll-shadow-side, $scroll-shadow-background)
|
||||
$scroll-gradient-direction: right !default
|
||||
background-repeat: no-repeat
|
||||
|
||||
@if $scroll-shadow-side == right
|
||||
$scroll-gradient-direction: left
|
||||
background-position: 100% 0
|
||||
|
||||
@if $scroll-shadow-side == both
|
||||
background-image: scroll-shadow-gradient(left, $scroll-shadow-background), scroll-shadow-gradient(right, $scroll-shadow-background)
|
||||
background-position: 100% 0, 0 0
|
||||
background-size: 20px 100%, 20px 100%
|
||||
@else
|
||||
background-image: scroll-shadow-gradient($scroll-gradient-direction, $scroll-shadow-background)
|
||||
background-size: 20px 100%
|
||||
|
||||
// Full vertical scroll shadows
|
||||
// adapted from: https://codepen.io/laustdeleuran/pen/DBaAu
|
||||
|
||||
@mixin scroll-shadow($background-color, $shadow-color, $shadow-offset: 0, $shadow-intensity: 0.4, $cover-size: 40px, $shadow-size: 15px)
|
||||
background: linear-gradient($background-color 30%, rgba($background-color,0)) 0 $shadow-offset, linear-gradient(rgba($background-color,0), $background-color 70%) 0 100%, radial-gradient(50% 0, farthest-side, rgba($shadow-color,$shadow-intensity), rgba($shadow-color,0)) 0 $shadow-offset, radial-gradient(50% 100%,farthest-side, rgba($shadow-color,$shadow-intensity), rgba($shadow-color,0)) 0 100%
|
||||
|
||||
background: linear-gradient($background-color 30%, rgba($background-color,0)) 0 $shadow-offset, linear-gradient(rgba($background-color,0), $background-color 70%) 0 100%, radial-gradient(farthest-side at 50% 0, rgba($shadow-color,$shadow-intensity), rgba($shadow-color,0)) -20px $shadow-offset, radial-gradient(farthest-side at 50% 100%, rgba($shadow-color, $shadow-intensity), rgba($shadow-color,0)) 0 100%
|
||||
background-repeat: no-repeat
|
||||
background-color: $background-color
|
||||
background-size: 100% $cover-size, 100% $cover-size, 100% $shadow-size, 100% $shadow-size
|
||||
background-attachment: local, local, scroll, scroll
|
|
@ -1,51 +0,0 @@
|
|||
//- 💫 CSS > VARIABLES
|
||||
|
||||
// Settings and Sizes
|
||||
|
||||
$type-base: 11px
|
||||
|
||||
$nav-height: 55px
|
||||
$content-width: 1250px
|
||||
$sidebar-width: 235px
|
||||
$aside-width: 27.5vw
|
||||
$aside-padding: 25px
|
||||
$border-radius: 6px
|
||||
|
||||
$logo-width: 85px
|
||||
$logo-height: 27px
|
||||
|
||||
$grid: ( quarter: 4, third: 3, half: 2, two-thirds: 1.5, three-quarters: 1.33 )
|
||||
$breakpoints: ( sm: 768px, md: 992px, lg: 1200px )
|
||||
$headings: (1: 4.4, 2: 3.4, 3: 2.6, 4: 2.2, 5: 1.8)
|
||||
|
||||
// Fonts
|
||||
|
||||
$font-primary: -apple-system, BlinkMacSystemFont, "Segoe UI", Roboto, Helvetica, Arial, sans-serif, "Apple Color Emoji", "Segoe UI Emoji", "Segoe UI Symbol" !default
|
||||
$font-secondary: "HK Grotesk", Roboto, Helvetica, Arial, sans-serif !default
|
||||
$font-code: Menlo, Monaco, Consolas, "Liberation Mono", "Courier New", monospace !default
|
||||
|
||||
// Colors
|
||||
|
||||
$colors: ( blue: #09a3d5, green: #05b083, purple: #6542d1 )
|
||||
|
||||
$color-back: #fff !default
|
||||
$color-front: #1a1e23 !default
|
||||
$color-dark: lighten($color-front, 20) !default
|
||||
|
||||
$color-theme: map-get($colors, $theme)
|
||||
$color-theme-dark: darken(map-get($colors, $theme), 10)
|
||||
$color-theme-light: rgba($color-theme, 0.05)
|
||||
|
||||
$color-subtle: #ddd !default
|
||||
$color-subtle-light: #f6f6f6 !default
|
||||
$color-subtle-dark: #949e9b !default
|
||||
|
||||
$color-red: #ef476f
|
||||
$color-green: #7ddf64
|
||||
$color-yellow: #f4c025
|
||||
|
||||
$syntax-highlighting: ( comment: #949e9b, tag: #b084eb, number: #b084eb, selector: #ffb86c, operator: #ff2c6d, function: #35b3dc, keyword: #ff2c6d, regex: #f4c025 )
|
||||
|
||||
$pattern: $color-theme url("/assets/img/pattern_#{$theme}.jpg") center top repeat
|
||||
$pattern-overlay: transparent url("/assets/img/pattern_landing.jpg") center -138px no-repeat
|
||||
$box-shadow: 0 1px 5px rgba(0, 0, 0, 0.2)
|
|
@ -1,37 +0,0 @@
|
|||
//- 💫 STYLESHEET
|
||||
|
||||
$theme: blue !default
|
||||
|
||||
|
||||
// Variables
|
||||
|
||||
@import variables
|
||||
@import mixins
|
||||
|
||||
|
||||
// Base
|
||||
|
||||
@import _base/reset
|
||||
@import _base/fonts
|
||||
@import _base/animations
|
||||
@import _base/grid
|
||||
@import _base/layout
|
||||
@import _base/objects
|
||||
@import _base/utilities
|
||||
|
||||
|
||||
// Components
|
||||
|
||||
@import _components/asides
|
||||
@import _components/buttons
|
||||
@import _components/chat
|
||||
@import _components/code
|
||||
@import _components/landing
|
||||
@import _components/lists
|
||||
@import _components/misc
|
||||
@import _components/navigation
|
||||
@import _components/progress
|
||||
@import _components/sidebar
|
||||
@import _components/tables
|
||||
@import _components/quickstart
|
||||
@import _components/tooltips
|
|
@ -1,4 +0,0 @@
|
|||
//- 💫 STYLESHEET (GREEN)
|
||||
|
||||
$theme: green
|
||||
@import style
|
|
@ -1,4 +0,0 @@
|
|||
//- 💫 STYLESHEET (PURPLE)
|
||||
|
||||
$theme: purple
|
||||
@import style
|
Before Width: | Height: | Size: 1.1 KiB |
|
@ -1 +0,0 @@
|
|||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 308.5 595.3 213"><path fill="#09a3d5" d="M73.7 395.2c-13.5-1.6-14.5-19.7-31.8-18.1-8.4 0-16.2 3.5-16.2 11.2 0 11.6 17.9 12.7 28.7 15.6 18.4 5.6 36.2 9.4 36.2 29.4 0 25.4-19.9 34.2-46.2 34.2-22 0-44.3-7.8-44.3-28 0-5.6 5.4-10 10.6-10 6.6 0 8.9 2.8 11.2 7.4 5.1 9 10.8 13.8 25 13.8 9 0 18.2-3.4 18.2-11.2 0-11.1-11.3-13.5-23-16.2-20.7-5.8-38.5-8.8-40.6-31.8-2.2-39.2 79.5-40.7 84.2-6.3-.1 6.2-5.9 10-12 10zm97.2-34.4c28.7 0 45 24 45 53.6 0 29.7-15.8 53.6-45 53.6-16.2 0-26.3-6.9-33.6-17.5v39.2c0 11.8-3.8 17.5-12.4 17.5-10.5 0-12.4-6.7-12.4-17.5v-114c0-9.3 3.9-15 12.4-15 8 0 12.4 6.3 12.4 15v3.2c8.1-10.2 17.4-18.1 33.6-18.1zm-6.8 86.8c16.8 0 24.3-15.5 24.3-33.6 0-17.7-7.6-33.6-24.3-33.6-17.5 0-25.6 14.4-25.6 33.6 0 18.7 8.2 33.6 25.6 33.6zm71.3-58.8c0-20.6 23.7-28 46.7-28 32.3 0 45.6 9.4 45.6 40.6v30c0 7.1 4.4 21.3 4.4 25.6 0 6.5-6 10.6-12.4 10.6-7.1 0-12.4-8.4-16.2-14.4-10.5 8.4-21.6 14.4-38.6 14.4-18.8 0-33.6-11.1-33.6-29.4 0-16.2 11.6-25.5 25.6-28.7 0 .1 45-10.6 45-10.7 0-13.8-4.9-19.9-19.4-19.9-12.8 0-19.3 3.5-24.3 11.2-4 5.8-3.5 9.3-11.2 9.3-6.2-.1-11.6-4.3-11.6-10.6zm38.4 61.9c19.7 0 28-10.4 28-31.1v-4.4c-5.3 1.8-26.7 7.1-32.5 8-6.2 1.2-12.4 5.8-12.4 13.1.2 8 8.4 14.4 16.9 14.4zm144.7-129c27.8 0 57.9 16.6 57.9 43 0 6.8-5.1 12.4-11.8 12.4-9.1 0-10.4-4.9-14.4-11.8-6.7-12.3-14.6-20.5-31.8-20.5-26.6-.2-38.5 22.6-38.5 51 0 28.6 9.9 49.2 37.4 49.2 18.3 0 28.4-10.6 33.6-24.3 2.1-6.3 5.9-12.4 13.8-12.4 6.2 0 12.4 6.3 12.4 13.1 0 28-28.6 47.4-58 47.4-32.2 0-50.4-13.6-60.4-36.2-4.9-10.8-8-22-8-37.4-.2-43.4 25.1-73.5 67.8-73.5zm159 39.1c7.1 0 11.2 4.6 11.2 11.8 0 2.9-2.3 8.7-3.2 11.8l-34.2 89.9c-7.6 19.5-13.3 33-39.2 33-12.3 0-23-1.1-23-11.8 0-6.2 4.7-9.3 11.2-9.3 1.2 0 3.2.6 4.4.6 1.9 0 3.2.6 4.4.6 13 0 14.8-13.3 19.4-22.5l-33-81.7c-1.9-4.4-3.2-7.4-3.2-10 0-7.2 5.6-12.4 13.1-12.4 8.4 0 11.7 6.6 13.8 13.8l21.8 64.8 21.8-59.9c3.3-9.3 3.6-18.7 14.7-18.7z"/></svg>
|
Before Width: | Height: | Size: 1.9 KiB |
Before Width: | Height: | Size: 225 KiB |
Before Width: | Height: | Size: 227 KiB |
Before Width: | Height: | Size: 182 KiB |
Before Width: | Height: | Size: 204 KiB |
Before Width: | Height: | Size: 180 KiB |
Before Width: | Height: | Size: 108 KiB |
Before Width: | Height: | Size: 32 KiB |
Before Width: | Height: | Size: 16 KiB |
Before Width: | Height: | Size: 31 KiB |
Before Width: | Height: | Size: 32 KiB |
Before Width: | Height: | Size: 8.5 KiB |
Before Width: | Height: | Size: 374 KiB |