spaCy/website/docs/api/top-level.md

---
title: Top-level Functions
menu:
  - ['spacy', 'spacy']
  - ['displacy', 'displacy']
  - ['Utility Functions', 'util']
  - ['Compatibility', 'compat']
---

## spaCy {#spacy hidden="true"}

### spacy.load {#spacy.load tag="function" model="any"}

Load a model via its [shortcut link](/usage/models#usage), the name of an
installed [model package](/usage/training#models-generating), a unicode path or
a `Path`-like object. spaCy will try resolving the load argument in this order.
If a model is loaded from a shortcut link or package name, spaCy will assume
it's a Python package and import it and call the model's own `load()` method. If
a model is loaded from a path, spaCy will assume it's a data directory, read the
language and pipeline settings off the meta.json and initialize the `Language`
class. The data will be loaded in via
[`Language.from_disk`](/api/language#from_disk).

> #### Example
>
> ```python
> nlp = spacy.load("en") # shortcut link
> nlp = spacy.load("en_core_web_sm") # package
> nlp = spacy.load("/path/to/en") # unicode path
> nlp = spacy.load(Path("/path/to/en")) # pathlib Path
>
> nlp = spacy.load("en_core_web_sm", disable=["parser", "tagger"])
> ```

| Name        | Type             | Description                                                                       |
| ----------- | ---------------- | --------------------------------------------------------------------------------- |
| `name`      | unicode / `Path` | Model to load, i.e. shortcut link, package name or path.                          |
| `disable`   | list             | Names of pipeline components to [disable](/usage/processing-pipelines#disabling). |
| **RETURNS** | `Language`       | A `Language` object with the loaded model.                                        |

Essentially, `spacy.load()` is a convenience wrapper that reads the language ID
and pipeline components from a model's `meta.json`, initializes the `Language`
class, loads in the model data and returns it.

```python
### Abstract example
cls = util.get_lang_class(lang)         #  get language for ID, e.g. 'en'
nlp = cls()                             #  initialise the language
for name in pipeline: component = nlp.create_pipe(name)   #  create each pipeline component nlp.add_pipe(component)             #  add component to pipeline
nlp.from_disk(model_data_path)          #  load in model data
```

<Infobox title="Changed in v2.0" variant="warning">

As of spaCy 2.0, the `path` keyword argument is deprecated. spaCy will also
raise an error if no model could be loaded and never just return an empty
`Language` object. If you need a blank language, you can use the new function
[`spacy.blank()`](/api/top-level#spacy.blank) or import the class explicitly,
e.g. `from spacy.lang.en import English`.

```diff
- nlp = spacy.load("en", path="/model")
+ nlp = spacy.load("/model")
```

</Infobox>

### spacy.blank {#spacy.blank tag="function" new="2"}

Create a blank model of a given language class. This function is the twin of
`spacy.load()`.

> #### Example
>
> ```python
> nlp_en = spacy.blank("en")
> nlp_de = spacy.blank("de")
> ```

| Name        | Type       | Description                                                                                      |
| ----------- | ---------- | ------------------------------------------------------------------------------------------------ |
| `name`      | unicode    | [ISO code](https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes) of the language class to load. |
| `disable`   | list       | Names of pipeline components to [disable](/usage/processing-pipelines#disabling).                |
| **RETURNS** | `Language` | An empty `Language` object of the appropriate subclass.                                          |

#### spacy.info {#spacy.info tag="function"}

The same as the [`info` command](/api/cli#info). Pretty-print information about
your installation, models and local setup from within spaCy. To get the model
meta data as a dictionary instead, you can use the `meta` attribute on your
`nlp` object with a loaded model, e.g. `nlp.meta`.

> #### Example
>
> ```python
> spacy.info()
> spacy.info("en")
> spacy.info("de", markdown=True)
> ```

| Name       | Type    | Description                                                   |
| ---------- | ------- | ------------------------------------------------------------- |
| `model`    | unicode | A model, i.e. shortcut link, package name or path (optional). |
| `markdown` | bool    | Print information as Markdown.                                |

### spacy.explain {#spacy.explain tag="function"}

Get a description for a given POS tag, dependency label or entity type. For a
list of available terms, see
[`glossary.py`](https://github.com/explosion/spaCy/tree/master/spacy/glossary.py).

> #### Example
>
> ```python
> spacy.explain("NORP")
> # Nationalities or religious or political groups
>
> doc = nlp("Hello world")
> for word in doc:
>    print(word.text, word.tag_, spacy.explain(word.tag_))
> # Hello UH interjection
> # world NN noun, singular or mass
> ```

| Name        | Type    | Description                                              |
| ----------- | ------- | -------------------------------------------------------- |
| `term`      | unicode | Term to explain.                                         |
| **RETURNS** | unicode | The explanation, or `None` if not found in the glossary. |

### spacy.prefer_gpu {#spacy.prefer_gpu tag="function" new="2.0.14"}

Allocate data and perform operations on [GPU](/usage/#gpu), if available. If
data has already been allocated on CPU, it will not be moved. Ideally, this
function should be called right after importing spaCy and _before_ loading any
models.

> #### Example
>
> ```python
> import spacy
> activated = spacy.prefer_gpu()
> nlp = spacy.load("en_core_web_sm")
> ```

| Name        | Type | Description                    |
| ----------- | ---- | ------------------------------ |
| **RETURNS** | bool | Whether the GPU was activated. |

### spacy.require_gpu {#spacy.require_gpu tag="function" new="2.0.14"}

Allocate data and perform operations on [GPU](/usage/#gpu). Will raise an error
if no GPU is available. If data has already been allocated on CPU, it will not
be moved. Ideally, this function should be called right after importing spaCy
and _before_ loading any models.

> #### Example
>
> ```python
> import spacy
> spacy.require_gpu()
> nlp = spacy.load("en_core_web_sm")
> ```

| Name        | Type | Description |
| ----------- | ---- | ----------- |
| **RETURNS** | bool | `True`      |

## displaCy {#displacy source="spacy/displacy"}

As of v2.0, spaCy comes with a built-in visualization suite. For more info and
examples, see the usage guide on [visualizing spaCy](/usage/visualizers).

### displacy.serve {#displacy.serve tag="method" new="2"}

Serve a dependency parse tree or named entity visualization to view it in your
browser. Will run a simple web server.

> #### Example
>
> ```python
> import spacy
> from spacy import displacy
> nlp = spacy.load("en_core_web_sm")
> doc1 = nlp("This is a sentence.")
> doc2 = nlp("This is another sentence.")
> displacy.serve([doc1, doc2], style="dep")
> ```

| Name      | Type                | Description                                                                                                                          | Default     |
| --------- | ------------------- | ------------------------------------------------------------------------------------------------------------------------------------ | ----------- |
| `docs`    | list, `Doc`, `Span` | Document(s) to visualize.                                                                                                            |
| `style`   | unicode             | Visualization style, `'dep'` or `'ent'`.                                                                                             | `'dep'`     |
| `page`    | bool                | Render markup as full HTML page.                                                                                                     | `True`      |
| `minify`  | bool                | Minify HTML markup.                                                                                                                  | `False`     |
| `options` | dict                | [Visualizer-specific options](#displacy_options), e.g. colors.                                                                       | `{}`        |
| `manual`  | bool                | Don't parse `Doc` and instead, expect a dict or list of dicts. [See here](/usage/visualizers#manual-usage) for formats and examples. | `False`     |
| `port`    | int                 | Port to serve visualization.                                                                                                         | `5000`      |
| `host`    | unicode             | Host to serve visualization.                                                                                                         | `'0.0.0.0'` |

### displacy.render {#displacy.render tag="method" new="2"}

Render a dependency parse tree or named entity visualization.

> #### Example
>
> ```python
> import spacy
> from spacy import displacy
> nlp = spacy.load("en_core_web_sm")
> doc = nlp("This is a sentence.")
> html = displacy.render(doc, style="dep")
> ```

| Name        | Type                | Description                                                                                                                                               | Default |
| ----------- | ------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------- | ------- |
| `docs`      | list, `Doc`, `Span` | Document(s) to visualize.                                                                                                                                 |
| `style`     | unicode             | Visualization style, `'dep'` or `'ent'`.                                                                                                                  | `'dep'` |
| `page`      | bool                | Render markup as full HTML page.                                                                                                                          | `False` |
| `minify`    | bool                | Minify HTML markup.                                                                                                                                       | `False` |
| `jupyter`   | bool                | Explicitly enable or disable "[Jupyter](http://jupyter.org/) mode" to return markup ready to be rendered in a notebook. Detected automatically if `None`. | `None`  |
| `options`   | dict                | [Visualizer-specific options](#displacy_options), e.g. colors.                                                                                            | `{}`    |
| `manual`    | bool                | Don't parse `Doc` and instead, expect a dict or list of dicts. [See here](/usage/visualizers#manual-usage) for formats and examples.                      | `False` |
| **RETURNS** | unicode             | Rendered HTML markup.                                                                                                                                     |

### Visualizer options {#displacy_options}

The `options` argument lets you specify additional settings for each visualizer.
If a setting is not present in the options, the default value will be used.

#### Dependency Visualizer options {#options-dep}

> #### Example
>
> ```python
> options = {"compact": True, "color": "blue"}
> displacy.serve(doc, style="dep", options=options)
> ```

| Name                                       | Type    | Description                                                                                                     | Default                 |
| ------------------------------------------ | ------- | --------------------------------------------------------------------------------------------------------------- | ----------------------- |
| `fine_grained`                             | bool    | Use fine-grained part-of-speech tags (`Token.tag_`) instead of coarse-grained tags (`Token.pos_`).              | `False`                 |
| `add_lemma` <Tag variant="new">2.2.4</Tag> | bool    | Print the lemma's in a separate row below the token texts.                                                      | `False`                 |
| `collapse_punct`                           | bool    | Attach punctuation to tokens. Can make the parse more readable, as it prevents long arcs to attach punctuation. | `True`                  |
| `collapse_phrases`                         | bool    | Merge noun phrases into one token.                                                                              | `False`                 |
| `compact`                                  | bool    | "Compact mode" with square arrows that takes up less space.                                                     | `False`                 |
| `color`                                    | unicode | Text color (HEX, RGB or color names).                                                                           | `'#000000'`             |
| `bg`                                       | unicode | Background color (HEX, RGB or color names).                                                                     | `'#ffffff'`             |
| `font`                                     | unicode | Font name or font family for all text.                                                                          | `'Arial'`               |
| `offset_x`                                 | int     | Spacing on left side of the SVG in px.                                                                          | `50`                    |
| `arrow_stroke`                             | int     | Width of arrow path in px.                                                                                      | `2`                     |
| `arrow_width`                              | int     | Width of arrow head in px.                                                                                      | `10` / `8` (compact)    |
| `arrow_spacing`                            | int     | Spacing between arrows in px to avoid overlaps.                                                                 | `20` / `12` (compact)   |
| `word_spacing`                             | int     | Vertical spacing between words and arcs in px.                                                                  | `45`                    |
| `distance`                                 | int     | Distance between words in px.                                                                                   | `175` / `150` (compact) |

#### Named Entity Visualizer options {#displacy_options-ent}

> #### Example
>
> ```python
> options = {"ents": ["PERSON", "ORG", "PRODUCT"],
>            "colors": {"ORG": "yellow"}}
> displacy.serve(doc, style="ent", options=options)
> ```

| Name                                    | Type    | Description                                                                                                                                | Default                                                                                          |
| --------------------------------------- | ------- | ------------------------------------------------------------------------------------------------------------------------------------------ | ------------------------------------------------------------------------------------------------ |
| `ents`                                  | list    | Entity types to highlight (`None` for all types).                                                                                          | `None`                                                                                           |
| `colors`                                | dict    | Color overrides. Entity types in uppercase should be mapped to color names or values.                                                      | `{}`                                                                                             |
| `template` <Tag variant="new">2.2</Tag> | unicode | Optional template to overwrite the HTML used to render entity spans. Should be a format string and can use `{bg}`, `{text}` and `{label}`. | see [`templates.py`](https://github.com/explosion/spaCy/blob/master/spacy/displacy/templates.py) |

By default, displaCy comes with colors for all
[entity types supported by spaCy](/api/annotation#named-entities). If you're
using custom entity types, you can use the `colors` setting to add your own
colors for them. Your application or model package can also expose a
[`spacy_displacy_colors` entry point](/usage/saving-loading#entry-points-displacy)
to add custom labels and their colors automatically.

## Utility functions {#util source="spacy/util.py"}

spaCy comes with a small collection of utility functions located in
[`spacy/util.py`](https://github.com/explosion/spaCy/tree/master/spacy/util.py).
Because utility functions are mostly intended for **internal use within spaCy**,
their behavior may change with future releases. The functions documented on this
page should be safe to use and we'll try to ensure backwards compatibility.
However, we recommend having additional tests in place if your application
depends on any of spaCy's utilities.

### util.get_data_path {#util.get_data_path tag="function"}

Get path to the data directory where spaCy looks for models. Defaults to
`spacy/data`.

| Name             | Type            | Description                                             |
| ---------------- | --------------- | ------------------------------------------------------- |
| `require_exists` | bool            | Only return path if it exists, otherwise return `None`. |
| **RETURNS**      | `Path` / `None` | Data path or `None`.                                    |

### util.set_data_path {#util.set_data_path tag="function"}

Set custom path to the data directory where spaCy looks for models.

> #### Example
>
> ```python
> util.set_data_path("/custom/path")
> util.get_data_path()
> # PosixPath('/custom/path')
> ```

| Name   | Type             | Description                 |
| ------ | ---------------- | --------------------------- |
| `path` | unicode / `Path` | Path to new data directory. |

### util.get_lang_class {#util.get_lang_class tag="function"}

Import and load a `Language` class. Allows lazy-loading
[language data](/usage/adding-languages) and importing languages using the
two-letter language code. To add a language code for a custom language class,
you can use the [`set_lang_class`](/api/top-level#util.set_lang_class) helper.

> #### Example
>
> ```python
> for lang_id in ["en", "de"]:
>     lang_class = util.get_lang_class(lang_id)
>     lang = lang_class()
>     tokenizer = lang.Defaults.create_tokenizer()
> ```

| Name        | Type       | Description                            |
| ----------- | ---------- | -------------------------------------- |
| `lang`      | unicode    | Two-letter language code, e.g. `'en'`. |
| **RETURNS** | `Language` | Language class.                        |

### util.set_lang_class {#util.set_lang_class tag="function"}

Set a custom `Language` class name that can be loaded via
[`get_lang_class`](/api/top-level#util.get_lang_class). If your model uses a
custom language, this is required so that spaCy can load the correct class from
the two-letter language code.

> #### Example
>
> ```python
> from spacy.lang.xy import CustomLanguage
>
> util.set_lang_class('xy', CustomLanguage)
> lang_class = util.get_lang_class('xy')
> nlp = lang_class()
> ```

| Name   | Type       | Description                            |
| ------ | ---------- | -------------------------------------- |
| `name` | unicode    | Two-letter language code, e.g. `'en'`. |
| `cls`  | `Language` | The language class, e.g. `English`.    |

### util.lang_class_is_loaded {#util.lang_class_is_loaded tag="function" new="2.1"}

Check whether a `Language` class is already loaded. `Language` classes are
loaded lazily, to avoid expensive setup code associated with the language data.

> #### Example
>
> ```python
> lang_cls = util.get_lang_class("en")
> assert util.lang_class_is_loaded("en") is True
> assert util.lang_class_is_loaded("de") is False
> ```

| Name        | Type    | Description                            |
| ----------- | ------- | -------------------------------------- |
| `name`      | unicode | Two-letter language code, e.g. `'en'`. |
| **RETURNS** | bool    | Whether the class has been loaded.     |

### util.load_model {#util.load_model tag="function" new="2"}

Load a model from a shortcut link, package or data path. If called with a
shortcut link or package name, spaCy will assume the model is a Python package
and import and call its `load()` method. If called with a path, spaCy will
assume it's a data directory, read the language and pipeline settings from the
meta.json and initialize a `Language` class. The model data will then be loaded
in via [`Language.from_disk()`](/api/language#from_disk).

> #### Example
>
> ```python
> nlp = util.load_model("en")
> nlp = util.load_model("en_core_web_sm", disable=["ner"])
> nlp = util.load_model("/path/to/data")
> ```

| Name          | Type       | Description                                              |
| ------------- | ---------- | -------------------------------------------------------- |
| `name`        | unicode    | Package name, shortcut link or model path.               |
| `**overrides` | -          | Specific overrides, like pipeline components to disable. |
| **RETURNS**   | `Language` | `Language` class with the loaded model.                  |

### util.load_model_from_path {#util.load_model_from_path tag="function" new="2"}

Load a model from a data directory path. Creates the [`Language`](/api/language)
class and pipeline based on the directory's meta.json and then calls
[`from_disk()`](/api/language#from_disk) with the path. This function also makes
it easy to test a new model that you haven't packaged yet.

> #### Example
>
> ```python
> nlp = load_model_from_path("/path/to/data")
> ```

| Name          | Type       | Description                                                                                          |
| ------------- | ---------- | ---------------------------------------------------------------------------------------------------- |
| `model_path`  | unicode    | Path to model data directory.                                                                        |
| `meta`        | dict       | Model meta data. If `False`, spaCy will try to load the meta from a meta.json in the same directory. |
| `**overrides` | -          | Specific overrides, like pipeline components to disable.                                             |
| **RETURNS**   | `Language` | `Language` class with the loaded model.                                                              |

### util.load_model_from_init_py {#util.load_model_from_init_py tag="function" new="2"}

A helper function to use in the `load()` method of a model package's
[`__init__.py`](https://github.com/explosion/spacy-models/tree/master/template/model/xx_model_name/__init__.py).

> #### Example
>
> ```python
> from spacy.util import load_model_from_init_py
>
> def load(**overrides):
>     return load_model_from_init_py(__file__, **overrides)
> ```

| Name          | Type       | Description                                              |
| ------------- | ---------- | -------------------------------------------------------- |
| `init_file`   | unicode    | Path to model's `__init__.py`, i.e. `__file__`.          |
| `**overrides` | -          | Specific overrides, like pipeline components to disable. |
| **RETURNS**   | `Language` | `Language` class with the loaded model.                  |

### util.get_model_meta {#util.get_model_meta tag="function" new="2"}

Get a model's meta.json from a directory path and validate its contents.

> #### Example
>
> ```python
> meta = util.get_model_meta("/path/to/model")
> ```

| Name        | Type             | Description              |
| ----------- | ---------------- | ------------------------ |
| `path`      | unicode / `Path` | Path to model directory. |
| **RETURNS** | dict             | The model's meta data.   |

### util.is_package {#util.is_package tag="function"}

Check if string maps to a package installed via pip. Mainly used to validate
[model packages](/usage/models).

> #### Example
>
> ```python
> util.is_package("en_core_web_sm") # True
> util.is_package("xyz") # False
> ```

| Name        | Type    | Description                                  |
| ----------- | ------- | -------------------------------------------- |
| `name`      | unicode | Name of package.                             |
| **RETURNS** | `bool`  | `True` if installed package, `False` if not. |

### util.get_package_path {#util.get_package_path tag="function" new="2"}

Get path to an installed package. Mainly used to resolve the location of
[model packages](/usage/models). Currently imports the package to find its path.

> #### Example
>
> ```python
> util.get_package_path("en_core_web_sm")
> # /usr/lib/python3.6/site-packages/en_core_web_sm
> ```

| Name           | Type    | Description                      |
| -------------- | ------- | -------------------------------- |
| `package_name` | unicode | Name of installed package.       |
| **RETURNS**    | `Path`  | Path to model package directory. |

### util.is_in_jupyter {#util.is_in_jupyter tag="function" new="2"}

Check if user is running spaCy from a [Jupyter](https://jupyter.org) notebook by
detecting the IPython kernel. Mainly used for the
[`displacy`](/api/top-level#displacy) visualizer.

> #### Example
>
> ```python
> html = "<h1>Hello world!</h1>"
> if util.is_in_jupyter():
>     from IPython.core.display import display, HTML
>     display(HTML(html))
> ```

| Name        | Type | Description                           |
| ----------- | ---- | ------------------------------------- |
| **RETURNS** | bool | `True` if in Jupyter, `False` if not. |

### util.update_exc {#util.update_exc tag="function"}

Update, validate and overwrite
[tokenizer exceptions](/usage/adding-languages#tokenizer-exceptions). Used to
combine global exceptions with custom, language-specific exceptions. Will raise
an error if key doesn't match `ORTH` values.

> #### Example
>
> ```python
> BASE =  {"a.": [{ORTH: "a."}], ":)": [{ORTH: ":)"}]}
> NEW = {"a.": [{ORTH: "a.", NORM: "all"}]}
> exceptions = util.update_exc(BASE, NEW)
> # {"a.": [{ORTH: "a.", NORM: "all"}], ":)": [{ORTH: ":)"}]}
> ```

| Name              | Type  | Description                                                     |
| ----------------- | ----- | --------------------------------------------------------------- |
| `base_exceptions` | dict  | Base tokenizer exceptions.                                      |
| `*addition_dicts` | dicts | Exception dictionaries to add to the base exceptions, in order. |
| **RETURNS**       | dict  | Combined tokenizer exceptions.                                  |

### util.compile_prefix_regex {#util.compile_prefix_regex tag="function"}

Compile a sequence of prefix rules into a regex object.

> #### Example
>
> ```python
> prefixes = ("§", "%", "=", r"\+")
> prefix_regex = util.compile_prefix_regex(prefixes)
> nlp.tokenizer.prefix_search = prefix_regex.search
> ```

| Name        | Type                                                          | Description                                                                                                                               |
| ----------- | ------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------- |
| `entries`   | tuple                                                         | The prefix rules, e.g. [`lang.punctuation.TOKENIZER_PREFIXES`](https://github.com/explosion/spaCy/tree/master/spacy/lang/punctuation.py). |
| **RETURNS** | [regex](https://docs.python.org/3/library/re.html#re-objects) | The regex object. to be used for [`Tokenizer.prefix_search`](/api/tokenizer#attributes).                                                  |

### util.compile_suffix_regex {#util.compile_suffix_regex tag="function"}

Compile a sequence of suffix rules into a regex object.

> #### Example
>
> ```python
> suffixes = ("'s", "'S", r"(?<=[0-9])\+")
> suffix_regex = util.compile_suffix_regex(suffixes)
> nlp.tokenizer.suffix_search = suffix_regex.search
> ```

| Name        | Type                                                          | Description                                                                                                                               |
| ----------- | ------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------- |
| `entries`   | tuple                                                         | The suffix rules, e.g. [`lang.punctuation.TOKENIZER_SUFFIXES`](https://github.com/explosion/spaCy/tree/master/spacy/lang/punctuation.py). |
| **RETURNS** | [regex](https://docs.python.org/3/library/re.html#re-objects) | The regex object. to be used for [`Tokenizer.suffix_search`](/api/tokenizer#attributes).                                                  |

### util.compile_infix_regex {#util.compile_infix_regex tag="function"}

Compile a sequence of infix rules into a regex object.

> #### Example
>
> ```python
> infixes = ("…", "-", "—", r"(?<=[0-9])[+\-\*^](?=[0-9-])")
> infix_regex = util.compile_infix_regex(infixes)
> nlp.tokenizer.infix_finditer = infix_regex.finditer
> ```

| Name        | Type                                                          | Description                                                                                                                             |
| ----------- | ------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------- |
| `entries`   | tuple                                                         | The infix rules, e.g. [`lang.punctuation.TOKENIZER_INFIXES`](https://github.com/explosion/spaCy/tree/master/spacy/lang/punctuation.py). |
| **RETURNS** | [regex](https://docs.python.org/3/library/re.html#re-objects) | The regex object. to be used for [`Tokenizer.infix_finditer`](/api/tokenizer#attributes).                                               |

### util.minibatch {#util.minibatch tag="function" new="2"}

Iterate over batches of items. `size` may be an iterator, so that batch-size can
vary on each step.

> #### Example
>
> ```python
> batches = minibatch(train_data)
> for batch in batches:
>     texts, annotations = zip(*batch)
>     nlp.update(texts, annotations)
> ```

| Name       | Type           | Description                                                                                                                                                                                  |
| ---------- | -------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `items`    | iterable       | The items to batch up.                                                                                                                                                                       |
| `size`     | int / iterable | The batch size(s). Use [`util.compounding`](/api/top-level#util.compounding) or [`util.decaying`](/api/top-level#util.decaying) or for an infinite series of compounding or decaying values. |
| **YIELDS** | list           | The batches.                                                                                                                                                                                 |

### util.compounding {#util.compounding tag="function" new="2"}

Yield an infinite series of compounding values. Each time the generator is
called, a value is produced by multiplying the previous value by the compound
rate.

> #### Example
>
> ```python
> sizes = compounding(1., 10., 1.5)
> assert next(sizes) == 1.
> assert next(sizes) == 1. * 1.5
> assert next(sizes) == 1.5 * 1.5
> ```

| Name       | Type        | Description             |
| ---------- | ----------- | ----------------------- |
| `start`    | int / float | The first value.        |
| `stop`     | int / float | The maximum value.      |
| `compound` | int / float | The compounding factor. |
| **YIELDS** | int         | Compounding values.     |

### util.decaying {#util.decaying tag="function" new="2"}

Yield an infinite series of linearly decaying values.

> #### Example
>
> ```python
> sizes = decaying(10., 1., 0.001)
> assert next(sizes) == 10.
> assert next(sizes) == 10. - 0.001
> assert next(sizes) == 9.999 - 0.001
> ```

| Name       | Type        | Description          |
| ---------- | ----------- | -------------------- |
| `start`    | int / float | The first value.     |
| `end`      | int / float | The maximum value.   |
| `decay`    | int / float | The decaying factor. |
| **YIELDS** | int         | The decaying values. |

### util.itershuffle {#util.itershuffle tag="function" new="2"}

Shuffle an iterator. This works by holding `bufsize` items back and yielding
them sometime later. Obviously, this is not unbiased – but should be good enough
for batching. Larger `bufsize` means less bias.

> #### Example
>
> ```python
> values = range(1000)
> shuffled = itershuffle(values)
> ```

| Name       | Type     | Description                         |
| ---------- | -------- | ----------------------------------- |
| `iterable` | iterable | Iterator to shuffle.                |
| `bufsize`  | int      | Items to hold back (default: 1000). |
| **YIELDS** | iterable | The shuffled iterator.              |

### util.filter_spans {#util.filter_spans tag="function" new="2.1.4"}

Filter a sequence of [`Span`](/api/span) objects and remove duplicates or
overlaps. Useful for creating named entities (where one token can only be part
of one entity) or when merging spans with
[`Retokenizer.merge`](/api/doc#retokenizer.merge). When spans overlap, the
(first) longest span is preferred over shorter spans.

> #### Example
>
> ```python
> doc = nlp("This is a sentence.")
> spans = [doc[0:2], doc[0:2], doc[0:4]]
> filtered = filter_spans(spans)
> ```

| Name        | Type     | Description          |
| ----------- | -------- | -------------------- |
| `spans`     | iterable | The spans to filter. |
| **RETURNS** | list     | The filtered spans.  |

## Compatibility functions {#compat source="spacy/compaty.py"}

All Python code is written in an **intersection of Python 2 and Python 3**. This
is easy in Cython, but somewhat ugly in Python. Logic that deals with Python or
platform compatibility only lives in `spacy.compat`. To distinguish them from
the builtin functions, replacement functions are suffixed with an underscore,
e.g. `unicode_`.

> #### Example
>
> ```python
> from spacy.compat import unicode_
>
> compatible_unicode = unicode_("hello world")
> ```

| Name                 | Python 2                           | Python 3    |
| -------------------- | ---------------------------------- | ----------- |
| `compat.bytes_`      | `str`                              | `bytes`     |
| `compat.unicode_`    | `unicode`                          | `str`       |
| `compat.basestring_` | `basestring`                       | `str`       |
| `compat.input_`      | `raw_input`                        | `input`     |
| `compat.path2str`    | `str(path)` with `.decode('utf8')` | `str(path)` |

### compat.is_config {#compat.is_config tag="function"}

Check if a specific configuration of Python version and operating system matches
the user's setup. Mostly used to display targeted error messages.

> #### Example
>
> ```python
> from spacy.compat import is_config
>
> if is_config(python2=True, windows=True):
>     print("You are using Python 2 on Windows.")
> ```

| Name        | Type | Description                                                      |
| ----------- | ---- | ---------------------------------------------------------------- |
| `python2`   | bool | spaCy is executed with Python 2.x.                               |
| `python3`   | bool | spaCy is executed with Python 3.x.                               |
| `windows`   | bool | spaCy is executed on Windows.                                    |
| `linux`     | bool | spaCy is executed on Linux.                                      |
| `osx`       | bool | spaCy is executed on OS X or macOS.                              |
| **RETURNS** | bool | Whether the specified configuration matches the user's platform. |
-												💫 Update website (#3285)

<!--- Provide a general summary of your changes in the title. -->

## Description

The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.

This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.


### Types of change
enhancement

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.

											
										
										
											2019-02-17 18:31:19 +00:00
+								---
 								title: Top-level Functions
 								menu:
 								  - ['spacy', 'spacy']
 								  - ['displacy', 'displacy']
 								  - ['Utility Functions', 'util']
 								  - ['Compatibility', 'compat']
 								---
 								## spaCy {#spacy hidden="true"}
 								### spacy.load {#spacy.load tag="function" model="any"}
 								Load a model via its [shortcut link](/usage/models#usage), the name of an
 								installed [model package](/usage/training#models-generating), a unicode path or
 								a `Path`-like object. spaCy will try resolving the load argument in this order.
 								If a model is loaded from a shortcut link or package name, spaCy will assume
 								it's a Python package and import it and call the model's own `load()` method. If
 								a model is loaded from a path, spaCy will assume it's a data directory, read the
 								language and pipeline settings off the meta.json and initialize the `Language`
 								class. The data will be loaded in via
 								[`Language.from_disk`](/api/language#from_disk).
 								> #### Example
 								>
 								> ```python
 								> nlp = spacy.load("en") # shortcut link
 								> nlp = spacy.load("en_core_web_sm") # package
 								> nlp = spacy.load("/path/to/en") # unicode path
 								> nlp = spacy.load(Path("/path/to/en")) # pathlib Path
 								>
-												Improve consistency of docs examples [ci skip]

											
										
										
											2019-07-25 12:24:56 +00:00
+								> nlp = spacy.load("en_core_web_sm", disable=["parser", "tagger"])
-												💫 Update website (#3285)

<!--- Provide a general summary of your changes in the title. -->

## Description

The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.

This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.


### Types of change
enhancement

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.

											
										
										
											2019-02-17 18:31:19 +00:00
+								> ```
 								| Name        | Type             | Description                                                                       |
 								| ----------- | ---------------- | --------------------------------------------------------------------------------- |
 								| `name`      | unicode / `Path` | Model to load, i.e. shortcut link, package name or path.                          |
 								| `disable`   | list             | Names of pipeline components to [disable](/usage/processing-pipelines#disabling). |
 								| **RETURNS** | `Language`       | A `Language` object with the loaded model.                                        |
 								Essentially, `spacy.load()` is a convenience wrapper that reads the language ID
 								and pipeline components from a model's `meta.json`, initializes the `Language`
 								class, loads in the model data and returns it.
 								```python
 								### Abstract example
 								cls = util.get_lang_class(lang)         #  get language for ID, e.g. 'en'
 								nlp = cls()                             #  initialise the language
 								for name in pipeline: component = nlp.create_pipe(name)   #  create each pipeline component nlp.add_pipe(component)             #  add component to pipeline
 								nlp.from_disk(model_data_path)          #  load in model data
 								```
 								<Infobox title="Changed in v2.0" variant="warning">
 								As of spaCy 2.0, the `path` keyword argument is deprecated. spaCy will also
 								raise an error if no model could be loaded and never just return an empty
 								`Language` object. If you need a blank language, you can use the new function
 								[`spacy.blank()`](/api/top-level#spacy.blank) or import the class explicitly,
 								e.g. `from spacy.lang.en import English`.
 								```diff
 								- nlp = spacy.load("en", path="/model")
 								+ nlp = spacy.load("/model")
 								```
 								</Infobox>
 								### spacy.blank {#spacy.blank tag="function" new="2"}
 								Create a blank model of a given language class. This function is the twin of
 								`spacy.load()`.
 								> #### Example
 								>
 								> ```python
 								> nlp_en = spacy.blank("en")
 								> nlp_de = spacy.blank("de")
 								> ```
 								| Name        | Type       | Description                                                                                      |
 								| ----------- | ---------- | ------------------------------------------------------------------------------------------------ |
 								| `name`      | unicode    | [ISO code](https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes) of the language class to load. |
 								| `disable`   | list       | Names of pipeline components to [disable](/usage/processing-pipelines#disabling).                |
 								| **RETURNS** | `Language` | An empty `Language` object of the appropriate subclass.                                          |
 								#### spacy.info {#spacy.info tag="function"}
 								The same as the [`info` command](/api/cli#info). Pretty-print information about
 								your installation, models and local setup from within spaCy. To get the model
 								meta data as a dictionary instead, you can use the `meta` attribute on your
 								`nlp` object with a loaded model, e.g. `nlp.meta`.
 								> #### Example
 								>
 								> ```python
 								> spacy.info()
 								> spacy.info("en")
 								> spacy.info("de", markdown=True)
 								> ```
 								| Name       | Type    | Description                                                   |
 								| ---------- | ------- | ------------------------------------------------------------- |
 								| `model`    | unicode | A model, i.e. shortcut link, package name or path (optional). |
 								| `markdown` | bool    | Print information as Markdown.                                |
 								### spacy.explain {#spacy.explain tag="function"}
 								Get a description for a given POS tag, dependency label or entity type. For a
 								list of available terms, see
 								[`glossary.py`](https://github.com/explosion/spaCy/tree/master/spacy/glossary.py).
 								> #### Example
 								>
 								> ```python
-												Remove u-strings and fix formatting [ci skip]

											
										
										
											2019-09-12 14:11:15 +00:00
+								> spacy.explain("NORP")
-												💫 Update website (#3285)

<!--- Provide a general summary of your changes in the title. -->

## Description

The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.

This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.


### Types of change
enhancement

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.

											
										
										
											2019-02-17 18:31:19 +00:00
+								> # Nationalities or religious or political groups
 								>
-												Remove u-strings and fix formatting [ci skip]

											
										
										
											2019-09-12 14:11:15 +00:00
+								> doc = nlp("Hello world")
-												💫 Update website (#3285)

<!--- Provide a general summary of your changes in the title. -->

## Description

The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.

This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.


### Types of change
enhancement

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.

											
										
										
											2019-02-17 18:31:19 +00:00
+								> for word in doc:
 								>    print(word.text, word.tag_, spacy.explain(word.tag_))
 								> # Hello UH interjection
 								> # world NN noun, singular or mass
 								> ```
 								| Name        | Type    | Description                                              |
 								| ----------- | ------- | -------------------------------------------------------- |
 								| `term`      | unicode | Term to explain.                                         |
 								| **RETURNS** | unicode | The explanation, or `None` if not found in the glossary. |
 								### spacy.prefer_gpu {#spacy.prefer_gpu tag="function" new="2.0.14"}
 								Allocate data and perform operations on [GPU](/usage/#gpu), if available. If
 								data has already been allocated on CPU, it will not be moved. Ideally, this
 								function should be called right after importing spaCy and _before_ loading any
 								models.
 								> #### Example
 								>
 								> ```python
 								> import spacy
 								> activated = spacy.prefer_gpu()
 								> nlp = spacy.load("en_core_web_sm")
 								> ```
 								| Name        | Type | Description                    |
 								| ----------- | ---- | ------------------------------ |
 								| **RETURNS** | bool | Whether the GPU was activated. |
 								### spacy.require_gpu {#spacy.require_gpu tag="function" new="2.0.14"}
 								Allocate data and perform operations on [GPU](/usage/#gpu). Will raise an error
 								if no GPU is available. If data has already been allocated on CPU, it will not
 								be moved. Ideally, this function should be called right after importing spaCy
 								and _before_ loading any models.
 								> #### Example
 								>
 								> ```python
 								> import spacy
 								> spacy.require_gpu()
 								> nlp = spacy.load("en_core_web_sm")
 								> ```
 								| Name        | Type | Description |
 								| ----------- | ---- | ----------- |
 								| **RETURNS** | bool | `True`      |
 								## displaCy {#displacy source="spacy/displacy"}
 								As of v2.0, spaCy comes with a built-in visualization suite. For more info and
 								examples, see the usage guide on [visualizing spaCy](/usage/visualizers).
 								### displacy.serve {#displacy.serve tag="method" new="2"}
 								Serve a dependency parse tree or named entity visualization to view it in your
 								browser. Will run a simple web server.
 								> #### Example
 								>
 								> ```python
 								> import spacy
 								> from spacy import displacy
 								> nlp = spacy.load("en_core_web_sm")
-												Remove u-strings and fix formatting [ci skip]

											
										
										
											2019-09-12 14:11:15 +00:00
+								> doc1 = nlp("This is a sentence.")
 								> doc2 = nlp("This is another sentence.")
-												💫 Update website (#3285)

<!--- Provide a general summary of your changes in the title. -->

## Description

The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.

This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.


### Types of change
enhancement

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.

											
										
										
											2019-02-17 18:31:19 +00:00
+								> displacy.serve([doc1, doc2], style="dep")
 								> ```
 								| Name      | Type                | Description                                                                                                                          | Default     |
 								| --------- | ------------------- | ------------------------------------------------------------------------------------------------------------------------------------ | ----------- |
 								| `docs`    | list, `Doc`, `Span` | Document(s) to visualize.                                                                                                            |
 								| `style`   | unicode             | Visualization style, `'dep'` or `'ent'`.                                                                                             | `'dep'`     |
 								| `page`    | bool                | Render markup as full HTML page.                                                                                                     | `True`      |
 								| `minify`  | bool                | Minify HTML markup.                                                                                                                  | `False`     |
-												Remove u-strings and fix formatting [ci skip]

											
										
										
											2019-09-12 14:11:15 +00:00
+								| `options` | dict                | [Visualizer-specific options](#displacy_options), e.g. colors.                                                                       | `{}`        |
-												💫 Update website (#3285)

<!--- Provide a general summary of your changes in the title. -->

## Description

The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.

This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.


### Types of change
enhancement

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.

											
										
										
											2019-02-17 18:31:19 +00:00
+								| `manual`  | bool                | Don't parse `Doc` and instead, expect a dict or list of dicts. [See here](/usage/visualizers#manual-usage) for formats and examples. | `False`     |
 								| `port`    | int                 | Port to serve visualization.                                                                                                         | `5000`      |
 								| `host`    | unicode             | Host to serve visualization.                                                                                                         | `'0.0.0.0'` |
 								### displacy.render {#displacy.render tag="method" new="2"}
 								Render a dependency parse tree or named entity visualization.
 								> #### Example
 								>
 								> ```python
 								> import spacy
 								> from spacy import displacy
 								> nlp = spacy.load("en_core_web_sm")
-												Remove u-strings and fix formatting [ci skip]

											
										
										
											2019-09-12 14:11:15 +00:00
+								> doc = nlp("This is a sentence.")
-												💫 Update website (#3285)

<!--- Provide a general summary of your changes in the title. -->

## Description

The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.

This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.


### Types of change
enhancement

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.

											
										
										
											2019-02-17 18:31:19 +00:00
+								> html = displacy.render(doc, style="dep")
 								> ```
-												Allow jupyter=False to override Jupyter mode (closes #3598)

											
										
										
											2019-04-22 12:18:32 +00:00
+								| Name        | Type                | Description                                                                                                                                               | Default |
 								| ----------- | ------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------- | ------- |
 								| `docs`      | list, `Doc`, `Span` | Document(s) to visualize.                                                                                                                                 |
 								| `style`     | unicode             | Visualization style, `'dep'` or `'ent'`.                                                                                                                  | `'dep'` |
 								| `page`      | bool                | Render markup as full HTML page.                                                                                                                          | `False` |
 								| `minify`    | bool                | Minify HTML markup.                                                                                                                                       | `False` |
 								| `jupyter`   | bool                | Explicitly enable or disable "[Jupyter](http://jupyter.org/) mode" to return markup ready to be rendered in a notebook. Detected automatically if `None`. | `None`  |
-												Remove u-strings and fix formatting [ci skip]

											
										
										
											2019-09-12 14:11:15 +00:00
+								| `options`   | dict                | [Visualizer-specific options](#displacy_options), e.g. colors.                                                                                            | `{}`    |
-												Allow jupyter=False to override Jupyter mode (closes #3598)

											
										
										
											2019-04-22 12:18:32 +00:00
+								| `manual`    | bool                | Don't parse `Doc` and instead, expect a dict or list of dicts. [See here](/usage/visualizers#manual-usage) for formats and examples.                      | `False` |
 								| **RETURNS** | unicode             | Rendered HTML markup.                                                                                                                                     |
-												💫 Update website (#3285)

<!--- Provide a general summary of your changes in the title. -->

## Description

The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.

This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.


### Types of change
enhancement

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.

											
										
										
											2019-02-17 18:31:19 +00:00
 								### Visualizer options {#displacy_options}
 								The `options` argument lets you specify additional settings for each visualizer.
 								If a setting is not present in the options, the default value will be used.
 								#### Dependency Visualizer options {#options-dep}
 								> #### Example
 								>
 								> ```python
 								> options = {"compact": True, "color": "blue"}
 								> displacy.serve(doc, style="dep", options=options)
 								> ```
-												Fix formatting and update docs for v2.2.4

											
										
										
											2020-03-09 10:17:20 +00:00
+								| Name                                       | Type    | Description                                                                                                     | Default                 |
 								| ------------------------------------------ | ------- | --------------------------------------------------------------------------------------------------------------- | ----------------------- |
 								| `fine_grained`                             | bool    | Use fine-grained part-of-speech tags (`Token.tag_`) instead of coarse-grained tags (`Token.pos_`).              | `False`                 |
 								| `add_lemma` <Tag variant="new">2.2.4</Tag> | bool    | Print the lemma's in a separate row below the token texts.                                                      | `False`                 |
 								| `collapse_punct`                           | bool    | Attach punctuation to tokens. Can make the parse more readable, as it prevents long arcs to attach punctuation. | `True`                  |
 								| `collapse_phrases`                         | bool    | Merge noun phrases into one token.                                                                              | `False`                 |
 								| `compact`                                  | bool    | "Compact mode" with square arrows that takes up less space.                                                     | `False`                 |
 								| `color`                                    | unicode | Text color (HEX, RGB or color names).                                                                           | `'#000000'`             |
 								| `bg`                                       | unicode | Background color (HEX, RGB or color names).                                                                     | `'#ffffff'`             |
 								| `font`                                     | unicode | Font name or font family for all text.                                                                          | `'Arial'`               |
 								| `offset_x`                                 | int     | Spacing on left side of the SVG in px.                                                                          | `50`                    |
 								| `arrow_stroke`                             | int     | Width of arrow path in px.                                                                                      | `2`                     |
 								| `arrow_width`                              | int     | Width of arrow head in px.                                                                                      | `10` / `8` (compact)    |
 								| `arrow_spacing`                            | int     | Spacing between arrows in px to avoid overlaps.                                                                 | `20` / `12` (compact)   |
 								| `word_spacing`                             | int     | Vertical spacing between words and arcs in px.                                                                  | `45`                    |
 								| `distance`                                 | int     | Distance between words in px.                                                                                   | `175` / `150` (compact) |
-												💫 Update website (#3285)

<!--- Provide a general summary of your changes in the title. -->

## Description

The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.

This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.


### Types of change
enhancement

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.

											
										
										
											2019-02-17 18:31:19 +00:00
 								#### Named Entity Visualizer options {#displacy_options-ent}
 								> #### Example
 								>
 								> ```python
 								> options = {"ents": ["PERSON", "ORG", "PRODUCT"],
 								>            "colors": {"ORG": "yellow"}}
 								> displacy.serve(doc, style="ent", options=options)
 								> ```
-												Remove u-strings and fix formatting [ci skip]

											
										
										
											2019-09-12 14:11:15 +00:00
+								| Name                                    | Type    | Description                                                                                                                                | Default                                                                                          |
 								| --------------------------------------- | ------- | ------------------------------------------------------------------------------------------------------------------------------------------ | ------------------------------------------------------------------------------------------------ |
 								| `ents`                                  | list    | Entity types to highlight (`None` for all types).                                                                                          | `None`                                                                                           |
 								| `colors`                                | dict    | Color overrides. Entity types in uppercase should be mapped to color names or values.                                                      | `{}`                                                                                             |
-												Update displaCy API docs [ci skip]

											
										
										
											2019-09-12 10:59:20 +00:00
+								| `template` <Tag variant="new">2.2</Tag> | unicode | Optional template to overwrite the HTML used to render entity spans. Should be a format string and can use `{bg}`, `{text}` and `{label}`. | see [`templates.py`](https://github.com/explosion/spaCy/blob/master/spacy/displacy/templates.py) |
-												💫 Update website (#3285)

<!--- Provide a general summary of your changes in the title. -->

## Description

The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.

This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.


### Types of change
enhancement

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.

											
										
										
											2019-02-17 18:31:19 +00:00
 								By default, displaCy comes with colors for all
 								[entity types supported by spaCy](/api/annotation#named-entities). If you're
 								using custom entity types, you can use the `colors` setting to add your own
-												Remove u-strings and fix formatting [ci skip]

											
										
										
											2019-09-12 14:11:15 +00:00
+								colors for them. Your application or model package can also expose a
 								[`spacy_displacy_colors` entry point](/usage/saving-loading#entry-points-displacy)
 								to add custom labels and their colors automatically.
-												💫 Update website (#3285)

<!--- Provide a general summary of your changes in the title. -->

## Description

The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.

This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.


### Types of change
enhancement

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.

											
										
										
											2019-02-17 18:31:19 +00:00
 								## Utility functions {#util source="spacy/util.py"}
 								spaCy comes with a small collection of utility functions located in
 								[`spacy/util.py`](https://github.com/explosion/spaCy/tree/master/spacy/util.py).
 								Because utility functions are mostly intended for **internal use within spaCy**,
 								their behavior may change with future releases. The functions documented on this
 								page should be safe to use and we'll try to ensure backwards compatibility.
 								However, we recommend having additional tests in place if your application
 								depends on any of spaCy's utilities.
 								### util.get_data_path {#util.get_data_path tag="function"}
 								Get path to the data directory where spaCy looks for models. Defaults to
 								`spacy/data`.
 								| Name             | Type            | Description                                             |
 								| ---------------- | --------------- | ------------------------------------------------------- |
 								| `require_exists` | bool            | Only return path if it exists, otherwise return `None`. |
 								| **RETURNS**      | `Path` / `None` | Data path or `None`.                                    |
 								### util.set_data_path {#util.set_data_path tag="function"}
 								Set custom path to the data directory where spaCy looks for models.
 								> #### Example
 								>
 								> ```python
 								> util.set_data_path("/custom/path")
 								> util.get_data_path()
 								> # PosixPath('/custom/path')
 								> ```
 								| Name   | Type             | Description                 |
 								| ------ | ---------------- | --------------------------- |
 								| `path` | unicode / `Path` | Path to new data directory. |
 								### util.get_lang_class {#util.get_lang_class tag="function"}
 								Import and load a `Language` class. Allows lazy-loading
 								[language data](/usage/adding-languages) and importing languages using the
 								two-letter language code. To add a language code for a custom language class,
 								you can use the [`set_lang_class`](/api/top-level#util.set_lang_class) helper.
 								> #### Example
 								>
 								> ```python
 								> for lang_id in ["en", "de"]:
 								>     lang_class = util.get_lang_class(lang_id)
 								>     lang = lang_class()
 								>     tokenizer = lang.Defaults.create_tokenizer()
 								> ```
 								| Name        | Type       | Description                            |
 								| ----------- | ---------- | -------------------------------------- |
 								| `lang`      | unicode    | Two-letter language code, e.g. `'en'`. |
 								| **RETURNS** | `Language` | Language class.                        |
 								### util.set_lang_class {#util.set_lang_class tag="function"}
 								Set a custom `Language` class name that can be loaded via
 								[`get_lang_class`](/api/top-level#util.get_lang_class). If your model uses a
 								custom language, this is required so that spaCy can load the correct class from
 								the two-letter language code.
 								> #### Example
 								>
 								> ```python
 								> from spacy.lang.xy import CustomLanguage
 								>
 								> util.set_lang_class('xy', CustomLanguage)
 								> lang_class = util.get_lang_class('xy')
 								> nlp = lang_class()
 								> ```
 								| Name   | Type       | Description                            |
 								| ------ | ---------- | -------------------------------------- |
 								| `name` | unicode    | Two-letter language code, e.g. `'en'`. |
 								| `cls`  | `Language` | The language class, e.g. `English`.    |
-												Merge branch 'spacy.io' [ci skip]

											
										
										
											2019-05-11 21:03:56 +00:00
+								### util.lang_class_is_loaded {#util.lang_class_is_loaded tag="function" new="2.1"}
-												Document new API [ci skip]

											
										
										
											2019-03-11 14:23:53 +00:00
 								Check whether a `Language` class is already loaded. `Language` classes are
 								loaded lazily, to avoid expensive setup code associated with the language data.
 								> #### Example
 								>
 								> ```python
 								> lang_cls = util.get_lang_class("en")
 								> assert util.lang_class_is_loaded("en") is True
 								> assert util.lang_class_is_loaded("de") is False
 								> ```
 								| Name        | Type    | Description                            |
 								| ----------- | ------- | -------------------------------------- |
 								| `name`      | unicode | Two-letter language code, e.g. `'en'`. |
 								| **RETURNS** | bool    | Whether the class has been loaded.     |
-												💫 Update website (#3285)

<!--- Provide a general summary of your changes in the title. -->

## Description

The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.

This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.


### Types of change
enhancement

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.

											
										
										
											2019-02-17 18:31:19 +00:00
+								### util.load_model {#util.load_model tag="function" new="2"}
 								Load a model from a shortcut link, package or data path. If called with a
 								shortcut link or package name, spaCy will assume the model is a Python package
 								and import and call its `load()` method. If called with a path, spaCy will
 								assume it's a data directory, read the language and pipeline settings from the
 								meta.json and initialize a `Language` class. The model data will then be loaded
 								in via [`Language.from_disk()`](/api/language#from_disk).
 								> #### Example
 								>
 								> ```python
 								> nlp = util.load_model("en")
 								> nlp = util.load_model("en_core_web_sm", disable=["ner"])
 								> nlp = util.load_model("/path/to/data")
 								> ```
 								| Name          | Type       | Description                                              |
 								| ------------- | ---------- | -------------------------------------------------------- |
 								| `name`        | unicode    | Package name, shortcut link or model path.               |
 								| `**overrides` | -          | Specific overrides, like pipeline components to disable. |
 								| **RETURNS**   | `Language` | `Language` class with the loaded model.                  |
 								### util.load_model_from_path {#util.load_model_from_path tag="function" new="2"}
 								Load a model from a data directory path. Creates the [`Language`](/api/language)
 								class and pipeline based on the directory's meta.json and then calls
 								[`from_disk()`](/api/language#from_disk) with the path. This function also makes
 								it easy to test a new model that you haven't packaged yet.
 								> #### Example
 								>
 								> ```python
 								> nlp = load_model_from_path("/path/to/data")
 								> ```
 								| Name          | Type       | Description                                                                                          |
 								| ------------- | ---------- | ---------------------------------------------------------------------------------------------------- |
 								| `model_path`  | unicode    | Path to model data directory.                                                                        |
 								| `meta`        | dict       | Model meta data. If `False`, spaCy will try to load the meta from a meta.json in the same directory. |
 								| `**overrides` | -          | Specific overrides, like pipeline components to disable.                                             |
 								| **RETURNS**   | `Language` | `Language` class with the loaded model.                                                              |
 								### util.load_model_from_init_py {#util.load_model_from_init_py tag="function" new="2"}
 								A helper function to use in the `load()` method of a model package's
 								[`__init__.py`](https://github.com/explosion/spacy-models/tree/master/template/model/xx_model_name/__init__.py).
 								> #### Example
 								>
 								> ```python
 								> from spacy.util import load_model_from_init_py
 								>
 								> def load(**overrides):
 								>     return load_model_from_init_py(__file__, **overrides)
 								> ```
 								| Name          | Type       | Description                                              |
 								| ------------- | ---------- | -------------------------------------------------------- |
 								| `init_file`   | unicode    | Path to model's `__init__.py`, i.e. `__file__`.          |
 								| `**overrides` | -          | Specific overrides, like pipeline components to disable. |
 								| **RETURNS**   | `Language` | `Language` class with the loaded model.                  |
 								### util.get_model_meta {#util.get_model_meta tag="function" new="2"}
 								Get a model's meta.json from a directory path and validate its contents.
 								> #### Example
 								>
 								> ```python
 								> meta = util.get_model_meta("/path/to/model")
 								> ```
 								| Name        | Type             | Description              |
 								| ----------- | ---------------- | ------------------------ |
 								| `path`      | unicode / `Path` | Path to model directory. |
 								| **RETURNS** | dict             | The model's meta data.   |
 								### util.is_package {#util.is_package tag="function"}
 								Check if string maps to a package installed via pip. Mainly used to validate
 								[model packages](/usage/models).
 								> #### Example
 								>
 								> ```python
 								> util.is_package("en_core_web_sm") # True
 								> util.is_package("xyz") # False
 								> ```
 								| Name        | Type    | Description                                  |
 								| ----------- | ------- | -------------------------------------------- |
 								| `name`      | unicode | Name of package.                             |
 								| **RETURNS** | `bool`  | `True` if installed package, `False` if not. |
 								### util.get_package_path {#util.get_package_path tag="function" new="2"}
 								Get path to an installed package. Mainly used to resolve the location of
 								[model packages](/usage/models). Currently imports the package to find its path.
 								> #### Example
 								>
 								> ```python
 								> util.get_package_path("en_core_web_sm")
 								> # /usr/lib/python3.6/site-packages/en_core_web_sm
 								> ```
 								| Name           | Type    | Description                      |
 								| -------------- | ------- | -------------------------------- |
 								| `package_name` | unicode | Name of installed package.       |
 								| **RETURNS**    | `Path`  | Path to model package directory. |
 								### util.is_in_jupyter {#util.is_in_jupyter tag="function" new="2"}
 								Check if user is running spaCy from a [Jupyter](https://jupyter.org) notebook by
 								detecting the IPython kernel. Mainly used for the
 								[`displacy`](/api/top-level#displacy) visualizer.
 								> #### Example
 								>
 								> ```python
 								> html = "<h1>Hello world!</h1>"
 								> if util.is_in_jupyter():
 								>     from IPython.core.display import display, HTML
 								>     display(HTML(html))
 								> ```
 								| Name        | Type | Description                           |
 								| ----------- | ---- | ------------------------------------- |
 								| **RETURNS** | bool | `True` if in Jupyter, `False` if not. |
 								### util.update_exc {#util.update_exc tag="function"}
 								Update, validate and overwrite
 								[tokenizer exceptions](/usage/adding-languages#tokenizer-exceptions). Used to
 								combine global exceptions with custom, language-specific exceptions. Will raise
 								an error if key doesn't match `ORTH` values.
 								> #### Example
 								>
 								> ```python
 								> BASE =  {"a.": [{ORTH: "a."}], ":)": [{ORTH: ":)"}]}
-												Remove LEMMA from exception examples [ci skip]

											
										
										
											2019-09-12 14:26:27 +00:00
+								> NEW = {"a.": [{ORTH: "a.", NORM: "all"}]}
-												💫 Update website (#3285)

<!--- Provide a general summary of your changes in the title. -->

## Description

The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.

This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.


### Types of change
enhancement

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.

											
										
										
											2019-02-17 18:31:19 +00:00
+								> exceptions = util.update_exc(BASE, NEW)
-												Remove LEMMA from exception examples [ci skip]

											
										
										
											2019-09-12 14:26:27 +00:00
+								> # {"a.": [{ORTH: "a.", NORM: "all"}], ":)": [{ORTH: ":)"}]}
-												💫 Update website (#3285)

<!--- Provide a general summary of your changes in the title. -->

## Description

The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.

This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.


### Types of change
enhancement

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.

											
										
										
											2019-02-17 18:31:19 +00:00
+								> ```
 								| Name              | Type  | Description                                                     |
 								| ----------------- | ----- | --------------------------------------------------------------- |
 								| `base_exceptions` | dict  | Base tokenizer exceptions.                                      |
 								| `*addition_dicts` | dicts | Exception dictionaries to add to the base exceptions, in order. |
 								| **RETURNS**       | dict  | Combined tokenizer exceptions.                                  |
-												Document regex utilities [ci skip]

											
										
										
											2019-02-24 17:34:10 +00:00
+								### util.compile_prefix_regex {#util.compile_prefix_regex tag="function"}
 								Compile a sequence of prefix rules into a regex object.
 								> #### Example
 								>
 								> ```python
 								> prefixes = ("§", "%", "=", r"\+")
 								> prefix_regex = util.compile_prefix_regex(prefixes)
 								> nlp.tokenizer.prefix_search = prefix_regex.search
 								> ```
 								| Name        | Type                                                          | Description                                                                                                                               |
 								| ----------- | ------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------- |
 								| `entries`   | tuple                                                         | The prefix rules, e.g. [`lang.punctuation.TOKENIZER_PREFIXES`](https://github.com/explosion/spaCy/tree/master/spacy/lang/punctuation.py). |
 								| **RETURNS** | [regex](https://docs.python.org/3/library/re.html#re-objects) | The regex object. to be used for [`Tokenizer.prefix_search`](/api/tokenizer#attributes).                                                  |
 								### util.compile_suffix_regex {#util.compile_suffix_regex tag="function"}
 								Compile a sequence of suffix rules into a regex object.
 								> #### Example
 								>
 								> ```python
 								> suffixes = ("'s", "'S", r"(?<=[0-9])\+")
 								> suffix_regex = util.compile_suffix_regex(suffixes)
 								> nlp.tokenizer.suffix_search = suffix_regex.search
 								> ```
 								| Name        | Type                                                          | Description                                                                                                                               |
 								| ----------- | ------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------- |
 								| `entries`   | tuple                                                         | The suffix rules, e.g. [`lang.punctuation.TOKENIZER_SUFFIXES`](https://github.com/explosion/spaCy/tree/master/spacy/lang/punctuation.py). |
 								| **RETURNS** | [regex](https://docs.python.org/3/library/re.html#re-objects) | The regex object. to be used for [`Tokenizer.suffix_search`](/api/tokenizer#attributes).                                                  |
 								### util.compile_infix_regex {#util.compile_infix_regex tag="function"}
 								Compile a sequence of infix rules into a regex object.
 								> #### Example
 								>
 								> ```python
 								> infixes = ("…", "-", "—", r"(?<=[0-9])[+\-\*^](?=[0-9-])")
 								> infix_regex = util.compile_infix_regex(infixes)
 								> nlp.tokenizer.infix_finditer = infix_regex.finditer
 								> ```
 								| Name        | Type                                                          | Description                                                                                                                             |
 								| ----------- | ------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------- |
 								| `entries`   | tuple                                                         | The infix rules, e.g. [`lang.punctuation.TOKENIZER_INFIXES`](https://github.com/explosion/spaCy/tree/master/spacy/lang/punctuation.py). |
 								| **RETURNS** | [regex](https://docs.python.org/3/library/re.html#re-objects) | The regex object. to be used for [`Tokenizer.infix_finditer`](/api/tokenizer#attributes).                                               |
-												💫 Update website (#3285)

<!--- Provide a general summary of your changes in the title. -->

## Description

The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.

This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.


### Types of change
enhancement

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.

											
										
										
											2019-02-17 18:31:19 +00:00
+								### util.minibatch {#util.minibatch tag="function" new="2"}
 								Iterate over batches of items. `size` may be an iterator, so that batch-size can
 								vary on each step.
 								> #### Example
 								>
 								> ```python
 								> batches = minibatch(train_data)
 								> for batch in batches:
 								>     texts, annotations = zip(*batch)
 								>     nlp.update(texts, annotations)
 								> ```
 								| Name       | Type           | Description                                                                                                                                                                                  |
 								| ---------- | -------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
 								| `items`    | iterable       | The items to batch up.                                                                                                                                                                       |
 								| `size`     | int / iterable | The batch size(s). Use [`util.compounding`](/api/top-level#util.compounding) or [`util.decaying`](/api/top-level#util.decaying) or for an infinite series of compounding or decaying values. |
 								| **YIELDS** | list           | The batches.                                                                                                                                                                                 |
 								### util.compounding {#util.compounding tag="function" new="2"}
 								Yield an infinite series of compounding values. Each time the generator is
 								called, a value is produced by multiplying the previous value by the compound
 								rate.
 								> #### Example
 								>
 								> ```python
 								> sizes = compounding(1., 10., 1.5)
 								> assert next(sizes) == 1.
 								> assert next(sizes) == 1. * 1.5
 								> assert next(sizes) == 1.5 * 1.5
 								> ```
 								| Name       | Type        | Description             |
 								| ---------- | ----------- | ----------------------- |
 								| `start`    | int / float | The first value.        |
 								| `stop`     | int / float | The maximum value.      |
 								| `compound` | int / float | The compounding factor. |
 								| **YIELDS** | int         | Compounding values.     |
 								### util.decaying {#util.decaying tag="function" new="2"}
 								Yield an infinite series of linearly decaying values.
 								> #### Example
 								>
 								> ```python
-												fix(util): fix decaying function output (#3495)

* fix(util): fix decaying function output

* fix(util): better test and adhere to code standards

* fix(util): correct variable name, pytestify test, update website text

											
										
										
											2019-03-28 12:24:47 +00:00
+								> sizes = decaying(10., 1., 0.001)
 								> assert next(sizes) == 10.
 								> assert next(sizes) == 10. - 0.001
 								> assert next(sizes) == 9.999 - 0.001
-												💫 Update website (#3285)

<!--- Provide a general summary of your changes in the title. -->

## Description

The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.

This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.


### Types of change
enhancement

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.

											
										
										
											2019-02-17 18:31:19 +00:00
+								> ```
 								| Name       | Type        | Description          |
 								| ---------- | ----------- | -------------------- |
 								| `start`    | int / float | The first value.     |
 								| `end`      | int / float | The maximum value.   |
 								| `decay`    | int / float | The decaying factor. |
 								| **YIELDS** | int         | The decaying values. |
 								### util.itershuffle {#util.itershuffle tag="function" new="2"}
 								Shuffle an iterator. This works by holding `bufsize` items back and yielding
 								them sometime later. Obviously, this is not unbiased – but should be good enough
-												Docs: bufsize instead of buffsize (#4247)


											
										
										
											2019-09-06 09:11:54 +00:00
+								for batching. Larger `bufsize` means less bias.
-												💫 Update website (#3285)

<!--- Provide a general summary of your changes in the title. -->

## Description

The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.

This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.


### Types of change
enhancement

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.

											
										
										
											2019-02-17 18:31:19 +00:00
 								> #### Example
 								>
 								> ```python
 								> values = range(1000)
 								> shuffled = itershuffle(values)
 								> ```
-												Remove u-strings and fix formatting [ci skip]

											
										
										
											2019-09-12 14:11:15 +00:00
+								| Name       | Type     | Description                         |
 								| ---------- | -------- | ----------------------------------- |
 								| `iterable` | iterable | Iterator to shuffle.                |
 								| `bufsize`  | int      | Items to hold back (default: 1000). |
 								| **YIELDS** | iterable | The shuffled iterator.              |
-												💫 Update website (#3285)

<!--- Provide a general summary of your changes in the title. -->

## Description

The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.

This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.


### Types of change
enhancement

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.

											
										
										
											2019-02-17 18:31:19 +00:00
-												Add util.filter_spans helper (#3686)


											
										
										
											2019-05-08 00:33:40 +00:00
+								### util.filter_spans {#util.filter_spans tag="function" new="2.1.4"}
 								Filter a sequence of [`Span`](/api/span) objects and remove duplicates or
 								overlaps. Useful for creating named entities (where one token can only be part
 								of one entity) or when merging spans with
 								[`Retokenizer.merge`](/api/doc#retokenizer.merge). When spans overlap, the
 								(first) longest span is preferred over shorter spans.
 								> #### Example
 								>
 								> ```python
 								> doc = nlp("This is a sentence.")
 								> spans = [doc[0:2], doc[0:2], doc[0:4]]
 								> filtered = filter_spans(spans)
 								> ```
 								| Name        | Type     | Description          |
 								| ----------- | -------- | -------------------- |
 								| `spans`     | iterable | The spans to filter. |
 								| **RETURNS** | list     | The filtered spans.  |
-												💫 Update website (#3285)

<!--- Provide a general summary of your changes in the title. -->

## Description

The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.

This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.


### Types of change
enhancement

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.

											
										
										
											2019-02-17 18:31:19 +00:00
+								## Compatibility functions {#compat source="spacy/compaty.py"}
 								All Python code is written in an **intersection of Python 2 and Python 3**. This
 								is easy in Cython, but somewhat ugly in Python. Logic that deals with Python or
 								platform compatibility only lives in `spacy.compat`. To distinguish them from
 								the builtin functions, replacement functions are suffixed with an underscore,
-												Tidy up and improve docs and docstrings (#3370)

<!--- Provide a general summary of your changes in the title. -->

## Description
* tidy up and adjust Cython code to code style
* improve docstrings and make calling `help()` nicer
* add URLs to new docs pages to docstrings wherever possible, mostly to user-facing objects
* fix various typos and inconsistencies in docs

### Types of change
enhancement, docs

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.

											
										
										
											2019-03-08 10:42:26 +00:00
+								e.g. `unicode_`.
-												💫 Update website (#3285)

<!--- Provide a general summary of your changes in the title. -->

## Description

The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.

This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.


### Types of change
enhancement

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.

											
										
										
											2019-02-17 18:31:19 +00:00
 								> #### Example
 								>
 								> ```python
 								> from spacy.compat import unicode_
 								>
 								> compatible_unicode = unicode_("hello world")
 								> ```
 								| Name                 | Python 2                           | Python 3    |
 								| -------------------- | ---------------------------------- | ----------- |
 								| `compat.bytes_`      | `str`                              | `bytes`     |
 								| `compat.unicode_`    | `unicode`                          | `str`       |
 								| `compat.basestring_` | `basestring`                       | `str`       |
 								| `compat.input_`      | `raw_input`                        | `input`     |
 								| `compat.path2str`    | `str(path)` with `.decode('utf8')` | `str(path)` |
-												Tidy up and improve docs and docstrings (#3370)

<!--- Provide a general summary of your changes in the title. -->

## Description
* tidy up and adjust Cython code to code style
* improve docstrings and make calling `help()` nicer
* add URLs to new docs pages to docstrings wherever possible, mostly to user-facing objects
* fix various typos and inconsistencies in docs

### Types of change
enhancement, docs

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.

											
										
										
											2019-03-08 10:42:26 +00:00
+								### compat.is_config {#compat.is_config tag="function"}
-												💫 Update website (#3285)

<!--- Provide a general summary of your changes in the title. -->

## Description

The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.

This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.


### Types of change
enhancement

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.

											
										
										
											2019-02-17 18:31:19 +00:00
 								Check if a specific configuration of Python version and operating system matches
 								the user's setup. Mostly used to display targeted error messages.
 								> #### Example
 								>
 								> ```python
 								> from spacy.compat import is_config
 								>
 								> if is_config(python2=True, windows=True):
 								>     print("You are using Python 2 on Windows.")
 								> ```
 								| Name        | Type | Description                                                      |
 								| ----------- | ---- | ---------------------------------------------------------------- |
 								| `python2`   | bool | spaCy is executed with Python 2.x.                               |
 								| `python3`   | bool | spaCy is executed with Python 3.x.                               |
 								| `windows`   | bool | spaCy is executed on Windows.                                    |
 								| `linux`     | bool | spaCy is executed on Linux.                                      |
 								| `osx`       | bool | spaCy is executed on OS X or macOS.                              |
 								| **RETURNS** | bool | Whether the specified configuration matches the user's platform. |