spaCy/website/docs/api/sentencesegmenter.md

---
title: SentenceSegmenter
tag: class
source: spacy/pipeline/hooks.py
---

A simple spaCy hook, to allow custom sentence boundary detection logic that
doesn't require the dependency parse. By default, sentence segmentation is
performed by the [`DependencyParser`](/api/dependencyparser), so the
`SentenceSegmenter` lets you implement a simpler, rule-based strategy that
doesn't require a statistical model to be loaded. The component is also
available via the string name `"sentencizer"`. After initialization, it is
typically added to the processing pipeline using
[`nlp.add_pipe`](/api/language#add_pipe).

## SentenceSegmenter.\_\_init\_\_ {#init tag="method"}

Initialize the sentence segmenter. To change the sentence boundary detection
strategy, pass a generator function `strategy` on initialization, or assign a
new strategy to the `.strategy` attribute. Sentence detection strategies should
be generators that take `Doc` objects and yield `Span` objects for each
sentence.

> #### Example
>
> ```python
> # Construction via create_pipe
> sentencizer = nlp.create_pipe("sentencizer")
>
> # Construction from class
> from spacy.pipeline import SentenceSegmenter
> sentencizer = SentenceSegmenter(nlp.vocab)
> ```

| Name        | Type                | Description                                                 |
| ----------- | ------------------- | ----------------------------------------------------------- |
| `vocab`     | `Vocab`             | The shared vocabulary.                                      |
| `strategy`  | unicode / callable  | The segmentation strategy to use. Defaults to `"on_punct"`. |
| **RETURNS** | `SentenceSegmenter` | The newly constructed object.                               |

## SentenceSegmenter.\_\_call\_\_ {#call tag="method"}

Apply the sentence segmenter on a `Doc`. Typically, this happens automatically
after the component has been added to the pipeline using
[`nlp.add_pipe`](/api/language#add_pipe).

> #### Example
>
> ```python
> from spacy.lang.en import English
>
> nlp = English()
> sentencizer = nlp.create_pipe("sentencizer")
> nlp.add_pipe(sentencizer)
> doc = nlp(u"This is a sentence. This is another sentence.")
> assert list(doc.sents) == 2
> ```

| Name        | Type  | Description                                                  |
| ----------- | ----- | ------------------------------------------------------------ |
| `doc`       | `Doc` | The `Doc` object to process, e.g. the `Doc` in the pipeline. |
| **RETURNS** | `Doc` | The modified `Doc` with added sentence boundaries.           |

## SentenceSegmenter.split_on_punct {#split_on_punct tag="staticmethod"}

Split the `Doc` on punctuation characters `.`, `!` and `?`. This is the default
strategy used by the `SentenceSegmenter.`

| Name       | Type   | Description                    |
| ---------- | ------ | ------------------------------ |
| `doc`      | `Doc`  | The `Doc` object to process.   |
| **YIELDS** | `Span` | The sentences in the document. |

## Attributes {#attributes}

| Name       | Type     | Description                                                         |
| ---------- | -------- | ------------------------------------------------------------------- |
| `strategy` | callable | The segmentation strategy. Can be overwritten after initialization. |