mirror of https://github.com/explosion/spaCy.git
37 lines
2.5 KiB
Plaintext
37 lines
2.5 KiB
Plaintext
A named entity is a "real-world object" that's assigned a name – for example, a
|
||
person, a country, a product or a book title. spaCy can **recognize various
|
||
types of named entities in a document, by asking the model for a prediction**.
|
||
Because models are statistical and strongly depend on the examples they were
|
||
trained on, this doesn't always work _perfectly_ and might need some tuning
|
||
later, depending on your use case.
|
||
|
||
Named entities are available as the `ents` property of a `Doc`:
|
||
|
||
```python {executable="true"}
|
||
import spacy
|
||
|
||
nlp = spacy.load("en_core_web_sm")
|
||
doc = nlp("Apple is looking at buying U.K. startup for $1 billion")
|
||
|
||
for ent in doc.ents:
|
||
print(ent.text, ent.start_char, ent.end_char, ent.label_)
|
||
```
|
||
|
||
> - **Text:** The original entity text.
|
||
> - **Start:** Index of start of entity in the `Doc`.
|
||
> - **End:** Index of end of entity in the `Doc`.
|
||
> - **Label:** Entity label, i.e. type.
|
||
|
||
| Text | Start | End | Label | Description |
|
||
| ----------- | :---: | :-: | ------- | ---------------------------------------------------- |
|
||
| Apple | 0 | 5 | `ORG` | Companies, agencies, institutions. |
|
||
| U.K. | 27 | 31 | `GPE` | Geopolitical entity, i.e. countries, cities, states. |
|
||
| \$1 billion | 44 | 54 | `MONEY` | Monetary values, including unit. |
|
||
|
||
Using spaCy's built-in [displaCy visualizer](/usage/visualizers), here's what
|
||
our example sentence and its named entities look like:
|
||
|
||
<Standalone height={120}>
|
||
<div style={{lineHeight: 2.5, fontFamily: "-apple-system, BlinkMacSystemFont, 'Segoe UI', Helvetica, Arial, sans-serif, 'Apple Color Emoji', 'Segoe UI Emoji', 'Segoe UI Symbol'", fontSize: 18}}><mark style={{ background: '#7aecec', padding: '0.45em 0.6em', margin: '0 0.25em', lineHeight: 1, borderRadius: '0.35em'}}>Apple <span style={{ fontSize: '0.8em', fontWeight: 'bold', lineHeight: 1, borderRadius: '0.35em', marginLeft: '0.5rem'}}>ORG</span></mark> is looking at buying <mark style={{ background: '#feca74', padding: '0.45em 0.6em', margin: '0 0.25em', lineHeight: 1, borderRadius: '0.35em'}}>U.K. <span style={{ fontSize: '0.8em', fontWeight: 'bold', lineHeight: 1, borderRadius: '0.35em', marginLeft: '0.5rem'}}>GPE</span></mark> startup for <mark style={{ background: '#e4e7d2', padding: '0.45em 0.6em', margin: '0 0.25em', lineHeight: 1, borderRadius: '0.35em'}}>$1 billion <span style={{ fontSize: '0.8em', fontWeight: 'bold', lineHeight: 1, borderRadius: '0.35em', marginLeft: '0.5rem'}}>MONEY</span></mark></div>
|
||
</Standalone>
|