mirror of https://github.com/explosion/spaCy.git
3.8 KiB
3.8 KiB
title | teaser | next | menu | ||||||
---|---|---|---|---|---|---|---|---|---|
Facts & Figures | The hard numbers for spaCy and how it compares to other tools | /usage/spacy-101 |
|
Comparison
When should I use spaCy?
Use Cases |
---|
✅ I'm a beginner and just getting started with NLP. spaCy makes it easy to get started and comes with extensive documentation, including a beginner-friendly 101 guide and a free interactive online course. |
✅ I want to build an end-to-end production application. |
✅ I want my application to be efficient on CPU. While spaCy lets you train modern NLP models that are best run on GPU, it also offers CPU-optimized pipelines, which may be less accurate but much cheaper to run. |
✅ I want to try out different neural network architectures for NLP. |
❌ I want to build a language generation application. spaCy's focus is natural language processing and extracting information from large volumes of text. While you can use it to help you re-write existing text, it doesn't include any specific functionality for language generation tasks. |
❌ I want to research machine learning algorithms. |
Benchmarks
spaCy v3.0 introduces transformer-based pipelines that bring spaCy's accuracy right up to current state-of-the-art. You can also use a CPU-optimized pipeline, which is less accurate but much cheaper to run.
Evaluation details
- OntoNotes 5.0: spaCy's English models are trained on this corpus, as it's several times larger than other English treebanks. However, most systems do not report accuracies on it.
- Penn Treebank: The "classic" parsing evaluation for research. However, it's quite far removed from actual usage: it uses sentences with gold-standard segmentation and tokenization, from a pretty specific type of text (articles from a single newspaper, 1984-1989).
import Benchmarks from 'usage/_benchmarks-models.md'
The easiest way to reproduce spaCy's benchmarks on the Penn Treebank is to clone our project template.