spaCy/website/usage/_facts-figures/_other-libraries.jade

71 lines
3.9 KiB
Plaintext
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

//- 💫 DOCS > USAGE > FACTS & FIGURES > OTHER LIBRARIES
p
| Data scientists, researchers and machine learning engineers have
| converged on Python as the language for AI. This gives developers a rich
| ecosystem of NLP libraries to work with. Here's how we think the pieces
| fit together.
//-+aside("Using spaCy with other libraries")
| For details on how to use spaCy together with popular machine learning
| libraries like TensorFlow, Keras or PyTorch, see the
| #[+a("/usage/deep-learning") usage guide on deep learning].
+infobox
+infobox-logos(["nltk", 80, 25, "http://nltk.org"])
| #[+label-inline NLTK] offers some of the same functionality as spaCy.
| Although originally developed for teaching and research, its longevity
| and stability has resulted in a large number of industrial users. It's
| the main alternative to spaCy for tokenization and sentence segmentation.
| In comparison to spaCy, NLTK takes a much more "broad church" approach
| so it has some functions that spaCy doesn't provide, at the expense of a
| bit more clutter to sift through. spaCy is also much more
| performance-focussed than NLTK: where the two libraries provide the same
| functionality, spaCy's implementation will usually be faster and more
| accurate.
+infobox
+infobox-logos(["gensim", 40, 40, "https://radimrehurek.com/gensim/"])
| #[+label-inline Gensim] provides unsupervised text modelling algorithms.
| Although Gensim isn't a runtime dependency of spaCy, we use it to train
| word vectors. There's almost no overlap between the libraries the two
| work together.
+infobox
+infobox-logos(["tensorflow", 35, 42, "https://www.tensorflow.org"], ["keras", 45, 45, "https://www.keras.io"])
| #[+label-inline Tensorflow / Keras] is the most popular deep learning library.
| spaCy provides efficient and powerful feature extraction functionality,
| that can be used as a pre-process to any deep learning library. You can
| also use Tensorflow and Keras to create spaCy pipeline components, to add
| annotations to the #[code Doc] object.
+infobox
+infobox-logos(["scikitlearn", 90, 44, "http://scikit-learn.org"])
| #[+label-inline scikit-learn] features a number of useful NLP functions,
| especially for solving text classification problems using linear models
| with bag-of-words features. If you know you need exactly that, it might
| be better to use scikit-learn's built-in pipeline directly. However, if
| you want to extract more detailed features, using part-of-speech tags,
| named entity labels, or string transformations, you can use spaCy as a
| pre-process in your classification system. scikit-learn also provides a
| lot of experiment management and evaluation utilities that people use
| alongside spaCy.
+infobox
+infobox-logos(["pytorch", 100, 48, "http://pytorch.org"], ["dynet", 80, 34, "http://dynet.readthedocs.io/"], ["chainer", 80, 43, "http://chainer.org"])
| #[+label-inline PyTorch, DyNet and Chainer] are dynamic neural network
| libraries, which can be much easier to work with for NLP. Outside of
| Google, there's a general shift among NLP researchers to both DyNet and
| Pytorch. spaCy is the front-end of choice for PyTorch's
| #[code torch.text] extension. You can use any of these libraries to
| create spaCy pipeline components, to add annotations to the #[code Doc]
| object.
+infobox
+infobox-logos(["allennlp", 124, 22, "http://allennlp.org"])
| #[+label-inline AllenNLP] is a new library designed to accelerate NLP
| research, by providing a framework that supports modern deep learning
| workflows for cutting-edge language understanding problems. AllenNLP uses
| spaCy as a preprocessing component. You can also use AllenNLP to develop
| spaCy pipeline components, to add annotations to the #[code Doc] object.