spaCy/website/usage/_facts-figures/_other-libraries.jade

//- 💫 DOCS > USAGE > FACTS & FIGURES > OTHER LIBRARIES

p
    |  Data scientists, researchers and machine learning engineers have
    |  converged on Python as the language for AI. This gives developers a rich
    |  ecosystem of NLP libraries to work with. Here's how we think the pieces
    |  fit together.

+aside("Using spaCy with other libraries")
    |  For details on how to use spaCy together with popular machine learning
    |  libraries like TensorFlow, Keras or PyTorch, see the
    |  #[+a("/usage/deep-learning") usage guide on deep learning].

+infobox
    +infobox-logos(["nltk", 80, 25, "http://nltk.org"])
    |  #[+label-inline NLTK] offers some of the same functionality as spaCy.
    |  Although originally developed for teaching and research, its longevity
    |  and stability has resulted in a large number of industrial users. It's
    |  the main alternative to spaCy for tokenization and sentence segmentation.
    |  In comparison to spaCy, NLTK takes a much more "broad church" approach –
    |  so it has some functions that spaCy doesn't provide, at the expense of a
    |  bit more clutter to sift through. spaCy is also much more
    |  performance-focussed than NLTK: where the two libraries provide the same
    |  functionality, spaCy's implementation will usually be faster and more
    |  accurate.

+infobox
    +infobox-logos(["gensim", 40, 40, "https://radimrehurek.com/gensim/"])
    |  #[+label-inline Gensim] provides unsupervised text modelling algorithms.
    |  Although Gensim isn't a runtime dependency of spaCy, we use it to train
    |  word vectors. There's almost no overlap between the libraries – the two
    |  work together.

+infobox
    +infobox-logos(["tensorflow", 35, 42, "https://www.tensorflow.org"], ["keras", 45, 45, "https://www.keras.io"])
    |  #[+label-inline Tensorflow / Keras] is the most popular deep learning library.
    |  spaCy provides efficient and powerful feature extraction functionality,
    |  that can be used as a pre-process to any deep learning library. You can
    |  also use Tensorflow and Keras to create spaCy pipeline components, to add
    |  annotations to the #[code Doc] object.

+infobox
    +infobox-logos(["scikitlearn", 90, 44, "http://scikit-learn.org"])
    |  #[+label-inline scikit-learn] features a number of useful NLP functions,
    |  especially for solving text classification problems using linear models
    |  with bag-of-words features. If you know you need exactly that, it might
    |  be better to use scikit-learn's built-in pipeline directly. However, if
    |  you want to extract more detailed features, using part-of-speech tags,
    |  named entity labels, or string transformations, you can use spaCy as a
    |  pre-process in your classification system. scikit-learn also provides a
    |  lot of experiment management and evaluation utilities that people use
    |  alongside spaCy.

+infobox
    +infobox-logos(["pytorch", 100, 48, "http://pytorch.org"], ["dynet", 80, 34, "http://dynet.readthedocs.io/"], ["chainer", 80, 43, "http://chainer.org"])
    |  #[+label-inline PyTorch, DyNet and Chainer] are dynamic neural network
    |  libraries, which can be much easier to work with for NLP. Outside of
    |  Google, there's a general shift among NLP researchers to both DyNet and
    |  Pytorch. spaCy is the front-end of choice for PyTorch's
    |  #[code torch.text] extension. You can use any of these libraries to
    |  create spaCy pipeline components, to add annotations to the #[code Doc]
    |  object.

+infobox
    +infobox-logos(["allennlp", 124, 22, "http://allennlp.org"])
    |  #[+label-inline AllenNLP] is a new library designed to accelerate NLP
    |  research, by providing a framework that supports modern deep learning
    |  workflows for cutting-edge language understanding problems. AllenNLP uses
    |  spaCy as a preprocessing component. You can also use AllenNLP to develop
    |  spaCy pipeline components, to add annotations to the #[code Doc] object.