spaCy/examples/keras_parikh_entailment
avinash b379c9d7d3 typos corrected 2018-01-03 16:54:22 +05:30
..
README.md typos corrected 2018-01-03 16:54:22 +05:30
__main__.py
keras_decomposable_attention.py Corretions for model test example 2017-05-03 22:41:23 +08:00
spacy_hook.py Set max_length to 100 for demo and evaluate 2017-04-05 16:48:35 +05:30

README.md

A decomposable attention model for Natural Language Inference

by Matthew Honnibal, @honnibal

⚠️ IMPORTANT NOTE: This example is currently only compatible with spaCy v1.x. We're working on porting the example over to Keras v2.x and spaCy v2.x. See #1445 for details contributions welcome!

This directory contains an implementation of the entailment prediction model described by Parikh et al. (2016). The model is notable for its competitive performance with very few parameters.

The model is implemented using Keras and spaCy. Keras is used to build and train the network. spaCy is used to load the GloVe vectors, perform the feature extraction, and help you apply the model at run-time. The following demo code shows how the entailment model can be used at runtime, once the hook is installed to customise the .similarity() method of spaCy's Doc and Span objects:

def demo(model_dir):
    nlp = spacy.load('en', path=model_dir,
            create_pipeline=create_similarity_pipeline)
    doc1 = nlp(u'Worst fries ever! Greasy and horrible...')
    doc2 = nlp(u'The milkshakes are good. The fries are bad.')
    print(doc1.similarity(doc2))
    sent1a, sent1b = doc1.sents
    print(sent1a.similarity(sent1b))
    print(sent1a.similarity(doc2))
    print(sent1b.similarity(doc2))

I'm working on a blog post to explain Parikh et al.'s model in more detail. I think it is a very interesting example of the attention mechanism, which I didn't understand very well before working through this paper. There are lots of ways to extend the model.

What's where

File Description
__main__.py The script that will be executed. Defines the CLI, the data reading, etc — all the boring stuff.
spacy_hook.py Provides a class SimilarityShim that lets you use an arbitrary function to customize spaCy's doc.similarity() method. Instead of the default average-of-vectors algorithm, when you call doc1.similarity(doc2), you'll get the result of your_model(doc1, doc2).
keras_decomposable_attention.py Defines the neural network model.

Setting up

First, install Keras, spaCy and the spaCy English models (about 1GB of data):

pip install https://github.com/fchollet/keras/archive/1.2.2.zip
pip install spacy
python -m spacy.en.download

⚠️ Important: In order for the example to run, you'll need to install Keras from the 1.2.2 release (and not via pip install keras). For more info on this, see #727.

You'll also want to get Keras working on your GPU. This will depend on your set up, so you're mostly on your own for this step. If you're using AWS, try the NVidia AMI. It made things pretty easy.

Once you've installed the dependencies, you can run a small preliminary test of the Keras model:

py.test keras_parikh_entailment/keras_decomposable_attention.py

This compiles the model and fits it with some dummy data. You should see that both tests passed.

Finally, download the Stanford Natural Language Inference corpus.

Running the example

You can run the keras_parikh_entailment/ directory as a script, which executes the file keras_parikh_entailment/__main__.py. The first thing you'll want to do is train the model:

python keras_parikh_entailment/ train <train_directory> <dev_directory>

Training takes about 300 epochs for full accuracy, and I haven't rerun the full experiment since refactoring things to publish this example — please let me know if I've broken something. You should get to at least 85% on the development data.

The other two modes demonstrate run-time usage. I never like relying on the accuracy printed by .fit() methods. I never really feel confident until I've run a new process that loads the model and starts making predictions, without access to the gold labels. I've therefore included an evaluate mode. Finally, there's also a little demo, which mostly exists to show you how run-time usage will eventually look.

Getting updates

We should have the blog post explaining the model ready before the end of the week. To get notified when it's published, you can either follow me on Twitter or subscribe to our mailing list.