GenieNLP: A versatile codebase for any NLP task

deep-learning dialogue natural-language-processing nlp paraphrasing question-answering semantic-parsing seq2seq starred-repo starred-stanford-oval-repo translation

Go to file

Giovanni Campagna b39fe2ea6a v0.1.1		2020-01-29 10:23:44 -08:00
dockerfiles	Update dockerfile	2020-01-28 18:31:14 -08:00
genienlp	Make radam optional	2020-01-29 10:15:25 -08:00
local_data	Create schema.txt	2018-06-21 14:30:58 -07:00
tests	Rename the python package to genienlp	2020-01-28 18:06:50 -08:00
.gitignore	Update dockerfile	2020-01-28 18:31:14 -08:00
.travis.yml	Simplify travis	2020-01-29 09:37:09 -08:00
HISTORY	v0.1.1	2020-01-29 10:23:44 -08:00
LICENSE	Rebrand as genienlp	2020-01-28 18:06:51 -08:00
Pipfile	Make radam optional	2020-01-29 10:15:25 -08:00
Pipfile.lock	Make radam optional	2020-01-29 10:15:25 -08:00
README.md	Fix readme badges	2020-01-29 08:32:11 -08:00
setup.py	v0.1.1	2020-01-29 10:23:44 -08:00

README.md

Genie NLP library

This library contains the NLP models for the Genie toolkit for virtual assistants. It is derived from the decaNLP library by Salesforce, but has diverged significantly.

The library is suitable for all NLP tasks that can be framed as Contextual Question Answering, that is, with 3 inputs:

text or structured input as context
text input as question
text or structured output as answer

As the decaNLP paper shows, many different NLP tasks can be framed in this way. Genie primarily uses the library for semantic parsing, dialogue state tracking, and natural language generation given a formal dialogue state, and this is what the models work best for.

Installation

genienlp is available on PyPi. You can install it with:

pip3 install genienlp

After installation, a genienlp command becomes available.

Likely, you will also want to download the word embeddings ahead of time:

genienlp cache-embeddings --embeddings glove+char -d <embeddingdir>

Usage

Train a model:

genienlp train --tasks almond --train_iterations 50000 --embeddings <embeddingdir> --data <datadir> --save <modeldir>

Generate predictions:

genienlp predict --tasks almond --data <datadir> --path <modeldir>

See genienlp --help for details.

Citation

If you use the MultiTask Question Answering model in your work, please cite The Natural Language Decathlon: Multitask Learning as Question Answering.

@article{McCann2018decaNLP,
  title={The Natural Language Decathlon: Multitask Learning as Question Answering},
  author={Bryan McCann and Nitish Shirish Keskar and Caiming Xiong and Richard Socher},
  journal={arXiv preprint arXiv:1806.08730},
  year={2018}
}

If you use the BERT-LSTM model (Identity encoder + MQAN decoder), please cite Schema2QA: Answering Complex Queries on the Structured Web with a Neural Model

@article{Xu2020Schema2QA,
  title={Schema2QA: Answering Complex Queries on the Structured Web with a Neural Model},
  author={Silei Xu and Giovanni Campagna and Jian Li and Monica S. Lam},
  journal={arXiv preprint arXiv:2001.05609},
  year={2020}
}