Commit Graph

337 Commits

Author SHA1 Message Date
mehrad a5b885ecd4 fix 2019-11-06 01:47:28 -08:00
mehrad 3addf7da94 fix 2019-11-06 01:18:08 -08:00
mehrad 1339767da3 update FastText embedding 2019-11-06 01:09:55 -08:00
Giovanni Campagna 4e10b5240b Fix downloading fasttext vectors 2019-11-05 21:54:06 -08:00
Giovanni Campagna 06eca199d7
Merge pull request #19 from stanford-oval/wip/dockerupdate
docker: Switch to ubi8 (RHEL 8)
2019-11-04 16:22:35 -08:00
Giovanni Campagna 9cdb85abbb
Merge pull request #20 from stanford-oval/wip/i18n
Support word embeddings for arbitrary languages
2019-11-03 22:10:16 -08:00
Giovanni Campagna 2973165de0 util: load "locale" parameter from config.json
And if not present, set it to "en"
2019-11-03 21:25:57 -08:00
Giovanni Campagna ba4c963176 fasttext: use common-crawl not wiki vectors
They are trained on a larger corpus and thus better
2019-11-01 17:38:38 -07:00
Giovanni Campagna d5aacba674 embeddings: allow passing full locale tags as --locale
This way, we don't need to do anything too special in Genie,
and we can call decanlp with --locale zh-tw or --locale zh-cn
if needed to distinguish
2019-11-01 17:36:36 -07:00
Giovanni Campagna d46353d352 Add fast-text support to cache-embeddings command 2019-11-01 17:30:59 -07:00
Giovanni Campagna 026e481df9 Add support for arbitrary word embeddings
FastText has all the word embeddings we care about anyway
2019-11-01 17:27:35 -07:00
Giovanni Campagna 668cc6f917 docker: Switch to ubi8 (RHEL 8)
ubi7 was updated to rhel 7.7, which should contain python36 natively,
but python3 is not part of the ubi repos. Luckily, it is part of the
ubi8 repos instead.
2019-11-01 17:08:06 -07:00
Giovanni Campagna a546c7b62c Fix cuda image name 2019-08-31 02:22:04 +02:00
Giovanni Campagna 28e8296032
Merge pull request #17 from stanford-oval/wip/docker
Remove obsolete Dockerfiles, and replace with a new one
2019-08-31 00:36:44 +02:00
Giovanni Campagna dcb4f020ae docker: fix hooks
They are executed from the directory containing the Dockerfile
2019-08-30 23:13:28 +02:00
Giovanni Campagna 86a51de115 MQAN: fix decoding with out of vocabulary words
If the vocabulary was limited due to max_generative_words and we
encounter an OOV word, we need to map it back to the full index
in GloVe, using the limited_idx_to_full_idx map set by the server
dynamically.
2019-08-30 21:39:17 +02:00
Giovanni Campagna 1c0a71aa0b Add debugging to docker hooks
To see exactly how they get called...
2019-08-30 21:37:42 +02:00
Giovanni Campagna fb72c84f73 docker: move embeddings to a shared directory
This way, the image can be used as a base image by almond-cloud
(which runs as a different user)
2019-08-30 21:25:54 +02:00
Giovanni Campagna ff311bd35b Add docker hub hooks to build both CPU and GPU images 2019-08-30 21:01:47 +02:00
Giovanni Campagna a9264a699d Remove obsolete Dockerfiles, and replace with a new one
A new one that wraps the new decanlp command and installs all
the dependencies correctly.
2019-08-30 18:55:28 +02:00
Giovanni Campagna 48b88317b7 Refuse to run with pytorch 1.2.0
Because pytorch 1.2.0 changed the behavior of Bools vs Uint8 and
that broke us...
2019-08-08 16:07:48 -07:00
Giovanni Campagna f5ea63ecb0
Merge pull request #15 from stanford-oval/wip/mehrad/multi-language
Wip/mehrad/multi language
2019-05-30 08:37:37 -07:00
mehrad 21db011ad2 fixes 2019-05-29 20:57:09 -07:00
mehrad 4919b28627 fixes 2019-05-29 19:38:24 -07:00
mehrad 672aa14117 merge branch wip/mehrad/multi_lingual 2019-05-29 18:43:10 -07:00
mehrad 4004664259 fixes 2019-05-29 16:42:36 -07:00
mehrad a27c3a8d8b bug fixes 2019-05-29 13:37:18 -07:00
mehrad d52d862310 fixing bugs 2019-05-29 12:18:57 -07:00
mehrad eb10a788b0 updates 2019-05-28 18:12:31 -07:00
mehrad bb90a35bc0 updates 2019-05-28 18:11:32 -07:00
mehrad 612e3bdd4d output context sentneces as well for predict.py 2019-05-28 17:43:10 -07:00
mehrad 85c7a99ec2 bunch of updates 2019-05-22 14:04:16 -07:00
mehrad 90308a84e3 minor fix 2019-05-20 17:50:50 -07:00
mehrad acfcbf88c4 updating prediction scripts 2019-05-20 17:43:45 -07:00
mehrad 7db36a90af fixes 2019-05-20 14:29:25 -07:00
mehrad ec77faacca Glueing the models together
end-to-end finetuning works now
2019-05-20 13:41:59 -07:00
mehrad 9bf2217324 adding arguments 2019-05-20 11:02:42 -07:00
Giovanni Campagna 25db020107 Reduce memory usage while loading almond datasets
Don't load all lines in memory
2019-05-15 09:27:00 -07:00
Giovanni Campagna 488a4feb64 Fix contextual almond 2019-05-15 09:26:54 -07:00
mehrad df94f7dd3a ignore malformed sentences 2019-05-13 14:50:08 -07:00
mehrad 0495d4d0eb Updates
1) use FastText for encoding persian text
2) let the user choose the question for almond task
3) bug fixes
2019-05-13 13:03:51 -07:00
Giovanni Campagna 27a3a8b173
Merge pull request #14 from stanford-oval/wip/contextual
Contextual Almond
2019-05-10 09:54:51 -07:00
mehrad 925d839e15 adding an end-to-end combined model 2019-04-29 13:56:39 -07:00
Giovanni Campagna 46eaae8ba8 fix almond dataset name 2019-04-23 09:35:57 -07:00
Giovanni Campagna 64020a497f Add ContextualAlmond task
Its training files have four columns: <id> <context> <sentence> <target_program>
2019-04-23 09:30:49 -07:00
Giovanni Campagna 6837cc7e7b Add missing dependency
Probably before it was coming from somewhere else, like allennlp
2019-04-17 12:10:45 -07:00
Giovanni Campagna 73ffec5365 Populate "install_requires" package metadata
This is necessary to automatically install dependencies when
the user installs the library with pip.
2019-04-17 11:40:40 -07:00
Giovanni Campagna 68e76f7990 Remove unused dependencies
These are not used anywhere I can see.
2019-04-17 11:39:54 -07:00
Giovanni Campagna 13e1c0335e Load allenlp, cove libraries lazily
These libraries are only needed if one passes --elmo or --cove
on the command line. They are annoyingly big libraries, so
it makes sense to keep them optional.
2019-04-17 11:39:15 -07:00
mehrad 19067c71ba option to retrain encoder embeddings 2019-04-15 16:11:10 -07:00