Commit Graph

186 Commits

Author SHA1 Message Date
Giovanni Campagna ad1a15637a Fix missing import 2019-03-01 16:09:58 -08:00
Giovanni Campagna 0866d8a4e8 Add more requirements 2019-03-01 16:08:58 -08:00
Giovanni Campagna 4bddbada45 Fix typo 2019-03-01 16:08:52 -08:00
Giovanni Campagna ac3ac8c680 Add Stanford copyright to all files that we touched 2019-03-01 15:54:54 -08:00
Giovanni Campagna 8e2b519ac3 Add copyright notices to all files
Makes the license clear and explicit
2019-03-01 15:51:45 -08:00
Giovanni Campagna 99e39f9528 Move generic dataset outside of torchtext
There is no reason for it to live inside torchtext, and having it
outside will help with using torchtext as a library
2019-03-01 15:47:11 -08:00
Giovanni Campagna afd39c8660 Add requirements.txt file 2019-03-01 15:43:02 -08:00
Giovanni Campagna 5447d0c37c Add a "decanlp" script that calls out to the different subcommands
Usage:
- decanlp train ...
- decanlp predict ...
- decanlp convert-to-logical-forms ...
2019-03-01 15:43:02 -08:00
Giovanni Campagna a5a203b099 Add setup.py 2019-03-01 15:43:02 -08:00
Giovanni Campagna 41b80bb4f4 Move all python files to a decanlp/ package
As per python conventions
2019-03-01 15:43:02 -08:00
Giovanni Campagna a44bde6693 Don't import unused tensorflow 2019-03-01 15:22:02 -08:00
mehrad 35fe30272b satisfy travis 2019-03-01 15:08:31 -08:00
Mehrad Moradshahi cf534b3523
Merge pull request #4 from Stanford-Mobisocial-IoT-Lab/wip/gcampax/cleanups
Cleanups, server support
2019-03-01 14:35:29 -08:00
Giovanni Campagna c66dde4ca0 Make the server actually work 2019-03-01 12:33:18 -08:00
Giovanni Campagna a0ff18f1fe Remove too many redundant tensorboard scalars
Clutters the UI, wastes disk space and cycles
2019-03-01 12:33:18 -08:00
Giovanni Campagna ca91b6e24f Clean up some really ugly code in predict.py and server.py 2019-03-01 12:33:18 -08:00
Giovanni Campagna 61f93aca9b train: compute the best validation metric during training, and use it for model selection
If a model is found to be better than the previous one, save it
as "best.pth"

Model selection should happen at training time rather than validation
time so we can safely discard checkpoints after training (because
they take too much space)
2019-03-01 12:33:18 -08:00
Giovanni Campagna d4b35d7ae6 train: don't save per-worker checkpoints if we're not doing distributed training
Saves disk space
2019-03-01 10:51:12 -08:00
Giovanni Campagna a36f2efb8c Add a simple json-based RPC server to handle interactive predictions 2019-03-01 10:46:16 -08:00
Giovanni Campagna 9a9b77ec17 Almond: load data from the TSV files directly
Almond is not a translation task, it's a semantic parsing task.
And it has its own established on-disk file format, and we should
respect that, instead of messing with .tt files
2019-03-01 09:36:14 -08:00
Giovanni Campagna 73d94f187b SummaryWriter: replace add_scalars with add_scalar
tensorboardX's add_scalars is bonkers, and will create a different
file for each tag, which makes the tensorboard completely unreadable
(See https://github.com/lanpa/tensorboardX/issues/366 )

Instead, use add_scalar(), which puts everything in the same file,
as tensorboard is designed to do.
2019-03-01 08:44:06 -08:00
Giovanni Campagna ae4bbfea3a arguments: remove bad ugly log dir messing
Stick with what the user says as the --save directory. If the user
needs, they can provide the timestamp on the command line.
2019-03-01 08:30:35 -08:00
mehrad 7738b66e72 updates 2019-02-28 19:52:13 -08:00
mehrad d2656bb53a update post_process_decoded_results.py 2019-02-27 12:01:42 -08:00
mehrad 26f57a3059 don't shuffle dataset during prediction 2019-02-27 11:17:38 -08:00
mehrad eff58ba1eb merge updates 2019-02-27 10:54:01 -08:00
mehrad 2ef63c660e export results to csv files 2019-02-27 10:52:38 -08:00
mehrad fede42e471 add cached option for predict 2019-02-20 11:22:32 -08:00
mehrad b8f4c1b0e5 fix max-margin loss implementation 2019-02-20 11:06:49 -08:00
mehrad f47da57330 fix bug for joining paths 2019-02-19 16:21:34 -08:00
mehrad ad5c9c6dc1 save cached files to an assigned path
useful when you don't have write permissions to dataset directory
2019-02-19 15:55:20 -08:00
mehrad f3545ad9d3 add max-margin loss 2019-02-19 13:48:52 -08:00
Mehrad Moradshahi f2335e456b
Merge pull request #1 from Stanford-Mobisocial-IoT-Lab/wip/field
Remove hacks needed to use a different tokenizer for Almond
2019-01-24 10:44:33 -08:00
mehrad e4afb21928 minor fix 2019-01-23 19:13:26 -08:00
mehrad 8c54dc1391 updates and fixes 2019-01-23 16:41:37 -08:00
Giovanni Campagna fdfdd154c4 Remove hacks needed to use a different tokenizer for Almond
Almond data is always pretokenized (because we must preprocess
numbers/quoted strings/etc.).

Previously, we used a bad HACK of hardcoding almond in the generic
Field subclass. Now, we instead thread through the tokenizer/detokenizer
arguments from the right places.

In the future, we will probably want task classes, to clean up
the mess of hacks and hardcoded task-specific tweaks everywhere.
2019-01-23 10:48:49 -08:00
Giovanni Campagna 91ad08211f Fix duplicate elmo option 2019-01-23 10:45:30 -08:00
Giovanni Campagna 9ec8d7cf2d Merge remote-tracking branch 'upstream/master' 2019-01-23 10:44:20 -08:00
Giovanni Campagna 09bcae5dd5 Merge branch 'master' into mehrad/master 2019-01-23 10:34:17 -08:00
Bryan McCann cd997f257f prepend root before save 2019-01-10 21:24:43 +00:00
Bryan McCann eed545bbf7 log_dir needs root prefix 2019-01-09 18:06:04 +00:00
Bryan McCann 81b7ea7e72 options for sgd 2019-01-09 01:13:46 +00:00
Bryan McCann 27bb192249 unique val tasks 2019-01-09 00:28:11 +00:00
Bryan McCann 287b911d9c separate out root dir; add option for experiment name 2019-01-08 02:05:55 +00:00
mehrad 873236860a fix cuda error 2018-12-18 14:56:29 -08:00
mehrad b4bb1ad661 working version of differentiable BLEU loss 2018-12-17 16:43:06 -08:00
Bryan McCann 07dd886f9a
Update README.md 2018-12-12 12:31:44 -08:00
Bryan Marcus McCann c096a1f5ba bugs in overwrite; new best mqan model 2018-12-12 20:29:46 +00:00
Bryan Marcus McCann 1dc5f7d28e moving schema raw files to s3; updated pretrained models 2018-12-10 21:52:34 +00:00
Bryan Marcus McCann 1bd922b5bc back compat predict.py 2018-12-10 19:41:27 +00:00