Giovanni Campagna
b950927a2b
Clean up old checkpoints as we go along
...
Introduce an utility Saver class, that does what tensorflow's Saver
does: keeps track of saved checkpoints in a separate file, and
deletes the old ones before saving a new one.
2019-03-03 22:35:29 -08:00
Giovanni Campagna
410c6cd8ec
Merge remote-tracking branch 'origin/master' into wip/more-cleanups
2019-03-03 22:16:09 -08:00
Giovanni Campagna
c2406e62e0
Fix "get_commit" when invoking "decanlp" installed with "pip install -e"
...
sys.argv will be a script living in ~/.bin that loads and executes
the module, so it will not live in the git repository
2019-03-02 18:28:04 +00:00
Giovanni Campagna
1aa6306702
server: automatically grow the embedding matrix when new words are encountered
...
Otherwise, we feed actual unks to the model, while the model
is trained with character embeddings and expects to see some
actual value for everything.
2019-03-02 00:44:51 -08:00
Giovanni Campagna
72175af080
torchtext.field: use <unk> as the unk tokens
...
" UNK " has spaces, and a token with spaces is asking for trouble
2019-03-02 00:17:34 -08:00
Giovanni Campagna
dac1e88a8e
server: close gracefully on Ctrl-c and EOF
2019-03-02 00:17:34 -08:00
mehrad
3a1b969ad3
updates
2019-03-01 17:51:54 -08:00
mehrad
9c3145846a
add tests
2019-03-01 17:51:54 -08:00
Giovanni Campagna
3a1fb01286
server: flush stdout after a request
2019-03-01 17:40:19 -08:00
Giovanni Campagna
ce5c1aa8df
Use logger instead of print()
...
print() uses stdout by default, which has two problems:
- it is not flushed until later (so messages don't show, or don't show
up in order with other loggers)
- it conflicts with stdin/stdout usage by `decanlp server --stdin`
2019-03-01 17:35:04 -08:00
Mehrad Moradshahi
f3193836b4
Merge pull request #5 from Stanford-Mobisocial-IoT-Lab/wip/library
...
Convert decanlp into a standard python library
2019-03-01 17:09:39 -08:00
Giovanni Campagna
8d4136a79a
server: add a simpler interface that just works over stdin/stdout
2019-03-01 16:29:05 -08:00
Giovanni Campagna
1a2a4a9ea9
One more stanford copyright
2019-03-01 16:18:10 -08:00
Giovanni Campagna
03ce9501ad
server: remove --data argument
...
The server does not load any data file
2019-03-01 16:14:08 -08:00
Giovanni Campagna
bafabac483
Fix argument handling
2019-03-01 16:13:10 -08:00
Giovanni Campagna
ad1a15637a
Fix missing import
2019-03-01 16:09:58 -08:00
Giovanni Campagna
0866d8a4e8
Add more requirements
2019-03-01 16:08:58 -08:00
Giovanni Campagna
4bddbada45
Fix typo
2019-03-01 16:08:52 -08:00
Giovanni Campagna
ac3ac8c680
Add Stanford copyright to all files that we touched
2019-03-01 15:54:54 -08:00
Giovanni Campagna
8e2b519ac3
Add copyright notices to all files
...
Makes the license clear and explicit
2019-03-01 15:51:45 -08:00
Giovanni Campagna
99e39f9528
Move generic dataset outside of torchtext
...
There is no reason for it to live inside torchtext, and having it
outside will help with using torchtext as a library
2019-03-01 15:47:11 -08:00
Giovanni Campagna
afd39c8660
Add requirements.txt file
2019-03-01 15:43:02 -08:00
Giovanni Campagna
5447d0c37c
Add a "decanlp" script that calls out to the different subcommands
...
Usage:
- decanlp train ...
- decanlp predict ...
- decanlp convert-to-logical-forms ...
2019-03-01 15:43:02 -08:00
Giovanni Campagna
a5a203b099
Add setup.py
2019-03-01 15:43:02 -08:00
Giovanni Campagna
41b80bb4f4
Move all python files to a decanlp/ package
...
As per python conventions
2019-03-01 15:43:02 -08:00
Giovanni Campagna
a44bde6693
Don't import unused tensorflow
2019-03-01 15:22:02 -08:00
mehrad
35fe30272b
satisfy travis
2019-03-01 15:08:31 -08:00
Mehrad Moradshahi
cf534b3523
Merge pull request #4 from Stanford-Mobisocial-IoT-Lab/wip/gcampax/cleanups
...
Cleanups, server support
2019-03-01 14:35:29 -08:00
Giovanni Campagna
c66dde4ca0
Make the server actually work
2019-03-01 12:33:18 -08:00
Giovanni Campagna
a0ff18f1fe
Remove too many redundant tensorboard scalars
...
Clutters the UI, wastes disk space and cycles
2019-03-01 12:33:18 -08:00
Giovanni Campagna
ca91b6e24f
Clean up some really ugly code in predict.py and server.py
2019-03-01 12:33:18 -08:00
Giovanni Campagna
61f93aca9b
train: compute the best validation metric during training, and use it for model selection
...
If a model is found to be better than the previous one, save it
as "best.pth"
Model selection should happen at training time rather than validation
time so we can safely discard checkpoints after training (because
they take too much space)
2019-03-01 12:33:18 -08:00
Giovanni Campagna
d4b35d7ae6
train: don't save per-worker checkpoints if we're not doing distributed training
...
Saves disk space
2019-03-01 10:51:12 -08:00
Giovanni Campagna
a36f2efb8c
Add a simple json-based RPC server to handle interactive predictions
2019-03-01 10:46:16 -08:00
Giovanni Campagna
9a9b77ec17
Almond: load data from the TSV files directly
...
Almond is not a translation task, it's a semantic parsing task.
And it has its own established on-disk file format, and we should
respect that, instead of messing with .tt files
2019-03-01 09:36:14 -08:00
Giovanni Campagna
73d94f187b
SummaryWriter: replace add_scalars with add_scalar
...
tensorboardX's add_scalars is bonkers, and will create a different
file for each tag, which makes the tensorboard completely unreadable
(See https://github.com/lanpa/tensorboardX/issues/366 )
Instead, use add_scalar(), which puts everything in the same file,
as tensorboard is designed to do.
2019-03-01 08:44:06 -08:00
Giovanni Campagna
ae4bbfea3a
arguments: remove bad ugly log dir messing
...
Stick with what the user says as the --save directory. If the user
needs, they can provide the timestamp on the command line.
2019-03-01 08:30:35 -08:00
mehrad
7738b66e72
updates
2019-02-28 19:52:13 -08:00
mehrad
d2656bb53a
update post_process_decoded_results.py
2019-02-27 12:01:42 -08:00
mehrad
26f57a3059
don't shuffle dataset during prediction
2019-02-27 11:17:38 -08:00
mehrad
eff58ba1eb
merge updates
2019-02-27 10:54:01 -08:00
mehrad
2ef63c660e
export results to csv files
2019-02-27 10:52:38 -08:00
mehrad
fede42e471
add cached option for predict
2019-02-20 11:22:32 -08:00
mehrad
b8f4c1b0e5
fix max-margin loss implementation
2019-02-20 11:06:49 -08:00
mehrad
f47da57330
fix bug for joining paths
2019-02-19 16:21:34 -08:00
mehrad
ad5c9c6dc1
save cached files to an assigned path
...
useful when you don't have write permissions to dataset directory
2019-02-19 15:55:20 -08:00
mehrad
f3545ad9d3
add max-margin loss
2019-02-19 13:48:52 -08:00
Mehrad Moradshahi
f2335e456b
Merge pull request #1 from Stanford-Mobisocial-IoT-Lab/wip/field
...
Remove hacks needed to use a different tokenizer for Almond
2019-01-24 10:44:33 -08:00
mehrad
e4afb21928
minor fix
2019-01-23 19:13:26 -08:00
mehrad
8c54dc1391
updates and fixes
2019-01-23 16:41:37 -08:00