genienlp

Commit Graph

Author	SHA1	Message	Date
Giovanni Campagna	b950927a2b	Clean up old checkpoints as we go along Introduce an utility Saver class, that does what tensorflow's Saver does: keeps track of saved checkpoints in a separate file, and deletes the old ones before saving a new one.	2019-03-03 22:35:29 -08:00
Giovanni Campagna	410c6cd8ec	Merge remote-tracking branch 'origin/master' into wip/more-cleanups	2019-03-03 22:16:09 -08:00
Giovanni Campagna	c2406e62e0	Fix "get_commit" when invoking "decanlp" installed with "pip install -e" sys.argv will be a script living in ~/.bin that loads and executes the module, so it will not live in the git repository	2019-03-02 18:28:04 +00:00
Giovanni Campagna	1aa6306702	server: automatically grow the embedding matrix when new words are encountered Otherwise, we feed actual unks to the model, while the model is trained with character embeddings and expects to see some actual value for everything.	2019-03-02 00:44:51 -08:00
Giovanni Campagna	72175af080	torchtext.field: use <unk> as the unk tokens " UNK " has spaces, and a token with spaces is asking for trouble	2019-03-02 00:17:34 -08:00
Giovanni Campagna	dac1e88a8e	server: close gracefully on Ctrl-c and EOF	2019-03-02 00:17:34 -08:00
mehrad	3a1b969ad3	updates	2019-03-01 17:51:54 -08:00
mehrad	9c3145846a	add tests	2019-03-01 17:51:54 -08:00
Giovanni Campagna	3a1fb01286	server: flush stdout after a request	2019-03-01 17:40:19 -08:00
Giovanni Campagna	ce5c1aa8df	Use logger instead of print() print() uses stdout by default, which has two problems: - it is not flushed until later (so messages don't show, or don't show up in order with other loggers) - it conflicts with stdin/stdout usage by `decanlp server --stdin`	2019-03-01 17:35:04 -08:00
Mehrad Moradshahi	f3193836b4	Merge pull request #5 from Stanford-Mobisocial-IoT-Lab/wip/library Convert decanlp into a standard python library	2019-03-01 17:09:39 -08:00
Giovanni Campagna	8d4136a79a	server: add a simpler interface that just works over stdin/stdout	2019-03-01 16:29:05 -08:00
Giovanni Campagna	1a2a4a9ea9	One more stanford copyright	2019-03-01 16:18:10 -08:00
Giovanni Campagna	03ce9501ad	server: remove --data argument The server does not load any data file	2019-03-01 16:14:08 -08:00
Giovanni Campagna	bafabac483	Fix argument handling	2019-03-01 16:13:10 -08:00
Giovanni Campagna	ad1a15637a	Fix missing import	2019-03-01 16:09:58 -08:00
Giovanni Campagna	0866d8a4e8	Add more requirements	2019-03-01 16:08:58 -08:00
Giovanni Campagna	4bddbada45	Fix typo	2019-03-01 16:08:52 -08:00
Giovanni Campagna	ac3ac8c680	Add Stanford copyright to all files that we touched	2019-03-01 15:54:54 -08:00
Giovanni Campagna	8e2b519ac3	Add copyright notices to all files Makes the license clear and explicit	2019-03-01 15:51:45 -08:00
Giovanni Campagna	99e39f9528	Move generic dataset outside of torchtext There is no reason for it to live inside torchtext, and having it outside will help with using torchtext as a library	2019-03-01 15:47:11 -08:00
Giovanni Campagna	afd39c8660	Add requirements.txt file	2019-03-01 15:43:02 -08:00
Giovanni Campagna	5447d0c37c	Add a "decanlp" script that calls out to the different subcommands Usage: - decanlp train ... - decanlp predict ... - decanlp convert-to-logical-forms ...	2019-03-01 15:43:02 -08:00
Giovanni Campagna	a5a203b099	Add setup.py	2019-03-01 15:43:02 -08:00
Giovanni Campagna	41b80bb4f4	Move all python files to a decanlp/ package As per python conventions	2019-03-01 15:43:02 -08:00
Giovanni Campagna	a44bde6693	Don't import unused tensorflow	2019-03-01 15:22:02 -08:00
mehrad	35fe30272b	satisfy travis	2019-03-01 15:08:31 -08:00
Mehrad Moradshahi	cf534b3523	Merge pull request #4 from Stanford-Mobisocial-IoT-Lab/wip/gcampax/cleanups Cleanups, server support	2019-03-01 14:35:29 -08:00
Giovanni Campagna	c66dde4ca0	Make the server actually work	2019-03-01 12:33:18 -08:00
Giovanni Campagna	a0ff18f1fe	Remove too many redundant tensorboard scalars Clutters the UI, wastes disk space and cycles	2019-03-01 12:33:18 -08:00
Giovanni Campagna	ca91b6e24f	Clean up some really ugly code in predict.py and server.py	2019-03-01 12:33:18 -08:00
Giovanni Campagna	61f93aca9b	train: compute the best validation metric during training, and use it for model selection If a model is found to be better than the previous one, save it as "best.pth" Model selection should happen at training time rather than validation time so we can safely discard checkpoints after training (because they take too much space)	2019-03-01 12:33:18 -08:00
Giovanni Campagna	d4b35d7ae6	train: don't save per-worker checkpoints if we're not doing distributed training Saves disk space	2019-03-01 10:51:12 -08:00
Giovanni Campagna	a36f2efb8c	Add a simple json-based RPC server to handle interactive predictions	2019-03-01 10:46:16 -08:00
Giovanni Campagna	9a9b77ec17	Almond: load data from the TSV files directly Almond is not a translation task, it's a semantic parsing task. And it has its own established on-disk file format, and we should respect that, instead of messing with .tt files	2019-03-01 09:36:14 -08:00
Giovanni Campagna	73d94f187b	SummaryWriter: replace add_scalars with add_scalar tensorboardX's add_scalars is bonkers, and will create a different file for each tag, which makes the tensorboard completely unreadable (See https://github.com/lanpa/tensorboardX/issues/366 ) Instead, use add_scalar(), which puts everything in the same file, as tensorboard is designed to do.	2019-03-01 08:44:06 -08:00
Giovanni Campagna	ae4bbfea3a	arguments: remove bad ugly log dir messing Stick with what the user says as the --save directory. If the user needs, they can provide the timestamp on the command line.	2019-03-01 08:30:35 -08:00
mehrad	7738b66e72	updates	2019-02-28 19:52:13 -08:00
mehrad	d2656bb53a	update post_process_decoded_results.py	2019-02-27 12:01:42 -08:00
mehrad	26f57a3059	don't shuffle dataset during prediction	2019-02-27 11:17:38 -08:00
mehrad	eff58ba1eb	merge updates	2019-02-27 10:54:01 -08:00
mehrad	2ef63c660e	export results to csv files	2019-02-27 10:52:38 -08:00
mehrad	fede42e471	add cached option for predict	2019-02-20 11:22:32 -08:00
mehrad	b8f4c1b0e5	fix max-margin loss implementation	2019-02-20 11:06:49 -08:00
mehrad	f47da57330	fix bug for joining paths	2019-02-19 16:21:34 -08:00
mehrad	ad5c9c6dc1	save cached files to an assigned path useful when you don't have write permissions to dataset directory	2019-02-19 15:55:20 -08:00
mehrad	f3545ad9d3	add max-margin loss	2019-02-19 13:48:52 -08:00
Mehrad Moradshahi	f2335e456b	Merge pull request #1 from Stanford-Mobisocial-IoT-Lab/wip/field Remove hacks needed to use a different tokenizer for Almond	2019-01-24 10:44:33 -08:00
mehrad	e4afb21928	minor fix	2019-01-23 19:13:26 -08:00
mehrad	8c54dc1391	updates and fixes	2019-01-23 16:41:37 -08:00

1 2 3 4 5

201 Commits All Branches Search

201 Commits

All Branches