Commit Graph

24 Commits

Author SHA1 Message Date
mehrad 4919b28627 fixes 2019-05-29 19:38:24 -07:00
mehrad 672aa14117 merge branch wip/mehrad/multi_lingual 2019-05-29 18:43:10 -07:00
Giovanni Campagna 25db020107 Reduce memory usage while loading almond datasets
Don't load all lines in memory
2019-05-15 09:27:00 -07:00
Giovanni Campagna 488a4feb64 Fix contextual almond 2019-05-15 09:26:54 -07:00
mehrad df94f7dd3a ignore malformed sentences 2019-05-13 14:50:08 -07:00
mehrad 0495d4d0eb Updates
1) use FastText for encoding persian text
2) let the user choose the question for almond task
3) bug fixes
2019-05-13 13:03:51 -07:00
mehrad 925d839e15 adding an end-to-end combined model 2019-04-29 13:56:39 -07:00
Giovanni Campagna 46eaae8ba8 fix almond dataset name 2019-04-23 09:35:57 -07:00
Giovanni Campagna 64020a497f Add ContextualAlmond task
Its training files have four columns: <id> <context> <sentence> <target_program>
2019-04-23 09:30:49 -07:00
mehrad a7a2d752d2 Fixes
std() in layer normalization is the culprit for generating NAN.
It happens in the backward pass for values with zero variance.
Just update the mean for these batches.
2019-04-08 14:48:23 -07:00
Giovanni Campagna cea6092f90 Fix evaluating
- fix loading old config.json files that are missing some parameters
- fix expanding the trained embedding
- add a default context for "almond_with_thingpedia_as_context"
  (to include thingpedia)
- fix handling empty sentences
2019-03-23 17:28:22 -07:00
mehrad 91e6f5ded8 merge master + updates 2019-03-21 14:38:34 -07:00
mehrad 7555ec6b82 master updates + additional tweaks 2019-03-21 11:20:48 -07:00
Giovanni Campagna 7f1a8b2578 fix 2019-03-19 18:34:02 -07:00
Giovanni Campagna 63c96cd76a Fix plain thingtalk grammar
I copied the wrong version of genieparser...
2019-03-19 18:32:23 -07:00
Giovanni Campagna 112bb0bbbf Fix 2019-03-19 17:23:36 -07:00
Giovanni Campagna c4ba6d7bcd Add a progbar when loading the almond dataset
Because it takes a while
2019-03-19 14:53:11 -07:00
Giovanni Campagna 7325ca1cc7 Add option to use grammar in Almond task 2019-03-19 14:38:18 -07:00
Giovanni Campagna 17f4381ea3 Import the grammar code from genie-parser
Now purged of unnecessary messing with numpy, and of unnecessary
tensorflow
2019-03-19 12:06:22 -07:00
Giovanni Campagna f40f168f17 Reshuffle code around
Move task specific stuff into tasks/
2019-03-19 11:22:54 -07:00
Giovanni Campagna 02e4d6ddac Prepare for supporting grammar
Use a consistent preprocessing function, provided by the task class,
between server and train/predict, and load the tasks once.
2019-03-19 11:14:32 -07:00
Giovanni Campagna 14caf01e49 server: update to use task classes 2019-03-19 10:58:34 -07:00
Giovanni Campagna 5989846771 Make use of task classes
And clean up the metric handling code as well
2019-03-19 10:01:45 -07:00
Giovanni Campagna 92aade6a66 Introduce a registry mechanism and a class hierarchy for tasks
Most tasks can be framed as CQA naturally, with the task specific
code living in the dataset code, but some require extra task-specific
handling (IDs, metrics, tokenization, etc.)

In preparation for having even more task specific handling for Almond,
start cleaning up by creating a class for each task.
2019-03-19 09:15:19 -07:00