From d33e180395151457f84af0fc0ba80dd45ac55888 Mon Sep 17 00:00:00 2001 From: mehrad Date: Sun, 1 Aug 2021 10:44:45 -0700 Subject: [PATCH] Update README.md --- README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 52d66b36..e5e19940 100644 --- a/README.md +++ b/README.md @@ -142,14 +142,14 @@ The alignment code has been updated and improved since 0.6.0 release, so if you First run a bootleg model to extract mentions, entity candidates, and contextual embeddings for the mentions. ```bash -genienlp bootleg-dump-features --train_tasks --save --preserve_case --data --train_batch_tokens 1200 --val_batch_size 2000 --database_type json --database_dir --ned_features type_id type_prob --ned_features_size 1 1 --ned_features_default_val 0 1.0 --num_workers 0 --min_entity_len 1 --max_entity_len 4 --bootleg_model +genienlp bootleg-dump-features --train_tasks --save --preserve_case --data --train_batch_tokens 1200 --val_batch_size 2000 --database_type json --database_dir --min_entity_len 1 --max_entity_len 4 --bootleg_model ``` This command generates several output files. In `` you should see a `prep` dir which contains preprocessed data (e.g. data converted to memory-mapped format, several array to facilitate embedding lookup etc.) If your dataset doesn't change you can reuse the same files. It will also generate several files in folder. In `eval_bootleg/[train|eval]//bootleg_lables.jsonl` you can see the examples, mentions, predicted candidates and their probabilities according to bootleg. Now you can use the extracted features from bootleg in downstream tasks such as semantic parsing to improve named entity understanding and consequently generation: ```bash -genienlp train --train_tasks --train_iterations --preserve_case --save --data --model TransformerLSTM --pretrained_model bert-base-uncased --trainable_decoder_embeddings 50 --train_batch_tokens 1000 --val_batch_size 1000 --do_ned --database_type json --database_dir --ned_retrieve_method bootleg --ned_features type_id type_prob --ned_features_size 1 1 --ned_features_default_val 0 1.0 --num_workers 0 --min_entity_len 1 --max_entity_len 4 --bootleg_model +genienlp train --train_tasks --train_iterations --preserve_case --save --data --model TransformerSeq2Seq --pretrained_model facebook/bart-base --train_batch_tokens 1000 --val_batch_size 1000 --do_ned --database_dir --ned_retrieve_method bootleg --entity_attributes type_id type_prob --add_entities_to_text append --bootleg_model ```