2021-10-27 17:14:33 +00:00
|
|
|
0.7.0a3
|
|
|
|
=======
|
|
|
|
|
|
|
|
* Added adafactor optimizer [#204]
|
|
|
|
* Added option to compute and log validation loss [#204]
|
|
|
|
* Multiple changes to improve alignment in translation, better heuristics for dates and numbers [#204]
|
|
|
|
* Added support for changing generation arguments in server requests [#204]
|
|
|
|
* Misc. code upgrades and bug fixes [#211, #216, #217]
|
|
|
|
* Updated dependencies (major update for sacrebleu) [#202, #203, #205, #206, #209, #210, #212, #214, #218].
|
|
|
|
|
2021-09-08 18:05:13 +00:00
|
|
|
0.7.0a2
|
|
|
|
=======
|
|
|
|
|
|
|
|
* Fix installation from pip [#201].
|
|
|
|
|
2021-09-08 04:25:04 +00:00
|
|
|
0.7.0a1
|
|
|
|
=======
|
|
|
|
|
|
|
|
* Added support for sequence classification tasks [#176].
|
|
|
|
* NED was refactored and cleaned up. Bootleg can now accept an optional file to normalize
|
|
|
|
types. Support for ElasticSearch in the naive NED was removed. Option names were simplified
|
|
|
|
and documentation was improved [#173, #183, #188, #191].
|
|
|
|
* kfserving will now terminate with an error in case of CUDA error, allowing graceful
|
|
|
|
restart [#190].
|
|
|
|
|
2021-05-26 15:53:03 +00:00
|
|
|
0.6.0
|
|
|
|
=====
|
|
|
|
|
|
|
|
* Added support for token classification tasks, and added AmbigQA and CrossNER tasks [#128].
|
|
|
|
* Added support for beam search in server mode [#141].
|
|
|
|
* Misc bug fixes [#142, #143].
|
|
|
|
* Updated dependencies [#133, #134, #136, #138, #140, #144, #145].
|
|
|
|
* Build system and code style fixes [#146].
|
|
|
|
|
2021-04-23 22:09:40 +00:00
|
|
|
0.6.0a4
|
|
|
|
=======
|
|
|
|
|
|
|
|
* Translation code is now migrated to the main genienlp codebase under `almond_translate` task [#98, #105].
|
|
|
|
* Changed data batching code to account for sequence output lengths too [#130]
|
|
|
|
* Pipenv is removed [#104].
|
|
|
|
* Bumped Bootleg version to 1.0.1 [#124].
|
|
|
|
* Bumped Transformers version to 4.5.1 [#114, #127].
|
|
|
|
* Misc code upgrades and bug fixes [#115, #102, #123].
|
|
|
|
|
2021-03-08 03:35:19 +00:00
|
|
|
0.6.0a3
|
|
|
|
=======
|
|
|
|
|
|
|
|
* Loss dropping is now an optional dependency instead of a required one [#96].
|
|
|
|
* Fixed running Bootleg models in Kubeflow.
|
|
|
|
* Fixed combining Bootleg and calibration [#99].
|
|
|
|
* Misc bug fixes [#97, #102].
|
|
|
|
* Misc build system and test fixes [#101].
|
|
|
|
|
2021-02-15 17:56:59 +00:00
|
|
|
0.6.0a2
|
|
|
|
=======
|
|
|
|
|
|
|
|
* Added support for [Bootleg](https://github.com/HazyResearch/bootleg), a state-of-the art
|
|
|
|
named entity recognition system. The output of the NER can be fed as auxiliary information
|
|
|
|
to the model in embedding or text form [#83, #93].
|
|
|
|
* Added support for calibration. Calibration is an additional step applied to the output of
|
|
|
|
the model to compute a confidence score that can be interpreted as the probability of producing
|
|
|
|
a correct parse. Multiple calibrators can be trained, to separately identify likely incorrect
|
|
|
|
parses and out-of-domain inputs [#72, #74, #92, #94].
|
|
|
|
* Added support for inference in Kubeflow, using the new command `genienlp kfserver`, which
|
|
|
|
exposes a compatible HTTP interface [#76, #80, #88, #90].
|
|
|
|
* Preprocessing of inputs can now use the new fast tokenizers from the huggingface library [#66].
|
|
|
|
* A number of new hyperparameter options were added, includng diverse beam search, loss dropping,
|
|
|
|
and a new learning rate schedule [#66].
|
|
|
|
* Paraphrasing is now a regular task trained with `genienlp train`, and no longer needs a
|
|
|
|
different set of commands [#79].
|
|
|
|
* Misc bug fixes [#67, #68, #69, #70, #71, #85, #95].
|
|
|
|
|
2020-12-20 03:26:57 +00:00
|
|
|
0.6.0a1
|
|
|
|
=======
|
|
|
|
|
|
|
|
* Preprocessing of code inputs have changed, and code tokens are no longer treated specially.
|
|
|
|
Instead, they are treated as normal words and preprocessed using BPE. This allows using any
|
|
|
|
Huggingface tokenizer without changes. Tasks can still define certain tokens that should be
|
|
|
|
treated as special tokens. These are either added as new tokens, or further preprocessed
|
|
|
|
into non-ambiguous sequences of words.
|
|
|
|
* Old models (MQAN and baselines) were removed. The GloVe vectors and other non-contextual
|
|
|
|
word embeddings were also removed. Old training options that were ineffective or unused
|
|
|
|
were removed.
|
|
|
|
* The internals of the library have been refactored to simplify development allow using any
|
|
|
|
Huggingface Seq2Seq or MLM model. As a result, the name of the models have changed: `Seq2Seq`
|
|
|
|
is now `TransformerLSTM` and `Bart` is now `TransformerSeq2Seq`. Command-line flags changed as well.
|
2021-05-24 21:54:36 +00:00
|
|
|
|
2020-12-20 03:26:57 +00:00
|
|
|
NOTE: due to the change in model names and commnd-line flags, this release is not backward
|
|
|
|
compatible with models trained with genienlp <= 0.5.0
|
|
|
|
|
2020-12-16 07:05:58 +00:00
|
|
|
0.5.0
|
|
|
|
=====
|
|
|
|
|
|
|
|
* Paraphrasing and training was made much faster, with improved GPU usage and by removing
|
|
|
|
redundant tokenization in the hot-paraphrase path [#37, #38, #47].
|
|
|
|
* The transformers library was updated to 4.0; PyTorch dependency increased to 1.6 [#44, #59, #62].
|
|
|
|
* New models: BART, mBART, mT5. As part of this work, the model code was refactored to be more consistent
|
|
|
|
with Huggingface models and generation code [#46, #62].
|
|
|
|
* Paraphrasing scripts are now proper subcommands of genienlp [#53].
|
|
|
|
* It is now possible to fine-tune MBart and Marian models for neural machine translation
|
|
|
|
and sentence denoising [#54].
|
|
|
|
* genienlp server can now operate in batch mode, improving GPU utilization [#58].
|
|
|
|
* Misc bug and documentation fixes [#39, #40, #41, #43, #48, #55, #58].
|
|
|
|
|
2020-09-15 00:27:54 +00:00
|
|
|
0.4.0
|
|
|
|
=====
|
|
|
|
|
|
|
|
* Added the ability to run paraphrasing in FP-16 mixed precision mode.
|
2021-05-24 21:54:36 +00:00
|
|
|
* The dependency on matplotlib and seaborn (used to produce plots for analysis) is now
|
2020-09-15 00:27:54 +00:00
|
|
|
optional [#36].
|
|
|
|
|
|
|
|
Please see the development releases below for the full list of features in this release.
|
|
|
|
|
2020-09-01 04:09:47 +00:00
|
|
|
0.4.0b1
|
|
|
|
=======
|
|
|
|
|
|
|
|
* Fixed handling of CJK characters and combining characters in BERT and XLM tokenizers [#34].
|
|
|
|
|
2020-08-10 17:47:42 +00:00
|
|
|
0.4.0a1
|
|
|
|
=======
|
|
|
|
|
|
|
|
* The paraphrase generation code was extended and can now use BART instead of GPT2. It also now
|
|
|
|
has the ability to run as a translation task as well (using the Marian models) [#26, #27, #29, #31].
|
|
|
|
* Added the ability to override the context and the question used as input to the model [#23].
|
|
|
|
* MultiGPU training was tested and fixed [#25].
|
|
|
|
* Completed support for beam search, including the ability to return multiple results for a given input [#30].
|
|
|
|
* Misc bug fixes [#32].
|
|
|
|
|
2020-06-09 03:50:28 +00:00
|
|
|
0.3.0
|
|
|
|
=====
|
|
|
|
|
|
|
|
* New option: sentence batching. Multiple sentences with related properties can be batched
|
|
|
|
together in microbatches within a larger minibatch [#14, #11].
|
|
|
|
* Added option to append context and question in a single model input [#18, #20, #22].
|
|
|
|
* Updated Transformers dependency to 2.9, and fixed compatibility with newer versions [#18, #24].
|
|
|
|
|
2020-04-03 17:28:04 +00:00
|
|
|
0.2.0
|
|
|
|
=====
|
|
|
|
|
|
|
|
* No changes since 0.2.0b2.
|
|
|
|
|
|
|
|
Please see the development releases below for the full list of features in this release.
|
|
|
|
|
2020-04-03 01:49:01 +00:00
|
|
|
0.2.0b2
|
|
|
|
=======
|
|
|
|
|
|
|
|
* Misc bug fixes related to inference time [#12, #13].
|
|
|
|
|
2020-03-27 20:08:42 +00:00
|
|
|
0.2.0b1
|
|
|
|
=======
|
|
|
|
|
|
|
|
* Added multilingual Almond tasks [#10].
|
|
|
|
|
2020-03-25 01:55:59 +00:00
|
|
|
0.2.0a2
|
|
|
|
=======
|
|
|
|
|
|
|
|
* Misc bug fixes [#8, #9]
|
|
|
|
|
2020-03-22 02:08:19 +00:00
|
|
|
0.2.0a1
|
|
|
|
=======
|
|
|
|
|
|
|
|
New features:
|
|
|
|
* Add new tasks for Almond: almond_dialogue_nlu, almond_dialogue_nlg, almond_dialogue_policy
|
|
|
|
* Added a new encoder, "Coattention", which encodes the context and question separately, then
|
|
|
|
coattends and applies a BiLSTM layer.
|
|
|
|
* For Coattention and Identity encoder, it is now possible to specify the context and question
|
|
|
|
embeddings separately.
|
|
|
|
* Embeddings in context, question and answer can now be untied, by suffixing the name with '@'
|
|
|
|
followed by an unique identifier (e.g. bert-base-uncased@0 and bert-base-uncased@1).
|
|
|
|
* Added an option to pretrain the context encoder, using MLM objective.
|
|
|
|
* Added beam search.
|
|
|
|
* New embedding option: XLM-R (XLM trained with Roberta).
|
|
|
|
* New task: paraphrasing with GPT2. This is not fully integrated with the other tasks yet,
|
|
|
|
but it will in the future.
|
|
|
|
* New command "genienlp export" can be used to save a trained model for inference.
|
|
|
|
|
|
|
|
Incompatible changes:
|
|
|
|
* The --save flag is now required when calling train
|
|
|
|
|
2020-01-29 18:23:44 +00:00
|
|
|
0.1.1
|
|
|
|
=====
|
|
|
|
|
|
|
|
* Fix publishing on pypi
|
|
|
|
|
2020-01-29 18:01:06 +00:00
|
|
|
0.1.0
|
|
|
|
=====
|
|
|
|
|
|
|
|
* First release
|