This commit is contained in:
Giovanni Campagna 2020-12-19 19:26:57 -08:00
parent 2e8173006e
commit 5fd224f16d
2 changed files with 19 additions and 1 deletions

View File

@ -1,3 +1,21 @@
0.6.0a1
=======
* Preprocessing of code inputs have changed, and code tokens are no longer treated specially.
Instead, they are treated as normal words and preprocessed using BPE. This allows using any
Huggingface tokenizer without changes. Tasks can still define certain tokens that should be
treated as special tokens. These are either added as new tokens, or further preprocessed
into non-ambiguous sequences of words.
* Old models (MQAN and baselines) were removed. The GloVe vectors and other non-contextual
word embeddings were also removed. Old training options that were ineffective or unused
were removed.
* The internals of the library have been refactored to simplify development allow using any
Huggingface Seq2Seq or MLM model. As a result, the name of the models have changed: `Seq2Seq`
is now `TransformerLSTM` and `Bart` is now `TransformerSeq2Seq`. Command-line flags changed as well.
NOTE: due to the change in model names and commnd-line flags, this release is not backward
compatible with models trained with genienlp <= 0.5.0
0.5.0
=====

View File

@ -34,7 +34,7 @@ with open("README.md", "r") as fh:
setuptools.setup(
name='genienlp',
version='0.5.0',
version='0.6.0a1',
packages=setuptools.find_packages(exclude=['tests']),
entry_points= {