Commit Graph

4 Commits

Author SHA1 Message Date
Sofie Van Landeghem 7b96a5e10f Reduce mem usage in training Entity Linker (#4811)
* move nlp processing for el pipe to batch training instead of preprocessing

* adding dev eval back in, and limit in articles instead of entities

* use pipe whenever possible

* few more small doc changes

* access dev data through generator

* tqdm description

* small fixes

* update documentation
2020-01-06 14:59:50 +01:00
Sofie Van Landeghem 12158c1e3a Restore tqdm imports (#4804)
* set 4.38.0 to minimal version with color bug fix

* set imports back to proper place

* add upper range for tqdm
2019-12-16 13:12:19 +01:00
Sofie Van Landeghem 2d249a9502 KB extensions and better parsing of WikiData (#4375)
* fix overflow error on windows

* more documentation & logging fixes

* md fix

* 3 different limit parameters to play with execution time

* bug fixes directory locations

* small fixes

* exclude dev test articles from prior probabilities stats

* small fixes

* filtering wikidata entities, removing numeric and meta items

* adding aliases from wikidata also to the KB

* fix adding WD aliases

* adding also new aliases to previously added entities

* fixing comma's

* small doc fixes

* adding subclassof filtering

* append alias functionality in KB

* prevent appending the same entity-alias pair

* fix for appending WD aliases

* remove date filter

* remove unnecessary import

* small corrections and reformatting

* remove WD aliases for now (too slow)

* removing numeric entities from training and evaluation

* small fixes

* shortcut during prediction if there is only one candidate

* add counts and fscore logging, remove FP NER from evaluation

* fix entity_linker.predict to take docs instead of single sentences

* remove enumeration sentences from the WP dataset

* entity_linker.update to process full doc instead of single sentence

* spelling corrections and dump locations in readme

* NLP IO fix

* reading KB is unnecessary at the end of the pipeline

* small logging fix

* remove empty files
2019-10-14 12:28:53 +02:00
Euan Dowers a6830d60e8 Changes to wiki_entity_linker (#4235)
* Changes to wiki_entity_linker

* No more f-strings

* Make some requested changes

* Add back option to get descriptions from wd not wp

* Fix logs

* Address comments and clean evaluation

* Remove type hints

* Refactor evaluation, add back metrics by label

* Address comments

* Log training performance as well as dev
2019-09-13 17:03:57 +02:00