This commit is contained in:
Yohei Tamura 2020-03-08 21:24:38 +09:00 committed by GitHub
parent 9dd98a4b27
commit 31755630a7
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
1 changed files with 1 additions and 1 deletions

View File

@ -2,7 +2,7 @@
### Step 1: Create a Knowledge Base (KB) and training data
Run `wikipedia_pretrain_kb.py`
Run `wikidata_pretrain_kb.py`
* This takes as input the locations of a **Wikipedia and a Wikidata dump**, and produces a **KB directory** + **training file**
* WikiData: get `latest-all.json.bz2` from https://dumps.wikimedia.org/wikidatawiki/entities/
* Wikipedia: get `enwiki-latest-pages-articles-multistream.xml.bz2` from https://dumps.wikimedia.org/enwiki/latest/ (or for any other language)