svlandeg
cc9ae28a52
custom error and warning messages
2019-06-19 12:35:26 +02:00
svlandeg
791327e3c5
Merge remote-tracking branch 'upstream/master' into feature/nel-wiki
2019-06-19 09:44:05 +02:00
svlandeg
a31648d28b
further code cleanup
2019-06-19 09:15:43 +02:00
svlandeg
478305cd3f
small tweaks and documentation
2019-06-18 18:38:09 +02:00
svlandeg
0d177c1146
clean up code, remove old code, move to bin
2019-06-18 13:20:40 +02:00
svlandeg
ffae7d3555
sentence encoder only (removing article/mention encoder)
2019-06-18 00:05:47 +02:00
svlandeg
6332af40de
baseline performances: oracle KB, random and prior prob
2019-06-17 14:39:40 +02:00
svlandeg
24db1392b9
reprocessing all of wikipedia for training data
2019-06-16 21:14:45 +02:00
Ines Montani
81c12640ab
Auto-format [ci skip]
2019-06-16 14:33:20 +02:00
Greg Werner
9041a72d7f
Update tokenizer.md for construction example ( #3790 )
...
* Update tokenizer.md for construction example
Self contained example. You should really say what nlp is so that the example will work as is
* Update CONTRIBUTOR_AGREEMENT.md
* Restore contributor agreement
* Adjust construction examples
2019-06-16 14:32:56 +02:00
Kabir Khan
1e19f34e29
Add optional `id` property to EntityRuler patterns ( #3591 )
...
* Adding support for entity_id in EntityRuler pipeline component
* Adding Spacy Contributor aggreement
* Updating EntityRuler to use string.format instead of f strings
* Update Entity Ruler to support an 'id' attribute per pattern that explicitly identifies an entity.
* Fixing tests
* Remove custom extension entity_id and use built in ent_id token attribute.
* Changing entity_id to ent_id for consistent naming
* entity_ids => ent_ids
* Removing kb, cleaning up tests, making util functions private, use rsplit instead of split
2019-06-16 13:29:04 +02:00
Suraj Rajan
46c78d0a41
Dependency tree pattern matcher ( #3465 )
...
* Functional dependency tree pattern matcher
* Tests fail due to inconsistent behaviour
* Renamed dependencymatcher and added optimizations
2019-06-16 13:25:32 +02:00
Paul O'Leary McCann
3f52e12335
Change vector training to work with latest gensim ( fix #3749 ) ( #3757 )
2019-06-16 13:24:06 +02:00
BreakBB
d8573ee715
Update error raising for CLI pretrain to fix #3840 ( #3843 )
...
* Add check for empty input file to CLI pretrain
* Raise error if JSONL is not a dict or contains neither `tokens` nor `text` key
* Skip empty values for correct pretrain keys and log a counter as warning
* Add tests for CLI pretrain core function make_docs.
* Add a short hint for the `tokens` key to the CLI pretrain docs
* Add success message to CLI pretrain
* Update model loading to fix the tests
* Skip empty values and do not create docs out of it
2019-06-16 13:22:57 +02:00
svlandeg
81731907ba
performance per entity type
2019-06-14 19:55:46 +02:00
svlandeg
b312f2d0e7
redo training data to be independent of KB and entity-level instead of doc-level
2019-06-14 15:55:26 +02:00
Azagh3l
5accfbb938
Update exemples.py ( #3838 )
...
Added missing hyphen and accent.
2019-06-14 09:31:05 +02:00
svlandeg
0b04d142de
regenerating KB
2019-06-13 22:32:56 +02:00
svlandeg
78dd3e11da
write entity linking pipe to file and keep vocab consistent between kb and nlp
2019-06-13 16:25:39 +02:00
svlandeg
b12001f368
small fixes
2019-06-12 22:05:53 +02:00
Ines Montani
f35ce09776
Add regression test for #3839
2019-06-12 13:38:30 +02:00
Ines Montani
aae9034492
Tidy up [ci skip]
2019-06-12 13:38:23 +02:00
svlandeg
6521cfa132
speeding up training
2019-06-12 13:37:05 +02:00
Motoki Wu
9c064e6ad9
Add resume logic to spacy pretrain ( #3652 )
...
* Added ability to resume training
* Add to readmee
* Remove duplicate entry
2019-06-12 13:29:23 +02:00
svlandeg
66813a1fdc
speed up predictions
2019-06-11 14:18:20 +02:00
svlandeg
fe1ed432ef
eval on dev set, varying combo's of prior and context scores
2019-06-11 11:40:58 +02:00
Azagh3l
eb3e4263ee
Update lex_attrs.py ( #3835 )
...
Corrected typos, added french (from France) versions of some numbers.
2019-06-11 10:59:16 +02:00
Azagh3l
d0d56635ce
Create Azagh3l.md ( #3836 )
2019-06-11 10:58:32 +02:00
svlandeg
83dc7b46fd
first tests with EL pipe
2019-06-10 21:25:26 +02:00
Matthew Honnibal
7f71cf0b02
Merge branch 'master' of https://github.com/explosion/spaCy
2019-06-07 20:41:00 +02:00
Matthew Honnibal
a931d72459
Add merge_subtokens as parser post-process. Re #3830
2019-06-07 20:40:41 +02:00
svlandeg
7de1ee69b8
training loop in proper pipe format
2019-06-07 15:55:10 +02:00
svlandeg
0486ccabfd
introduce goldparse.links
2019-06-07 13:54:45 +02:00
svlandeg
a5c061f506
storing NEL training data in GoldParse objects
2019-06-07 12:58:42 +02:00
Ines Montani
5d6b4bb3bd
Update srsly pin
2019-06-07 11:14:32 +02:00
svlandeg
61f0e2af65
code cleanup
2019-06-06 20:22:14 +02:00
svlandeg
d8b435ceff
pretraining description vectors and storing them in the KB
2019-06-06 19:51:27 +02:00
svlandeg
5c723c32c3
entity vectors in the KB + serialization of them
2019-06-05 18:29:18 +02:00
svlandeg
9abbd0899f
separate entity encoder to get 64D descriptions
2019-06-05 00:09:46 +02:00
Ines Montani
511977ae5e
Update universe [ci skip]
2019-06-04 11:15:51 +02:00
Ramanan Balakrishnan
eb12703d10
minor fix to broken link in documentation ( #3819 ) [ci skip]
2019-06-04 11:15:35 +02:00
svlandeg
fb37cdb2d3
implementing el pipe in pipes.pyx (not tested yet)
2019-06-03 21:32:54 +02:00
intrafind
436a578369
Create intrafindBreno.md ( #3814 )
2019-06-03 18:33:09 +02:00
intrafind
2bba2a3536
Fix for #3811 ( #3815 )
...
Corrected type of seed parameter.
2019-06-03 18:32:47 +02:00
Ines Montani
62ebc65c62
Update universe [ci skip]
2019-06-03 12:19:13 +02:00
svlandeg
d83a1e3052
Merge branch 'master' into feature/nel-wiki
2019-06-03 09:35:10 +02:00
svlandeg
9e88763dab
60% acc run
2019-06-03 08:04:49 +02:00
Ines Montani
e703301129
Update universe [ci skip]
2019-06-02 13:55:55 +02:00
Ines Montani
892e72451f
Update universe [ci skip]
2019-06-02 12:58:12 +02:00
Ines Montani
42de5be90c
Tidy up universe [ci skip]
2019-06-02 12:38:48 +02:00