Commit Graph

10175 Commits

Author SHA1 Message Date
svlandeg d8b435ceff pretraining description vectors and storing them in the KB 2019-06-06 19:51:27 +02:00
svlandeg 5c723c32c3 entity vectors in the KB + serialization of them 2019-06-05 18:29:18 +02:00
svlandeg 9abbd0899f separate entity encoder to get 64D descriptions 2019-06-05 00:09:46 +02:00
svlandeg fb37cdb2d3 implementing el pipe in pipes.pyx (not tested yet) 2019-06-03 21:32:54 +02:00
svlandeg d83a1e3052 Merge branch 'master' into feature/nel-wiki 2019-06-03 09:35:10 +02:00
svlandeg 9e88763dab 60% acc run 2019-06-03 08:04:49 +02:00
Ines Montani e703301129 Update universe [ci skip] 2019-06-02 13:55:55 +02:00
Ines Montani 892e72451f Update universe [ci skip] 2019-06-02 12:58:12 +02:00
Ines Montani 42de5be90c Tidy up universe [ci skip] 2019-06-02 12:38:48 +02:00
Nirant 638caba9b5 Add multiple packages to universe.json (#3809) [ci skip]
* Add multiple packages to universe.json

Added following packages: NLPArchitect, NLPRe, Chatterbot, alibi, NeuroNER

* Auto-format

* Update slogan (probably just copy-paste mistake)

* Adjust formatting

* Update tags / categories
2019-06-02 12:35:52 +02:00
Germán 86eb817b74 Overwrites default getter for like_num in Spanish by adding _num_words and like_num to lex_attrs.py (#3810) (closes #3803))
* (#3803) Spanish like_num returning false for number-like token

* (#3803) Spanish like_num now returning True for number-like token
2019-06-02 12:22:57 +02:00
Nirant d4d1eab5e1 Add Baderlab/saber to universe.json (#3806) 2019-06-01 17:36:40 +02:00
Nirant a5d92a3035 Create NirantK.md (#3807) [ci skip] 2019-06-01 17:36:06 +02:00
Ines Montani 6be7d07315
Update UNIVERSE.md 2019-06-01 16:37:06 +02:00
Ines Montani 09e78b52cf Improve E024 text for incorrect GoldParse (closes #3558) 2019-06-01 14:37:27 +02:00
Ines Montani 0c74506c9c Fix typos in docs (closes #3802) [ci skip] 2019-06-01 11:35:01 +02:00
Nipun Sadvilkar 1f13005751 Incorrect Token attribute ent_iob_ description (#3800)
* Incorrect Token attribute ent_iob_ description

* Add spaCy contributor agreement
2019-05-31 16:50:45 +02:00
Ramanan Balakrishnan 26c37c5a4d fix all references to BILUO annotation format (#3797) 2019-05-31 12:19:19 +02:00
Ines Montani a7fd42d937 Make jsonschema dependency optional (#3784) 2019-05-30 14:34:58 +02:00
svlandeg 268a52ead7 experimenting with cosine sim for negative examples (not OK yet) 2019-05-29 16:07:53 +02:00
mak 89379a7fa4 Corrected example model URL in requirements.txt (#3786)
The URL used to show how to add a model to the requirements.txt had the old release path (excl. explosion).
2019-05-29 10:51:55 +02:00
svlandeg a761929fa5 context encoder combining sentence and article 2019-05-28 18:14:49 +02:00
Ines Montani a8416c46f7 Use string name in setup.py
Hopefully this will trick GitHub's parser into recognising it as a Python package and show us the dependents / "used by" statistics 🤞
2019-05-28 17:11:39 +02:00
svlandeg 992fa92b66 refactor again to clusters of entities and cosine similarity 2019-05-28 00:05:22 +02:00
svlandeg 8c4aa076bc small fixes 2019-05-27 14:29:38 +02:00
Ujwal Narayan ed7be3f64c Update norm_exceptions.py (#3778)
* Update norm_exceptions.py

Extended the Currency set to include Franc, Indian Rupee, Bangladeshi Taka, Korean Won, Mexican Dollar, and Egyptian Pound

* Fix formatting [ci skip]
2019-05-27 11:52:52 +02:00
svlandeg cfc27d7ff9 using Tok2Vec instead 2019-05-26 23:39:46 +02:00
svlandeg abf9af81c9 learn rate en epochs 2019-05-24 22:04:25 +02:00
estr4ng7d 604acb6ace Marathi Language Support (#3767)
* Adding Marathi language details and folder to it

* Adding few changes and running tests

* Adding few changes and running tests

* Update __init__.py

mh -> mr

* Rename spacy/lang/mh/__init__.py to spacy/lang/mr/__init__.py

* mh -> mr
2019-05-24 14:29:42 +02:00
Ines Montani 7634812172 Document Language.evaluate 2019-05-24 14:06:36 +02:00
Ines Montani 45e6855550 Update Language.update docs 2019-05-24 14:06:26 +02:00
Ines Montani b78a8dc1d2 Update Scorer and add API docs 2019-05-24 14:06:04 +02:00
svlandeg 86ed771e0b adding local sentence encoder 2019-05-23 16:59:11 +02:00
svlandeg 4392c01b7b obtain sentence for each mention 2019-05-23 15:37:05 +02:00
svlandeg 97241a3ed7 upsampling and batch processing 2019-05-22 23:40:10 +02:00
svlandeg 1a16490d20 update per entity 2019-05-22 12:46:40 +02:00
svlandeg eb08bdb11f hidden with for encoders 2019-05-21 23:42:46 +02:00
svlandeg 7b13e3d56f undersampling negatives 2019-05-21 18:35:10 +02:00
svlandeg 2fa3fac851 fix concat bp and more efficient batch calls 2019-05-21 13:43:59 +02:00
svlandeg 0a15ee4541 fix in bp call 2019-05-20 23:54:55 +02:00
svlandeg 89e322a637 small fixes 2019-05-20 17:20:39 +02:00
Ujwal Narayan 4d550a3055 Enhancing Kannada language Resources (#3755)
* Updated stop_words.py

Added more stopwords

* Create ujwal-narayan.md

Enhancing Kannada language resources
2019-05-20 12:56:10 +02:00
svlandeg 7edb2e1711 fix convolution layer 2019-05-20 11:58:48 +02:00
svlandeg dd691d0053 debugging 2019-05-17 17:44:11 +02:00
svlandeg 400b19353d simplify architecture and larger-scale test runs 2019-05-17 01:51:18 +02:00
Ines Montani 321c9f5acc Fix lex_id docs (closes #3743) 2019-05-16 23:15:58 +02:00
svlandeg d51bffe63b clean up code 2019-05-16 18:36:15 +02:00
svlandeg b5470f3d75 various tests, architectures and experiments 2019-05-16 18:25:34 +02:00
svlandeg 9ffe5437ae calculate gradient for entity encoding 2019-05-15 02:23:08 +02:00
svlandeg 2713abc651 implement loss function using dot product and prob estimate per candidate cluster 2019-05-14 22:55:56 +02:00