Commit Graph

10019 Commits

Author SHA1 Message Date
svlandeg 8e70a564f1 custom reader and writer for _EntryC fields (first stab at it - not complete) 2019-04-23 16:33:40 +02:00
svlandeg 004e5e7d1c little fixes 2019-04-19 14:24:02 +02:00
svlandeg 9a8197185b fix alias capitalization 2019-04-18 22:37:50 +02:00
svlandeg 9f308eb5dc fixes for prior prob and linking wikidata IDs with wikipedia titles 2019-04-18 16:14:25 +02:00
svlandeg 10ee8dfea2 poc with few entities and collecting aliases from the WP links 2019-04-18 14:12:17 +02:00
svlandeg 6763e025e1 parse wp dump for links to determine prior probabilities 2019-04-15 11:41:57 +02:00
svlandeg 3163331b1e wikipedia dump parser and mediawiki format regex cleanup 2019-04-14 21:52:01 +02:00
svlandeg b31a390a9a reading types, claims and sitelinks 2019-04-11 21:42:44 +02:00
svlandeg 6e997be4b4 reading wikidata descriptions and aliases 2019-04-11 21:08:22 +02:00
svlandeg 9a7d534b1b enable nogil for cython functions in kb.pxd 2019-04-10 17:25:10 +02:00
svlandeg 61a33f55d2 little fixes 2019-04-10 16:06:09 +02:00
Ines Montani 6ae3b5699e Make sure path is string (resolves #3546) 2019-04-08 12:53:41 +02:00
Ines Montani d0f5e015cb Auto-format 2019-04-08 12:53:16 +02:00
pierremonico 0d26bfe677 Removes duplicate in table (#3550)
* Removes duplicate in table

Just fixing typos.

* Remove newline


Co-authored-by: Ines Montani <ines@ines.io>
2019-04-08 10:30:42 +02:00
Piero Molino 5198aa4ae6 Added Ludwig among the projects (#3548) [ci skip]
* Added Ludwig among the projects

* Create w4nderlust.md

* Add Uber to logo wall
2019-04-07 13:01:26 +02:00
Dobita21 8bf6967eb7 Update Thai stop words (#3545)
* test sPacy commit to git fri 04052019 10:54

* change Data format from my format to master format

* ทัทั้งนี้ ---> ทั้งนี้

* delete stop_word translate from Eng

* Adjust formatting and readability
2019-04-05 12:06:38 +02:00
jeannefukumaru f67d881b30 fix typos in tag_map flagged by `python -m debug-data` (#3542)
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [ ] I have submitted the spaCy Contributor Agreement.
- [ ] I ran the tests, and all new and existing tests passed.
- [ ] My changes don't require a change to the documentation, or if they do, I've added all required information.


Co-authored-by: Ines Montani <ines@ines.io>
2019-04-05 12:06:09 +02:00
Ines Montani cd21778bef
Merge pull request #3539 from jeannefukumaru/master
Added tags previously missing from Indonesian `tag_map.py`
2019-04-04 11:57:03 +02:00
Jeanne Choo b6c9807431 Merge remote-tracking branch 'upstream/master' 2019-04-04 14:21:50 +08:00
Jeanne Choo 80e15af76c fixed tag_map.py merge conflict 2019-04-04 14:18:27 +08:00
jeannefukumaru eba4f77526
Merge pull request #2 from jeannefukumaru/update_indonesian_tag_map
updated tag map with missing tags
2019-04-04 06:49:04 +08:00
jeannefukumaru 876ce01567 updated tag map with missing tags 2019-04-03 23:09:11 +08:00
jeannefukumaru 99e04c4ce2
Merge pull request #1 from jeannefukumaru/added-indonesian-tag-map
Added indonesian tag map
2019-04-03 23:05:05 +08:00
Ines Montani 4faf62d515
Merge pull request #3530 from svlandeg/fix/issue_3521
Allow English stopwords with any type of apostrophe
2019-04-03 14:14:03 +02:00
Yves Peirsman 951825532c Improved Dutch language resources and Dutch lemmatization (#3409)
* Improved Dutch language resources and Dutch lemmatization

* Fix conftest

* Update punctuation.py

* Auto-format

* Format and fix tests

* Remove unused test file

* Re-add deleted test

* removed redundant infix regex pattern for ','; note: brackets + simple hyphen remains

* Cleaner lemmatization files
2019-04-03 14:13:26 +02:00
svlandeg 4ff786e113 addressed all comments by Ines 2019-04-03 13:50:33 +02:00
Ines Montani 6a4575a56c Don't make "settings" or "title" required in displaCy data (closes #3531) 2019-04-03 10:13:16 +02:00
Ines Montani 2f0f439c54 Remove non-existent example (closes #3533) 2019-04-03 09:59:17 +02:00
Kamolsit Mongkolsrisawat dcc67f3f51 Update Thai tokenizer_exception list (#3529)
* add tokenizer_exceptions word (ก-น) from https://goo.gl/JpJ2qq

* update tokenizer_exceptions word list

* add contributor file
2019-04-03 09:13:36 +02:00
ivigamberdiev 5e5641616d Update links and http -> https (#3532)
* update links and http -> https

* SCA
2019-04-02 17:36:22 +02:00
svlandeg 85b4319f33 specify encoding in files 2019-04-02 15:05:31 +02:00
svlandeg 673c81bbb4 unicode string for python 2.7 2019-04-02 13:52:07 +02:00
svlandeg eca9cc5417 fixing Issue #3521 by adding all hyphen variants for each stopword 2019-04-02 13:24:59 +02:00
svlandeg e7062cf699 failing test for Issue #3521 2019-04-02 13:15:35 +02:00
svlandeg 1424b12b09 failing test for Issue #3449 2019-04-02 13:06:37 +02:00
Ines Montani 24cecdb44f Update compatibility [ci skip] 2019-04-01 16:25:16 +02:00
jeannefukumaru 6cdb7b2e04 added tag_map for indonesian (#3515)
* added tag_map for indonesian

* changed tag map from .py to .txt to see if tests pass

* added symbols import

* added utf8 encoding flag

* added missing SCONJ symbol

* Auto-format

* Remove unused imports

* Make tag map available in Indonesian defaults
2019-04-01 12:27:48 +02:00
Ines Montani c23e234d65 Auto-format 2019-04-01 12:11:27 +02:00
Ines Montani 5821b020d5 Merge branch 'spacy.io' 2019-04-01 11:47:59 +02:00
Ines Montani 0a0b1087b0 Make tag map available in Indonesian defaults 2019-04-01 11:46:51 +02:00
Ines Montani 5d9212c44c Remove unused imports 2019-04-01 11:46:25 +02:00
Ines Montani 8d6b544632 Auto-format 2019-04-01 11:45:43 +02:00
jeannefukumaru 6567f27849
added missing SCONJ symbol 2019-04-01 17:02:53 +08:00
jeannefukumaru 082a0a2232
added utf8 encoding flag 2019-04-01 16:37:11 +08:00
jeannefukumaru a741bed7a7
added symbols import 2019-04-01 16:21:06 +08:00
jeannefukumaru 745cf0c914 changed tag map from .py to .txt to see if tests pass 2019-04-01 07:04:50 +08:00
jeannefukumaru 3cc897102f added tag_map for indonesian 2019-04-01 00:00:08 +08:00
Matthew Honnibal e64b241f9c Merge branch 'master' of https://github.com/explosion/spaCy 2019-03-31 13:58:38 +02:00
Ines Montani b070e0caf7 Update landing.js 2019-03-30 22:26:46 +01:00
Ines Montani 9d1221943b Merge branch 'master' into spacy.io 2019-03-30 20:32:14 +01:00