Commit Graph

3930 Commits

Author SHA1 Message Date
Ines Montani 4e95737c6c Add base tag map 2016-12-18 16:54:28 +01:00
Ines Montani 2b2ea8ca11 Reorganise language data 2016-12-18 16:54:19 +01:00
Matthew Honnibal 1b31c05bf8 Whitespace 2016-12-18 16:51:40 +01:00
Matthew Honnibal bdcecb3c96 Add import in regression test 2016-12-18 16:51:31 +01:00
Matthew Honnibal 6ee1df93c5 Set tag_map to None if it's not seen in the data by vocab 2016-12-18 16:51:10 +01:00
Matthew Honnibal 33996e770b Update header for morphology class 2016-12-18 16:50:42 +01:00
Matthew Honnibal d58187ffa7 Filter out morphology keys in deprecated attrs 2016-12-18 16:50:26 +01:00
Matthew Honnibal 837a5d4100 Update morphology class so that exceptions can be added one-by-one, and so that arbitrary attributes can be referenced. 2016-12-18 16:49:46 +01:00
Matthew Honnibal 44f4f008bd Wire up lemmatizer rules for English 2016-12-18 15:50:09 +01:00
Matthew Honnibal e6fc4afb04 Whitespace 2016-12-18 15:48:00 +01:00
Ines Montani 32b36c3882 Break language data components into their own files 2016-12-18 15:40:22 +01:00
Ines Montani 1bff59a8db Update English language data 2016-12-18 15:36:53 +01:00
Ines Montani 2eb163c5dd Add lemma rules 2016-12-18 15:36:53 +01:00
Ines Montani 29ad8143d8 Add morph rules 2016-12-18 15:36:53 +01:00
Ines Montani bc40dad7d9 Add entity rules 2016-12-18 15:36:53 +01:00
Ines Montani eaa3b1319d Fix formatting 2016-12-18 15:36:53 +01:00
Ines Montani 704c7442e0 Break language data components into their own files 2016-12-18 15:36:53 +01:00
Ines Montani 62655fd36f Add ENT_ID constant 2016-12-18 15:36:53 +01:00
Matthew Honnibal fa272fdf12 Merge branch 'organize-language-data' of ssh://github.com/explosion/spaCy into organize-language-data 2016-12-18 15:00:21 +01:00
Matthew Honnibal 57c4341453 Refactor loading of morphology exceptions, adding a method add_special_case. 2016-12-18 14:59:44 +01:00
Ines Montani 77cf2fb0f6 Remove unnecessary argument in test 2016-12-18 14:06:27 +01:00
Ines Montani 121c310566 Remove trailing whitespace 2016-12-18 14:06:27 +01:00
Matthew Honnibal 46e98ec029 Move init_model.py script from repo. These meta-tools should live elsewhere 2016-12-18 14:03:40 +01:00
Matthew Honnibal d5840c488b Clean unused code from fabfile 2016-12-18 13:53:30 +01:00
Ines Montani 0fc4e45cb3 Fix tag map for German 2016-12-18 13:30:03 +01:00
Ines Montani 28326649f3 Fix typo 2016-12-18 13:30:03 +01:00
Matthew Honnibal 0595cc0635 Change test595 to mock data, instead of requiring model. 2016-12-18 13:28:51 +01:00
Matthew Honnibal a4eb5c2bff Check POS key in lemmatizer, to update it for new data format 2016-12-18 13:28:20 +01:00
Matthew Honnibal 28d63ec58e Restore missing '' character in tokenizer exceptions. 2016-12-18 05:34:51 +01:00
Ines Montani a9421652c9 Remove duplicates in tag map 2016-12-17 22:44:31 +01:00
Ines Montani 69baf1c9a8 Fix tag map 2016-12-17 22:44:22 +01:00
Ines Montani 577adad945 Fix formatting 2016-12-17 14:00:52 +01:00
Ines Montani fc4ad17136 Fix typo 2016-12-17 14:00:47 +01:00
Ines Montani bb94e784dc Fix typo 2016-12-17 13:59:30 +01:00
Ines Montani afda532595 Use symbols in tag map 2016-12-17 13:56:24 +01:00
Ines Montani 07249145c9 Fix formatting 2016-12-17 13:34:46 +01:00
Ines Montani dd55d085b6 Reformat dutch language data to match new style 2016-12-17 13:26:01 +01:00
Ines Montani f2c48ef504 Resolve stopwords conflict to merge Dutch 2016-12-17 13:08:16 +01:00
Ines Montani 3dded56ae1 Add contributors from #688 2016-12-17 12:52:57 +01:00
Matthew Honnibal ff03ade08f Merge pull request #688 from nlesc-sherlock/dutch
Support for Dutch in SpaCy
2016-12-17 22:44:58 +11:00
Ines Montani a22322187f Add missing lemmas to tokenizer exceptions (fixes #674) 2016-12-17 12:42:41 +01:00
Ines Montani 5445074cbd Expand tokenizer exceptions with unicode apostrophe (fixes #685) 2016-12-17 12:34:08 +01:00
Ines Montani e0a7b5c612 Fix formatting 2016-12-17 12:33:09 +01:00
Ines Montani 08162dce67 Move shared functions and constants to global language data 2016-12-17 12:32:48 +01:00
Ines Montani 6a60a61086 Move update_exc to global language data utils 2016-12-17 12:29:02 +01:00
Ines Montani f324311249 Add global language data utils 2016-12-17 12:27:41 +01:00
Ines Montani 487ce1e20a Add encoding declaration 2016-12-17 12:25:44 +01:00
Ines Montani d8d50a0334 Add tokenizer exception for "gonna" (fixes #691) 2016-12-17 11:59:28 +01:00
Ines Montani c69b77d8aa Revert "Add exception for "gonna""
This reverts commit 280c03f67b.
2016-12-17 11:56:44 +01:00
Ines Montani 280c03f67b Add exception for "gonna" 2016-12-17 11:54:59 +01:00