Commit Graph

4859 Commits

Author SHA1 Message Date
DuyguA cca87756d7 added Sti 2018-03-08 18:07:52 +01:00
DuyguA 3c994311c5 added abbrevs 2018-03-08 18:03:27 +01:00
DuyguA 56d6fb180e added like_num to lex 2018-03-08 15:25:25 +01:00
DuyguA 26ee0590a3 added some commonly used cases 2018-03-08 12:43:58 +01:00
DuyguA ae6473e4d5 removed some words with negation particle. 2018-03-08 12:20:32 +01:00
DuyguA 6ed59a2198 removed number words to be caried to the lexical 2018-03-08 12:19:23 +01:00
DuyguA 04784a44a6 made alphabetical order for Turkish chaaracters 2018-03-08 12:11:32 +01:00
DuyguA af33e022a5 added example sentences for Turkish 2018-03-08 12:06:03 +01:00
Matthew Honnibal a1be01185c Fix array out of bounds error in Span 2018-02-28 12:27:09 +01:00
Thomas Opsomer 8df9e52829 lemma property to return hash instead of unicode 2018-02-27 19:50:01 +01:00
Ines Montani 35634352fe
Merge pull request #2025 from dejanmarich/patch-1
Update stop_words.py for Croatian language
2018-02-26 18:22:32 +01:00
Matthew Honnibal 14f729c72a Add subtok label to parser 2018-02-26 12:26:35 +01:00
Matthew Honnibal 7137ad8b0b Make label filtering clearer for projectivisation 2018-02-26 12:02:01 +01:00
Matthew Honnibal b8d52cb285 Fix inconsistent label freq cutoff for projectivisation 2018-02-26 12:01:44 +01:00
Matthew Honnibal 7b66ec896a Revert "Revert "Improve parser oracle around sentence breaks.""
This reverts commit 36e481c584.
2018-02-26 10:57:37 +01:00
Matthew Honnibal 36e481c584 Revert "Improve parser oracle around sentence breaks."
This reverts commit 50817dc9ad.
2018-02-26 10:53:55 +01:00
Matthew Honnibal 5faae803c6 Add option to not use Janome for Japanese tokenization 2018-02-26 09:39:46 +01:00
Matthew Honnibal 9b406181cd Add Chinese.Defaults.use_jieba setting, for UD 2018-02-25 15:12:38 +01:00
Matthew Honnibal 9ccd0c643b Add Vietnamese 2018-02-25 15:00:46 +01:00
Matthew Honnibal d4fdb97c87 Fix alignment for words with spaces 2018-02-25 14:55:00 +01:00
Matthew Honnibal 6d2c1ef52c Fix SP tag in generic tag map 2018-02-24 16:04:56 +01:00
Matthew Honnibal 5cc3bd1c1d Update alignment tests 2018-02-24 16:03:58 +01:00
Matthew Honnibal 6138439469 Fix many-to-one alignment 2018-02-24 16:03:50 +01:00
Matthew Honnibal 4890ee1732 Fix scoring of tokenization for punct 2018-02-24 10:32:32 +01:00
Matthew Honnibal 12b39f87da Move cython declarations in matcher.pyx 2018-02-24 10:32:18 +01:00
Matthew Honnibal 01d1b7abdf Support many-to-one alignment in GoldParse 2018-02-24 10:17:01 +01:00
Matthew Honnibal 7865746574 Support many-to-one alignment 2018-02-24 02:09:53 +01:00
Matthew Honnibal 458710b831 Poke matcher test for appveyor 2018-02-23 23:53:48 +01:00
Matthew Honnibal 968dabdde4 Fix bug in multi-task objective 2018-02-23 23:48:09 +01:00
Matthew Honnibal 2c9c8b8d72 Try comming out emoji test in matcher 2018-02-23 23:34:35 +01:00
Matthew Honnibal 980ad68cbe Try to find test that fails on appveyor 2018-02-23 21:27:53 +01:00
Matthew Honnibal 39de8cd4d3 Try to find test failing on appveyor 2018-02-23 20:59:21 +01:00
Matthew Honnibal 4492a33a9d Fix sent_start multi-task objective when alignment fails 2018-02-23 16:50:59 +01:00
Matthew Honnibal 5fa44e93f1 Set unicode_literals in matcher 2018-02-23 16:48:54 +01:00
Matthew Honnibal 12264f9296 Add multi-task objective for sentence segmentation 2018-02-23 16:25:57 +01:00
Matthew Honnibal e7deadb519 Set version to 2.1.0.dev1 2018-02-23 16:22:24 +01:00
Matthew Honnibal 7b575a119e Try to reduce memory usage of test_matcher 2018-02-23 15:34:37 +01:00
Matthew Honnibal 24563f4026 Fix data typing in align 2018-02-23 15:08:06 +01:00
Matthew Honnibal 7a5ba20692 Fix integer typing in _align 2018-02-23 14:51:24 +01:00
Matthew Honnibal 875411b875 Set unicode types in _align.pyx and test 2018-02-23 14:35:38 +01:00
Matthew Honnibal 51d9679aa3 Fix broken span.as_doc test 2018-02-23 14:22:24 +01:00
dejanmarich 71c261d58b
Update stop_words.py
Added more words
2018-02-23 10:31:01 +01:00
Matthew Honnibal 3e6c1111b7 Remove obsolete test 2018-02-23 03:22:07 +01:00
Matthew Honnibal a4fdec524a Merge branch 'master' of https://github.com/explosion/spaCy into feature/better-gold 2018-02-22 21:44:28 +01:00
Matthew Honnibal 50817dc9ad Improve parser oracle around sentence breaks. 2018-02-22 19:22:26 +01:00
Matthew Honnibal 661873ee4c Randomize the rebatch size in parser 2018-02-21 21:02:07 +01:00
Matthew Honnibal 0872cf611d Don't lower-case lemmas of proper nouns 2018-02-21 16:01:16 +01:00
Matthew Honnibal a0ddb803fd Make error when no label found more helpful 2018-02-21 16:00:59 +01:00
Matthew Honnibal ea2fc5d45f Improve length and freq cutoffs in parser 2018-02-21 16:00:38 +01:00
Matthew Honnibal e5757d4bf0 Add labels property to parser 2018-02-21 16:00:00 +01:00