Commit Graph

2944 Commits

Author SHA1 Message Date
Wolfgang Seeker dae6bc05eb define German dummy lemmatizer until morphology is done 2016-05-02 16:04:53 +02:00
Matthew Honnibal 6e1f1c4b9e Merge pull request #357 from wbwseeker/german_ner
German ner
2016-05-02 23:39:34 +10:00
Wolfgang Seeker b6b96b233c don't require read_json_file to expect particular annotations 2016-05-02 15:29:30 +02:00
Matthew Honnibal 902a389d85 * Fix merge conflict in test_parse 2016-05-02 15:28:07 +02:00
Matthew Honnibal 276fbe9996 * Fix assignment of iterator on Doc object 2016-05-02 15:26:24 +02:00
Matthew Honnibal 02c23cc1d0 * Fix sentence boundary test 2016-05-02 15:26:07 +02:00
Matthew Honnibal d2f469b809 * Fix parsing tests, so that labels are added if they're missing, and so that the branching test values are correct 2016-05-02 15:25:27 +02:00
Wolfgang Seeker b11cbb06c6 remove old tests for sentence boundary detection 2016-05-02 14:36:35 +02:00
Matthew Honnibal 508fd1f6dc * Refactor noun chunk iterators, so that they're simple functions. Install the iterator when the Doc is created, but allow users to write to the noun_chunk_iterator attribute. The iterator functions accept an object and yield (int start, int end, int label) triples. 2016-05-02 14:25:10 +02:00
Matthew Honnibal e526be5602 Merge branch 'master' of ssh://github.com/spacy-io/spaCy 2016-05-02 13:08:08 +02:00
Wolfgang Seeker fa961ea694 add tests for serialization bug 2016-05-02 11:01:56 +02:00
Henning Peters 9b142d4438 can't work around build issue on windows 2016-05-01 12:30:59 +02:00
Henning Peters 749cbd359e Update LICENSE 2016-04-29 09:49:28 +02:00
Henning Peters fac209cb7e add stdint.h fallback (vs 2008) 2016-04-29 00:08:14 +02:00
Henning Peters 2bf34687ea add stdint.h fallback (vs 2008) 2016-04-28 22:10:43 +02:00
Matthew Honnibal 97b2bba249 * Merge updated/simplified Break approach 2016-04-25 19:44:42 +00:00
Matthew Honnibal 77609588b6 * Fix assignment of root label to words left as root implicitly, after parsing ends. 2016-04-25 19:41:59 +00:00
Matthew Honnibal 7c2d2deaa7 * Revise transition system so that the Break transition retains sole responsibility for setting sentence boundaries. Re Issue #322 2016-04-25 19:41:59 +00:00
Wolfgang Seeker c2f76a4024 Merge branch 'master' into german_ner 2016-04-25 13:21:23 +02:00
Matthew Honnibal feb65fcaa1 Merge pull request #346 from wbwseeker/sentbnd_bug
introduce sentence boundaries for additional root tokens
2016-04-25 20:31:27 +10:00
Wolfgang Seeker 1003e7ccec remove debug output from tests 2016-04-25 12:12:40 +02:00
Wolfgang Seeker f57f843e85 fix bug in updating tree structure when introducing additional roots 2016-04-25 12:01:19 +02:00
Matthew Honnibal 478a8d1829 * Register Chinese language in spacy/__init__.py 2016-04-24 18:45:16 +02:00
Matthew Honnibal 8569dbc2d0 * Add initial stuff for Chinese parsing 2016-04-24 18:44:24 +02:00
Wolfgang Seeker 4d7f393fae don't require json-files to have syntactic annotation 2016-04-22 16:32:27 +02:00
Wolfgang Seeker b6477fc4f4 adjusted tests to Travis Setup 2016-04-21 17:15:10 +02:00
Wolfgang Seeker 736ffcb9a2 remove whitespace 2016-04-21 16:55:55 +02:00
Wolfgang Seeker 6c7301cc6d the parser now introduces sentence boundaries properly when predicting dependents with root labels 2016-04-21 16:50:53 +02:00
Wolfgang Seeker 12024b0b0a bugfix: introducing multiple roots now updates original head's properties
adjust tests to rely less on statistical model
2016-04-20 16:42:41 +02:00
Henning Peters c356251f45 Merge branch 'master' of github.com:spacy-io/spaCy 2016-04-19 19:50:55 +02:00
Henning Peters bb3238bcdd pin numpy to >=1.7, ship headers 2016-04-19 19:50:42 +02:00
Matthew Honnibal 67ce96c9c9 * Make patterns argument to Matcher class optional 2016-04-17 21:32:24 +02:00
Matthew Honnibal 8b4677d34d * Add missing keyword arguments to spacy.load() function 2016-04-17 21:31:50 +02:00
Matthew Honnibal 2add5206aa * Fix description of matcher test 2016-04-17 15:40:21 +02:00
Matthew Honnibal 2b419d5b8c * Update test for Issue #242 2016-04-17 15:34:23 +02:00
Matthew Honnibal f12b043308 * Add test for Issue #242: Overlapping matches not well recognised. 2016-04-17 15:19:17 +02:00
Wolfgang Seeker b98cc3266d bugfix: iterators now reset properly when called a second time 2016-04-15 17:49:16 +02:00
Wolfgang Seeker e6945c4d0e bugfix: uppercase attr values before looking them up 2016-04-15 15:46:31 +02:00
Matthew Honnibal c0909afe22 Merge pull request #312 from wbwseeker/space_head_bug
add restrictions to L-arc and R-arc to prevent space heads
2016-04-15 20:36:03 +10:00
Wolfgang Seeker 289b10f441 remove some comments 2016-04-14 15:37:51 +02:00
Matthew Honnibal fe9299a118 * Fix long-standing issue with coarse-grained tags: proper nouns weren't receiving the PROPN tag, and personal pronouns weren't receiving the PRON tag. This should fix Issue #191, and also Issue #325, which reported that proper nouns were being lemmatized using the common noun policies. This lemmatization will be prevented if the universal tag is PROPN, not NOUN, as no lemmatization rules are loaded for the PROPN tag. 2016-04-14 12:46:43 +02:00
Matthew Honnibal 6f82065761 * Fix infixed commas in tokenizer, re Issue #326. Need to benchmark on empirical data, to make sure this doesn't break other cases. 2016-04-14 11:36:03 +02:00
Matthew Honnibal 0f957dd586 Merge branch 'master' of ssh://github.com/honnibal/spaCy 2016-04-14 10:37:56 +02:00
Matthew Honnibal 108aca0e50 * Make Matcher use attrs from the attrs.pyx file, rather than having an incomplete function doing the mapping. 2016-04-14 10:37:39 +02:00
Matthew Honnibal 61d20de35d * Fix language.py docstring 2016-04-14 10:36:57 +02:00
Wolfgang Seeker d99a9cbce9 different handling of space tokens
space tokens are now always attached to the previous non-space token
there are two exceptions:
leading space tokens are attached to the first following non-space token
in input that consists exclusively of space tokens, the last space token
is the head of all others.
2016-04-13 15:28:28 +02:00
Matthew Honnibal 04d0209be9 * Recognise multiple infixes in a token. 2016-04-13 18:38:26 +10:00
Henning Peters a473d6e937 fix tests (use english model) 2016-04-12 16:41:57 +02:00
Henning Peters f2d011c034 avoid polluting spacy namespace with lang classes 2016-04-12 16:31:16 +02:00
Henning Peters ff690f76ba fix loading non-german models 2016-04-12 16:00:56 +02:00