Commit Graph

1533 Commits

Author SHA1 Message Date
Wolfgang Seeker c2f76a4024 Merge branch 'master' into german_ner 2016-04-25 13:21:23 +02:00
Wolfgang Seeker 1003e7ccec remove debug output from tests 2016-04-25 12:12:40 +02:00
Wolfgang Seeker f57f843e85 fix bug in updating tree structure when introducing additional roots 2016-04-25 12:01:19 +02:00
Wolfgang Seeker 4d7f393fae don't require json-files to have syntactic annotation 2016-04-22 16:32:27 +02:00
Wolfgang Seeker b6477fc4f4 adjusted tests to Travis Setup 2016-04-21 17:15:10 +02:00
Wolfgang Seeker 736ffcb9a2 remove whitespace 2016-04-21 16:55:55 +02:00
Wolfgang Seeker 6c7301cc6d the parser now introduces sentence boundaries properly when predicting dependents with root labels 2016-04-21 16:50:53 +02:00
Wolfgang Seeker 12024b0b0a bugfix: introducing multiple roots now updates original head's properties
adjust tests to rely less on statistical model
2016-04-20 16:42:41 +02:00
Matthew Honnibal 2add5206aa * Fix description of matcher test 2016-04-17 15:40:21 +02:00
Matthew Honnibal 2b419d5b8c * Update test for Issue #242 2016-04-17 15:34:23 +02:00
Matthew Honnibal f12b043308 * Add test for Issue #242: Overlapping matches not well recognised. 2016-04-17 15:19:17 +02:00
Wolfgang Seeker b98cc3266d bugfix: iterators now reset properly when called a second time 2016-04-15 17:49:16 +02:00
Wolfgang Seeker e6945c4d0e bugfix: uppercase attr values before looking them up 2016-04-15 15:46:31 +02:00
Matthew Honnibal c0909afe22 Merge pull request #312 from wbwseeker/space_head_bug
add restrictions to L-arc and R-arc to prevent space heads
2016-04-15 20:36:03 +10:00
Wolfgang Seeker 289b10f441 remove some comments 2016-04-14 15:37:51 +02:00
Matthew Honnibal 6f82065761 * Fix infixed commas in tokenizer, re Issue #326. Need to benchmark on empirical data, to make sure this doesn't break other cases. 2016-04-14 11:36:03 +02:00
Matthew Honnibal 0f957dd586 Merge branch 'master' of ssh://github.com/honnibal/spaCy 2016-04-14 10:37:56 +02:00
Matthew Honnibal 108aca0e50 * Make Matcher use attrs from the attrs.pyx file, rather than having an incomplete function doing the mapping. 2016-04-14 10:37:39 +02:00
Matthew Honnibal 61d20de35d * Fix language.py docstring 2016-04-14 10:36:57 +02:00
Wolfgang Seeker d99a9cbce9 different handling of space tokens
space tokens are now always attached to the previous non-space token
there are two exceptions:
leading space tokens are attached to the first following non-space token
in input that consists exclusively of space tokens, the last space token
is the head of all others.
2016-04-13 15:28:28 +02:00
Matthew Honnibal 04d0209be9 * Recognise multiple infixes in a token. 2016-04-13 18:38:26 +10:00
Henning Peters a473d6e937 fix tests (use english model) 2016-04-12 16:41:57 +02:00
Henning Peters f2d011c034 avoid polluting spacy namespace with lang classes 2016-04-12 16:31:16 +02:00
Henning Peters ff690f76ba fix loading non-german models 2016-04-12 16:00:56 +02:00
Henning Peters 6215272786 remove ujson as default non-dev dependency (still works as fallback if installed), because ujson doesn't ship wheels 2016-04-12 11:28:07 +02:00
Matthew Honnibal 6df3858dbc * Fix Issue #323: Incorrect semantics of Token.__str__ built-in. Add flag to allow users to switch the old semantics back on, to ease transition. 2016-04-12 13:17:59 +10:00
Wolfgang Seeker d328e0b4a8 Merge branch 'master' into space_head_bug 2016-04-11 12:11:01 +02:00
Wolfgang Seeker 80bea62842 bugfix in unit test 2016-04-08 16:46:44 +02:00
Wolfgang Seeker be4903a1b2 update version numbers 2016-04-08 13:54:05 +02:00
Wolfgang Seeker 1fe911cdb0 bigfix 2016-04-07 18:19:51 +02:00
Matthew Honnibal 872695759d Merge pull request #306 from wbwseeker/german_noun_chunks
add German noun chunk functionality
2016-04-08 00:54:24 +10:00
Henning Peters 470cdf5bf9 remove deprecated LOCAL_DATA_DIR 2016-04-05 11:25:54 +02:00
Matthew Honnibal 26622f0ffc Merge branch 'master' of ssh://github.com/honnibal/spaCy 2016-03-29 14:31:52 +11:00
Matthew Honnibal b1fe41b45d * Extend infix test, commenting on limitation of tokenizer w.r.t. infixes at the moment. 2016-03-29 14:31:05 +11:00
Matthew Honnibal 9c73983bdd * Add test for hyphenation problem in Issue #302 2016-03-29 14:27:13 +11:00
Matthew Honnibal ad119c074f * Fix incorrect whitespacing in Doc.text. This change is potentially breaking, to anyone who was relying on the previous incorrect semantics. 2016-03-29 13:02:42 +11:00
Matthew Honnibal 8c7a1908ee Merge pull request #307 from scoder/faster_string_store
remove internal redundancy and overhead from StringStore
2016-03-29 12:59:52 +11:00
Wolfgang Seeker 7195b6742d add restrictions to L-arc and R-arc to prevent space heads 2016-03-28 10:40:52 +02:00
Matthew Honnibal 8c77a994c6 Merge pull request #305 from henningpeters/master
multiple langs in download script
2016-03-26 21:54:59 +11:00
Henning Peters c90d4a6f17 relative imports in __init__.py 2016-03-26 11:44:53 +01:00
Henning Peters db095a162c fix 2016-03-25 18:59:47 +01:00
Henning Peters b8f63071eb add lang registration facility 2016-03-25 18:54:45 +01:00
Matthew Honnibal 4a37fdcee1 Merge pull request #287 from wbwseeker/deproj_sentbnd_bug
add function to Token for setting head and dep (and dep_)
2016-03-25 09:47:45 +11:00
Stefan Behnel f18805ee1c make StringStore.__contains__() return True for the empty string (which is also contained in iteration) 2016-03-24 15:42:12 +01:00
Stefan Behnel f2cfbfc412 remove internal redundancy and overhead from StringStore 2016-03-24 15:25:27 +01:00
Wolfgang Seeker d65ef41d08 make error messages language independent 2016-03-24 11:47:09 +01:00
Henning Peters 963570aa49 Merge branch 'master' of github.com:spacy-io/spaCy 2016-03-24 11:19:47 +01:00
Henning Peters a7d7ea3afa first idea for supporting multiple langs in download script 2016-03-24 11:19:43 +01:00
Wolfgang Seeker 5080077097 revert init_model.py back to pre-german state (because it makes more sense)
simplify token.n_rights and token.n_lefts
2016-03-21 16:10:25 +01:00
Wolfgang Seeker 5e2e8e951a add baseclass DocIterator for iterators over documents
add classes for English and German noun chunks

the respective iterators are set for the document when created by the parser
as they depend on the annotation scheme of the parsing model
2016-03-16 15:53:35 +01:00