Matthew Honnibal
6e1f1c4b9e
Merge pull request #357 from wbwseeker/german_ner
...
German ner
2016-05-02 23:39:34 +10:00
Wolfgang Seeker
b6b96b233c
don't require read_json_file to expect particular annotations
2016-05-02 15:29:30 +02:00
Matthew Honnibal
902a389d85
* Fix merge conflict in test_parse
2016-05-02 15:28:07 +02:00
Matthew Honnibal
276fbe9996
* Fix assignment of iterator on Doc object
2016-05-02 15:26:24 +02:00
Matthew Honnibal
02c23cc1d0
* Fix sentence boundary test
2016-05-02 15:26:07 +02:00
Matthew Honnibal
d2f469b809
* Fix parsing tests, so that labels are added if they're missing, and so that the branching test values are correct
2016-05-02 15:25:27 +02:00
Wolfgang Seeker
b11cbb06c6
remove old tests for sentence boundary detection
2016-05-02 14:36:35 +02:00
Matthew Honnibal
508fd1f6dc
* Refactor noun chunk iterators, so that they're simple functions. Install the iterator when the Doc is created, but allow users to write to the noun_chunk_iterator attribute. The iterator functions accept an object and yield (int start, int end, int label) triples.
2016-05-02 14:25:10 +02:00
Matthew Honnibal
e526be5602
Merge branch 'master' of ssh://github.com/spacy-io/spaCy
2016-05-02 13:08:08 +02:00
Wolfgang Seeker
fa961ea694
add tests for serialization bug
2016-05-02 11:01:56 +02:00
Matthew Honnibal
97b2bba249
* Merge updated/simplified Break approach
2016-04-25 19:44:42 +00:00
Matthew Honnibal
77609588b6
* Fix assignment of root label to words left as root implicitly, after parsing ends.
2016-04-25 19:41:59 +00:00
Matthew Honnibal
7c2d2deaa7
* Revise transition system so that the Break transition retains sole responsibility for setting sentence boundaries. Re Issue #322
2016-04-25 19:41:59 +00:00
Wolfgang Seeker
c2f76a4024
Merge branch 'master' into german_ner
2016-04-25 13:21:23 +02:00
Wolfgang Seeker
1003e7ccec
remove debug output from tests
2016-04-25 12:12:40 +02:00
Wolfgang Seeker
f57f843e85
fix bug in updating tree structure when introducing additional roots
2016-04-25 12:01:19 +02:00
Matthew Honnibal
478a8d1829
* Register Chinese language in spacy/__init__.py
2016-04-24 18:45:16 +02:00
Matthew Honnibal
8569dbc2d0
* Add initial stuff for Chinese parsing
2016-04-24 18:44:24 +02:00
Wolfgang Seeker
4d7f393fae
don't require json-files to have syntactic annotation
2016-04-22 16:32:27 +02:00
Wolfgang Seeker
b6477fc4f4
adjusted tests to Travis Setup
2016-04-21 17:15:10 +02:00
Wolfgang Seeker
736ffcb9a2
remove whitespace
2016-04-21 16:55:55 +02:00
Wolfgang Seeker
6c7301cc6d
the parser now introduces sentence boundaries properly when predicting dependents with root labels
2016-04-21 16:50:53 +02:00
Wolfgang Seeker
12024b0b0a
bugfix: introducing multiple roots now updates original head's properties
...
adjust tests to rely less on statistical model
2016-04-20 16:42:41 +02:00
Matthew Honnibal
67ce96c9c9
* Make patterns argument to Matcher class optional
2016-04-17 21:32:24 +02:00
Matthew Honnibal
8b4677d34d
* Add missing keyword arguments to spacy.load() function
2016-04-17 21:31:50 +02:00
Matthew Honnibal
2add5206aa
* Fix description of matcher test
2016-04-17 15:40:21 +02:00
Matthew Honnibal
2b419d5b8c
* Update test for Issue #242
2016-04-17 15:34:23 +02:00
Matthew Honnibal
f12b043308
* Add test for Issue #242 : Overlapping matches not well recognised.
2016-04-17 15:19:17 +02:00
Wolfgang Seeker
b98cc3266d
bugfix: iterators now reset properly when called a second time
2016-04-15 17:49:16 +02:00
Wolfgang Seeker
e6945c4d0e
bugfix: uppercase attr values before looking them up
2016-04-15 15:46:31 +02:00
Matthew Honnibal
c0909afe22
Merge pull request #312 from wbwseeker/space_head_bug
...
add restrictions to L-arc and R-arc to prevent space heads
2016-04-15 20:36:03 +10:00
Wolfgang Seeker
289b10f441
remove some comments
2016-04-14 15:37:51 +02:00
Matthew Honnibal
6f82065761
* Fix infixed commas in tokenizer, re Issue #326 . Need to benchmark on empirical data, to make sure this doesn't break other cases.
2016-04-14 11:36:03 +02:00
Matthew Honnibal
0f957dd586
Merge branch 'master' of ssh://github.com/honnibal/spaCy
2016-04-14 10:37:56 +02:00
Matthew Honnibal
108aca0e50
* Make Matcher use attrs from the attrs.pyx file, rather than having an incomplete function doing the mapping.
2016-04-14 10:37:39 +02:00
Matthew Honnibal
61d20de35d
* Fix language.py docstring
2016-04-14 10:36:57 +02:00
Wolfgang Seeker
d99a9cbce9
different handling of space tokens
...
space tokens are now always attached to the previous non-space token
there are two exceptions:
leading space tokens are attached to the first following non-space token
in input that consists exclusively of space tokens, the last space token
is the head of all others.
2016-04-13 15:28:28 +02:00
Matthew Honnibal
04d0209be9
* Recognise multiple infixes in a token.
2016-04-13 18:38:26 +10:00
Henning Peters
a473d6e937
fix tests (use english model)
2016-04-12 16:41:57 +02:00
Henning Peters
f2d011c034
avoid polluting spacy namespace with lang classes
2016-04-12 16:31:16 +02:00
Henning Peters
ff690f76ba
fix loading non-german models
2016-04-12 16:00:56 +02:00
Henning Peters
6215272786
remove ujson as default non-dev dependency (still works as fallback if installed), because ujson doesn't ship wheels
2016-04-12 11:28:07 +02:00
Matthew Honnibal
6df3858dbc
* Fix Issue #323 : Incorrect semantics of Token.__str__ built-in. Add flag to allow users to switch the old semantics back on, to ease transition.
2016-04-12 13:17:59 +10:00
Wolfgang Seeker
d328e0b4a8
Merge branch 'master' into space_head_bug
2016-04-11 12:11:01 +02:00
Wolfgang Seeker
80bea62842
bugfix in unit test
2016-04-08 16:46:44 +02:00
Wolfgang Seeker
be4903a1b2
update version numbers
2016-04-08 13:54:05 +02:00
Wolfgang Seeker
1fe911cdb0
bigfix
2016-04-07 18:19:51 +02:00
Matthew Honnibal
872695759d
Merge pull request #306 from wbwseeker/german_noun_chunks
...
add German noun chunk functionality
2016-04-08 00:54:24 +10:00
Henning Peters
470cdf5bf9
remove deprecated LOCAL_DATA_DIR
2016-04-05 11:25:54 +02:00
Matthew Honnibal
26622f0ffc
Merge branch 'master' of ssh://github.com/honnibal/spaCy
2016-03-29 14:31:52 +11:00