Commit Graph

124 Commits

Author SHA1 Message Date
Matthew Honnibal 5e94b5d581 * Have Tokens return proper numpy arrays, not Cython views. 2015-06-23 00:07:34 +02:00
Matthew Honnibal c04e6ebca6 * Allow user to load different sized vectors. 2015-06-05 16:26:39 +02:00
Matthew Honnibal f8843906ad Merge branch 'constituency'
Add beam parsing and training from JSON files, with Levenshtein alignment.
2015-06-03 06:07:24 +02:00
Matthew Honnibal ca320afe86 * Add docstring for ents attribute 2015-05-13 21:20:47 +02:00
Matthew Honnibal d48218f4b2 * Add left_edge and right_edge properties 2015-05-12 20:27:55 +02:00
Matthew Honnibal fb8d50b3d5 Merge branch 'master' of ssh://github.com/honnibal/spaCy 2015-04-30 12:45:15 +02:00
Jordan Suchow 3a8d9b37a6 Remove trailing whitespace 2015-04-19 13:01:38 -07:00
Matthew Honnibal f7ffd94e6a * Add Token.conjuncts property 2015-04-17 01:40:53 +02:00
Matthew Honnibal 2ef170a991 * Fix Issue #54: Error merging multi-word token when there's a mid-token match. 2015-04-16 04:28:06 +02:00
Matthew Honnibal bf0aff5124 * Fix bug in Tokens.ents where entity wasn't being emitted if another started immediately after 2015-04-13 21:34:33 +02:00
Matthew Honnibal 2b84a90bbb * Fix Issue #50: Python 3 compatibility of v0.80 2015-04-13 05:59:43 +02:00
Matthew Honnibal fbd48c571d * Rearrange code in tokens.pyx 2015-04-13 05:41:25 +02:00
Matthew Honnibal cff2b13fef * Fix Issue #44: Broken Token.string attribute when single word sentence 2015-04-07 06:08:25 +02:00
Matthew Honnibal b64b2bd910 * Fix Issue #43: TAG attr not supported. Also add DEP attr, while I'm at it. Need better way of ensuring future changes don't break in similar way. 2015-04-07 06:00:30 +02:00
Matthew Honnibal f9e510a893 * Whitespace 2015-04-07 04:53:59 +02:00
Matthew Honnibal fbf19049cf * Add ent_type_ property 2015-03-31 02:01:29 +02:00
Matthew Honnibal e70b87efeb * Add merge() method to Tokens, with fairly brittle/hacky implementation, but quite easy to test. Passing minimal tests. Still need to fix left/right deps in C data 2015-03-30 01:37:41 +02:00
Matthew Honnibal 6f47a667cf * Move Span class to own file 2015-03-26 16:45:38 +01:00
Matthew Honnibal 2b2dec95d3 * Add comment to set_parse 2015-03-26 16:44:47 +01:00
Matthew Honnibal e770fade1e * Don't set dependency labels in set_parse, as this may be used by the Entity recogniser instead. Need to clean this method up... 2015-03-26 16:44:47 +01:00
Matthew Honnibal 670959f40c * Fix iteration order on Tokens.rights 2015-03-26 16:44:46 +01:00
Matthew Honnibal 231ce2dae5 * Assign ROOT label by default. May be papering over another bug. 2015-03-26 16:44:46 +01:00
Matthew Honnibal 3105c7f8ba * Don't pass label_ids dict to Tokens, since we now use the StringStore to manage string-to-int mapping for labels 2015-03-26 16:44:45 +01:00
Matthew Honnibal 31fad99518 * Use StringStore to encode label names, instead of label_ids 2015-03-26 16:44:45 +01:00
Matthew Honnibal 64db61bff1 * Add Span class to Python API 2015-03-26 16:44:45 +01:00
Matthew Honnibal 8057a95f20 * NER seems to be working, scoring 69 F. Need to add decision-history features --- currently only use current word, 2 words context. Need refactoring. 2015-03-26 16:44:44 +01:00
Matthew Honnibal 6865c2fb4d * Fix assignment of dep strings in tokens.pyx 2015-03-26 16:44:43 +01:00
Matthew Honnibal 01bc4d6815 * Add set_parse method, to assign parse to tokens in a less hacky way. 2015-03-26 16:44:42 +01:00
Matthew Honnibal 23c1f6fc04 * Merge changes from stash 2015-03-26 16:44:41 +01:00
Matthew Honnibal 2e3dc3dfe2 * Merge changes in tokens.pyx 2015-03-26 16:44:41 +01:00
Matthew Honnibal 0962ffc095 * Fix issue #37: missing check_flag attribute from Token class 2015-03-26 15:06:26 +01:00
Matthew Honnibal dbe26f5793 * Add children and subtree methods to Token, which are generators to assist parse-tree navigation. 2015-03-03 04:18:41 -05:00
Matthew Honnibal cae077b583 * Work on fixing orphaned Token objects bug 2015-02-16 15:20:31 -05:00
Matthew Honnibal 7572e31f5e * Pass ownership of C data to Token instances if Tokens object is being garbage-collected, but Token instances are staying alive. 2015-02-11 18:05:06 -05:00
Matthew Honnibal ab8bb047d0 * Fix negative index for __getitem__ 2015-02-07 12:58:46 -05:00
Matthew Honnibal c7d8644149 * Fix regression on 'prob' attr of Token. 2015-02-03 03:32:18 +11:00
Matthew Honnibal de772088e6 * Use parse tree for sbd in Tokens.sents 2015-02-02 12:17:32 +11:00
Matthew Honnibal 7de00c5a79 * Try not holding a reference to Pool, since that seems to confuse the GC 2015-01-31 22:10:22 +11:00
Matthew Honnibal 018e0bfa24 * Bug fixes to parse navigation 2015-01-31 16:37:13 +11:00
Matthew Honnibal 77d62d0179 * Large refactor of Token objects, making them much thinner. This is to support fast parse-tree navigation. 2015-01-31 13:42:58 +11:00
Matthew Honnibal 251dbf24d7 * Fix unintialised variable error 2015-01-30 20:46:34 +11:00
Matthew Honnibal 1a7a1c2771 * Fix Issue #16: tokens recurse when printing 2015-01-30 19:47:50 +11:00
Matthew Honnibal b68f563c2f * Fix Issue #14: Improve parsing API 2015-01-30 18:04:41 +11:00
Matthew Honnibal e6c3d3471f * Tweak documentation for Tokens, and hide constructor as __cinit__ 2015-01-27 18:57:52 +11:00
Matthew Honnibal 12b034e3ef * Move POS tag definitions to parts_of_speech.pxd 2015-01-25 16:31:07 +11:00
Matthew Honnibal 7431c133d8 * Add error if try to access head and not is_parsed 2015-01-25 15:33:54 +11:00
Matthew Honnibal a97bed9359 * Fix POS and dependency label tag names. Add parse and string navigation functions. 2015-01-24 17:29:04 +11:00
Matthew Honnibal 76cd024095 * Add whitespace property to Token 2015-01-24 07:41:21 +11:00
Matthew Honnibal 5fd72bc220 * Have 'string' refer to the whitespace-padded string 2015-01-24 07:32:38 +11:00
Matthew Honnibal fda94271af * Rename NORM1 and NORM2 attrs to lower and norm 2015-01-24 06:17:03 +11:00