Commit Graph

313 Commits

Author SHA1 Message Date
Matthew Honnibal e10e9ad2c5 Improve efficiency of Doc.to_array 2017-11-23 12:33:27 +00:00
Matthew Honnibal fa62427300 Remove lookup-based lemmatization 2017-11-23 12:32:22 +00:00
Matthew Honnibal fb26b2cb12 Use lookup lemmatizer if lemma unset 2017-11-23 12:31:58 +00:00
Burton DeWilde a5c6869b2d Fix bug where span.orth_ != span.text (see #1612) 2017-11-20 12:05:43 -06:00
Motoki Wu a52e195a0a Fixes Issue #1207 where `noun_chunks` of `Span` gives an error.
Make sure to reference `self.doc` when getting the noun chunks.

Same fix as 9750a0128c
2017-11-17 17:16:20 -08:00
ines 1c218397f6 Ensure path in Doc.to_disk/from_disk (resolves ##1521)
Also add Doc serialization tests with both Path and string path options
2017-11-09 02:29:03 +01:00
Matthew Honnibal 144a93c2a5 Back-off to tensor for similarity if no vectors 2017-11-03 20:56:33 +01:00
Matthew Honnibal 62ed58935a Add Doc.extend_tensor() method 2017-11-03 11:20:31 +01:00
ines 9659391944 Update deprecated methods and add warnings 2017-11-01 16:49:42 +01:00
ines 705a4e3e4a Fix formatting 2017-11-01 16:44:08 +01:00
Matthew Honnibal 9e0ebee81c Add Token.is_sent_start property, so can deprecate Token.sent_start 2017-11-01 13:27:14 +01:00
Matthew Honnibal 7e7116cdf7 Fix Doc.to_array when only one string attr provided 2017-11-01 13:26:43 +01:00
Matthew Honnibal 301fb2bb60 Implement Span.n_lefts and Span.n_rights 2017-11-01 13:25:12 +01:00
Matthew Honnibal 86eba61fae Fix token.vector when vectors are missing 2017-11-01 00:47:35 +01:00
ines d96e72f656 Tidy up rest 2017-10-27 21:07:59 +02:00
ines d2df81d907 Fix not implemented Span getters 2017-10-27 18:09:28 +02:00
ines 544a407b93 Tidy up Doc, Token and Span and add missing docs 2017-10-27 17:07:26 +02:00
ines 6a0483b7aa Tidy up and document Doc, Token and Span 2017-10-27 15:41:45 +02:00
ines 1a559d4c95 Remove old, unused file 2017-10-27 15:34:35 +02:00
ines ea4a41c8fb Tidy up util and helpers 2017-10-27 14:39:09 +02:00
Matthew Honnibal b66b8f028b Fix #1375 -- out-of-bounds on token.nbor() 2017-10-24 12:10:39 +02:00
Matthew Honnibal ccd2ab1a62 Merge pull request #1443 from ramananbalakrishnan/develop-get-lca-matrix
Add LCA matrix for spans and docs
2017-10-24 11:22:46 +02:00
Matthew Honnibal fdf25d10ba Merge pull request #1440 from ramananbalakrishnan/develop
Support single value for attribute list in doc.to_array
2017-10-24 10:23:12 +02:00
ines a31f048b4d Fix formatting 2017-10-23 10:38:06 +02:00
Ramanan Balakrishnan d2fe56a577
Add LCA matrix for spans and docs 2017-10-20 23:58:00 +05:30
Ramanan Balakrishnan 0726946563
cleanup to_array implementation using fixes on master 2017-10-20 17:09:37 +05:30
Ramanan Balakrishnan b3ab124fc5
Support strings for attribute list in doc.to_array 2017-10-20 11:46:57 +05:30
Ramanan Balakrishnan 7b9b1be44c
Support single value for attribute list in doc.to_array 2017-10-19 17:00:41 +05:30
Matthew Honnibal 394633efce Make doc pickling support hooks 2017-10-17 19:44:09 +02:00
Matthew Honnibal cdb0c426d8 Improve deserialization of user_data, esp. for Underscore 2017-10-17 19:29:20 +02:00
Matthew Honnibal 32a8564c79 Fix doc pickling 2017-10-17 18:20:24 +02:00
Matthew Honnibal 92c1eb2d6f Fix Doc pickling. This also removes need for Binder class 2017-10-17 16:11:13 +02:00
Matthew Honnibal a002264fec Remove caching of Token in Doc, as caused cycle. 2017-10-16 19:34:21 +02:00
Matthew Honnibal 59c216196c Allow weakrefs on Doc objects 2017-10-16 19:22:11 +02:00
ines e0ff145a8b Merge branch 'develop' into feature/dot-underscore 2017-10-11 11:57:05 +02:00
Matthew Honnibal 3b527fa52b Call morphology.assign_untagged when pushing token to Doc 2017-10-11 03:23:57 +02:00
Matthew Honnibal e0a9b02b67 Merge Span._ and Span.as_doc methods 2017-10-09 22:00:15 -05:00
ines 3fc4fe61d2 Fix typo 2017-10-10 04:15:14 +02:00
ines 59c4f27499 Add get, set and has methods to Underscore 2017-10-10 04:14:35 +02:00
Matthew Honnibal 51d18937af Partially apply doc/span/token into method
We want methods to act like they're "bound" to the object, so that you can make your method conditional on the `doc`, `span` or `token` instance --- like, well, a method. We therefore partially apply the function, which works like this:

```
def partial(unbound_method, constant_arg):
    def bound_method(*args, **kwargs):
        return unbound_method(constant_arg, *args, **kwargs)
    return bound_method
2017-10-10 02:21:28 +02:00
Matthew Honnibal e938bce320 Adjust parsing transition system to allow preset sentence segments. 2017-10-08 23:53:34 +02:00
Matthew Honnibal 080afd4924 Add ternary value setting to Token.sent_start 2017-10-08 23:51:58 +02:00
Matthew Honnibal 7ae67ec6a1 Add Span.as_doc method 2017-10-08 23:50:20 +02:00
Matthew Honnibal 668a0ea640 Pass extensions into Underscore class 2017-10-07 18:56:01 +02:00
Matthew Honnibal 1289129fd9 Add Underscore class 2017-10-07 18:00:14 +02:00
Matthew Honnibal 9bfd585a11 Fix parameter name in .pxd file 2017-09-26 07:28:50 -05:00
ines 2480f8f521 Add missing return in Doc.from_disk() (closes #1330) 2017-09-18 15:32:00 +02:00
Matthew Honnibal 03b5b9727a Fix Doc.vector for empty doc objects 2017-08-22 19:52:19 +02:00
Matthew Honnibal 0551b7b03a Fix doc.vector 2017-08-22 19:46:52 +02:00
Matthew Honnibal d55d6e1cfa Fix comparison of Token from different docs. Closes #1257 2017-08-19 16:39:32 +02:00