Commit Graph

328 Commits

Author SHA1 Message Date
Matthew Honnibal a1be01185c Fix array out of bounds error in Span 2018-02-28 12:27:09 +01:00
Thomas Opsomer 8df9e52829 lemma property to return hash instead of unicode 2018-02-27 19:50:01 +01:00
Matthew Honnibal cf0e320f2b Add doc.is_sentenced attribute, re #1959 2018-02-18 14:16:55 +01:00
Matthew Honnibal 1e5aeb4eec
Merge pull request #1987 from thomasopsomer/span-sent
Make span.sent work when only manual / custom sbd
2018-02-18 14:05:37 +01:00
Thomas Opsomer deab391cbf correct check on sent_start & raise if no boundaries 2018-02-15 16:58:30 +01:00
Thomas Opsomer b902731313 Find span sentence when only sentence boundaries (no parser) 2018-02-14 22:18:54 +01:00
4altinok ca8728035d added new lex feat to token 2018-02-11 18:55:48 +01:00
Thomas Opsomer 515e25910e fix sent_start in serialization 2018-01-28 19:50:42 +01:00
Matthew Honnibal 56164ab688 Set l_edge and r_edge correctly for non-projective parses. Fixes #1799 2018-01-22 20:18:04 +01:00
Matthew Honnibal ccb51a9f36 Make .similarity() return 1.0 if all orth attrs match 2018-01-15 16:29:48 +01:00
Matthew Honnibal b904d81e9a Fix rich comparison against None objects. Closes #1757 2018-01-15 15:51:25 +01:00
Matthew Honnibal ab7c45b12d Fix error message and handling of doc.sents 2018-01-15 15:21:11 +01:00
Matthew Honnibal 465a6f6452 Add missing Span.vocab property. Closes #1633 2018-01-14 15:06:30 +01:00
Matthew Honnibal 0cb090e526 Fix infinite recursion in token.sent_start. Closes #1640 2018-01-14 15:02:15 +01:00
Matthew Honnibal 5cbe913b6f Don't raise deprecation warning in property. Closes #1813, #1712 2018-01-14 14:55:58 +01:00
Matthew Honnibal e10e9ad2c5 Improve efficiency of Doc.to_array 2017-11-23 12:33:27 +00:00
Matthew Honnibal fa62427300 Remove lookup-based lemmatization 2017-11-23 12:32:22 +00:00
Matthew Honnibal fb26b2cb12 Use lookup lemmatizer if lemma unset 2017-11-23 12:31:58 +00:00
Burton DeWilde a5c6869b2d Fix bug where span.orth_ != span.text (see #1612) 2017-11-20 12:05:43 -06:00
Motoki Wu a52e195a0a Fixes Issue #1207 where `noun_chunks` of `Span` gives an error.
Make sure to reference `self.doc` when getting the noun chunks.

Same fix as 9750a0128c
2017-11-17 17:16:20 -08:00
ines 1c218397f6 Ensure path in Doc.to_disk/from_disk (resolves ##1521)
Also add Doc serialization tests with both Path and string path options
2017-11-09 02:29:03 +01:00
Matthew Honnibal 144a93c2a5 Back-off to tensor for similarity if no vectors 2017-11-03 20:56:33 +01:00
Matthew Honnibal 62ed58935a Add Doc.extend_tensor() method 2017-11-03 11:20:31 +01:00
ines 9659391944 Update deprecated methods and add warnings 2017-11-01 16:49:42 +01:00
ines 705a4e3e4a Fix formatting 2017-11-01 16:44:08 +01:00
Matthew Honnibal 9e0ebee81c Add Token.is_sent_start property, so can deprecate Token.sent_start 2017-11-01 13:27:14 +01:00
Matthew Honnibal 7e7116cdf7 Fix Doc.to_array when only one string attr provided 2017-11-01 13:26:43 +01:00
Matthew Honnibal 301fb2bb60 Implement Span.n_lefts and Span.n_rights 2017-11-01 13:25:12 +01:00
Matthew Honnibal 86eba61fae Fix token.vector when vectors are missing 2017-11-01 00:47:35 +01:00
ines d96e72f656 Tidy up rest 2017-10-27 21:07:59 +02:00
ines d2df81d907 Fix not implemented Span getters 2017-10-27 18:09:28 +02:00
ines 544a407b93 Tidy up Doc, Token and Span and add missing docs 2017-10-27 17:07:26 +02:00
ines 6a0483b7aa Tidy up and document Doc, Token and Span 2017-10-27 15:41:45 +02:00
ines 1a559d4c95 Remove old, unused file 2017-10-27 15:34:35 +02:00
ines ea4a41c8fb Tidy up util and helpers 2017-10-27 14:39:09 +02:00
Matthew Honnibal b66b8f028b Fix #1375 -- out-of-bounds on token.nbor() 2017-10-24 12:10:39 +02:00
Matthew Honnibal ccd2ab1a62 Merge pull request #1443 from ramananbalakrishnan/develop-get-lca-matrix
Add LCA matrix for spans and docs
2017-10-24 11:22:46 +02:00
Matthew Honnibal fdf25d10ba Merge pull request #1440 from ramananbalakrishnan/develop
Support single value for attribute list in doc.to_array
2017-10-24 10:23:12 +02:00
ines a31f048b4d Fix formatting 2017-10-23 10:38:06 +02:00
Ramanan Balakrishnan d2fe56a577
Add LCA matrix for spans and docs 2017-10-20 23:58:00 +05:30
Ramanan Balakrishnan 0726946563
cleanup to_array implementation using fixes on master 2017-10-20 17:09:37 +05:30
Ramanan Balakrishnan b3ab124fc5
Support strings for attribute list in doc.to_array 2017-10-20 11:46:57 +05:30
Ramanan Balakrishnan 7b9b1be44c
Support single value for attribute list in doc.to_array 2017-10-19 17:00:41 +05:30
Matthew Honnibal 394633efce Make doc pickling support hooks 2017-10-17 19:44:09 +02:00
Matthew Honnibal cdb0c426d8 Improve deserialization of user_data, esp. for Underscore 2017-10-17 19:29:20 +02:00
Matthew Honnibal 32a8564c79 Fix doc pickling 2017-10-17 18:20:24 +02:00
Matthew Honnibal 92c1eb2d6f Fix Doc pickling. This also removes need for Binder class 2017-10-17 16:11:13 +02:00
Matthew Honnibal a002264fec Remove caching of Token in Doc, as caused cycle. 2017-10-16 19:34:21 +02:00
Matthew Honnibal 59c216196c Allow weakrefs on Doc objects 2017-10-16 19:22:11 +02:00
ines e0ff145a8b Merge branch 'develop' into feature/dot-underscore 2017-10-11 11:57:05 +02:00