Commit Graph

181 Commits

Author SHA1 Message Date
Thomas Opsomer 515e25910e fix sent_start in serialization 2018-01-28 19:50:42 +01:00
Matthew Honnibal 56164ab688 Set l_edge and r_edge correctly for non-projective parses. Fixes #1799 2018-01-22 20:18:04 +01:00
Matthew Honnibal ccb51a9f36 Make .similarity() return 1.0 if all orth attrs match 2018-01-15 16:29:48 +01:00
Matthew Honnibal ab7c45b12d Fix error message and handling of doc.sents 2018-01-15 15:21:11 +01:00
Matthew Honnibal e10e9ad2c5 Improve efficiency of Doc.to_array 2017-11-23 12:33:27 +00:00
Matthew Honnibal fa62427300 Remove lookup-based lemmatization 2017-11-23 12:32:22 +00:00
ines 1c218397f6 Ensure path in Doc.to_disk/from_disk (resolves ##1521)
Also add Doc serialization tests with both Path and string path options
2017-11-09 02:29:03 +01:00
Matthew Honnibal 144a93c2a5 Back-off to tensor for similarity if no vectors 2017-11-03 20:56:33 +01:00
Matthew Honnibal 62ed58935a Add Doc.extend_tensor() method 2017-11-03 11:20:31 +01:00
ines 9659391944 Update deprecated methods and add warnings 2017-11-01 16:49:42 +01:00
ines 705a4e3e4a Fix formatting 2017-11-01 16:44:08 +01:00
Matthew Honnibal 7e7116cdf7 Fix Doc.to_array when only one string attr provided 2017-11-01 13:26:43 +01:00
ines 544a407b93 Tidy up Doc, Token and Span and add missing docs 2017-10-27 17:07:26 +02:00
ines 6a0483b7aa Tidy up and document Doc, Token and Span 2017-10-27 15:41:45 +02:00
Matthew Honnibal ccd2ab1a62 Merge pull request #1443 from ramananbalakrishnan/develop-get-lca-matrix
Add LCA matrix for spans and docs
2017-10-24 11:22:46 +02:00
Ramanan Balakrishnan d2fe56a577
Add LCA matrix for spans and docs 2017-10-20 23:58:00 +05:30
Ramanan Balakrishnan 0726946563
cleanup to_array implementation using fixes on master 2017-10-20 17:09:37 +05:30
Ramanan Balakrishnan b3ab124fc5
Support strings for attribute list in doc.to_array 2017-10-20 11:46:57 +05:30
Ramanan Balakrishnan 7b9b1be44c
Support single value for attribute list in doc.to_array 2017-10-19 17:00:41 +05:30
Matthew Honnibal 394633efce Make doc pickling support hooks 2017-10-17 19:44:09 +02:00
Matthew Honnibal cdb0c426d8 Improve deserialization of user_data, esp. for Underscore 2017-10-17 19:29:20 +02:00
Matthew Honnibal 32a8564c79 Fix doc pickling 2017-10-17 18:20:24 +02:00
Matthew Honnibal 92c1eb2d6f Fix Doc pickling. This also removes need for Binder class 2017-10-17 16:11:13 +02:00
Matthew Honnibal a002264fec Remove caching of Token in Doc, as caused cycle. 2017-10-16 19:34:21 +02:00
ines e0ff145a8b Merge branch 'develop' into feature/dot-underscore 2017-10-11 11:57:05 +02:00
Matthew Honnibal 3b527fa52b Call morphology.assign_untagged when pushing token to Doc 2017-10-11 03:23:57 +02:00
Matthew Honnibal e0a9b02b67 Merge Span._ and Span.as_doc methods 2017-10-09 22:00:15 -05:00
Matthew Honnibal e938bce320 Adjust parsing transition system to allow preset sentence segments. 2017-10-08 23:53:34 +02:00
Matthew Honnibal 668a0ea640 Pass extensions into Underscore class 2017-10-07 18:56:01 +02:00
ines 2480f8f521 Add missing return in Doc.from_disk() (closes #1330) 2017-09-18 15:32:00 +02:00
Matthew Honnibal 03b5b9727a Fix Doc.vector for empty doc objects 2017-08-22 19:52:19 +02:00
Matthew Honnibal 0551b7b03a Fix doc.vector 2017-08-22 19:46:52 +02:00
Matthew Honnibal 8b7ac77c23 Allow span label to be string in Doc.char_span 2017-08-19 16:18:09 +02:00
Matthew Honnibal 80236116a6 Add Doc.char_span method, to get a span by character offset 2017-08-19 12:21:09 +02:00
Matthew Honnibal a6a2159969 Add slot for text categories to Doc 2017-07-22 00:34:15 +02:00
Matthew Honnibal 2a3bd5ee90 Fix fetching of noun chunk iterator 2017-06-04 15:53:05 -05:00
Matthew Honnibal 92ae36f84e Improve way noun chunks iterator is looked up 2017-06-04 21:53:39 +02:00
Matthew Honnibal 675f448313 Fix vector linkage on Doc 2017-06-04 14:25:30 -05:00
ines 459a1e8470 Fix whitespace 2017-06-03 11:31:18 +02:00
ines 5109bba910 Port over fix from #1070 2017-06-03 11:31:11 +02:00
Matthew Honnibal 498ad85309 Try using tensor for vector/similarity methdos 2017-05-30 23:35:17 +02:00
Matthew Honnibal 4ddff020c3 Fix compile error 2017-05-28 23:30:40 +02:00
Matthew Honnibal 6d3caeadd2 Fix type check for long 2017-05-28 23:22:45 +02:00
Matthew Honnibal 7996d21717 Fixes for new StringStore 2017-05-28 11:09:27 -05:00
Matthew Honnibal fe11564b8e Finish stringstore change. Also xfail vectors tests 2017-05-28 15:10:22 +02:00
Matthew Honnibal 84e66ca6d4 WIP on stringstore change. 27 failures 2017-05-28 14:06:40 +02:00
ines 66088851dc Add Doc.to_disk() and Doc.from_disk() methods 2017-05-24 11:58:17 +02:00
Matthew Honnibal d44b1eafc4 Fix conflict artefacts 2017-05-23 18:47:11 +02:00
Matthew Honnibal d68dd1f251 Add SENT_START attribute, for custom sentence boundary detection 2017-05-23 18:37:58 +02:00
ines 23f9a3ccc8 Update docstrings and API docs for Doc 2017-05-19 18:47:39 +02:00