Thomas Opsomer
|
515e25910e
|
fix sent_start in serialization
|
2018-01-28 19:50:42 +01:00 |
Matthew Honnibal
|
56164ab688
|
Set l_edge and r_edge correctly for non-projective parses. Fixes #1799
|
2018-01-22 20:18:04 +01:00 |
Matthew Honnibal
|
ccb51a9f36
|
Make .similarity() return 1.0 if all orth attrs match
|
2018-01-15 16:29:48 +01:00 |
Matthew Honnibal
|
ab7c45b12d
|
Fix error message and handling of doc.sents
|
2018-01-15 15:21:11 +01:00 |
Matthew Honnibal
|
e10e9ad2c5
|
Improve efficiency of Doc.to_array
|
2017-11-23 12:33:27 +00:00 |
Matthew Honnibal
|
fa62427300
|
Remove lookup-based lemmatization
|
2017-11-23 12:32:22 +00:00 |
ines
|
1c218397f6
|
Ensure path in Doc.to_disk/from_disk (resolves ##1521)
Also add Doc serialization tests with both Path and string path options
|
2017-11-09 02:29:03 +01:00 |
Matthew Honnibal
|
144a93c2a5
|
Back-off to tensor for similarity if no vectors
|
2017-11-03 20:56:33 +01:00 |
Matthew Honnibal
|
62ed58935a
|
Add Doc.extend_tensor() method
|
2017-11-03 11:20:31 +01:00 |
ines
|
9659391944
|
Update deprecated methods and add warnings
|
2017-11-01 16:49:42 +01:00 |
ines
|
705a4e3e4a
|
Fix formatting
|
2017-11-01 16:44:08 +01:00 |
Matthew Honnibal
|
7e7116cdf7
|
Fix Doc.to_array when only one string attr provided
|
2017-11-01 13:26:43 +01:00 |
ines
|
544a407b93
|
Tidy up Doc, Token and Span and add missing docs
|
2017-10-27 17:07:26 +02:00 |
ines
|
6a0483b7aa
|
Tidy up and document Doc, Token and Span
|
2017-10-27 15:41:45 +02:00 |
Matthew Honnibal
|
ccd2ab1a62
|
Merge pull request #1443 from ramananbalakrishnan/develop-get-lca-matrix
Add LCA matrix for spans and docs
|
2017-10-24 11:22:46 +02:00 |
Ramanan Balakrishnan
|
d2fe56a577
|
Add LCA matrix for spans and docs
|
2017-10-20 23:58:00 +05:30 |
Ramanan Balakrishnan
|
0726946563
|
cleanup to_array implementation using fixes on master
|
2017-10-20 17:09:37 +05:30 |
Ramanan Balakrishnan
|
b3ab124fc5
|
Support strings for attribute list in doc.to_array
|
2017-10-20 11:46:57 +05:30 |
Ramanan Balakrishnan
|
7b9b1be44c
|
Support single value for attribute list in doc.to_array
|
2017-10-19 17:00:41 +05:30 |
Matthew Honnibal
|
394633efce
|
Make doc pickling support hooks
|
2017-10-17 19:44:09 +02:00 |
Matthew Honnibal
|
cdb0c426d8
|
Improve deserialization of user_data, esp. for Underscore
|
2017-10-17 19:29:20 +02:00 |
Matthew Honnibal
|
32a8564c79
|
Fix doc pickling
|
2017-10-17 18:20:24 +02:00 |
Matthew Honnibal
|
92c1eb2d6f
|
Fix Doc pickling. This also removes need for Binder class
|
2017-10-17 16:11:13 +02:00 |
Matthew Honnibal
|
a002264fec
|
Remove caching of Token in Doc, as caused cycle.
|
2017-10-16 19:34:21 +02:00 |
ines
|
e0ff145a8b
|
Merge branch 'develop' into feature/dot-underscore
|
2017-10-11 11:57:05 +02:00 |
Matthew Honnibal
|
3b527fa52b
|
Call morphology.assign_untagged when pushing token to Doc
|
2017-10-11 03:23:57 +02:00 |
Matthew Honnibal
|
e0a9b02b67
|
Merge Span._ and Span.as_doc methods
|
2017-10-09 22:00:15 -05:00 |
Matthew Honnibal
|
e938bce320
|
Adjust parsing transition system to allow preset sentence segments.
|
2017-10-08 23:53:34 +02:00 |
Matthew Honnibal
|
668a0ea640
|
Pass extensions into Underscore class
|
2017-10-07 18:56:01 +02:00 |
ines
|
2480f8f521
|
Add missing return in Doc.from_disk() (closes #1330)
|
2017-09-18 15:32:00 +02:00 |
Matthew Honnibal
|
03b5b9727a
|
Fix Doc.vector for empty doc objects
|
2017-08-22 19:52:19 +02:00 |
Matthew Honnibal
|
0551b7b03a
|
Fix doc.vector
|
2017-08-22 19:46:52 +02:00 |
Matthew Honnibal
|
8b7ac77c23
|
Allow span label to be string in Doc.char_span
|
2017-08-19 16:18:09 +02:00 |
Matthew Honnibal
|
80236116a6
|
Add Doc.char_span method, to get a span by character offset
|
2017-08-19 12:21:09 +02:00 |
Matthew Honnibal
|
a6a2159969
|
Add slot for text categories to Doc
|
2017-07-22 00:34:15 +02:00 |
Matthew Honnibal
|
2a3bd5ee90
|
Fix fetching of noun chunk iterator
|
2017-06-04 15:53:05 -05:00 |
Matthew Honnibal
|
92ae36f84e
|
Improve way noun chunks iterator is looked up
|
2017-06-04 21:53:39 +02:00 |
Matthew Honnibal
|
675f448313
|
Fix vector linkage on Doc
|
2017-06-04 14:25:30 -05:00 |
ines
|
459a1e8470
|
Fix whitespace
|
2017-06-03 11:31:18 +02:00 |
ines
|
5109bba910
|
Port over fix from #1070
|
2017-06-03 11:31:11 +02:00 |
Matthew Honnibal
|
498ad85309
|
Try using tensor for vector/similarity methdos
|
2017-05-30 23:35:17 +02:00 |
Matthew Honnibal
|
4ddff020c3
|
Fix compile error
|
2017-05-28 23:30:40 +02:00 |
Matthew Honnibal
|
6d3caeadd2
|
Fix type check for long
|
2017-05-28 23:22:45 +02:00 |
Matthew Honnibal
|
7996d21717
|
Fixes for new StringStore
|
2017-05-28 11:09:27 -05:00 |
Matthew Honnibal
|
fe11564b8e
|
Finish stringstore change. Also xfail vectors tests
|
2017-05-28 15:10:22 +02:00 |
Matthew Honnibal
|
84e66ca6d4
|
WIP on stringstore change. 27 failures
|
2017-05-28 14:06:40 +02:00 |
ines
|
66088851dc
|
Add Doc.to_disk() and Doc.from_disk() methods
|
2017-05-24 11:58:17 +02:00 |
Matthew Honnibal
|
d44b1eafc4
|
Fix conflict artefacts
|
2017-05-23 18:47:11 +02:00 |
Matthew Honnibal
|
d68dd1f251
|
Add SENT_START attribute, for custom sentence boundary detection
|
2017-05-23 18:37:58 +02:00 |
ines
|
23f9a3ccc8
|
Update docstrings and API docs for Doc
|
2017-05-19 18:47:39 +02:00 |