Commit Graph

83 Commits

Author SHA1 Message Date
Ines Montani 5651a0d052 💫 Replace {Doc,Span}.merge with Doc.retokenize (#3280)
* Add deprecation warning to Doc.merge and Span.merge

* Replace {Doc,Span}.merge with Doc.retokenize
2019-02-15 10:29:44 +01:00
Ines Montani fbf9f1edf1 Also raise error in Span.__reduce__ 2019-02-13 13:22:05 +01:00
Ines Montani ea07f3022e Only run noun chunks iterator in Span if available (closes #3199) 2019-02-08 18:33:16 +01:00
Ines Montani ff36b14cb2 Fix whitespace 2019-02-08 18:31:31 +01:00
Ines Montani 5d0b60999d Merge branch 'master' into develop 2019-02-07 20:54:07 +01:00
Álvaro Abella Bascarán 1cd8f9823f Correct docs of `Token.subtree` and `Span.subtree` (issue #3122) (#3124)
* solve inconsistency between docs and Span.subtree (issue #3122)

* solve inconsistency between docs and Token.subtree (issue #3122)
2019-01-09 03:11:15 +01:00
Matthew Honnibal 63b7accd74
💫 Make span.as_doc() return a copy, not a view. Closes #1537 (#3107)
Initially span.as_doc() was designed to return a view of the span's contents, as a Doc object. This was a nice idea, but it fails due to the token.idx property, which refers to the character offset within the string. In a span, the idx of the first token might not be 0. Because this data is different, we can't have a view --- it'll be inconsistent.

This patch changes span.as_doc() to instead return a copy. The docs are updated accordingly. Closes #1537

* Update test for span.as_doc()

* Make span.as_doc() return a copy. Closes #1537

* Document change to Span.as_doc()
2018-12-30 15:17:46 +01:00
Álvaro Abella Bascarán 9bc4cc1352 Fix issue 2396 (#3089)
* Test on #2396: bug in Doc.get_lca_matrix()

* reimplementation of Doc.get_lca_matrix(), (closes #2396)

* reimplement Span.get_lca_matrix(), and call it from Doc.get_lca_matrix()

* tests Span.get_lca_matrix() as well as Doc.get_lca_matrix()

* implement _get_lca_matrix as a helper function in doc.pyx; call it from Doc.get_lca_matrix and Span.get_lca_matrix

* use memory view instead of np.ndarray in _get_lca_matrix (faster)

* fix bug when calling Span.get_lca_matrix; return lca matrix as np.array instead of memoryview

* cleaner conditional, add comment
2018-12-29 18:05:52 +01:00
Álvaro Abella Bascarán 6fe276f85d Fix issue 2396 (#3089)
* Test on #2396: bug in Doc.get_lca_matrix()

* reimplementation of Doc.get_lca_matrix(), (closes #2396)

* reimplement Span.get_lca_matrix(), and call it from Doc.get_lca_matrix()

* tests Span.get_lca_matrix() as well as Doc.get_lca_matrix()

* implement _get_lca_matrix as a helper function in doc.pyx; call it from Doc.get_lca_matrix and Span.get_lca_matrix

* use memory view instead of np.ndarray in _get_lca_matrix (faster)

* fix bug when calling Span.get_lca_matrix; return lca matrix as np.array instead of memoryview

* cleaner conditional, add comment
2018-12-29 18:02:26 +01:00
Matthew Honnibal 2c2db0c492 💫 Allow Span to take text label (#3031)
Fixes #3027.

* Allow Span.__init__ to take unicode values for the `label` argument.
* Allow `Span.label_` to be writeable.

- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [ ] My changes don't require a change to the documentation, or if they do, I've added all required information.
2018-12-08 13:08:41 +01:00
Matthew Honnibal 4336397ecb Update develop from master 2018-08-14 03:04:28 +02:00
Ole Henrik Skogstrøm 0473add369 Feature/span ents (#2599)
* Created Span.ents property

* Add tests for span.ents

* Add tests for start and end of sentence
2018-08-07 13:52:32 +02:00
Ines Montani cae4457c38 💫 Add .similarity warnings for no vectors and option to exclude warnings (#2197)
* Add logic to filter out warning IDs via environment variable

Usage: SPACY_WARNING_EXCLUDE=W001,W007

* Add warnings for empty vectors

* Add warning if no word vectors are used in .similarity methods

For example, if only tensors are available in small models – should hopefully clear up some confusion around this

* Capture warnings in tests

* Rename SPACY_WARNING_EXCLUDE to SPACY_WARNING_IGNORE
2018-05-21 01:22:38 +02:00
Matthew Honnibal 2c4a6d66fa Merge master into develop. Big merge, many conflicts -- need to review 2018-04-29 14:49:26 +02:00
ines 1c6d77610c Add remove_extension method on Doc, Token and Span (closes #2242) 2018-04-28 23:33:09 +02:00
ines 62b4b527d7 Don't raise error if set_extension has getter and setter (closes #2177)
Improve error messages, raise error if setter is specified without a getter and compare against _unset to allow default=None. Also add more tests.
2018-04-03 18:30:17 +02:00
Ines Montani 3141e04822
💫 New system for error messages and warnings (#2163)
* Add spacy.errors module

* Update deprecation and user warnings

* Replace errors and asserts with new error message system

* Remove redundant asserts

* Fix whitespace

* Add messages for print/util.prints statements

* Fix typo

* Fix typos

* Move CLI messages to spacy.cli._messages

* Add decorator to display error code with message

An implementation like this is nice because it only modifies the string when it's retrieved from the containing class – so we don't have to worry about manipulating tracebacks etc.

* Remove unused link in spacy.about

* Update errors for invalid pipeline components

* Improve error for unknown factories

* Add displaCy warnings

* Update formatting consistency

* Move error message to spacy.errors

* Update errors and check if doc returned by component is None
2018-04-03 15:50:31 +02:00
Matthew Honnibal 1f7229f40f Revert "Merge branch 'develop' of https://github.com/explosion/spaCy into develop"
This reverts commit c9ba3d3c2d, reversing
changes made to 92c26a35d4.
2018-03-27 19:23:02 +02:00
Matthew Honnibal ccb51a9f36 Make .similarity() return 1.0 if all orth attrs match 2018-01-15 16:29:48 +01:00
Matthew Honnibal b904d81e9a Fix rich comparison against None objects. Closes #1757 2018-01-15 15:51:25 +01:00
Matthew Honnibal 465a6f6452 Add missing Span.vocab property. Closes #1633 2018-01-14 15:06:30 +01:00
Burton DeWilde a5c6869b2d Fix bug where span.orth_ != span.text (see #1612) 2017-11-20 12:05:43 -06:00
Motoki Wu a52e195a0a Fixes Issue #1207 where `noun_chunks` of `Span` gives an error.
Make sure to reference `self.doc` when getting the noun chunks.

Same fix as 9750a0128c
2017-11-17 17:16:20 -08:00
Matthew Honnibal 144a93c2a5 Back-off to tensor for similarity if no vectors 2017-11-03 20:56:33 +01:00
Matthew Honnibal 301fb2bb60 Implement Span.n_lefts and Span.n_rights 2017-11-01 13:25:12 +01:00
ines d96e72f656 Tidy up rest 2017-10-27 21:07:59 +02:00
ines d2df81d907 Fix not implemented Span getters 2017-10-27 18:09:28 +02:00
ines 544a407b93 Tidy up Doc, Token and Span and add missing docs 2017-10-27 17:07:26 +02:00
ines 6a0483b7aa Tidy up and document Doc, Token and Span 2017-10-27 15:41:45 +02:00
Matthew Honnibal ccd2ab1a62 Merge pull request #1443 from ramananbalakrishnan/develop-get-lca-matrix
Add LCA matrix for spans and docs
2017-10-24 11:22:46 +02:00
ines a31f048b4d Fix formatting 2017-10-23 10:38:06 +02:00
Ramanan Balakrishnan d2fe56a577
Add LCA matrix for spans and docs 2017-10-20 23:58:00 +05:30
Matthew Honnibal e0a9b02b67 Merge Span._ and Span.as_doc methods 2017-10-09 22:00:15 -05:00
Matthew Honnibal 7ae67ec6a1 Add Span.as_doc method 2017-10-08 23:50:20 +02:00
Matthew Honnibal 668a0ea640 Pass extensions into Underscore class 2017-10-07 18:56:01 +02:00
Matthew Honnibal dea229c634 Fix Span.to_array method 2017-08-19 16:24:28 +02:00
Matthew Honnibal 482bba1722 Add Span.to_array method 2017-08-19 12:20:45 +02:00
Matthew Honnibal 7996d21717 Fixes for new StringStore 2017-05-28 11:09:27 -05:00
Matthew Honnibal 84e66ca6d4 WIP on stringstore change. 27 failures 2017-05-28 14:06:40 +02:00
ines 62ceec4fc6 Update docstrings and API docs for Span 2017-05-19 18:47:46 +02:00
ines 0791f0aae6 Update docstrings and API docs for Span class 2017-05-19 00:31:31 +02:00
ines 593361ee3c Update docstrings for Span class 2017-05-18 22:17:41 +02:00
ines 9d85cda8e4 Fix models error message and use about.__docs_models__ (see #1051) 2017-05-13 13:05:47 +02:00
Matthew Honnibal 4d98511db7 Make Span hashable. Closes #1019 2017-04-26 19:01:05 +02:00
ines 0739ae7b76 Tidy up and fix formatting and imports 2017-04-15 13:05:15 +02:00
ines 3b667a24d4 Remove whitespace 2017-04-01 10:21:08 +02:00
ines e71a1f4bd0 Fix download commands in error messages (see #946) 2017-04-01 10:20:57 +02:00
Em 9c809efc25 Removed mapStr 2017-03-11 16:23:26 -08:00
Em 426d17167f Added string manipulation for spans 2017-03-10 16:50:02 -08:00
Matthew Honnibal f6e356aada Add (and test) Span.sentiment attribute. By default we average token.span, but can override with custom hook. Re Issue #667 2016-12-02 11:05:50 +01:00