Commit Graph

5710 Commits

Author SHA1 Message Date
Ines Montani 9d6ca18a10 Tidy up and only use self.vector once 2019-03-07 01:06:12 +01:00
Ines Montani a8f1efd2f5 Merge branch 'master' into develop 2019-03-07 00:56:31 +01:00
Daniel King 5f40229397 Don't use numpy directly for similarity (#3362)
* Don't use numpy directly for similarity

* Contributor agreement
2019-03-06 22:58:38 +00:00
Ines Montani 6bd34e9d54 Expose Japanese stop words (closes #3346) 2019-03-06 14:21:15 +01:00
Ines Montani 85deb96278 Fix whitespace 2019-03-06 14:20:34 +01:00
Ines Montani 23f6ebf0f3 Add missing " (closes #3343) 2019-02-27 16:37:03 +01:00
Ines Montani 533b580c19 Add test for stray print statements in languages (see #3342) 2019-02-27 16:04:30 +01:00
Ines Montani 48a2046d1c Remove stray print statement (closes #3342) 2019-02-27 15:35:04 +01:00
Ines Montani 07d7c0a1af Fix whitespace 2019-02-27 15:34:21 +01:00
Ines Montani 9b62639d19 Auto-format [ci skip] 2019-02-27 14:24:55 +01:00
Matthew Honnibal 656edcb984 Set version to v2.1.0a10 2019-02-27 12:26:13 +01:00
Matthew Honnibal f1d77eb140
💫 Improve handling of missing NER tags (closes #2603) (#3341)
* Improve handling of missing NER tags

GoldParse can accept missing NER tags, if entities is provided
in BILUO format (rather than as spans). Missing tags can be provided
as None values.

Fix bug that occurred when first tag was a None value. Closes #2603.

* Document specification of missing NER tags.
2019-02-27 12:06:32 +01:00
Ines Montani e359bdd0e3 Auto-format 2019-02-27 11:56:45 +01:00
Matthew Honnibal 4a3371acd5
Make doc[0].is_sent_start == True (closes #2869) (#3340)
* Make doc[0] have sent_start True. Closes #2869

* Document that doc[0].is_sent_start defaults True.
2019-02-27 11:17:17 +01:00
Matthew Honnibal 2d3ce89b78 Improve matcher tests re issue #3328 2019-02-27 10:25:56 +01:00
Matthew Honnibal 8d6954e0e7 Fix matcher bug #3328 2019-02-27 10:25:39 +01:00
Ines Montani aadf586789 Add xfailing test for #3331 2019-02-25 22:33:30 +01:00
Matthew Honnibal 3cdd3eb518 Set version to v2.1.0a9 2019-02-25 21:55:19 +01:00
Matthew Honnibal b449be0f04 Add comment re issue #3170 2019-02-25 21:24:03 +01:00
Matthew Honnibal 9ccd6a3062 Fix head-outside-sentence bug. Fixes #3170 2019-02-25 21:21:44 +01:00
Matthew Honnibal f2fae1f186 Add batch size argument to Language.evaluate(). Closes #3263 2019-02-25 19:30:33 +01:00
Ines Montani f135d663f7 Update conftest.py 2019-02-25 15:55:29 +01:00
Ines Montani 76ce8b2662 Merge branch 'master' into develop 2019-02-25 15:54:55 +01:00
Julia Makogon f1c3108d52 Fixing pymorphy2 dependency issue (#3329) (closes #3327)
* Classes for Ukrainian; small fix in Russian.

* Contributor agreement

* pymorphy2 initialization split for ru and uk (#3327)

* stop-words fixed

* Unit-tests updated
2019-02-25 15:48:17 +01:00
Ines Montani 1a735e0f1f Add regression test for #3328 2019-02-25 10:12:58 +01:00
Ines Montani dfbed07d3b Remove unused temp errors 2019-02-24 22:26:08 +01:00
Ines Montani 62b558ab72 💫 Support lexical attributes in retokenizer attrs (closes #2390) (#3325)
* Fix formatting and whitespace

* Add support for lexical attributes (closes #2390)

* Document lexical attribute setting during retokenization

* Assign variable oputside of nested loop
2019-02-24 21:13:51 +01:00
Ines Montani a48deb4081 Merge regression tests 2019-02-24 21:03:39 +01:00
Ines Montani 8f6c193a4d Delete _test_issue1622.py 2019-02-24 20:33:31 +01:00
Ines Montani c8e967c78d Try include previously segfaulting test 2019-02-24 20:32:46 +01:00
Ines Montani 328b589deb Merge regression tests 2019-02-24 20:31:38 +01:00
Ines Montani 3bc53905cc Remove print statements from test 2019-02-24 20:31:15 +01:00
Ines Montani 1ae0df3da9 Un-x-fail passing test 2019-02-24 20:24:15 +01:00
Ines Montani 399a5803d0 Tidy up tests [ci skip] 2019-02-24 19:02:16 +01:00
Ines Montani 2011563c51 Update docstrings [ci skip] 2019-02-24 18:39:59 +01:00
Ines Montani df19e2bff6
💫 Allow setting of custom attributes during retokenization (closes #3314) (#3324)
<!--- Provide a general summary of your changes in the title. -->

## Description

This PR adds the abilility to override custom extension attributes during merging. This will only work for attributes that are writable, i.e. attributes registered with a default value like `default=False` or attribute that have both a getter *and* a setter implemented.

```python
Token.set_extension('is_musician', default=False)

doc = nlp("I like David Bowie.")
with doc.retokenize() as retokenizer:
    attrs = {"LEMMA": "David Bowie", "_": {"is_musician": True}}
    retokenizer.merge(doc[2:4], attrs=attrs)

assert doc[2].text == "David Bowie"
assert doc[2].lemma_ == "David Bowie"
assert doc[2]._.is_musician
```

### Types of change
enhancement

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2019-02-24 18:38:47 +01:00
Ines Montani 1ea1bc98e7 Document regex utilities [ci skip] 2019-02-24 18:34:10 +01:00
Matthew Honnibal 1f7c56cd93 Fix parser.add_label() 2019-02-24 16:53:22 +01:00
Matthew Honnibal 893aa40d73 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2019-02-24 16:43:01 +01:00
Matthew Honnibal 5882d82915 Set version to v2.1.0a9.dev2 2019-02-24 16:42:06 +01:00
Matthew Honnibal 0367f864fe Fix handling of added labels. Resolves #3189 2019-02-24 16:41:41 +01:00
Matthew Honnibal d74dbde828 Fix order of actions when labels added to parser
When labels were added to the parser or NER, we weren't loading back the
classes in the correct order. Re issue #3189
2019-02-24 16:36:29 +01:00
Ines Montani 6de81ae310 Fix formatting of errors 2019-02-24 15:11:28 +01:00
Ines Montani d8f69d592f Tidy up retokenizer tests 2019-02-24 14:14:11 +01:00
Ines Montani 723e27cb8c Tidy up tests 2019-02-24 14:11:23 +01:00
Ines Montani 2982f82934 Auto-format 2019-02-24 14:09:15 +01:00
Matthew Honnibal 909a9d9932 Set version to v2.1.0a9.dev1 2019-02-23 13:10:42 +01:00
Matthew Honnibal 6b0008afc6 Clean up TextCategorizer slightly 2019-02-23 12:28:06 +01:00
Matthew Honnibal d13b9373bf Improve initialization for mutually textcat 2019-02-23 12:27:45 +01:00
Matthew Honnibal e9dd5943b9 Support exclusive_classes setting for textcat models 2019-02-23 11:57:16 +01:00