Commit Graph

10514 Commits

Author SHA1 Message Date
Matthew Honnibal 04113a844d Set version to v2.1.8 2019-08-07 13:53:58 +02:00
Ines Montani 36ac044937 Update README.md [ci skip] 2019-08-07 13:38:59 +02:00
Ines Montani 3e60afacf9 Add Serbian to languages [ci skip] 2019-08-07 13:38:25 +02:00
Ines Montani 1dc28a9ecb Update Binder version [ci skip] 2019-08-07 13:38:12 +02:00
Ines Montani 6bec24cdd0 Require downloaded model in pkg_resources (#4090) 2019-08-07 13:18:11 +02:00
Ines Montani 8b4a0fabbb Adjust docs example [ci skip] 2019-08-07 00:46:47 +02:00
adrianeboyd 69aca7d839 Add validate option to EntityRuler (#4089)
* Add validate option to EntityRuler

* Add validate to EntityRuler, passed to Matcher and PhraseMatcher

* Add validate to usage and API docs

* Update website/docs/usage/rule-based-matching.md

Co-Authored-By: Ines Montani <ines@ines.io>

* Update website/docs/usage/rule-based-matching.md

Co-Authored-By: Ines Montani <ines@ines.io>
2019-08-07 00:40:53 +02:00
Ines Montani 4ae320e5c2 Use consistent casing for entity ruler patterns (see #4063) [ci skip] 2019-08-06 12:20:22 +02:00
Ines Montani 223bde5cf6 Improve docs on matcher attributes [ci skip] (closes #4063) 2019-08-06 12:13:42 +02:00
Ines Montani 2bfae0b167 Auto-format 2019-08-06 12:13:31 +02:00
Jeno 15be09ceb0 Raise error if annotation dict in simple training style has unexpected keys #4074 (#4079)
* adding enhancement #4074.

* modified behavior to strictly require top level dictionary keys - issue #4074

* pass expected keys to error message and add links as expected top level key
2019-08-06 11:01:25 +02:00
Sofie Van Landeghem ad09b0d6f3 fetch norm from lex if necessary for matching (#4080) 2019-08-05 23:51:04 +02:00
Ines Montani 7f3212e2f5
💫 Sync branches (#4084) [ci skip]
* Update from master

* Re-added Universe readme (#3688) (closes #3680)

* Fix typo

* Add version tag to `--base-model` argument (closes #3720)

* fixing regex matcher examples (#3708) (#3719)

* Improve Token.prob and Lexeme.prob docs (resolves #3701)

* Fix DependencyParser.predict docs (resolves #3561)

* Update languages.json


Co-authored-by: Bram Vanroy <Bram.Vanroy@UGent.be>
Co-authored-by: Aaron Kub <aaronkub@gmail.com>
2019-08-05 14:32:54 +02:00
Ines Montani 0f740fad1a Update universe.json [ci skip] 2019-08-05 14:30:07 +02:00
Pavle Vidanović e1a935d71c Stopwords for Serbian language. (#4078)
* Serbian stopwords added. (cyrillic alphabet)

* spaCy Contribution agreement included.

* Test initialize updated
2019-08-05 10:22:27 +02:00
Sebastian Jordan 878302a55d Fix typo in requirements section of pyproject.toml (#4081) 2019-08-05 10:21:14 +02:00
veer-bains 874bd8c8dd Fixed syntax error in lang/ko when using python 2 (#4082) (closes #4068)
* fixed syntax error in declaring variables with python 2.7 in spacy/lang/ko/__init__.py

* fixed syntax error in declaring variables with python 2.7 in spacy/lang/ko/__init__.py

* Update __init__.py

* Create veer-bains.md

* Update __init__.py

fixed syntax errors in variable datatype assignment when calling spacy.blank("ko") with python 2.7
2019-08-05 10:19:32 +02:00
Ines Montani 87ddbdc33e Fix handling of kwargs in Language.evaluate
Makes it consistent with other methods
2019-08-04 13:44:21 +02:00
Muhammad Irfan d1d30b0442 added missing punctuation following conventions. (#4066) 2019-08-04 13:41:18 +02:00
Anastassia 33b14724a5 Update gold corpus code to properly ingest a directory of jsonl… (#4067)
* Update gold corpus code to properly ingest a directory of jsonlines files

In response to: https://github.com/explosion/spaCy/issues/3975

* Update spacy/gold.pyx

Co-Authored-By: Ines Montani <ines@ines.io>
2019-08-02 09:58:51 +02:00
Ines Montani 0f76e0022d Update .tensor docs [ci skip] 2019-08-01 18:37:09 +02:00
Ines Montani 3072eb28c2 Support and render Markdown in model meta [ci skip] 2019-08-01 18:33:10 +02:00
Matthew Honnibal 944a66c326 Add span.tensor and token.tensor attributes 2019-08-01 18:30:50 +02:00
Matthew Honnibal d3071ecdbc Set version to v2.1.7 2019-08-01 18:09:19 +02:00
Matthew Honnibal 97c51ef93b Set version to v2.1.7.dev1 2019-08-01 17:29:25 +02:00
Matthew Honnibal 4632c597e7 Fix Pipe base class 2019-08-01 17:29:01 +02:00
Ines Montani 8718ca8b1f
Fix init_model if there's no vocab (closes #4048) (#4049) 2019-08-01 17:26:09 +02:00
adrianeboyd 925a852bb6 Improve NER per type scoring (#4052)
* Improve NER per type scoring

* include all gold labels in per type scoring, not only when recall > 0
* improve efficiency of per type scoring

* Create Scorer tests, initially with NER tests

* move regression test #3968 (per type NER scoring) to Scorer tests

* add new test for per type NER scoring with imperfect P/R/F and per
type P/R/F including a case where R == 0.0
2019-08-01 17:15:36 +02:00
Sofie Van Landeghem f7d950de6d ensure the lang of vocab and nlp stay consistent (#4057)
* ensure the language of vocab and nlp stay consistent across serialization

* equality with =
2019-08-01 17:13:01 +02:00
Björn Böing a83c0add2e Add links to tokenizer API docs to refer relevant information. (#4064)
* Add links to tokenizer API docs to refer relevant information.

* Add suggested changes

Co-Authored-By: Ines Montani <ines@ines.io>
2019-08-01 14:28:38 +02:00
Ejar 2cdf7d39e7 Corrected imported fucntion (#4062)
The example showed an incorrected import
2019-08-01 12:43:36 +02:00
Mohammed Daudali 23ec07debd Correct typo for AllenAI url on homepage (#4050)
* Typo fix for AllenAI url

Changed incorrect home page url for AllenAI from appenai.org to allenai.org

* Sign contributor agreement

* Change date format
2019-07-31 00:16:33 +02:00
Sofie Van Landeghem 7de3b129ab Resolve edge case when calling textcat.predict with empty doc (#4035)
* resolve edge case where no doc has tokens when calling textcat.predict

* more explicit value test
2019-07-30 14:58:01 +02:00
Ines Montani fcd2f7f656 Fix version introducing Span.ents (closes #4045) [ci skip] 2019-07-30 10:32:33 +02:00
Matthew Honnibal 89c92c65fb Update version 2019-07-28 17:56:38 +02:00
Matthew Honnibal 06eb428ed1 Make pipe base class a bit less presumptuous 2019-07-28 17:56:11 +02:00
Matthew Honnibal 16b5144095 Don't raise NotImplemented in Pipe.update 2019-07-28 17:54:11 +02:00
Ines Montani fc69da0acb
💫 Support simple training format in nlp.evaluate and add tests (#4033)
* Support simple training format in nlp.evaluate and add tests

* Update docs [ci skip]
2019-07-27 17:30:18 +02:00
Ines Montani a3723f439c Fix formatting [ci skip] 2019-07-27 16:35:42 +02:00
Ines Montani d5bce35fb1 Fix bug in Span.similarity when called via hook 2019-07-27 15:33:27 +02:00
Ines Montani 109b5e1798 Fix bug in Token.similarity when called via hook 2019-07-27 15:26:01 +02:00
Ines Montani e000b5ed82 Also support "requirements" in model.json 2019-07-27 13:34:57 +02:00
Ines Montani 307ffe472d
Support custom language factory setting in meta.json (#4031) 2019-07-27 13:17:43 +02:00
Ines Montani b7cd58c736 Tidy up and auto-format [ci skip] 2019-07-27 12:19:35 +02:00
Bae Yong-Ju 05fbf5d976 Fix error when Korean text contains regexp special characters. (#4022) 2019-07-25 17:53:33 +02:00
Ines Montani bd39e5e630 Add "Processing text" section [ci skip] 2019-07-25 17:38:03 +02:00
Ines Montani a5e3d2f318 Improve section on disabling pipes [ci skip] 2019-07-25 14:25:34 +02:00
Ines Montani 02e444ec7c Add section on special tokenizer component [ci skip] 2019-07-25 14:25:03 +02:00
Ines Montani 1fa6d6ba55 Improve consistency of docs examples [ci skip] 2019-07-25 14:24:56 +02:00
adrianeboyd 784a5f4284 Update GoldParse attributes in API docs (#4023)
* add `words`
* update name of entity list to `ner`

I think it might be a bit more consistent to have `ner` named `entities`
or `ents` (and `ents` is actually set somewhere to `None`, which is a
bit confusing), but it looks like renaming it would be a non-trivial
decision.
2019-07-25 12:14:02 +02:00