Commit Graph

15838 Commits

Author SHA1 Message Date
Raphael Mitsch 6aa6b86d49
Make generation of empty `KnowledgeBase` instances configurable in `EntityLinker` (#12320)
* Make empty_kb() configurable.

* Format.

* Update docs.

* Be more specific in KB serialization test.

* Update KB serialization tests. Update docs.

* Remove doc update for batched candidate generation.

* Fix serialization of subclassed KB in tests.

* Format.

* Update docstring.

* Update docstring.

* Switch from pickle to json for custom field serialization.
2023-03-01 16:02:55 +01:00
kadarakos 56aa0cc75f
Displacy doc fix (#12352)
* more details for color setting

* more details for color setting

* prettier
2023-03-01 15:38:23 +01:00
Sofie Van Landeghem 74cae47bf6
rely on is_empty property instead of __len__ (#12347) 2023-03-01 12:06:07 +01:00
Raphael Mitsch efbc3d37b3
Update docs w.r.t. spacy.CandidateBatchGenerator.v1. (#12350) 2023-03-01 11:01:35 +01:00
Adriane Boyd 33864f1d07
Add new tags in docs for #12334 (#12348) 2023-03-01 10:46:13 +01:00
Adriane Boyd 8f058e39bd
Fix error message for displacy auto_select_port (#12343) 2023-02-28 16:36:03 +01:00
TAN Long 071667376a
Add new REL_OPs: `>+`, `>-`, `<+`, and `<-` (#12334)
* Add immediate left/right child/parent dependency relations

* Add tests for new REL_OPs: `>+`, `>-`, `<+`, and `<-`.

---------

Co-authored-by: Tan Long <tanloong@foxmail.com>
2023-02-28 14:36:33 +01:00
lise-brinck e2de188cf1
Bugfix/swedish tokenizer (#12315)
* add unittest for explosion#12311

* create punctuation.py for swedish

* removed : from infixes in swedish punctuation.py

* allow : as infix if succeeding char is uppercase
2023-02-27 10:53:45 +01:00
Adriane Boyd 4539fbae17
Revert "Fix FUZZY operator definition (#12318)" (#12336)
This reverts commit daedc45d05.

The default length depends on the length of the pattern string and was
correct for this example.
2023-02-27 09:48:36 +01:00
Kevin Humphreys acdd993071
Matcher performance fix for extension predicates: use shared key function (#12272)
* standardize predicate key format

* single key function

* Make optional args in key function keyword-only

---------

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2023-02-27 08:35:08 +01:00
Paul O'Leary McCann 1e8bac99f3
Add tests for projects to master (#12303)
* Add tests for projects to master

* Fix git clone related issues on Windows

* Add stat import
2023-02-23 10:22:57 +01:00
andyjessen daedc45d05
Fix FUZZY operator definition (#12318)
* Fix FUZZY operator definition

The default length of the FUZZY operator is 2 and not 3.

* adjust edit distance in matcher usage docs too

---------

Co-authored-by: svlandeg <svlandeg@github.com>
2023-02-23 09:37:40 +01:00
Adriane Boyd 80bc140533
Add grc to langs with lexeme norms in spacy-lookups-data (#12287) 2023-02-16 17:57:02 +01:00
Edward 61b8454137
Adjust return type of `registry.find` (#12227)
* Fix registry find return type

* add dot

* Add type ignore for mypy

* update black formatting version

* add mypy ignore to package cli

* mypy type fix (for real)

* Update find description in spacy/util.py

Co-authored-by: Raphael Mitsch <r.mitsch@outlook.com>

* adjust mypy directive

---------

Co-authored-by: Raphael Mitsch <r.mitsch@outlook.com>
2023-02-15 12:32:53 +01:00
Raphael Mitsch 2d4fb94ba0
Fix wrong file name in docs for rule-based matcher. (#12262) 2023-02-09 12:58:14 +01:00
Adriane Boyd 9d920bafcf
Extend mypy to v1.0.x (#12245) 2023-02-08 14:33:16 +01:00
Raphael Mitsch d38a88f0f3
Remove negation. (#12252) 2023-02-08 14:18:33 +01:00
Adriane Boyd 9a454676f3
Use black version constraints from requirements.txt (#12220) 2023-02-03 11:44:10 +01:00
Sofie Van Landeghem 79ef6cf0f9
Have logging calls use string formatting types (#12215)
* change logging call for spacy.LookupsDataLoader.v1

* substitutions in language and _util

* various more substitutions

* add string formatting guidelines to contribution guidelines
2023-02-02 11:15:22 +01:00
Sofie Van Landeghem 4c60afb946
Backslash fixes in docs (#12213)
* backslash fixes

* revert unrelated change
2023-02-01 10:15:38 +01:00
Raphael Mitsch 02af17a5c8
Remove flaky assertions. (#12210) 2023-01-31 16:52:06 +01:00
Adriane Boyd 0e51c918ae
Normalize whitespace in evaluate CLI output test (#12157)
* Normalize whitespace in evaluate CLI output test

Depending on terminal settings, lines may be padded to the screen width
so the comparison is too strict with only the command string replacement.

* Move to test util method

* Change to normalization method
2023-01-30 17:51:27 +01:00
Paul O'Leary McCann 8932f4dc35
Add extra flag to assets docs (#12194)
* Add extra flag to assets docs

For some reason this wasn't included.

* Add new tag to docs
2023-01-30 10:05:23 +01:00
Adriane Boyd 606273f7e4
Normalize whitespace in evaluate CLI output test (#12157)
* Normalize whitespace in evaluate CLI output test

Depending on terminal settings, lines may be padded to the screen width
so the comparison is too strict with only the command string replacement.

* Move to test util method

* Change to normalization method
2023-01-27 16:13:34 +01:00
Sofie Van Landeghem bd739e67d6
explain KB change and how to remedy (#12189) 2023-01-27 15:13:20 +01:00
Adriane Boyd 5f8a398bb9
Add span_id to Span.char_span, update Doc/Span.char_span docs (#12196)
* Add span_id to Span.char_span, update Doc/Span.char_span docs

`Span.char_span(id=)` should be removed in the future.

* Also use Union[int, str] in Doc docstring
2023-01-27 15:09:17 +01:00
Simon Gurcke 774c10fa39
Add alignment_mode argument to Span.char_span() (#12145)
* Add alignment_mode argument to Span.char_span()

* Update website

* Update spacy/tokens/span.pyx

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Add test

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2023-01-27 11:43:40 +01:00
Peter Baumgartner c68e6b8a96
`trainable_lemmatizer` in `debug data` (#11419)
* WIP

* rm ipython embeds

* rm total

* WIP

* cleanup

* cleanup + reword

* rm component function

* remove migration support form

* fix reference dataset for dev data

* additional fixes

- set approach to identifying unique trees
- adjust line length on messages
- add logic for detecting docs without annotations

* use 0 instead of none for no annotation

* partial annotation support

* initial tests for _compile_gold lemma attributes

Using the example data from the edit tree lemmatizer tests for:
- lemmatizer_trees
- partial_lemma_annotations
- n_low_cardinality_lemmas
- no_lemma_annotations

* adds output test for cli app

* switch msg level

* rm unclear uniqueness check

* Revert "rm unclear uniqueness check"

This reverts commit 6ea2b3524b.

* remove good message on uniqueness

* formatting

* use en_vocab fixture

* clarify data set source in messages

* remove unnecessary import

Co-authored-by: svlandeg <svlandeg@github.com>
2023-01-26 17:36:50 +01:00
Daniël de Kok 8d69874afb
Add `spacy.PlainTextCorpusReader.v1` (#12122)
* Add `spacy.PlainTextCorpusReader.v1`

This is a corpus reader that reads plain text corpora with the following
format:

- UTF-8 encoding
- One line per document.
- Blank lines are ignored.

It is useful for applications where we deal with very large corpora,
such as distillation, and don't want to deal with the space overhead of
serialized formats. Additionally, many large corpora already use such
a text format, keeping the necessary preprocessing to a minimum.

* Update spacy/training/corpus.py

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* docs: add version to `PlainTextCorpus`

* Add docstring to registry function

* Add plain text corpus tests

* Only strip newline/carriage return

* Add return type _string_to_tmp_file helper

* Use a temporary directory in place of file name

Different OS auto delete/sharing semantics are just wonky.

* This will be new in 3.5.1 (rather than 4)

* Test improvements from code review

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2023-01-26 11:33:22 +01:00
Marcus Blättermann a37117abd0
Fix text colors in docs (#12186) 2023-01-26 10:30:24 +01:00
Marcus Blättermann 056b73468c
Load components dynamically (decrease initial file size for docs) (#12175)
* Extract `CodeBlock` component into own file

* Extract `InlineCode` component into own file

* Extract `TypeAnnotation` component into own file

* Convert named `export` to `default export`

* Remove unused `export`

* Simplify `TypeAnnotation` to remove dependency for Prism

* Load `Code` component dynamically

* Extract `MarkdownToReact` component into own file

* WIP Code Dynamic

* Load `MarkdownToReact` component dynamically

* Extract `htmlToReact` to own file

* Load `htmlToReact` component dynamically

* Dynamically load `Juniper`
2023-01-25 17:30:41 +01:00
Adriane Boyd 07dfa54669
CI: Extend website excludes (#12185) 2023-01-25 15:35:17 +01:00
Marcus Blättermann 11f10fff60
Fix frontpage image (#12184) 2023-01-25 13:17:35 +01:00
Marcus Blättermann 5a6000fb8b
Fix text color in docs (#12183)
* Fix text color on landing page

* Fix code color
2023-01-25 13:14:32 +01:00
Adriane Boyd 8ea15240ca
Update binder version to v3.5 (#12153) 2023-01-25 13:14:23 +01:00
Adriane Boyd 2dbb764183
CI: Add black formatting check to validation (#12182) 2023-01-25 12:51:37 +01:00
Marcus Blättermann 99a05734a8
Add `aria-label` to quickstart widget (#12179) 2023-01-25 11:46:55 +01:00
Marcus Blättermann 0298b1a863
WEB-28 Increase contrast of grey text (#12178)
* Use transparent colors to increase contrast on darker backgrounds

* Increase color contrast of grey text
2023-01-25 11:46:43 +01:00
Marcus Blättermann 3062fae2ca
Fix broken URL (#12176) 2023-01-25 11:42:19 +01:00
Marcus Blättermann 57ba37bc52
Fix regression with links in prompts (#12172) 2023-01-25 08:51:40 +01:00
Marcus Blättermann 05a3685849
Fix broken syntax for type annotations (#12171) 2023-01-25 08:51:25 +01:00
Marcus Blättermann f3c586f74a
Fix navigation alert (#12169)
Fixes a regression introduced in #12163
2023-01-24 16:40:40 +01:00
Marcus Blättermann 49237f05a6
Fix `aria-hidden` element (#12163)
* Rename CSS class to make use more clear

* Rename component prop to improve code readability

* Fix `aria-hidden` directly on a link element

This link wouldn't have been clickable by screenreaders

* Refactor component

This removes a unnessary `div` and a duplicate link

Co-authored-by: Ines Montani <ines@ines.io>
2023-01-24 14:44:47 +01:00
Marcus Blättermann 0a70696923
Fix wrong HTML element attribute (#12151)
Originally introduced in 62b9c9c6d7

Original error: Warning: Invalid DOM property `class`. Did you mean `className`?

React doesn't have `class`, it uses `className`.
2023-01-24 14:35:31 +01:00
Marcus Blättermann 9555e7aecf
Remove unnessary links (#12159)
There is no need to link to the image we are already viewing and this is also considered an accessibility issue.
2023-01-24 14:01:00 +01:00
Marcus Blättermann 031f6c7b60
WEB-27 Add `alt` tags to images (#12166)
* Update spaCy badge `alt` text

* Add `next/image` component to Universe

* Add missing `alt`texts
2023-01-24 13:56:14 +01:00
Marcus Blättermann c9beb47ab7
Increase contrast of text and theme color (#12165) 2023-01-24 13:55:20 +01:00
Marcus Blättermann a7d6a62f7c
Remove zoom locking (#12164)
* Fix missing comma

* Activate user zoom for website

This is recommended by lighthouse:

> Disabling zooming is problematic for users with low vision who rely on screen magnification to properly see the contents of a web page. Learn more.

Also iOS already ignores this attribute anyway.
2023-01-24 13:54:49 +01:00
Marcus Blättermann 48159e1d60
Update explosion logo (#12162)
This fixes a misalignment of the explosion logo
2023-01-24 13:53:51 +01:00
Marcus Blättermann 7160f7835d
Fix GitHub badge (#12161)
* Extract component

* Remove rounded border form GitHub Stars badge

* Add `alt` text
2023-01-24 13:53:28 +01:00