spaCy

History

Connor Brinton 6dd56868de 📝 Fix formula for receptive field in docs (#12918 ) SpaCy's HashEmbedCNN layer performs convolutions over tokens to produce contextualized embeddings using a `MaxoutWindowEncoder` layer. These convolutions are implemented using Thinc's `expand_window` layer, which concatenates `window_size` neighboring sequence items on either side of the sequence item being processed. This is repeated across `depth` convolutional layers. For example, consider the sequence "ABCDE" and a `MaxoutWindowEncoder` layer with a context window of 1 and a depth of 2. We'll focus on the token "C". We can visually represent the contextual embedding produced for "C" as: ```mermaid flowchart LR A0(A<sub>0</sub>) B0(B<sub>0</sub>) C0(C<sub>0</sub>) D0(D<sub>0</sub>) E0(E<sub>0</sub>) B1(B<sub>1</sub>) C1(C<sub>1</sub>) D1(D<sub>1</sub>) C2(C<sub>2</sub>) A0 --> B1 B0 --> B1 C0 --> B1 B0 --> C1 C0 --> C1 D0 --> C1 C0 --> D1 D0 --> D1 E0 --> D1 B1 --> C2 C1 --> C2 D1 --> C2 ``` Described in words, this graph shows that before the first layer of the convolution, the "receptive field" centered at each token consists only of that same token. That is to say, that we have a receptive field of 1. The first layer of the convolution adds one neighboring token on either side to the receptive field. Since this is done on both sides, the receptive field increases by 2, giving the first layer a receptive field of 3. The second layer of the convolutions adds an _additional_ neighboring token on either side to the receptive field, giving a final receptive field of 5. However, this doesn't match the formula currently given in the docs, which read: > The receptive field of the CNN will be > `depth * (window_size * 2 + 1)`, so a 4-layer network with a window > size of `2` will be sensitive to 20 words at a time. Substituting in our depth of 2 and window size of 1, this formula gives us a receptive field of: ``` depth * (window_size * 2 + 1) = 2 * (1 * 2 + 1) = 2 * (2 + 1) = 2 * 3 = 6 ``` This not only doesn't match our computations from above, it's also an even number! This is suspicious, since the receptive field is supposed to be centered on a token, and not between tokens. Generally, this formula results in an even number for any even value of `depth`. The error in this formula is that the adjustment for the center token is multiplied by the depth, when it should occur only once. The corrected formula, `depth * window_size * 2 + 1`, gives the correct value for our small example from above: ``` depth * window_size * 2 + 1 = 2 * 1 * 2 + 1 = 4 + 1 = 5 ``` These changes update the docs to correct the receptive field formula and the example receptive field size.		2023-08-21 10:52:32 +02:00
..
cli	Update br tags (#12882 )	2023-08-04 10:52:41 +02:00
displacy	Update br tags (#12882 )	2023-08-04 10:52:41 +02:00
kb	ci: add cython linter (#12694 )	2023-07-19 12:03:31 +02:00
lang	Update examples.py (#12895 )	2023-08-11 15:38:06 +02:00
matcher	ci: add cython linter (#12694 )	2023-07-19 12:03:31 +02:00
ml	📝 Fix formula for receptive field in docs (#12918 )	2023-08-21 10:52:32 +02:00
pipeline	Allow pydantic v2 using transitional v1 support (#12888 )	2023-08-08 11:27:28 +02:00
tests	Allow pydantic v2 using transitional v1 support (#12888 )	2023-08-08 11:27:28 +02:00
tokens	ci: add cython linter (#12694 )	2023-07-19 12:03:31 +02:00
training	ci: add cython linter (#12694 )	2023-07-19 12:03:31 +02:00
__init__.pxd	…
__init__.py	Configure isort to use the Black profile, recursively isort the `spacy` module (#12721 )	2023-06-14 17:48:41 +02:00
__main__.py	…
about.py	Set version to v3.6.1 (#12892 )	2023-08-08 15:04:13 +02:00
attrs.pxd	ci: add cython linter (#12694 )	2023-07-19 12:03:31 +02:00
attrs.pyx	ci: add cython linter (#12694 )	2023-07-19 12:03:31 +02:00
compat.py	Configure isort to use the Black profile, recursively isort the `spacy` module (#12721 )	2023-06-14 17:48:41 +02:00
default_config.cfg	Add `training.before_update` callback (#11739 )	2022-11-23 17:54:58 +01:00
default_config_pretraining.cfg	Add new parameter for saving every n epoch in pretraining (#8912 )	2021-08-12 11:14:48 +02:00
errors.py	Support custom token/lexeme attribute for vectors (#12625 )	2023-06-28 09:43:14 +02:00
glossary.py	Configure isort to use the Black profile, recursively isort the `spacy` module (#12721 )	2023-06-14 17:48:41 +02:00
language.py	Clean up unused code in Language (#12836 )	2023-07-18 14:10:30 +02:00
lexeme.pxd	Configure isort to use the Black profile, recursively isort the `spacy` module (#12721 )	2023-06-14 17:48:41 +02:00
lexeme.pyi	Configure isort to use the Black profile, recursively isort the `spacy` module (#12721 )	2023-06-14 17:48:41 +02:00
lexeme.pyx	ci: add cython linter (#12694 )	2023-07-19 12:03:31 +02:00
lookups.py	Configure isort to use the Black profile, recursively isort the `spacy` module (#12721 )	2023-06-14 17:48:41 +02:00
morphology.pxd	ci: add cython linter (#12694 )	2023-07-19 12:03:31 +02:00
morphology.pyx	ci: add cython linter (#12694 )	2023-07-19 12:03:31 +02:00
parts_of_speech.pxd	ci: add cython linter (#12694 )	2023-07-19 12:03:31 +02:00
parts_of_speech.pyx	…
pipe_analysis.py	Configure isort to use the Black profile, recursively isort the `spacy` module (#12721 )	2023-06-14 17:48:41 +02:00
py.typed	Add py.typed	2021-03-16 09:48:31 +01:00
schemas.py	Allow pydantic v2 using transitional v1 support (#12888 )	2023-08-08 11:27:28 +02:00
scorer.py	Configure isort to use the Black profile, recursively isort the `spacy` module (#12721 )	2023-06-14 17:48:41 +02:00
strings.pxd	Configure isort to use the Black profile, recursively isort the `spacy` module (#12721 )	2023-06-14 17:48:41 +02:00
strings.pyi	Configure isort to use the Black profile, recursively isort the `spacy` module (#12721 )	2023-06-14 17:48:41 +02:00
strings.pyx	ci: add cython linter (#12694 )	2023-07-19 12:03:31 +02:00
structs.pxd	ci: add cython linter (#12694 )	2023-07-19 12:03:31 +02:00
symbols.pxd	ci: add cython linter (#12694 )	2023-07-19 12:03:31 +02:00
symbols.pyx	ci: add cython linter (#12694 )	2023-07-19 12:03:31 +02:00
tokenizer.pxd	ci: add cython linter (#12694 )	2023-07-19 12:03:31 +02:00
tokenizer.pyx	ci: add cython linter (#12694 )	2023-07-19 12:03:31 +02:00
ty.py	Configure isort to use the Black profile, recursively isort the `spacy` module (#12721 )	2023-06-14 17:48:41 +02:00
typedefs.pxd	Configure isort to use the Black profile, recursively isort the `spacy` module (#12721 )	2023-06-14 17:48:41 +02:00
typedefs.pyx	…
util.py	Display model's full base version string in incompatiblity warning (#12857 )	2023-08-02 08:06:41 +02:00
vectors.pyx	ci: add cython linter (#12694 )	2023-07-19 12:03:31 +02:00
vocab.pxd	ci: add cython linter (#12694 )	2023-07-19 12:03:31 +02:00
vocab.pyi	Configure isort to use the Black profile, recursively isort the `spacy` module (#12721 )	2023-06-14 17:48:41 +02:00
vocab.pyx	ci: add cython linter (#12694 )	2023-07-19 12:03:31 +02:00