diff --git a/website/docs/api/architectures.md b/website/docs/api/architectures.md
index fd88434f1..3089fa1b3 100644
--- a/website/docs/api/architectures.md
+++ b/website/docs/api/architectures.md
@@ -11,9 +11,17 @@ menu:
   - ['Entity Linking', 'entitylinker']
 ---
 
-TODO: intro and how architectures work, link to
-[`registry`](/api/top-level#registry),
-[custom functions](/usage/training#custom-functions) usage etc.
+A **model architecture** is a function that wires up a
+[`Model`](https://thinc.ai/docs/api-model) instance, which you can then use in a
+pipeline component or as a layer of a larger network. This page documents
+spaCy's built-in architectures that are used for different NLP tasks. All
+trainable [built-in components](/api#architecture-pipeline) expect a `model`
+argument defined in the config and document their the default architecture.
+Custom architectures can be registered using the
+[`@spacy.registry.architectures`](/api/top-level#regsitry) decorator and used as
+part of the [training config](/usage/training#custom-functions). Also see the
+usage documentation on
+[layers and model architectures](/usage/layers-architectures).
 
 ## Tok2Vec architectures {#tok2vec-arch source="spacy/ml/models/tok2vec.py"}
 
@@ -284,8 +292,18 @@ on [static vectors](/usage/embeddings-transformers#static-vectors) for details.
 
 The following architectures are provided by the package
 [`spacy-transformers`](https://github.com/explosion/spacy-transformers). See the
-[usage documentation](/usage/embeddings-transformers) for how to integrate the
-architectures into your training config.
+[usage documentation](/usage/embeddings-transformers#transformers) for how to
+integrate the architectures into your training config.
+
+<Infobox variant="warning">
+
+Note that in order to use these architectures in your config, you need to
+install the
+[`spacy-transformers`](https://github.com/explosion/spacy-transformers). See the
+[installation docs](/usage/embeddings-transformers#transformers-installation)
+for details and system requirements.
+
+</Infobox>
 
 ### spacy-transformers.TransformerModel.v1 {#TransformerModel}
 
diff --git a/website/docs/usage/layers-architectures.md b/website/docs/usage/layers-architectures.md
index 1ee0f4fae..eebcc4681 100644
--- a/website/docs/usage/layers-architectures.md
+++ b/website/docs/usage/layers-architectures.md
@@ -9,7 +9,7 @@ menu:
 next: /usage/projects
 ---
 
-​ A **model architecture** is a function that wires up a
+​A **model architecture** is a function that wires up a
 [Thinc `Model`](https://thinc.ai/docs/api-model) instance, which you can then
 use in a component or as a layer of a larger network. You can use Thinc as a
 thin wrapper around frameworks such as PyTorch, TensorFlow or MXNet, or you can
diff --git a/website/docs/usage/training.md b/website/docs/usage/training.md
index c04d3ca77..59766bada 100644
--- a/website/docs/usage/training.md
+++ b/website/docs/usage/training.md
@@ -6,8 +6,7 @@ menu:
   - ['Quickstart', 'quickstart']
   - ['Config System', 'config']
   - ['Custom Functions', 'custom-functions']
-  - ['Transfer Learning', 'transfer-learning']
-  - ['Parallel Training', 'parallel-training']
+  #   - ['Parallel Training', 'parallel-training']
   - ['Internal API', 'api']
 ---
 
@@ -92,16 +91,6 @@ spaCy's binary `.spacy` format. You can either include the data paths in the
 $ python -m spacy train config.cfg --output ./output --paths.train ./train.spacy --paths.dev ./dev.spacy
 ```
 
-<!-- TODO:
-<Project id="some_example_project">
-
-The easiest way to get started with an end-to-end training process is to clone a
-[project](/usage/projects) template. Projects let you manage multi-step
-workflows, from data preprocessing to training and packaging your model.
-
-</Project>
--->
-
 ## Training config {#config}
 
 Training config files include all **settings and hyperparameters** for training
@@ -400,13 +389,11 @@ recipe once the dish has already been prepared. You have to make a new one.
 spaCy includes a variety of built-in [architectures](/api/architectures) for
 different tasks. For example:
 
-<!-- TODO: model return types -->
-
 | Architecture                                                      | Description                                                                                                                                                                                                                                               |
 | ----------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
 | [HashEmbedCNN](/api/architectures#HashEmbedCNN)                   | Build spaCy’s "standard" embedding layer, which uses hash embedding with subword features and a CNN with layer-normalized maxout. ~~Model[List[Doc], List[Floats2d]]~~                                                                                    |
 | [TransitionBasedParser](/api/architectures#TransitionBasedParser) | Build a [transition-based parser](https://explosion.ai/blog/parsing-english-in-python) model used in the default [`EntityRecognizer`](/api/entityrecognizer) and [`DependencyParser`](/api/dependencyparser). ~~Model[List[Docs], List[List[Floats2d]]]~~ |
-| [TextCatEnsemble](/api/architectures#TextCatEnsemble)             | Stacked ensemble of a bag-of-words model and a neural network model with an internal CNN embedding layer. Used in the default [`TextCategorizer`](/api/textcategorizer). ~~Model~~                                                                        |
+| [TextCatEnsemble](/api/architectures#TextCatEnsemble)             | Stacked ensemble of a bag-of-words model and a neural network model with an internal CNN embedding layer. Used in the default [`TextCategorizer`](/api/textcategorizer). ~~Model[List[Doc], Floats2d]~~                                                   |
 
 <!-- TODO: link to not yet existing usage page on custom architectures etc. -->
 
@@ -755,71 +742,10 @@ def filter_batch(size: int) -> Callable[[Iterable[Example]], Iterator[List[Examp
     return create_filtered_batches
 ```
 
-<!-- TODO:
-
-<Project id="example_pytorch_model">
-
-Lorem ipsum dolor sit amet, consectetur adipiscing elit. Phasellus interdum
-sodales lectus, ut sodales orci ullamcorper id. Sed condimentum neque ut erat
-mattis pretium.
-
-</Project>
-
- -->
-
 ### Defining custom architectures {#custom-architectures}
 
 <!-- TODO: this should probably move to new section on models -->
 
-## Transfer learning {#transfer-learning}
-
-<!-- TODO: write something, link to embeddings and transformers page – should probably wait until transformers/embeddings/transfer learning docs are done -->
-
-### Using transformer models like BERT {#transformers}
-
-spaCy v3.0 lets you use almost any statistical model to power your pipeline. You
-can use models implemented in a variety of frameworks. A transformer model is
-just a statistical model, so the
-[`spacy-transformers`](https://github.com/explosion/spacy-transformers) package
-actually has very little work to do: it just has to provide a few functions that
-do the required plumbing. It also provides a pipeline component,
-[`Transformer`](/api/transformer), that lets you do multi-task learning and lets
-you save the transformer outputs for later use.
-
-<!-- TODO:
-
-<Project id="en_core_trf_lg">
-
-Try out a BERT-based model pipeline using this project template: swap in your
-data, edit the settings and hyperparameters and train, evaluate, package and
-visualize your model.
-
-</Project>
--->
-
-For more details on how to integrate transformer models into your training
-config and customize the implementations, see the usage guide on
-[training transformers](/usage/embeddings-transformers#transformers-training).
-
-### Pretraining with spaCy {#pretraining}
-
-<!-- TODO: document spacy pretrain, objectives etc. – should probably wait until transformers/embeddings/transfer learning docs are done -->
-
-## Parallel Training with Ray {#parallel-training}
-
-<!-- TODO:
-
-
-<Project id="some_example_project">
-
-Lorem ipsum dolor sit amet, consectetur adipiscing elit. Phasellus interdum
-sodales lectus, ut sodales orci ullamcorper id. Sed condimentum neque ut erat
-mattis pretium.
-
-</Project>
-
--->
-
 ## Internal training API {#api}
 
 <Infobox variant="warning">
@@ -880,8 +806,8 @@ example = Example.from_dict(predicted, {"tags": tags})
 Here's another example that shows how to define gold-standard named entities.
 The letters added before the labels refer to the tags of the
 [BILUO scheme](/usage/linguistic-features#updating-biluo) – `O` is a token
-outside an entity, `U` a single entity unit, `B` the beginning of an entity,
-`I` a token inside an entity and `L` the last token of an entity.
+outside an entity, `U` a single entity unit, `B` the beginning of an entity, `I`
+a token inside an entity and `L` the last token of an entity.
 
 ```python
 doc = Doc(nlp.vocab, words=["Facebook", "released", "React", "in", "2014"])
diff --git a/website/src/styles/layout.sass b/website/src/styles/layout.sass
index b71eccd80..775523190 100644
--- a/website/src/styles/layout.sass
+++ b/website/src/styles/layout.sass
@@ -363,7 +363,7 @@ body [id]:target
         color: var(--color-red-medium)
         background: var(--color-red-transparent)
 
-    &.italic, &.comment
+    &.italic
         font-style: italic
 
 
@@ -384,11 +384,9 @@ body [id]:target
 // Settings for ini syntax (config files)
 [class*="language-ini"]
     color: var(--syntax-comment)
-    font-style: italic !important
 
     .token
         color: var(--color-subtle)
-        font-style: normal !important
 
 
 .gatsby-highlight-code-line
@@ -426,7 +424,6 @@ body [id]:target
 
     .cm-comment
         color: var(--syntax-comment)
-        font-style: italic
 
     .cm-keyword
         color: var(--syntax-keyword)