docs: minor spelling tweaks (#5022)
This commit is contained in:
parent
6d2aeff26a
commit
ddd3eda26f
|
@ -21,7 +21,7 @@ To link up arbitrary hardware, implement your own Accelerator subclass
|
|||
class MyAccelerator(Accelerator):
|
||||
def __init__(self, trainer, cluster_environment=None):
|
||||
super().__init__(trainer, cluster_environment)
|
||||
self.nickname = 'my_accelator'
|
||||
self.nickname = 'my_accelerator'
|
||||
|
||||
def setup(self):
|
||||
# find local rank, etc, custom things to implement
|
||||
|
|
|
@ -324,13 +324,13 @@ that are included with NeMo:
|
|||
- `Language Modeling (BERT Pretraining) <https://github.com/NVIDIA/NeMo/blob/v1.0.0b1/tutorials/nlp/01_Pretrained_Language_Models_for_Downstream_Tasks.ipynb>`_
|
||||
- `Question Answering <https://github.com/NVIDIA/NeMo/blob/v1.0.0b1/tutorials/nlp/Question_Answering_Squad.ipynb>`_
|
||||
- `Text Classification <https://github.com/NVIDIA/NeMo/tree/v1.0.0b1/examples/nlp/text_classification>`_ (including Sentiment Analysis)
|
||||
- `Token Classifcation <https://github.com/NVIDIA/NeMo/tree/v1.0.0b1/examples/nlp/token_classification>`_ (including Named Entity Recognition)
|
||||
- `Token Classification <https://github.com/NVIDIA/NeMo/tree/v1.0.0b1/examples/nlp/token_classification>`_ (including Named Entity Recognition)
|
||||
- `Punctuation and Capitalization <https://github.com/NVIDIA/NeMo/blob/v1.0.0b1/tutorials/nlp/Punctuation_and_Capitalization.ipynb>`_
|
||||
|
||||
Named Entity Recognition (NER)
|
||||
------------------------------
|
||||
|
||||
NER (or more generally token classifcation) is the NLP task of detecting and classifying key information (entities) in text.
|
||||
NER (or more generally token classification) is the NLP task of detecting and classifying key information (entities) in text.
|
||||
This task is very popular in Healthcare and Finance. In finance, for example, it can be important to identify
|
||||
geographical, geopolitical, organizational, persons, events, and natural phenomenon entities.
|
||||
See this `NER notebook <https://github.com/NVIDIA/NeMo/blob/v1.0.0b1/tutorials/nlp/Token_Classification_Named_Entity_Recognition.ipynb>`_
|
||||
|
@ -435,7 +435,7 @@ Hydra makes every aspect of the NeMo model, including the PyTorch Lightning Trai
|
|||
Tokenizers
|
||||
----------
|
||||
|
||||
Tokenization is the process of converting natural langauge text into integer arrays
|
||||
Tokenization is the process of converting natural language text into integer arrays
|
||||
which can be used for machine learning.
|
||||
For NLP tasks, tokenization is an essential part of data preprocessing.
|
||||
NeMo supports all BERT-like model tokenizers from
|
||||
|
@ -462,7 +462,7 @@ Much of the state-of-the-art in natural language processing is achieved
|
|||
by fine-tuning pretrained language models on the downstream task.
|
||||
|
||||
With NeMo, you can either `pretrain <https://github.com/NVIDIA/NeMo/blob/v1.0.0b1/examples/nlp/language_modeling/bert_pretraining.py>`_
|
||||
a BERT model on your data or use a pretrained lanugage model from `HuggingFace Transformers <https://github.com/huggingface/transformers>`_
|
||||
a BERT model on your data or use a pretrained language model from `HuggingFace Transformers <https://github.com/huggingface/transformers>`_
|
||||
or `NVIDIA Megatron-LM <https://github.com/NVIDIA/Megatron-LM>`_.
|
||||
|
||||
To see the list of language models available in NeMo:
|
||||
|
|
|
@ -46,7 +46,7 @@ Example 1: Pretrained, prebuilt models
|
|||
Example 2: Extend for faster research
|
||||
-------------------------------------
|
||||
Bolts are contributed with benchmarks and continuous-integration tests. This means
|
||||
you can trust the implementations and use them to bootstrap your resarch much faster.
|
||||
you can trust the implementations and use them to bootstrap your research much faster.
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
|
|
|
@ -10,7 +10,7 @@ Loggers
|
|||
*******
|
||||
|
||||
Lightning supports the most popular logging frameworks (TensorBoard, Comet, etc...). TensorBoard is used by default,
|
||||
but you can pass to the :class:`~pytorch_lightning.trainer.trainer.Trainer` any combintation of the following loggers.
|
||||
but you can pass to the :class:`~pytorch_lightning.trainer.trainer.Trainer` any combination of the following loggers.
|
||||
|
||||
.. note::
|
||||
|
||||
|
|
|
@ -102,7 +102,7 @@ method of the trainer. A typical example of this would look like
|
|||
trainer.fit(model)
|
||||
|
||||
The figure produced by ``lr_finder.plot()`` should look something like the figure
|
||||
below. It is recommended to not pick the learning rate that achives the lowest
|
||||
below. It is recommended to not pick the learning rate that achieves the lowest
|
||||
loss, but instead something in the middle of the sharpest downward slope (red point).
|
||||
This is the point returned py ``lr_finder.suggestion()``.
|
||||
|
||||
|
|
|
@ -17,7 +17,7 @@ common metric implementations.
|
|||
|
||||
The metrics API provides ``update()``, ``compute()``, ``reset()`` functions to the user. The metric base class inherits
|
||||
``nn.Module`` which allows us to call ``metric(...)`` directly. The ``forward()`` method of the base ``Metric`` class
|
||||
serves the dual purpose of calling ``update()`` on its input and simultanously returning the value of the metric over the
|
||||
serves the dual purpose of calling ``update()`` on its input and simultaneously returning the value of the metric over the
|
||||
provided input.
|
||||
|
||||
These metrics work with DDP in PyTorch and PyTorch Lightning by default. When ``.compute()`` is called in
|
||||
|
|
|
@ -224,7 +224,7 @@ The accelerator backend to use (previously known as distributed_backend).
|
|||
- (```ddp```) is DistributedDataParallel (each gpu on each node trains, and syncs grads)
|
||||
- (```ddp_cpu```) is DistributedDataParallel on CPU (same as `ddp`, but does not use GPUs.
|
||||
Useful for multi-node CPU training or single-node debugging. Note that this will **not** give
|
||||
a speedup on a single node, since Torch already makes effient use of multiple CPUs on a single
|
||||
a speedup on a single node, since Torch already makes efficient use of multiple CPUs on a single
|
||||
machine.)
|
||||
- (```ddp2```) dp on node, ddp across nodes. Useful for things like increasing
|
||||
the number of negative samples
|
||||
|
@ -982,7 +982,7 @@ Number of processes to train with. Automatically set to the number of GPUs
|
|||
when using ``accelerator="ddp"``. Set to a number greater than 1 when
|
||||
using ``accelerator="ddp_cpu"`` to mimic distributed training on a
|
||||
machine without GPUs. This is useful for debugging, but **will not** provide
|
||||
any speedup, since single-process Torch already makes effient use of multiple
|
||||
any speedup, since single-process Torch already makes efficient use of multiple
|
||||
CPUs.
|
||||
|
||||
.. testcode::
|
||||
|
|
|
@ -110,11 +110,11 @@ The algorithm in short works by:
|
|||
2. Iteratively until convergence or maximum number of tries `max_trials` (default 25) has been reached:
|
||||
- Call `fit()` method of trainer. This evaluates `steps_per_trial` (default 3) number of
|
||||
training steps. Each training step can trigger an OOM error if the tensors
|
||||
(training batch, weights, gradients ect.) allocated during the steps have a
|
||||
(training batch, weights, gradients, etc.) allocated during the steps have a
|
||||
too large memory footprint.
|
||||
- If an OOM error is encountered, decrease batch size else increase it.
|
||||
How much the batch size is increased/decreased is determined by the choosen
|
||||
stratrgy.
|
||||
How much the batch size is increased/decreased is determined by the chosen
|
||||
strategy.
|
||||
3. The found batch size is saved to either `model.batch_size` or `model.hparams.batch_size`
|
||||
4. Restore the initial state of model and trainer
|
||||
|
||||
|
|
Loading…
Reference in New Issue