lightning/tests
ananthsub 851f9e3997
Move NaN/Inf detection to a separate utilities file (#6834)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2021-04-09 01:47:02 +02:00
..
accelerators [Model Parallel] Add configure sharded model hook (#6679) 2021-03-29 14:50:51 -06:00
base Do not add return dict items to callback_metrics (#6682) 2021-03-26 14:05:20 +01:00
callbacks Fix finetuning complex models correctly unfreezes. (#6880) 2021-04-08 12:59:06 +05:30
checkpointing Update Changelog for v1.2.7 (#6874) 2021-04-07 22:58:41 +00:00
core Support teardown hook on DataModule (#4673) 2021-03-25 07:51:55 -05:00
deprecated_api Move NaN/Inf detection to a separate utilities file (#6834) 2021-04-09 01:47:02 +02:00
helpers Remove legacy support for the magic `log`/`progress_bar` keys in dict returns (#6734) 2021-03-31 00:28:04 +02:00
loggers Fix support for symlink save_dir in TensorBoardLogger (#6730) 2021-04-06 11:36:25 +02:00
metrics Simplify deprecations (#6620) 2021-03-25 15:26:38 +01:00
models Add `Trainer(gradient_clip_algorithm='value'|'norm')` (#6123) 2021-04-06 08:27:37 -05:00
overrides Flash predict step (#6577) 2021-03-23 11:13:13 -04:00
plugins TPUSpawn + IterableDataset error message (#6875) 2021-04-08 19:57:48 +05:30
trainer Move NaN/Inf detection to a separate utilities file (#6834) 2021-04-09 01:47:02 +02:00
tuner Fix tuner.scale_batch_size not finding batch size attribute when using datamodule (#5968) 2021-03-14 09:16:19 +01:00
utilities [fix] Better support for rank_zero_only setting for SLURM and torchelastic (#6802) 2021-04-07 12:25:13 +01:00
README.md support python 3.9 (#4944) 2021-03-29 12:20:13 -04:00
__init__.py fixing examples (#6600) 2021-03-20 18:58:59 +00:00
collect_env_details.py add copyright to tests (#5143) 2021-01-05 09:57:37 +01:00
conftest.py CI: fixture for global rank variable reset (#6839) 2021-04-06 09:37:17 -07:00
mnode_tests.txt Mnodes (#5020) 2021-02-04 20:55:40 +01:00
special_tests.sh DeepSpeed ZeRO Update (#6546) 2021-03-30 13:39:02 -04:00
test_profiler.py Add PyTorch 1.8 Profiler 5/5 (#6618) 2021-03-23 20:43:21 +00:00

README.md

PyTorch-Lightning Tests

Most PL tests train a full MNIST model under various trainer conditions (ddp, ddp2+amp, etc...). This provides testing for most combinations of important settings. The tests expect the model to perform to a reasonable degree of testing accuracy to pass.

Running tests

The automatic travis tests ONLY run CPU-based tests. Although these cover most of the use cases, run on a 2-GPU machine to validate the full test-suite.

To run all tests do the following:

Install Open MPI or another MPI implementation. Learn how to install Open MPI on this page.

git clone https://github.com/PyTorchLightning/pytorch-lightning
cd pytorch-lightning

# install AMP support
bash requirements/install_Apex.sh

# install dev deps
pip install -r requirements/devel.txt

# run tests
py.test -v

To test models that require GPU make sure to run the above command on a GPU machine. The GPU machine must have:

  1. At least 2 GPUs.
  2. NVIDIA-apex installed.
  3. Horovod with NCCL support: HOROVOD_GPU_OPERATIONS=NCCL pip install horovod

Running Coverage

Make sure to run coverage on a GPU machine with at least 2 GPUs and NVIDIA apex installed.

cd pytorch-lightning

# generate coverage (coverage is also installed as part of dev dependencies under requirements/devel.txt)
coverage run --source pytorch_lightning -m py.test pytorch_lightning tests examples -v

# print coverage stats
coverage report -m

# exporting results
coverage xml

Building test image

You can build it on your own, note it takes lots of time, be prepared.

git clone <git-repository>
docker image build -t pytorch_lightning:devel-torch1.4 -f dockers/cuda-extras/Dockerfile --build-arg TORCH_VERSION=1.4 .

To build other versions, select different Dockerfile.

docker image list
docker run --rm -it pytorch_lightning:devel-torch1.4 bash
docker image rm pytorch_lightning:devel-torch1.4