History

Rohit Gupta 8a8ecb8d01 Update the logic to check for accumulation steps with deepspeed (#9826 ) * support_dict * chlog * fix test * epochs		2021-10-06 17:50:10 +01:00
..
accelerators	Update the logic to check for accumulation steps with deepspeed (#9826 )	2021-10-06 17:50:10 +01:00
base	Keep global step update in the loop (#8856 )	2021-09-14 19:21:39 +05:30
callbacks	rename callback FineTune arg `round` (#9711 )	2021-10-06 09:39:36 +01:00
checkpointing	Fix restoring training state during `trainer.fit` only (#9413 )	2021-10-06 14:57:40 +00:00
core	Fix `ResultCollection._get_cache` with multielement tensors (#9582 )	2021-09-22 14:03:20 +01:00
deprecated_api	Enable auto parameters tying for TPUs (#9525 )	2021-10-06 10:16:44 +02:00
helpers	Fix restoring training state during `trainer.fit` only (#9413 )	2021-10-06 14:57:40 +00:00
loggers	Add `enable_progress_bar` to Trainer constructor (#9664 )	2021-09-24 22:53:31 -07:00
loops	[Feat] Add auto_restart for fault tolerant training (#9722 )	2021-10-01 16:37:17 +00:00
models	Fix restoring training state during `trainer.fit` only (#9413 )	2021-10-06 14:57:40 +00:00
overrides	Add support for `torch.use_deterministic_algorithms` (#9121 )	2021-09-30 04:40:09 +00:00
plugins	Update the logic to check for accumulation steps with deepspeed (#9826 )	2021-10-06 17:50:10 +01:00
profiler	Remove unnecessary `pytest.param` usage (#9760 )	2021-09-30 02:42:11 +00:00
trainer	Fix restoring training state during `trainer.fit` only (#9413 )	2021-10-06 14:57:40 +00:00
tuner	Fix some flaky tests in tuner/lr_finder (#9766 )	2021-10-01 11:15:16 +05:30
utilities	Enable auto parameters tying for TPUs (#9525 )	2021-10-06 10:16:44 +02:00
README.md	CI: add mdformat (#8673 )	2021-08-03 18:19:09 +00:00
__init__.py	Replace `yapf` with `black` (#7783 )	2021-07-26 13:37:35 +02:00
conftest.py	Add support for `torch.use_deterministic_algorithms` (#9121 )	2021-09-30 04:40:09 +00:00
mnode_tests.txt	Mnodes (#5020 )	2021-02-04 20:55:40 +01:00
special_tests.sh	Skip reconciliate_processes if used within a cluster environment that creates processes externally (#9389 )	2021-09-15 11:54:17 +01:00

README.md

PyTorch-Lightning Tests

Most PL tests train a full MNIST model under various trainer conditions (ddp, ddp2+amp, etc...). This provides testing for most combinations of important settings. The tests expect the model to perform to a reasonable degree of testing accuracy to pass.

Running tests

git clone https://github.com/PyTorchLightning/pytorch-lightning
cd pytorch-lightning

# install dev deps
pip install -r requirements/devel.txt

# run tests
py.test -v

To test models that require GPU make sure to run the above command on a GPU machine. The GPU machine must have at least 2 GPUs to run distributed tests.

Note that this setup will not run tests that require specific packages installed such as Horovod, FairScale, NVIDIA/apex, NVIDIA/DALI, etc. You can rely on our CI to make sure all these tests pass.

Running Coverage

Make sure to run coverage on a GPU machine with at least 2 GPUs and NVIDIA apex installed.

cd pytorch-lightning

# generate coverage (coverage is also installed as part of dev dependencies under requirements/devel.txt)
coverage run --source pytorch_lightning -m py.test pytorch_lightning tests examples -v

# print coverage stats
coverage report -m

# exporting results
coverage xml

Building test image

You can build it on your own, note it takes lots of time, be prepared.

git clone <git-repository>
docker image build -t pytorch_lightning:devel-torch1.9 -f dockers/cuda-extras/Dockerfile --build-arg TORCH_VERSION=1.9 .

To build other versions, select different Dockerfile.

docker image list
docker run --rm -it pytorch_lightning:devel-torch1.9 bash
docker image rm pytorch_lightning:devel-torch1.9