History

Justus Schock f23f5e5648 Fix DP Logging Aggregation (#4138 ) * add option to step result to do aggregation on a specific device * in dp: do aggregation on root gpu * Update CHANGELOG.md * pep8 * trailing whitespace * move to root move result stupid result object revert to master undo import add "to" method to result generalize to try a test try a test Revert "try a test" This reverts commit 22e3c1001e6c5774ea18ad925830304c245bf145. Revert "try a test" This reverts commit 4d2d8fb2a52d552894809a0cbe51af126d78f070. new test max epochs super epoch end log in test hanging test undo test initial test that fails on master step end pass step end step end epoch end print step check dev clean up test sanity check wtf is go ing on frustration debugging test test test test test test test test test unused import * move chlog entry * clean * remove outdated changes Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: Jirka Borovec <jirka.borovec@seznam.cz>		2020-12-04 19:10:07 +01:00
..
backends	Tpu save (#4309 )	2020-12-02 13:05:11 +00:00
base	CI cleaning (#4941 )	2020-12-02 10:00:05 +00:00
callbacks	[TEST] Min steps override early stopping (#4283 )	2020-12-04 17:10:14 +01:00
checkpointing	optimizer clean up (#4658 )	2020-12-01 00:09:46 +00:00
core	optimizer clean up (#4658 )	2020-12-01 00:09:46 +00:00
loggers	[bugfix] Accumulated_gradient and TensoBoard (#4738 )	2020-11-25 19:44:05 +00:00
metrics	Update test for logging a metric object and state reset (#4825 )	2020-11-24 11:28:02 +01:00
models	refactor imports of optional dependencies (#4859 )	2020-12-04 10:26:10 +01:00
plugins	Allow string plugins (#4888 )	2020-12-01 20:30:49 +00:00
trainer	Fix DP Logging Aggregation (#4138 )	2020-12-04 19:10:07 +01:00
tuner	fix: `nb` is set total number of devices, when nb is -1. (#4209 )	2020-10-29 10:50:37 +01:00
utilities	Tpu save (#4309 )	2020-12-02 13:05:11 +00:00
README.md	Horovod: fixed early stopping and added metrics aggregation (#3775 )	2020-11-05 12:52:02 -05:00
__init__.py	CI cleaning (#4941 )	2020-12-02 10:00:05 +00:00
collect_env_details.py	…
conftest.py	Apply import formatting to files in the 2nd top level (#4717 )	2020-11-18 00:29:09 +01:00
test_deprecated.py	[docs] Added description of saving using ddp (#4660 )	2020-12-04 17:59:38 +01:00
test_profiler.py	update	2020-11-27 17:48:51 +00:00

README.md

PyTorch-Lightning Tests

Most PL tests train a full MNIST model under various trainer conditions (ddp, ddp2+amp, etc...). This provides testing for most combinations of important settings. The tests expect the model to perform to a reasonable degree of testing accuracy to pass.

Running tests

The automatic travis tests ONLY run CPU-based tests. Although these cover most of the use cases, run on a 2-GPU machine to validate the full test-suite.

To run all tests do the following:

Install Open MPI or another MPI implementation. Learn how to install Open MPI on this page.

git clone https://github.com/PyTorchLightning/pytorch-lightning
cd pytorch-lightning

# install AMP support
bash requirements/install_AMP.sh

# install dev deps
pip install -r requirements/devel.txt

# run tests
py.test -v

To test models that require GPU make sure to run the above command on a GPU machine. The GPU machine must have:

At least 2 GPUs.
NVIDIA-apex installed.
Horovod with NCCL support: HOROVOD_GPU_OPERATIONS=NCCL pip install horovod

Running Coverage

Make sure to run coverage on a GPU machine with at least 2 GPUs and NVIDIA apex installed.

cd pytorch-lightning

# generate coverage (coverage is also installed as part of dev dependencies under requirements/devel.txt)
coverage run --source pytorch_lightning -m py.test pytorch_lightning tests examples -v

# print coverage stats
coverage report -m

# exporting results
coverage xml

Building test image

You can build it on your own, note it takes lots of time, be prepared.

git clone <git-repository>
docker image build -t pytorch_lightning:devel-torch1.4 -f dockers/cuda-extras/Dockerfile --build-arg TORCH_VERSION=1.4 .

To build other versions, select different Dockerfile.

docker image list
docker run --rm -it pytorch_lightning:devel-torch1.4 bash
docker image rm pytorch_lightning:devel-torch1.4