lightning/tests
Justus Schock f23f5e5648
Fix DP Logging Aggregation (#4138)
* add option to step result to do aggregation on a specific device

* in dp: do aggregation on root gpu

* Update CHANGELOG.md

* pep8

* trailing whitespace

* move to root


move result


stupid result object


revert to master


undo import


add "to" method to result


generalize to


try a test


try a test


Revert "try a test"

This reverts commit 22e3c1001e6c5774ea18ad925830304c245bf145.

Revert "try a test"

This reverts commit 4d2d8fb2a52d552894809a0cbe51af126d78f070.

new test


max epochs


super epoch end 


log in test


hanging test


undo test


initial test that fails on master


step end


pass


step end


step end


epoch end


print


step


check dev


clean up test


sanity check


wtf is go ing on


frustration


debugging test


test


test


test


test


test


test


test


test


unused import

* move chlog entry

* clean

* remove outdated changes

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Jirka Borovec <jirka.borovec@seznam.cz>
2020-12-04 19:10:07 +01:00
..
backends Tpu save (#4309) 2020-12-02 13:05:11 +00:00
base CI cleaning (#4941) 2020-12-02 10:00:05 +00:00
callbacks [TEST] Min steps override early stopping (#4283) 2020-12-04 17:10:14 +01:00
checkpointing optimizer clean up (#4658) 2020-12-01 00:09:46 +00:00
core optimizer clean up (#4658) 2020-12-01 00:09:46 +00:00
loggers [bugfix] Accumulated_gradient and TensoBoard (#4738) 2020-11-25 19:44:05 +00:00
metrics Update test for logging a metric object and state reset (#4825) 2020-11-24 11:28:02 +01:00
models refactor imports of optional dependencies (#4859) 2020-12-04 10:26:10 +01:00
plugins Allow string plugins (#4888) 2020-12-01 20:30:49 +00:00
trainer Fix DP Logging Aggregation (#4138) 2020-12-04 19:10:07 +01:00
tuner fix: `nb` is set total number of devices, when nb is -1. (#4209) 2020-10-29 10:50:37 +01:00
utilities Tpu save (#4309) 2020-12-02 13:05:11 +00:00
README.md Horovod: fixed early stopping and added metrics aggregation (#3775) 2020-11-05 12:52:02 -05:00
__init__.py CI cleaning (#4941) 2020-12-02 10:00:05 +00:00
collect_env_details.py
conftest.py Apply import formatting to files in the 2nd top level (#4717) 2020-11-18 00:29:09 +01:00
test_deprecated.py [docs] Added description of saving using ddp (#4660) 2020-12-04 17:59:38 +01:00
test_profiler.py update 2020-11-27 17:48:51 +00:00

README.md

PyTorch-Lightning Tests

Most PL tests train a full MNIST model under various trainer conditions (ddp, ddp2+amp, etc...). This provides testing for most combinations of important settings. The tests expect the model to perform to a reasonable degree of testing accuracy to pass.

Running tests

The automatic travis tests ONLY run CPU-based tests. Although these cover most of the use cases, run on a 2-GPU machine to validate the full test-suite.

To run all tests do the following:

Install Open MPI or another MPI implementation. Learn how to install Open MPI on this page.

git clone https://github.com/PyTorchLightning/pytorch-lightning
cd pytorch-lightning

# install AMP support
bash requirements/install_AMP.sh

# install dev deps
pip install -r requirements/devel.txt

# run tests
py.test -v

To test models that require GPU make sure to run the above command on a GPU machine. The GPU machine must have:

  1. At least 2 GPUs.
  2. NVIDIA-apex installed.
  3. Horovod with NCCL support: HOROVOD_GPU_OPERATIONS=NCCL pip install horovod

Running Coverage

Make sure to run coverage on a GPU machine with at least 2 GPUs and NVIDIA apex installed.

cd pytorch-lightning

# generate coverage (coverage is also installed as part of dev dependencies under requirements/devel.txt)
coverage run --source pytorch_lightning -m py.test pytorch_lightning tests examples -v

# print coverage stats
coverage report -m

# exporting results
coverage xml

Building test image

You can build it on your own, note it takes lots of time, be prepared.

git clone <git-repository>
docker image build -t pytorch_lightning:devel-torch1.4 -f dockers/cuda-extras/Dockerfile --build-arg TORCH_VERSION=1.4 .

To build other versions, select different Dockerfile.

docker image list
docker run --rm -it pytorch_lightning:devel-torch1.4 bash
docker image rm pytorch_lightning:devel-torch1.4