lightning/tests
Lezwon Castelino 9446390779
fix TPU parsing and TPU tests (#2094)
* added tpu params test

* added tests

* removed xla imports

* added test cases for TPU

* fix pep 8 issues

* refactorings and comments

* add message to MisconfigurationException

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* test if device is set correctly

* added TPU device check
removed mark.spawn

* removed device selection

* remove xla_device call

* readded spawn due to test failures

* add TODO for tpu check

* Apply suggestions from code review

* Apply suggestions from code review

* flake8

* added tpu args to cli tests

* added support for tpu_core selection via cli

* fixed flake formatting

* replaced default_save_path with default_root_dir

* added check for data type for tpu_cores

* fixed flake indent

* protected

* protected

* added tpu params test

* added tests

* removed xla imports

* test if device is set correctly

* added support for tpu_core selection via cli

* replaced default_save_path with default_root_dir

* added check for data type for tpu_cores

* chlog

* fixed tpu cores error

* rebased with latest changes

* flake fix

* Update pytorch_lightning/trainer/distrib_parts.py

added suggesstion

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Jirka <jirka@pytorchlightning.ai>
2020-06-23 12:06:57 -04:00
..
base Add missing test for "multiple dataloader + percent_check fix" (#2226) 2020-06-23 11:21:24 -04:00
callbacks Revert/Fix: epoch indexing from 1, to be from 0 (#2289) 2020-06-19 23:39:53 -04:00
core Fix summary hook handles not getting removed (#2298) 2020-06-20 07:38:47 -04:00
loggers replace train_percent_check with limit_train_batches (#2220) 2020-06-17 13:42:28 -04:00
metrics Fix ROC metric for CUDA tensors (#2304) 2020-06-23 15:19:16 +02:00
models fix TPU parsing and TPU tests (#2094) 2020-06-23 12:06:57 -04:00
trainer fix TPU parsing and TPU tests (#2094) 2020-06-23 12:06:57 -04:00
utilities New metric classes (#1326) (#1877) 2020-05-19 11:05:07 -04:00
Dockerfile clean requirements (#2128) 2020-06-13 10:15:22 -04:00
README.md clean requirements (#2128) 2020-06-13 10:15:22 -04:00
__init__.py default test logger (#1478) 2020-04-21 20:33:10 -04:00
collect_env_details.py cleaning (#2030) 2020-06-04 11:25:07 -04:00
conftest.py cleaning tests (#2201) 2020-06-15 22:03:40 -04:00
install_AMP.sh CI: split tests-examples (#990) 2020-03-25 07:46:27 -04:00
test_deprecated.py Add missing test for "multiple dataloader + percent_check fix" (#2226) 2020-06-23 11:21:24 -04:00
test_profiler.py RC & Docs/changelog (#1776) 2020-05-11 21:57:53 -04:00

README.md

PyTorch-Lightning Tests

Most PL tests train a full MNIST model under various trainer conditions (ddp, ddp2+amp, etc...). This provides testing for most combinations of important settings. The tests expect the model to perform to a reasonable degree of testing accuracy to pass.

Running tests

The automatic travis tests ONLY run CPU-based tests. Although these cover most of the use cases, run on a 2-GPU machine to validate the full test-suite.

To run all tests do the following:

Install Open MPI or another MPI implementation. Learn how to install Open MPI on this page.

git clone https://github.com/PyTorchLightning/pytorch-lightning
cd pytorch-lightning

# install AMP support
bash tests/install_AMP.sh

# install dev deps
pip install -r requirements/devel.txt

# run tests
py.test -v

To test models that require GPU make sure to run the above command on a GPU machine. The GPU machine must have:

  1. At least 2 GPUs.
  2. NVIDIA-apex installed.
  3. Horovod with NCCL support: HOROVOD_GPU_ALLREDUCE=NCCL HOROVOD_GPU_BROADCAST=NCCL pip install horovod

Running Coverage

Make sure to run coverage on a GPU machine with at least 2 GPUs and NVIDIA apex installed.

cd pytorch-lightning

# generate coverage (coverage is also installed as part of dev dependencies under requirements/devel.txt)
coverage run --source pytorch_lightning -m py.test pytorch_lightning tests examples -v

# print coverage stats
coverage report -m

# exporting results
coverage xml

Building test image

You can build it on your own, note it takes lots of time, be prepared.

git clone <git-repository>
docker image build -t pytorch_lightning:devel-pt_1_4 -f tests/Dockerfile --build-arg TORCH_VERSION=1.4 .

To build other versions, select different Dockerfile.

docker image list
docker run --rm -it pytorch_lightning:devel-pt_1_4 bash
docker image rm pytorch_lightning:devel-pt_1_4