lightning/dockers
Lezwon Castelino 12cb9942a1
Tpu save (#4309)
* convert xla tensor to cpu before save

* move_to_cpu

* updated CHANGELOG.md

* added on_save to accelerators

* if accelerator is not None

* refactors

* change filename to run test

* run test_tpu_backend

* added xla_device_utils to tests

* added xla_device_utils to test

* removed tests

* Revert "added xla_device_utils to test"

This reverts commit 0c9316bb

* fixed pep

* increase timeout and print traceback

* lazy check tpu exists

* increased timeout
removed barrier for tpu during test
reduced epochs

* fixed torch_xla imports

* fix tests

* define xla utils

* fix test

* aval

* chlog

* docs

* aval

* Apply suggestions from code review

Co-authored-by: Jirka Borovec <jirka.borovec@seznam.cz>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: chaton <thomas@grid.ai>
Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
2020-12-02 13:05:11 +00:00
..
base-conda drop fairscale for PT <= 1.4 (#4910) 2020-11-30 23:19:30 +00:00
base-cuda drop fairscale for PT <= 1.4 (#4910) 2020-11-30 23:19:30 +00:00
base-xla [dockers] install nvidia-dali-cudaXXX (#4532) 2020-11-09 21:18:24 +06:30
release Drone: use nightly build cuda docker images (#3658) 2020-10-26 10:47:09 +00:00
tpu-tests Tpu save (#4309) 2020-12-02 13:05:11 +00:00
README.md [dockers] install nvidia-dali-cudaXXX (#4532) 2020-11-09 21:18:24 +06:30

README.md

Docker images

Builds images form attached Dockerfiles

You can build it on your own, note it takes lots of time, be prepared.

git clone <git-repository>
docker image build -t pytorch-lightning:latest -f dockers/conda/Dockerfile .

or with specific arguments

git clone <git-repository>
docker image build \
    -t pytorch-lightning:py3.8-pt1.6 \
    -f dockers/base-cuda/Dockerfile \
    --build-arg PYTHON_VERSION=3.8 \
    --build-arg PYTORCH_VERSION=1.6 \
    .

To run your docker use

docker image list
docker run --rm -it pytorch-lightning:latest bash

and if you do not need it anymore, just clean it:

docker image list
docker image rm pytorch-lightning:latest

Run docker image with GPUs

To run docker image with access to you GPUs you need to install

# Add the package repositories
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list

sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
sudo systemctl restart docker

and later run the docker image with --gpus all so for example

docker run --rm -it --gpus all pytorchlightning/pytorch_lightning:base-cuda-py3.7-torch1.6