History

Carlos Mocholí 375ab53861 Migrate TPU tests to GitHub actions (#14687 ) * Migrate TPU tests to GitHub actions * No working dir * Keep _target * Dont skip draft * CHECK_SLEEP * Not yet * Remove recurrent cleanup script * Set secrets * a step cannot have both the `uses` and `run` keys * Version $PYTHON_VER was not found in the local cache * can't load package ... ($GOPATH not set) * The `set-env` command is disabled * Try updating go * Match timeout * simplify path * More cleanup * Install coverage. Unmark draft * Update .github/workflows/ci-pytorch-test-tpu.yml * DEBUG echo * Revert "DEBUG echo" This reverts commit `4011856e6e`. * More debug * SSH * Im stupid * Remove always() * Forgot some Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Luca Antiga <luca.antiga@gmail.com>		2022-10-21 20:01:39 +02:00
..
base-cuda	Add ColossalAI strategy (#14224 )	2022-10-11 13:59:09 +02:00
base-ipu	Improve building times of IPU docker image (#14934 )	2022-09-29 09:55:12 +00:00
base-xla	Assistant fixes (#15221 )	2022-10-20 18:23:47 +00:00
ci-runner-hpu	CI: Use self-hosted Azure GPU runners (#14632 )	2022-10-05 10:43:54 +00:00
ci-runner-ipu	CI: Use self-hosted Azure GPU runners (#14632 )	2022-10-05 10:43:54 +00:00
nvidia	CI: enable CI run for PT 1.13 (#15128 )	2022-10-20 08:33:56 +00:00
release	Add ColossalAI strategy (#14224 )	2022-10-11 13:59:09 +02:00
tpu-tests	Migrate TPU tests to GitHub actions (#14687 )	2022-10-21 20:01:39 +02:00
README.md	CI: enable CI run for PT 1.13 (#15128 )	2022-10-20 08:33:56 +00:00

README.md

Docker images

Build images from Dockerfiles

You can build it on your own, note it takes lots of time, be prepared.

git clone https://github.com/Lightning-AI/lightning.git

# build with the default arguments
docker image build -t pytorch-lightning:latest -f dockers/base-cuda/Dockerfile .

# build with specific arguments
docker image build -t pytorch-lightning:base-cuda-py3.9-torch1.12-cuda11.6.1 -f dockers/base-cuda/Dockerfile --build-arg PYTHON_VERSION=3.9 --build-arg PYTORCH_VERSION=1.12 --build-arg CUDA_VERSION=11.6.1 .

To run your docker use

docker image list
docker run --rm -it pytorch-lightning:latest bash

and if you do not need it anymore, just clean it:

docker image list
docker image rm pytorch-lightning:latest

Run docker image with GPUs

To run docker image with access to your GPUs, you need to install

# Add the package repositories
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list

sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
sudo systemctl restart docker

and later run the docker image with --gpus all. For example,

docker run --rm -it --gpus all pytorchlightning/pytorch_lightning:base-cuda-py3.9-torch1.12-cuda11.6.1

Run Jupyter server

Inspiration comes from https://u.group/thinking/how-to-put-jupyter-notebooks-in-a-dockerfile

Build the docker image:

docker image build -t pytorch-lightning:v1.6.5 -f dockers/nvidia/Dockerfile --build-arg LIGHTNING_VERSION=1.6.5 .

start the server and map ports:

docker run --rm -it --gpus=all -p 8888:8888 pytorch-lightning:v1.6.5

Connect in local browser:
- copy the generated path e.g. http://hostname:8888/?token=0719fa7e1729778b0cec363541a608d5003e26d4910983c6
- replace the hostname by localhost