lightning/dockers
Jirka Borovec 64e8e8eb4b
CI: debug HPU flow (#13419)
* Update the hpu-tests.yml to pull docker from vault
* fire & sudo
* habana-gaudi-hpus
* Check the driver status on gaudi server (#13718)

Co-authored-by: arao <arao@habana.ai>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Akarsha Rao <94624926+raoakarsha@users.noreply.github.com>
2022-07-20 12:35:01 +02:00
..
base-conda add testing PT 1.12 (#13386) 2022-07-15 19:41:23 +02:00
base-cuda Future 5/n: Move requirements (#13306) 2022-06-21 17:11:33 +02:00
base-ipu CI: fix requirements freeze (#13441) 2022-06-29 09:35:57 -04:00
base-xla Future 5/n: Move requirements (#13306) 2022-06-21 17:11:33 +02:00
ci-runner-hpu CI: debug HPU flow (#13419) 2022-07-20 12:35:01 +02:00
ci-runner-ipu Drop PyTorch 1.8 support (#13155) 2022-06-14 20:46:44 -04:00
nvidia bump base NGC image (#13346) 2022-07-15 21:36:19 +00:00
release fix PL release docker (#13439) 2022-06-29 19:36:36 +02:00
tpu-tests CI: debug TPU failing tests (#13679) 2022-07-15 17:40:04 -04:00
README.md build more dockers & slack fails (#12675) 2022-04-13 17:24:08 +02:00

README.md

Docker images

Builds images form attached Dockerfiles

You can build it on your own, note it takes lots of time, be prepared.

git clone <git-repository>
docker image build -t pytorch-lightning:latest -f dockers/conda/Dockerfile .

or with specific arguments

git clone <git-repository>
docker image build \
    -t pytorch-lightning:base-cuda-py3.9-pt1.10 \
    -f dockers/base-cuda/Dockerfile \
    --build-arg PYTHON_VERSION=3.9 \
    --build-arg PYTORCH_VERSION=1.10 \
    .

or nightly version from Conda

git clone <git-repository>
docker image build \
    -t pytorch-lightning:base-conda-py3.9-pt1.11 \
    -f dockers/base-conda/Dockerfile \
    --build-arg PYTHON_VERSION=3.9 \
    --build-arg PYTORCH_VERSION=1.11 \
    .

To run your docker use

docker image list
docker run --rm -it pytorch-lightning:latest bash

and if you do not need it anymore, just clean it:

docker image list
docker image rm pytorch-lightning:latest

Run docker image with GPUs

To run docker image with access to you GPUs you need to install

# Add the package repositories
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list

sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
sudo systemctl restart docker

and later run the docker image with --gpus all so for example

docker run --rm -it --gpus all pytorchlightning/pytorch_lightning:base-cuda-py3.9-torch1.10

Run Jupyter server

Inspiration comes from https://u.group/thinking/how-to-put-jupyter-notebooks-in-a-dockerfile

  1. Build the docker image:
    docker image build \
        -t pytorch-lightning:v1.3.1 \
        -f dockers/nvidia/Dockerfile \
        --build-arg LIGHTNING_VERSION=1.3.1 \
        .
    
  2. start the server and map ports:
    docker run --rm -it --runtime=nvidia -e NVIDIA_VISIBLE_DEVICES=all -p 8888:8888 pytorch-lightning:v1.3.1
    
  3. Connect in local browser:
    • copy the generated path e.g. http://hostname:8888/?token=0719fa7e1729778b0cec363541a608d5003e26d4910983c6
    • replace the hostname by localhost