History

thomas chaton 1302766f83 DeepSpeed ZeRO Update (#6546 ) * Add context to call hook to handle all modules defined within the hook * Expose some additional parameters * Added docs, exposed parameters * Make sure we only configure if necessary * Setup activation checkpointing regardless, saves the user having to do it manually * Add some tests that fail currently * update * update * update * add tests * change docstring * resolve accumulate_grad_batches * resolve flake8 * Update DeepSpeed to use latest version, add some comments * add metrics * update * Small formatting fixes, clean up some code * Few cleanups * No need for default state * Fix tests, add some boilerplate that should move eventually * Add hook removal * Add a context manager to handle hook * Small naming cleanup * wip * move save_checkpoint responsability to accelerator * resolve flake8 * add BC * Change recommended scale to 16 * resolve flake8 * update test * update install * update * update test * update * update * update test * resolve flake8 * update * update * update on comments * Push * pull * Update pytorch_lightning/plugins/training_type/deepspeed.py Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * Update pytorch_lightning/plugins/training_type/deepspeed.py Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * update * Apply suggestions from code review * Swap to using world size defined by plugin * update * update todo * Remove deepspeed from extra, keep it in the base cuda docker install * Push * pull * update * update * update * update * Minor changes * duplicate * format * format2 Co-authored-by: SeanNaren <sean@grid.ai> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: Sean Naren <sean.narenthiran@gmail.com> Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Jirka Borovec <jirka.borovec@seznam.cz>		2021-03-30 13:39:02 -04:00
..
base-conda	require: adjust versions (#6363 )	2021-03-06 14:34:54 +01:00
base-cuda	DeepSpeed ZeRO Update (#6546 )	2021-03-30 13:39:02 -04:00
base-xla	require: adjust versions (#6363 )	2021-03-06 14:34:54 +01:00
nvidia	remake nvidia docker (#6686 )	2021-03-29 09:39:06 +01:00
release	remake nvidia docker (#6686 )	2021-03-29 09:39:06 +01:00
tpu-tests	move accelerator legacy tests (#5948 )	2021-02-13 19:42:18 -05:00
README.md	try fix: Docker with Conda & PT 1.8 (#5842 )	2021-02-09 08:22:35 +00:00

README.md

Docker images

Builds images form attached Dockerfiles

You can build it on your own, note it takes lots of time, be prepared.

git clone <git-repository>
docker image build -t pytorch-lightning:latest -f dockers/conda/Dockerfile .

or with specific arguments

git clone <git-repository>
docker image build \
    -t pytorch-lightning:py3.8-pt1.6 \
    -f dockers/base-cuda/Dockerfile \
    --build-arg PYTHON_VERSION=3.8 \
    --build-arg PYTORCH_VERSION=1.6 \
    .

or nightly version from Coda

git clone <git-repository>
docker image build \
    -t pytorch-lightning:py3.7-pt1.8 \
    -f dockers/base-conda/Dockerfile \
    --build-arg PYTHON_VERSION=3.7 \
    --build-arg PYTORCH_VERSION=1.8 \
    .

To run your docker use

docker image list
docker run --rm -it pytorch-lightning:latest bash

and if you do not need it anymore, just clean it:

docker image list
docker image rm pytorch-lightning:latest

Run docker image with GPUs

To run docker image with access to you GPUs you need to install

# Add the package repositories
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list

sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
sudo systemctl restart docker

and later run the docker image with --gpus all so for example

docker run --rm -it --gpus all pytorchlightning/pytorch_lightning:base-cuda-py3.7-torch1.6