1302766f83
* Add context to call hook to handle all modules defined within the hook * Expose some additional parameters * Added docs, exposed parameters * Make sure we only configure if necessary * Setup activation checkpointing regardless, saves the user having to do it manually * Add some tests that fail currently * update * update * update * add tests * change docstring * resolve accumulate_grad_batches * resolve flake8 * Update DeepSpeed to use latest version, add some comments * add metrics * update * Small formatting fixes, clean up some code * Few cleanups * No need for default state * Fix tests, add some boilerplate that should move eventually * Add hook removal * Add a context manager to handle hook * Small naming cleanup * wip * move save_checkpoint responsability to accelerator * resolve flake8 * add BC * Change recommended scale to 16 * resolve flake8 * update test * update install * update * update test * update * update * update test * resolve flake8 * update * update * update on comments * Push * pull * Update pytorch_lightning/plugins/training_type/deepspeed.py Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * Update pytorch_lightning/plugins/training_type/deepspeed.py Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * update * Apply suggestions from code review * Swap to using world size defined by plugin * update * update todo * Remove deepspeed from extra, keep it in the base cuda docker install * Push * pull * update * update * update * update * Minor changes * duplicate * format * format2 Co-authored-by: SeanNaren <sean@grid.ai> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: Sean Naren <sean.narenthiran@gmail.com> Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Jirka Borovec <jirka.borovec@seznam.cz> |
||
---|---|---|
.. | ||
base-conda | ||
base-cuda | ||
base-xla | ||
nvidia | ||
release | ||
tpu-tests | ||
README.md |
README.md
Docker images
Builds images form attached Dockerfiles
You can build it on your own, note it takes lots of time, be prepared.
git clone <git-repository>
docker image build -t pytorch-lightning:latest -f dockers/conda/Dockerfile .
or with specific arguments
git clone <git-repository>
docker image build \
-t pytorch-lightning:py3.8-pt1.6 \
-f dockers/base-cuda/Dockerfile \
--build-arg PYTHON_VERSION=3.8 \
--build-arg PYTORCH_VERSION=1.6 \
.
or nightly version from Coda
git clone <git-repository>
docker image build \
-t pytorch-lightning:py3.7-pt1.8 \
-f dockers/base-conda/Dockerfile \
--build-arg PYTHON_VERSION=3.7 \
--build-arg PYTORCH_VERSION=1.8 \
.
To run your docker use
docker image list
docker run --rm -it pytorch-lightning:latest bash
and if you do not need it anymore, just clean it:
docker image list
docker image rm pytorch-lightning:latest
Run docker image with GPUs
To run docker image with access to you GPUs you need to install
# Add the package repositories
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
sudo systemctl restart docker
and later run the docker image with --gpus all
so for example
docker run --rm -it --gpus all pytorchlightning/pytorch_lightning:base-cuda-py3.7-torch1.6