Commit Graph

43 Commits

Author SHA1 Message Date
Carlos Mocholí 3ba0f56b18
Remove support for the deprecated torchtext legacy (#14375) 2022-08-26 20:01:51 +00:00
otaj 1ae14ca754
[CI] fix horovod tests (#14382) 2022-08-25 17:30:06 +00:00
otaj 0bd5703b81
[CI] Trick Bagua into installing appropriate wheel in GPU tests (#14380)
Bagua trick needs to be replicated on everywhere applicable
2022-08-24 08:59:49 +00:00
otaj bb634310e7
[CI] Bump CUDA in Docker images to 11.6.1 (#14348)
* bump cuda in docker images to 11.6.1

* PUSH TO HUB. REVERT THIS!

* conda forge for 11.6

* cuda 11.5

* revert conda changes

* 11.6 back again

* 11.6 back again, all of them

* maybe all passes now

* maybe all passes now

* final push

* Revert "PUSH TO HUB. REVERT THIS!"

This reverts commit 602bfce224.

* Apply suggestions from code review

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2022-08-23 12:10:52 -04:00
Jirka Borovec aa62fe36df
add testing PT 1.12 (#13386)
* add testing PT 1.12
* Fix quantization tests
* Fix another set of tests
* Fix check since https://github.com/pytorch/pytorch/pull/80139 is only going to be available for 1.13
* Skip this test for now for 1.12

Co-authored-by: SeanNaren <sean@grid.ai>
2022-07-15 19:41:23 +02:00
Carlos Mocholí ad87d2cad0
Future 5/n: Move requirements (#13306)
Co-authored-by: Jirka <jirka.borovec@seznam.cz>
2022-06-21 17:11:33 +02:00
Carlos Mocholí 0cf9d73d28
Drop PyTorch 1.8 support (#13155)
* Drop PyTorch 1.8 support

* Missed update

* Skip profiler test until supported

* Upgrade ipu dockerfile pytorch version

* Update XLA version
2022-06-14 20:46:44 -04:00
Jirka Borovec 78ff201c7e
Update CI setup (#13291)
* drop mamba
* use legacy GPU machines
2022-06-14 17:11:54 +00:00
Jirka Borovec fec9a09672
add freeze for development and full range for install (#12994)
* freeze versions

* unfreeze

* dependabot

* Apply suggestions from code review

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* fix all req

* ...

* use base

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix refs

* Apply suggestions from code review

Co-authored-by: Akihiro Nitta <nitta@akihironitta.com>

* Apply suggestions from code review

* dockers

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Akihiro Nitta <nitta@akihironitta.com>
2022-05-12 09:14:18 -04:00
Jirka Borovec bb51e2a55b
Merge pull request #12723 from PyTorchLightning/req/strategies
Separate strategies' requirements
2022-05-04 10:06:02 -04:00
Akihiro Nitta ecd135e939
Update nvidia gpg key to fix nightly docker builds (#12930)
* Update gpg key
* Use curl instead of wget
* Install key manually
2022-05-02 09:00:44 +02:00
Akihiro Nitta 98b206e836
Use cmake installed with apt (#12907) 2022-04-28 07:44:52 +00:00
Jirka Borovec f9b69ce5b0
CI: check docker requires (#12677)
* check docker requires
* ci update
* bagua
* conda
* cuda
2022-04-12 00:29:54 +09:00
Jirka Borovec fe940e195d
CI: update prune_pkgs (#12382) 2022-03-21 12:50:50 +00:00
Jirka Borovec efa870eebc
Docker: fix NCCL building Horovod (#12318)
* Horovod w. MPI
* nccl_built
* fix
2022-03-18 14:23:19 +00:00
Jirka Borovec 7ee690758c
CI: fix running PT 1.11 (#12304)
* fix fire
* horovod
* assistant
* cmake
* u20
* cuda
* -j2
* fix mypy

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2022-03-12 09:00:20 +00:00
Jirka Borovec bc8172856f
aggregate multiple helper scripts to single CLI (#11147)
* nightly release
* min version
* fire
2022-03-11 11:13:43 +00:00
Jirka Borovec 1144673cd9
CI: sanity check for req. pkgs (#11819)
* CI: sanity check for req. pkgs
* scripts
* rename
* gcsfs ?
* rich !
* install extra
* move
* set -e

Co-authored-by: Aki Nitta <nitta@akihironitta.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2022-03-11 09:20:47 +00:00
Jirka Borovec 3b4061f39a
CI: enable testing for PT 1.11 (#11792)
* enable PT 1.11
* horovod
* Apply suggestions from code review

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Aki Nitta <nitta@akihironitta.com>
2022-03-10 18:38:47 +00:00
Jirka Borovec 8577ef7bba
Skip horovod 0.24.0 only (#12248)
* try skip horovod 0.24.0 only
* HOROVOD_BUILD_CUDA_CC_LIST
* fix test

Co-authored-by: Aki Nitta <nitta@akihironitta.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2022-03-10 16:01:08 +00:00
Aki Nitta 0a1b8b880d
Fix horovod installation `base-cuda` Dockerfile (#11811)
* pip install --user

* add checks

* rm unrelated comment

* consistent format

* Fail if horovod not found

Co-authored-by: Jirka <jirka.borovec@seznam.cz>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2022-02-10 16:48:33 +09:00
Carlos Mocholí 70570f9eaa
Minimize the number of docker jobs (#10202)
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2021-10-29 07:48:05 +01:00
Carlos Mocholí a0e45dc071
Some minor CI cleanup (#10088) 2021-10-26 13:58:20 +02:00
Jirka Borovec abbcfa1ab7
fix CI for PT 1.10 (#8526)
* fix CI for PT 1.10
* Apply suggestions from code review

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-07-23 19:24:31 +02:00
Jirka Borovec 74a09a23f1
CI: support PT 1.10 (#8133)
* prepare PT 1.10

* dockers

* fixes

* readme
2021-07-14 18:04:33 +03:00
Carlos Mocholí 6ce77a102b
Set minimum PyTorch version to 1.6 (#8288)
Co-authored-by: Jirka <jirka.borovec@seznam.cz>
2021-07-13 17:12:49 +00:00
Jirka Borovec 6e56f56aa1
docker use $(nproc) (#7606)
* docker use $(nproc)

* Update typo

Co-authored-by: Roger Shieh <sh.rog@protonmail.ch>

Co-authored-by: Roger Shieh <sh.rog@protonmail.ch>
2021-05-19 21:48:14 +02:00
Jirka Borovec 626ef08694
enable Dockers for PT 1.9 (#7363)
* enable PT 1.9

* fix versions

* args

* fix
2021-05-05 14:26:22 +02:00
Sean Naren 5d8610955a
Fix `apex` version in Docker due to broken upstream (#7146)
* Set Apex commit before introduction of new MLP extensions

* Refactor install command

Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-04-21 23:58:55 +01:00
Jirka Borovec 85c8074bee
require: adjust versions (#6363)
* adjust versions

* release

* manifest

* pep8

* CI

* fix

* build
2021-03-06 14:34:54 +01:00
Sean Naren 5157ba5509
Add openmpi to our base cuda container for MPI support (#6026)
* Add openmpi to our base container for DeepSpeed MPI support

* conda

Co-authored-by: Jirka Borovec <jirka.borovec@seznam.cz>
2021-02-17 12:15:49 +00:00
Jirka Borovec 1ac9164f91
create new Conda images (#5877)
* create new Conda images

* .

* .
2021-02-09 15:30:48 +00:00
Jirka Borovec 937f11c05b
try fix: Docker with Conda & PT 1.8 (#5842)
* ci

* ver

* list

* pt

* nk

* ch

* 4.9
2021-02-09 08:22:35 +00:00
Sumanth Ratna 8732475701 Remove unnecessary intermediate layers in base-conda Dockerfile (#5697)
* [docker][base-conda] Combine ENV+COPY instructions

* [docker][base-cuda] Combine ENV+COPY instructions

* [docker][base-xla] Combine ENV+COPY instructions

* [docker][base-cuda] Fix COPY instruction

* [docker][base-xla] Fix quote in ENV

* [docker][base-xla] Fix $PATH in ENV

* [docker][base-conda] Fix COPY instruction

* chlog

Co-authored-by: Jirka Borovec <jirka.borovec@seznam.cz>
2021-02-05 21:40:40 +01:00
Jirka Borovec 9dd04028d5 tests for legacy checkpoints (#5223)
* wip

* generate

* clean

* tests

* copy

* download

* download

* download

* download

* download

* download

* download

* download

* download

* download

* download

* flake8

* extend

* aws

* extension

* pull

* pull

* pull

* pull

* pull

* pull

* pull

* try

* try

* try

* got it

* Apply suggestions from code review

(cherry picked from commit 72525f0a83)
2021-01-26 14:27:56 +01:00
Jeff Yang e1a4c2e448 docker: run ci only docker related files are changed (#5203)
* only run ci on docker related files

* docker related files changed!

* install pytorch along with cudatoolkit

* build docker only on SUN

* conda exit status has been fixed

* reverts back to old conda version

* add more docker related files

* conda env update --name

* create env and install pytorch again

* create env and install pytorch again

* ${PYTORCH_CHANNEL}

* dont update pytorch with conda env update

* Apply suggestions from code review

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* Update dockers/base-conda/Dockerfile

* Apply suggestions from code review

* remove checks in cron job

* Apply suggestions from code review

* readd #

* readd #

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Roger Shieh <sh.rog@protonmail.ch>
(cherry picked from commit cc624358c8)
2021-01-26 14:27:56 +01:00
Jirka Borovec 9be04c1c0b
try to update failing dockers (#5611) 2021-01-25 17:10:56 -05:00
Jirka Borovec 7e4d6cbe48
set minimal req. PT 1.4 (#5418)
* set minimal req. PT 1.4

* chlog
2021-01-12 19:15:35 -05:00
Jirka Borovec 2fe1eff85d
drop fairscale for PT <= 1.4 (#4910)
* drop fairscale for PT <= 1.4

* fix

* Add extra check to remove fairscale from minimal testing if using minimal torch version 1.3

* Update ci_test-full.yml

* Update gym to .3 to see if this fixes examples CI

* Update omegaconf to minimum for hydra v1.0

* Revert "Update gym to .3 to see if this fixes examples CI"

This reverts commit 4221d4b9

* Revert "Update omegaconf to minimum for hydra v1.0"

This reverts commit 4f579217

Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
Co-authored-by: SeanNaren <sean@grid.ai>
2020-11-30 23:19:30 +00:00
Jirka Borovec bd6c413829
Conda: PT 1.8 (#3833)
* PT 1.8

* unfreeze PT

* drop nightly from full

* add PT 1.8 to workflow

* readme table

* cuda

* skip cuda

* test 1.8

* unfreeze torch vision

Co-authored-by: ydcjeff <ydcjeff@outlook.com>
Co-authored-by: chaton <thomas@grid.ai>
Co-authored-by: Nicki Skafte <skaftenicki@gmail.com>
2020-11-12 15:03:43 +01:00
Jeff Yang 23719e3c05
[dockers] install nvidia-dali-cudaXXX (#4532)
* [dockers] install nvidia-dali-cuda100

* Apply suggestions from code review

* build DALI

* build DALI

* build DALI

* dali from source

* dali from source

* use binaries

* qq

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Jirka Borovec <jirka@pytorchlightning.ai>
Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
2020-11-09 21:18:24 +06:30
Jeff Yang 1d594c5d0c
[docker] Lock cuda version (#4453)
* lock cuda version

* back to normal
2020-10-31 20:17:07 +06:30
Jirka Borovec ce8abd6255
Drone: use nightly build cuda docker images (#3658)
* upgrade PT version

* update docker

* docker

* try 1.5

* badge

* fix typo: dor -> for (#3918)

* prune

* prune

* env

* echo

* try

* notes

* env

* env

* env

* notes

* docker

* prune

* maintainer

* CI

* update

* just 1.5

* CI

* CI

* CI

* CI

* CI

* CI

* CI

* CI

* CI

* CI

* CI

* docker

* CI

* CI

* CI

* CI

* CI

* CI

* CI

* CI

* CI

* push

* try

* prune

* CI

* CI

* CI

* CI

Co-authored-by: Klyukin Valeriy <mr.clyukin@gmail.com>
Co-authored-by: Jeff Yang <ydcjeff@outlook.com>
2020-10-26 10:47:09 +00:00