Jirka Borovec
c2c82dad62
CI: Azure ( #5882 )
...
* add base Azure pipeline
* skip
2021-02-10 04:43:26 -05:00
Jirka Borovec
1ac9164f91
create new Conda images ( #5877 )
...
* create new Conda images
* .
* .
2021-02-09 15:30:48 +00:00
Jirka Borovec
937f11c05b
try fix: Docker with Conda & PT 1.8 ( #5842 )
...
* ci
* ver
* list
* pt
* nk
* ch
* 4.9
2021-02-09 08:22:35 +00:00
tchaton
77be6f6e24
resolve conflits
...
resolve doc
boring commit
docs
torchvision
tpu
Update dockers/tpu-tests/tpu_test_cases.jsonnet
Update dockers/tpu-tests/tpu_test_cases.jsonnet
2021-02-05 21:43:10 +01:00
Jirka Borovec
a39b382fe1
hotfix for GHA tpu ( #5762 )
...
* -y
* t
* .
* t
2021-02-05 21:43:10 +01:00
Sumanth Ratna
8732475701
Remove unnecessary intermediate layers in base-conda Dockerfile ( #5697 )
...
* [docker][base-conda] Combine ENV+COPY instructions
* [docker][base-cuda] Combine ENV+COPY instructions
* [docker][base-xla] Combine ENV+COPY instructions
* [docker][base-cuda] Fix COPY instruction
* [docker][base-xla] Fix quote in ENV
* [docker][base-xla] Fix $PATH in ENV
* [docker][base-conda] Fix COPY instruction
* chlog
Co-authored-by: Jirka Borovec <jirka.borovec@seznam.cz>
2021-02-05 21:40:40 +01:00
Jirka Borovec
07f24d2438
add nvidia docker image ( #5668 )
...
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-01-29 11:01:03 -05:00
Jirka Borovec
7e2e874d95
Refactor: legacy accelerators and plugins ( #5645 )
...
* tests: legacy
* legacy: accel
* legacy: plug
* fix imports
* mypy
* flake8
2021-01-26 20:04:36 -05:00
Jirka Borovec
9dd04028d5
tests for legacy checkpoints ( #5223 )
...
* wip
* generate
* clean
* tests
* copy
* download
* download
* download
* download
* download
* download
* download
* download
* download
* download
* download
* flake8
* extend
* aws
* extension
* pull
* pull
* pull
* pull
* pull
* pull
* pull
* try
* try
* try
* got it
* Apply suggestions from code review
(cherry picked from commit 72525f0a83
)
2021-01-26 14:27:56 +01:00
Jeff Yang
e1a4c2e448
docker: run ci only docker related files are changed ( #5203 )
...
* only run ci on docker related files
* docker related files changed!
* install pytorch along with cudatoolkit
* build docker only on SUN
* conda exit status has been fixed
* reverts back to old conda version
* add more docker related files
* conda env update --name
* create env and install pytorch again
* create env and install pytorch again
* ${PYTORCH_CHANNEL}
* dont update pytorch with conda env update
* Apply suggestions from code review
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* Update dockers/base-conda/Dockerfile
* Apply suggestions from code review
* remove checks in cron job
* Apply suggestions from code review
* readd #
* readd #
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Roger Shieh <sh.rog@protonmail.ch>
(cherry picked from commit cc624358c8
)
2021-01-26 14:27:56 +01:00
Jirka Borovec
9be04c1c0b
try to update failing dockers ( #5611 )
2021-01-25 17:10:56 -05:00
Jirka Borovec
7e4d6cbe48
set minimal req. PT 1.4 ( #5418 )
...
* set minimal req. PT 1.4
* chlog
2021-01-12 19:15:35 -05:00
Jirka Borovec
5119013c81
drop install FairScale for TPU ( #5113 )
...
* drop install FairScale for TPU
* typo
Co-authored-by: Roger Shieh <sh.rog@protonmail.ch>
2021-01-05 09:58:37 +01:00
Lezwon Castelino
12cb9942a1
Tpu save ( #4309 )
...
* convert xla tensor to cpu before save
* move_to_cpu
* updated CHANGELOG.md
* added on_save to accelerators
* if accelerator is not None
* refactors
* change filename to run test
* run test_tpu_backend
* added xla_device_utils to tests
* added xla_device_utils to test
* removed tests
* Revert "added xla_device_utils to test"
This reverts commit 0c9316bb
* fixed pep
* increase timeout and print traceback
* lazy check tpu exists
* increased timeout
removed barrier for tpu during test
reduced epochs
* fixed torch_xla imports
* fix tests
* define xla utils
* fix test
* aval
* chlog
* docs
* aval
* Apply suggestions from code review
Co-authored-by: Jirka Borovec <jirka.borovec@seznam.cz>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: chaton <thomas@grid.ai>
Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
2020-12-02 13:05:11 +00:00
Jirka Borovec
2fe1eff85d
drop fairscale for PT <= 1.4 ( #4910 )
...
* drop fairscale for PT <= 1.4
* fix
* Add extra check to remove fairscale from minimal testing if using minimal torch version 1.3
* Update ci_test-full.yml
* Update gym to .3 to see if this fixes examples CI
* Update omegaconf to minimum for hydra v1.0
* Revert "Update gym to .3 to see if this fixes examples CI"
This reverts commit 4221d4b9
* Revert "Update omegaconf to minimum for hydra v1.0"
This reverts commit 4f579217
Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
Co-authored-by: SeanNaren <sean@grid.ai>
2020-11-30 23:19:30 +00:00
Jirka Borovec
597dfa174c
build dockers XLA 1.7 ( #4891 )
...
* build XLA 1.7
* night XLA 1.7
* rename
* use 1.7
* tpu ver
2020-11-29 15:14:19 -04:00
Jirka Borovec
bddc6cd77a
pytest default color ( #4703 )
...
* pytest default color
* time
Co-authored-by: chaton <thomas@grid.ai>
2020-11-18 10:53:44 +00:00
Jirka Borovec
7940ea5aaf
CI: TPU drop install horovod ( #4622 )
...
Co-authored-by: chaton <thomas@grid.ai>
2020-11-13 11:33:52 +01:00
Jirka Borovec
bd6c413829
Conda: PT 1.8 ( #3833 )
...
* PT 1.8
* unfreeze PT
* drop nightly from full
* add PT 1.8 to workflow
* readme table
* cuda
* skip cuda
* test 1.8
* unfreeze torch vision
Co-authored-by: ydcjeff <ydcjeff@outlook.com>
Co-authored-by: chaton <thomas@grid.ai>
Co-authored-by: Nicki Skafte <skaftenicki@gmail.com>
2020-11-12 15:03:43 +01:00
Jeff Yang
23719e3c05
[dockers] install nvidia-dali-cudaXXX ( #4532 )
...
* [dockers] install nvidia-dali-cuda100
* Apply suggestions from code review
* build DALI
* build DALI
* build DALI
* dali from source
* dali from source
* use binaries
* qq
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Jirka Borovec <jirka@pytorchlightning.ai>
Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
2020-11-09 21:18:24 +06:30
Jeff Yang
1d594c5d0c
[docker] Lock cuda version ( #4453 )
...
* lock cuda version
* back to normal
2020-10-31 20:17:07 +06:30
Jeff Yang
0f584faa6b
PyTorch 1.7 Stable support ( #3821 )
...
* prepare for 1.7 support [ci skip]
* tpu [ci skip]
* test run 1.7
* all 1.7, needs to fix tests
* couple with torchvision
* windows try
* remove windows
* 1.7 is here
* on purpose fail [ci skip]
* return [ci skip]
* 1.7 docker
* back to normal [ci skip]
* change to some_val [ci skip]
* add seed [ci skip]
* 4 places [ci skip]
* fail on purpose [ci skip]
* verbose=True [ci skip]
* use filename to track
* use filename to track
* monitor epoch + changelog
* Update tests/checkpointing/test_model_checkpoint.py
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2020-10-30 15:42:14 +00:00
Jirka Borovec
ce8abd6255
Drone: use nightly build cuda docker images ( #3658 )
...
* upgrade PT version
* update docker
* docker
* try 1.5
* badge
* fix typo: dor -> for (#3918 )
* prune
* prune
* env
* echo
* try
* notes
* env
* env
* env
* notes
* docker
* prune
* maintainer
* CI
* update
* just 1.5
* CI
* CI
* CI
* CI
* CI
* CI
* CI
* CI
* CI
* CI
* CI
* docker
* CI
* CI
* CI
* CI
* CI
* CI
* CI
* CI
* CI
* push
* try
* prune
* CI
* CI
* CI
* CI
Co-authored-by: Klyukin Valeriy <mr.clyukin@gmail.com>
Co-authored-by: Jeff Yang <ydcjeff@outlook.com>
2020-10-26 10:47:09 +00:00
Jeff Yang
d83c4e4d69
Cache docker builds ( #3659 )
...
* parent faa357648f
author ydcjeff <ydcjeff@outlook.com> 1601049378 +0630
committer ydcjeff <ydcjeff@outlook.com> 1601469495 +0630
cache docker builds
lock horovod at 0.19.5
done [ci skip] [CI SKIP]
use --cache-from [ci skip]
typo and horovod [ci skip]
exclude pt 1.3 py3.8 [ci skip]
conda no cache [ci skip]
fix
* revert
* align with master [ci skip]
* retry
* remove empty continuation lines
* add comment
* fix build-args
2020-10-25 18:46:10 +06:30
chaton
829d90b257
activated color in all pytest runs ( #4254 )
...
* activated color in all pytest runs
* Update .drone.yml
Co-authored-by: Jeff Yang <ydcjeff@outlook.com>
Co-authored-by: Jeff Yang <ydcjeff@outlook.com>
2020-10-20 16:38:17 +02:00
Jirka Borovec
d3567c33a6
move base req. to root ( #4219 )
...
* move base req. to root
* check-manifest
* check-manifest
* manifest
* req
2020-10-18 20:40:18 +02:00
Jeff Yang
90929fa433
Fix apt repo issue for docker ( #3823 )
...
* fix docker repo issue
* docker
* docker
* docker
* no cudnn
* no cudnn
* try 16.04
Co-authored-by: Jirka Borovec <jirka@pytorchlightning.ai>
2020-10-05 23:18:14 -04:00
Jirka Borovec
1160270882
fix path in CI for release & python version in all dockers & duplicated badges ( #3765 )
...
* typo
* path
* check
* trigger
* fix conda
* pip ver
* fix cuda
* fix XLA
* fix xla
* ci
* docker
* BIULD
* unBIULD
* update
* py 3.8
* apex
* apex
2020-10-02 05:26:21 -04:00
Jirka Borovec
ab508dae0c
run TPU tests with multiple versions ( #3024 )
...
* rename
* multi build
* multi build
* copy
* copy
* copy
* copy
* copy
* copy
* clean
* note
* docker
* formatting
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: William Falcon <waf2107@columbia.edu>
2020-09-30 08:36:02 -04:00
Jirka Borovec
a0968e4bdf
fix PT version in CUDA docker images ( #3739 )
...
* upgrade PT version
* update docker
* docker
* try 1.5
* fix docker versions
* old
* badge
2020-09-30 08:33:22 -04:00
Jirka Borovec
a94728c99b
spec Horovod version ( #3661 )
...
* spec Horovod version
* MAKEFLAGS="-j2"
* tests
* CI
* docker
* CI
* docker
2020-09-26 19:30:25 +02:00
Jirka Borovec
0784cf3ab4
dockers nightly ( #3615 )
...
* dockers nightly
* typo
* Apply suggestions from code review
Co-authored-by: Jeff Yang <ydcjeff@outlook.com>
Co-authored-by: Jeff Yang <ydcjeff@outlook.com>
2020-09-25 15:58:01 +02:00
Jeff Yang
a2120130ed
Lightning docker image based on base-cuda ( #3637 )
...
* use lightning CI docker
* exclude py3.8 and torch1.3
* torch 1.7
* mergify
* Apply suggestions from code review
Co-authored-by: Jirka Borovec <jirka@pytorchlightning.ai>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2020-09-24 23:14:15 +02:00
Jirka Borovec
37a59be21b
build more docker configs ( #3533 )
...
* update build cases
* list
* matrix
* matrix
* builds
* docker
* -j1
* -q
* -q
* sep
* docker
* docker
* mergify
* -j1
* -j1
* horovod
* copy
2020-09-23 01:41:35 +02:00
Jeff Yang
8be79a9a96
stable, dev PyTorch in Dockerfile and conda gh actions ( #3074 )
...
* dockerfile and actions file
* dockerfile and actions file
* added pytorch conda cpu nightly
* added pytorch conda cpu nightly
* recopy base reqs
* gh action `include` torch nightly
* add pytorch nightly & conda gh badge
* rebase
* fix horovod
* proposal refactor
* Update .github/workflows/ci_pt-conda.yml
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* Update .github/workflows/ci_pt-conda.yml
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* update
* update
* fix cmd
* filled &&
* fix
* add -y
* torchvision >0.7 allowed
* explicitly install torchvision
* use HOROVOD_GPU_OPERATIONS env variable
* CI
* skip 1.7
* table
Co-authored-by: Jirka Borovec <jirka@pytorchlightning.ai>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2020-09-17 20:30:39 +02:00
Jirka Borovec
cbc4f6f8a4
add CI for building dockers ( #3383 )
...
* rename
* fix badges
* add docker build
* mergify
* update
* env
* ci
* times
* CI
* name
* comment
2020-09-10 18:38:29 -04:00
Jirka Borovec
9f2b29a7cd
build XLA with py3.6 ( #2863 )
...
* build py3.6
* info
* conda
* update
* version
* version
* builds
* builds
* builds
* builds
* builds
2020-08-15 15:39:44 -04:00
Jirka Borovec
a6e7aa7796
allow using apex with any PT version ( #2865 )
...
* wip
* setup
* type
* name
* wip
* docs
* imports
* fix if
* fix if
* use_amp
* Apply suggestions from code review
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* Apply suggestions from code review
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* fix tests
* Apply suggestions from code review
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* fix tests
* todos
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2020-08-08 11:07:32 +02:00
Jirka Borovec
448be60701
update GPU to PT 1.5 ( #2779 )
...
* update gpu PT 1.6
* fix docker
* use PT 1.5
* Update tests/install_AMP.sh
Co-authored-by: Nathan Raw <nxr9266@g.rit.edu>
Co-authored-by: Nathan Raw <nxr9266@g.rit.edu>
2020-08-02 08:14:53 -04:00
Jirka Borovec
bc7a08fbe0
test dockers & add AMP in pt-1.6 ( #1584 )
...
* exist images
* names
* images
* args
* pt 1.6 dev
* circleci
* update
* refactor
* build
* fix
* MKL
2020-07-31 08:23:13 -04:00
zcain117
d0b8e850a4
integrate with CircleCI ( #2486 )
...
* add circleCI
* wip
* CircleCI setup that worked on my private repo. Use a working pytorch-lightning commit
* Fix the orb imports
* Update circleci header comment
* Try to pull the GITHUB_REF from the CI_PULL_REQUEST
* Use null instead of space for 'sed'
* Add TODO for codecov
* Remove echo of GKE_CLUSTER since it will be redacted by CircleCI.
* Try running codecov upload.
* Try using codecov orb
* Use pip install codecov
* Use codecov orb again since it should be approved
* dockers/tpu-tests/Dockerfile
* action
* suggestions
* drop suggestion
* suggestion
Co-authored-by: Jirka <jirka@pytorchlightning.ai>
2020-07-23 12:13:10 -04:00
Jirka Borovec
fb85d493d0
use XLA base image for TPU testing ( #2536 )
...
* drop py3.6
* use base image
* typo
* skip extra
* drop cache
2020-07-07 07:05:17 -04:00
Jirka Borovec
977df6ed31
Docker: building XLA base image ( #2494 )
...
* refactor
* add TPU base
* wip
* builds
* typo
* extras
* simple
* unzip
* rename
2020-07-06 14:21:36 -04:00