Commit Graph

27 Commits

Author SHA1 Message Date
Jirka Borovec bddc6cd77a
pytest default color (#4703)
* pytest default color

* time

Co-authored-by: chaton <thomas@grid.ai>
2020-11-18 10:53:44 +00:00
Jirka Borovec 7940ea5aaf
CI: TPU drop install horovod (#4622)
Co-authored-by: chaton <thomas@grid.ai>
2020-11-13 11:33:52 +01:00
Jirka Borovec bd6c413829
Conda: PT 1.8 (#3833)
* PT 1.8

* unfreeze PT

* drop nightly from full

* add PT 1.8 to workflow

* readme table

* cuda

* skip cuda

* test 1.8

* unfreeze torch vision

Co-authored-by: ydcjeff <ydcjeff@outlook.com>
Co-authored-by: chaton <thomas@grid.ai>
Co-authored-by: Nicki Skafte <skaftenicki@gmail.com>
2020-11-12 15:03:43 +01:00
Jeff Yang 23719e3c05
[dockers] install nvidia-dali-cudaXXX (#4532)
* [dockers] install nvidia-dali-cuda100

* Apply suggestions from code review

* build DALI

* build DALI

* build DALI

* dali from source

* dali from source

* use binaries

* qq

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Jirka Borovec <jirka@pytorchlightning.ai>
Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
2020-11-09 21:18:24 +06:30
Jeff Yang 1d594c5d0c
[docker] Lock cuda version (#4453)
* lock cuda version

* back to normal
2020-10-31 20:17:07 +06:30
Jeff Yang 0f584faa6b
PyTorch 1.7 Stable support (#3821)
* prepare for 1.7 support [ci skip]

* tpu [ci skip]

* test run 1.7

* all 1.7, needs to fix tests

* couple with torchvision

* windows try

* remove windows

* 1.7 is here

* on purpose fail [ci skip]

* return [ci skip]

* 1.7 docker

* back to normal [ci skip]

* change to some_val [ci skip]

* add seed [ci skip]

* 4 places [ci skip]

* fail on purpose [ci skip]

* verbose=True [ci skip]

* use filename to track

* use filename to track

* monitor epoch + changelog

* Update tests/checkpointing/test_model_checkpoint.py

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2020-10-30 15:42:14 +00:00
Jirka Borovec ce8abd6255
Drone: use nightly build cuda docker images (#3658)
* upgrade PT version

* update docker

* docker

* try 1.5

* badge

* fix typo: dor -> for (#3918)

* prune

* prune

* env

* echo

* try

* notes

* env

* env

* env

* notes

* docker

* prune

* maintainer

* CI

* update

* just 1.5

* CI

* CI

* CI

* CI

* CI

* CI

* CI

* CI

* CI

* CI

* CI

* docker

* CI

* CI

* CI

* CI

* CI

* CI

* CI

* CI

* CI

* push

* try

* prune

* CI

* CI

* CI

* CI

Co-authored-by: Klyukin Valeriy <mr.clyukin@gmail.com>
Co-authored-by: Jeff Yang <ydcjeff@outlook.com>
2020-10-26 10:47:09 +00:00
Jeff Yang d83c4e4d69
Cache docker builds (#3659)
* parent faa357648f
author ydcjeff <ydcjeff@outlook.com> 1601049378 +0630
committer ydcjeff <ydcjeff@outlook.com> 1601469495 +0630

cache docker builds

lock horovod at 0.19.5

done [ci skip] [CI SKIP]

use --cache-from [ci skip]

typo and horovod [ci skip]

exclude pt 1.3 py3.8 [ci skip]

conda no cache [ci skip]

fix

* revert

* align with master [ci skip]

* retry

* remove empty continuation lines

* add comment

* fix build-args
2020-10-25 18:46:10 +06:30
chaton 829d90b257
activated color in all pytest runs (#4254)
* activated color in all pytest runs

* Update .drone.yml

Co-authored-by: Jeff Yang <ydcjeff@outlook.com>

Co-authored-by: Jeff Yang <ydcjeff@outlook.com>
2020-10-20 16:38:17 +02:00
Jirka Borovec d3567c33a6
move base req. to root (#4219)
* move base req. to root

* check-manifest

* check-manifest

* manifest

* req
2020-10-18 20:40:18 +02:00
Jeff Yang 90929fa433
Fix apt repo issue for docker (#3823)
* fix docker repo issue

* docker

* docker

* docker

* no cudnn

* no cudnn

* try 16.04

Co-authored-by: Jirka Borovec <jirka@pytorchlightning.ai>
2020-10-05 23:18:14 -04:00
Jirka Borovec 1160270882
fix path in CI for release & python version in all dockers & duplicated badges (#3765)
* typo

* path

* check

* trigger

* fix conda

* pip ver

* fix cuda

* fix XLA

* fix xla

* ci

* docker

* BIULD

* unBIULD

* update

* py 3.8

* apex

* apex
2020-10-02 05:26:21 -04:00
Jirka Borovec ab508dae0c
run TPU tests with multiple versions (#3024)
* rename

* multi build

* multi build

* copy

* copy

* copy

* copy

* copy

* copy

* clean

* note

* docker

* formatting

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: William Falcon <waf2107@columbia.edu>
2020-09-30 08:36:02 -04:00
Jirka Borovec a0968e4bdf
fix PT version in CUDA docker images (#3739)
* upgrade PT version

* update docker

* docker

* try 1.5

* fix docker versions

* old

* badge
2020-09-30 08:33:22 -04:00
Jirka Borovec a94728c99b
spec Horovod version (#3661)
* spec Horovod version

* MAKEFLAGS="-j2"

* tests

* CI

* docker

* CI

* docker
2020-09-26 19:30:25 +02:00
Jirka Borovec 0784cf3ab4
dockers nightly (#3615)
* dockers nightly

* typo

* Apply suggestions from code review

Co-authored-by: Jeff Yang <ydcjeff@outlook.com>

Co-authored-by: Jeff Yang <ydcjeff@outlook.com>
2020-09-25 15:58:01 +02:00
Jeff Yang a2120130ed
Lightning docker image based on base-cuda (#3637)
* use lightning CI docker

* exclude py3.8 and torch1.3

* torch 1.7

* mergify

* Apply suggestions from code review

Co-authored-by: Jirka Borovec <jirka@pytorchlightning.ai>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2020-09-24 23:14:15 +02:00
Jirka Borovec 37a59be21b
build more docker configs (#3533)
* update build cases

* list

* matrix

* matrix

* builds

* docker

* -j1

* -q

* -q

* sep

* docker

* docker

* mergify

* -j1

* -j1

* horovod

* copy
2020-09-23 01:41:35 +02:00
Jeff Yang 8be79a9a96
stable, dev PyTorch in Dockerfile and conda gh actions (#3074)
* dockerfile and actions file

* dockerfile and actions file

* added pytorch conda cpu nightly

* added pytorch conda cpu nightly

* recopy base reqs

* gh action `include` torch nightly

* add pytorch nightly & conda gh badge

* rebase

* fix horovod

* proposal refactor

* Update .github/workflows/ci_pt-conda.yml

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* Update .github/workflows/ci_pt-conda.yml

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* update

* update

* fix cmd

* filled &&

* fix

* add -y

* torchvision >0.7 allowed

* explicitly install torchvision

* use HOROVOD_GPU_OPERATIONS env variable

* CI

* skip 1.7

* table

Co-authored-by: Jirka Borovec <jirka@pytorchlightning.ai>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2020-09-17 20:30:39 +02:00
Jirka Borovec cbc4f6f8a4
add CI for building dockers (#3383)
* rename

* fix badges

* add docker build

* mergify

* update

* env

* ci

* times

* CI

* name

* comment
2020-09-10 18:38:29 -04:00
Jirka Borovec 9f2b29a7cd
build XLA with py3.6 (#2863)
* build py3.6

* info

* conda

* update

* version

* version

* builds

* builds

* builds

* builds

* builds
2020-08-15 15:39:44 -04:00
Jirka Borovec a6e7aa7796
allow using apex with any PT version (#2865)
* wip

* setup

* type

* name

* wip

* docs

* imports

* fix if

* fix if

* use_amp

* Apply suggestions from code review

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* Apply suggestions from code review

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* fix tests

* Apply suggestions from code review

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* fix tests

* todos

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2020-08-08 11:07:32 +02:00
Jirka Borovec 448be60701
update GPU to PT 1.5 (#2779)
* update gpu PT 1.6

* fix docker

* use PT 1.5

* Update tests/install_AMP.sh

Co-authored-by: Nathan Raw <nxr9266@g.rit.edu>

Co-authored-by: Nathan Raw <nxr9266@g.rit.edu>
2020-08-02 08:14:53 -04:00
Jirka Borovec bc7a08fbe0
test dockers & add AMP in pt-1.6 (#1584)
* exist images

* names

* images

* args

* pt 1.6 dev

* circleci

* update

* refactor

* build

* fix

* MKL
2020-07-31 08:23:13 -04:00
zcain117 d0b8e850a4
integrate with CircleCI (#2486)
* add circleCI

* wip

* CircleCI setup that worked on my private repo. Use a working pytorch-lightning commit

* Fix the orb imports

* Update circleci header comment

* Try to pull the GITHUB_REF from the CI_PULL_REQUEST

* Use null instead of space for 'sed'

* Add TODO for codecov

* Remove echo of GKE_CLUSTER since it will be redacted by CircleCI.

* Try running codecov upload.

* Try using codecov orb

* Use pip install codecov

* Use codecov orb again since it should be approved

* dockers/tpu-tests/Dockerfile

* action

* suggestions

* drop suggestion

* suggestion

Co-authored-by: Jirka <jirka@pytorchlightning.ai>
2020-07-23 12:13:10 -04:00
Jirka Borovec fb85d493d0
use XLA base image for TPU testing (#2536)
* drop py3.6

* use base image

* typo

* skip extra

* drop cache
2020-07-07 07:05:17 -04:00
Jirka Borovec 977df6ed31
Docker: building XLA base image (#2494)
* refactor

* add TPU base

* wip

* builds

* typo

* extras

* simple

* unzip

* rename
2020-07-06 14:21:36 -04:00