Jirka Borovec
448be60701
update GPU to PT 1.5 ( #2779 )
...
* update gpu PT 1.6
* fix docker
* use PT 1.5
* Update tests/install_AMP.sh
Co-authored-by: Nathan Raw <nxr9266@g.rit.edu>
Co-authored-by: Nathan Raw <nxr9266@g.rit.edu>
2020-08-02 08:14:53 -04:00
Jirka Borovec
3772601cd6
update CI testing with pip upgrade ( #2380 )
...
* try pt1.5
* cpu
* upgrade
* tpu
* user
* [blocked by #2380 ] freeze GPU PT 1.4 (#2780 )
* freeze
* user
2020-07-31 14:50:06 -04:00
Jirka Borovec
bc7a08fbe0
test dockers & add AMP in pt-1.6 ( #1584 )
...
* exist images
* names
* images
* args
* pt 1.6 dev
* circleci
* update
* refactor
* build
* fix
* MKL
2020-07-31 08:23:13 -04:00
Adrian Wälchli
7ef73f242a
try remove pr ( #2543 )
2020-07-07 15:26:58 -04:00
Jirka Borovec
977df6ed31
Docker: building XLA base image ( #2494 )
...
* refactor
* add TPU base
* wip
* builds
* typo
* extras
* simple
* unzip
* rename
2020-07-06 14:21:36 -04:00
Jirka Borovec
39a6435726
Revert "Revert "join coverage ( #2460 )" ( #2499 )" ( #2500 )
...
This reverts commit 355918af8d
.
2020-07-04 11:31:12 -04:00
William Falcon
355918af8d
Revert "join coverage ( #2460 )" ( #2499 )
...
This reverts commit 944ffba305
.
2020-07-04 10:29:50 -04:00
Jirka Borovec
944ffba305
join coverage ( #2460 )
...
* join coverage
* full TPU test
* codecov
* typo
* report
* docker
* timeout
* base
* show
* cd dir
* req
* docker
* docker
* docker
* coverage
* upload
* drop main
* report
* report
* python
* upload
* drone
* drone
* drone
* drone
* drone
* drone
* drone
* drone
* drone
2020-07-04 10:22:58 -04:00
zcain117
1a40963d1d
Add Github Action to run TPU tests. ( #2376 )
...
* Add Github Action to run TPU tests.
* Trigger new Github Actions run.
* Clean up more comments.
* Use different fixed version of ml-testing-accelerators and update config to match.
* use cluster in us-central1-a
* Run 'gcloud logging read' directly without 'echo' to preserve newlines.
* cat coverage.xml on the TPU VM side and upload xml on the Github Action side
* Use new commit on ml-testing-accelerators so command runs fully.
* Preserve newlines in the xml and use if: always() temporarily to upload codecov
* Use pytorch_lightning for coverage instead of pytorch-lightning
* Remove the debug cat of coverage xml
* Apply suggestions from code review
* jsonnet rename
* name
* add codecov flags
* add codecov flags
* codecov
* codecov
* revert codecov
* Clean up after apt-get and remove old TODOs.
* More codefactor cleanups.
* drone
* drone
* disable codecov
* cleaning
* docker py versions
* docker py 3.7
* readme
* bash
* docker
* freeze conda
* py3.6
* Stop using apt-get clean.
* Dont rm pytorch-lightning
* Update docker/tpu/Dockerfile
* Longer timeout in the Github Action to wait for GKE to finish.
* job1
* job2
* job3
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Jirka <jirka@pytorchlightning.ai>
2020-07-01 21:44:19 -04:00
Jirka Borovec
4e13e419ea
add CLI test for examples ( #2285 )
...
* cli examples
* ddp
* CI
* CI
* req
* tests
* skip DDP
Co-authored-by: William Falcon <waf2107@columbia.edu>
2020-06-27 09:13:29 -04:00
Jirka Borovec
bfaabd7b7f
clean requirements ( #2128 )
...
* clean requirements
* missing
* missing
* req
* min
* default >> base
* base.txt
2020-06-13 10:15:22 -04:00
Jirka Borovec
2674976f2c
remove deprecated API for v0.8 ( #2073 )
...
* remove deprecated API
* chlog
* times
* missed
* formatting check
* missing
* missing
* miss
* fix docs build error
* fix pep whitespace error
* docs
* wip
* amp_level
* amp_level
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2020-06-12 14:37:52 -04:00
Jirka Borovec
c438d0dd90
increase acc ( #2039 )
...
* increase acc
* try 0.45
* @pytest
* @pytest
* try .50
* duration
* pytest
2020-06-03 08:28:19 -04:00
Adrian Wälchli
a6de1b8d75
doctest for .rst files ( #1511 )
...
* add doctest to circleci
* Revert "add doctest to circleci"
This reverts commit c45b34ea911a81f87989f6c3a832b1e8d8c471c6.
* Revert "Revert "add doctest to circleci""
This reverts commit 41fca97fdcfe1cf4f6bdb3bbba75d25fa3b11f70.
* doctest docs rst files
* Revert "doctest docs rst files"
This reverts commit b4a2e83e3da5ed1909de500ec14b6b614527c07f.
* doctest only rst
* doctest debugging.rst
* doctest apex
* doctest callbacks
* doctest early stopping
* doctest for child modules
* doctest experiment reporting
* indentation
* doctest fast training
* doctest for hyperparams
* doctests for lr_finder
* doctests multi-gpu
* more doctest
* make doctest drone
* fix label build error
* update fast training
* update invalid imports
* fix problem with int device count
* rebase stuff
* wip
* wip
* wip
* intro guide
* add missing code block
* circleci
* logger import for doctest
* test if doctest runs on drone
* fix mnist download
* also run install deps for building docs
* install cmake
* try sudo
* hide output
* try pip stuff
* try to mock horovod
* Tranfer -> Transfer
* add torchvision to extras
* revert pip stuff
* mlflow file location
* do not mock torch
* torchvision
* drone extra req.
* try higher sphinx version
* Revert "try higher sphinx version"
This reverts commit 490ac28e46d6fd52352640dfdf0d765befa56988.
* try coverage command
* try coverage command
* try undoc flag
* newline
* undo drone
* report coverage
* review
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* remove torchvision from extras
* skip tests only if torchvision not available
* fix testoutput torchvision
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2020-05-04 22:16:54 -04:00
William Falcon
29ebe92208
support for native amp ( #1561 )
...
* adding native amp suppport
* adding native amp suppport
* adding native amp suppport
* adding native amp suppport
* autocast
* autocast
* autocast
* autocast
* autocast
* autocast
* removed comments
* removed comments
* added state saving
* added state saving
* try install amp again
* added state saving
* drop Apex reinstall
Co-authored-by: J. Borovec <jirka.borovec@seznam.cz>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2020-04-23 14:47:08 -04:00
Jirka Borovec
0b22b64a10
Tests/docker ( #1573 )
...
* devel image
* try parallel
* new image
2020-04-23 12:52:59 -04:00
Travis Addair
7024177f7d
Added Horovod distributed backend ( #1529 )
...
* Initial commit of Horovod distributed backend implementation
* Update distrib_data_parallel.py
* Update distrib_data_parallel.py
* Update tests/models/test_horovod.py
Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com>
* Update tests/models/test_horovod.py
Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com>
* Fixed tests
* Added six
* tests
* Install tox for GitHub CI
* Retry tests
* Catch all exceptions
* Skip cache
* Remove tox
* Restore pip cache
* Remove the cache
* Restore pip cache
* Remove AMP
Co-authored-by: William Falcon <waf2107@columbia.edu>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: J. Borovec <jirka.borovec@seznam.cz>
2020-04-22 17:39:08 -04:00
Jirka Borovec
724b787cd1
faster CI testing ( #1323 )
...
* MNIST digits
* increase test acc
* smaller parity
* drone builds
* increase GH action timeout
* drone format
* fix paths
* drone cache
* circle cache
* fix test
* lower nb epochs
* circleCI
* user orb
* fix test
* fix test
* circle cache
* circle cache
* circle cache
* comment caches
* benchmark batch size
* cache dataset
* smaller dataset
* smaller dataset
* fix nb samples
* batch size
* fix test
2020-04-02 12:28:44 -04:00
William Falcon
18d055a390
Parity test ( #1284 )
...
* adding test
* adding test
* added base parity model
* added base parity model
* added parity test
* added parity test
* added parity test
* added parity test
* added parity test
* added parity test
* added parity test
* added parity test
* added parity test
* added parity test
* added parity test
* added parity test
* added parity test
* added parity test
* added parity test
* move parity to benchmark
* formatting
* fixed gradient acc sched
* move parity to benchmark
* formatting
* fixed gradient acc sched
* skip for CPU
* call last
Co-authored-by: J. Borovec <jirka.borovec@seznam.cz>
2020-03-30 18:16:32 -04:00
Jirka Borovec
61177cd1c8
system info ( #1234 )
...
* system info
* update big info
* test script
* update config
* rename script
* import path
2020-03-27 08:45:52 -04:00
Jirka Borovec
45d671a4a8
CI: split tests-examples ( #990 )
...
* CI: split tests-examples
* tests without template
* comment depends
* CircleCI typo
* add doctest
* update test req.
* CI tests
* setup macOS
* longer train
* lover pred acc
* fix model
* rename default model
* lower tests acc
* typo
* imports
* fix test optimizer
* update calls
* fix Win
* lower Drone image
* fix call
* pytorch image
* fix test
* add dev image
* add dev image
* update image
* drone volume
* lint
* update test notes
* rename tests/models >> tests/base
* group models
* conftest
* optim imports
* typos
* fix import
* fix tests
* install AMP
* tests
* fix import
2020-03-25 07:46:27 -04:00
Jirka Borovec
22a7264e9a
improve partial Codecov ( #1172 )
...
* ignore in setup
* show report
* abs imports
* abstract pass
* cover loggers
* doctest trains
* locals
* pass
* revert tensorboard
* use tensorboardX
* revert tensorboardX
* fix trains
* Add TrainsLogger.set_credentials (#1179 )
* Add TrainsLogger.set_credentials to control trains server configuration and authentication from code. Sync trains package version.
Fix CI Trains tests
* Add global TrainsLogger set_bypass_mode (#1187 )
* Add global TrainsLogger set_bypass_mode skips all external communication
Co-authored-by: bmartinn <>
* rm some no-cov
Co-authored-by: Martin.B <51887611+bmartinn@users.noreply.github.com>
2020-03-19 09:14:29 -04:00
Jirka Borovec
f6a7a5278a
enable Codecov ( #1133 )
...
* update config
* try Drone cache
* drop Drone cache
* move import
* remove token
2020-03-14 13:01:57 -04:00
Jirka Borovec
5691ffb160
add Drone CI ( #1115 )
...
* add Drone config
* update Drone config
* add Drone config
* list GPUs
* add type
* native torch
* native torch
* fix image
* update
* SLURM_LOCALID
* add badge
* simple test
2020-03-11 15:39:59 -04:00