Commit Graph

10 Commits

Author SHA1 Message Date
William Falcon 29ebe92208
support for native amp (#1561)
* adding native amp suppport

* adding native amp suppport

* adding native amp suppport

* adding native amp suppport

* autocast

* autocast

* autocast

* autocast

* autocast

* autocast

* removed comments

* removed comments

* added state saving

* added state saving

* try install amp again

* added state saving

* drop Apex reinstall

Co-authored-by: J. Borovec <jirka.borovec@seznam.cz>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2020-04-23 14:47:08 -04:00
Jirka Borovec 0b22b64a10
Tests/docker (#1573)
* devel image

* try parallel

* new image
2020-04-23 12:52:59 -04:00
Travis Addair 7024177f7d
Added Horovod distributed backend (#1529)
* Initial commit of Horovod distributed backend implementation

* Update distrib_data_parallel.py

* Update distrib_data_parallel.py

* Update tests/models/test_horovod.py

Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com>

* Update tests/models/test_horovod.py

Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com>

* Fixed tests

* Added six

* tests

* Install tox for GitHub CI

* Retry tests

* Catch all exceptions

* Skip cache

* Remove tox

* Restore pip cache

* Remove the cache

* Restore pip cache

* Remove AMP

Co-authored-by: William Falcon <waf2107@columbia.edu>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: J. Borovec <jirka.borovec@seznam.cz>
2020-04-22 17:39:08 -04:00
Jirka Borovec 724b787cd1
faster CI testing (#1323)
* MNIST digits

* increase test acc

* smaller parity

* drone builds

* increase GH action timeout

* drone format

* fix paths

* drone cache

* circle cache

* fix test

* lower nb epochs

* circleCI

* user orb

* fix test

* fix test

* circle cache

* circle cache

* circle cache

* comment caches

* benchmark batch size

* cache dataset

* smaller dataset

* smaller dataset

* fix nb samples

* batch size

* fix test
2020-04-02 12:28:44 -04:00
William Falcon 18d055a390
Parity test (#1284)
* adding test

* adding test

* added base parity model

* added base parity model

* added parity test

* added parity test

* added parity test

* added parity test

* added parity test

* added parity test

* added parity test

* added parity test

* added parity test

* added parity test

* added parity test

* added parity test

* added parity test

* added parity test

* added parity test

* move parity to benchmark

* formatting

* fixed gradient acc sched

* move parity to benchmark

* formatting

* fixed gradient acc sched

* skip for CPU

* call last

Co-authored-by: J. Borovec <jirka.borovec@seznam.cz>
2020-03-30 18:16:32 -04:00
Jirka Borovec 61177cd1c8
system info (#1234)
* system info

* update big info

* test script

* update config

* rename script

* import path
2020-03-27 08:45:52 -04:00
Jirka Borovec 45d671a4a8
CI: split tests-examples (#990)
* CI: split tests-examples

* tests without template

* comment depends

* CircleCI typo

* add doctest

* update test req.

* CI tests

* setup macOS

* longer train

* lover pred acc

* fix model

* rename default model

* lower tests acc

* typo

* imports

* fix test optimizer

* update calls

* fix Win

* lower Drone image

* fix call

* pytorch image

* fix test

* add dev image

* add dev image

* update image

* drone volume

* lint

* update test notes

* rename tests/models >> tests/base

* group models

* conftest

* optim imports

* typos

* fix import

* fix tests

* install AMP

* tests

* fix import
2020-03-25 07:46:27 -04:00
Jirka Borovec 22a7264e9a
improve partial Codecov (#1172)
* ignore in setup

* show report

* abs imports

* abstract pass

* cover loggers

* doctest trains

* locals

* pass

* revert tensorboard

* use tensorboardX

* revert tensorboardX

* fix trains

* Add TrainsLogger.set_credentials (#1179)

* Add TrainsLogger.set_credentials to control trains server configuration and authentication from code. Sync trains package version.
Fix CI Trains tests

* Add global TrainsLogger set_bypass_mode (#1187)

* Add global TrainsLogger set_bypass_mode skips all external communication

Co-authored-by: bmartinn <>

* rm some no-cov

Co-authored-by: Martin.B <51887611+bmartinn@users.noreply.github.com>
2020-03-19 09:14:29 -04:00
Jirka Borovec f6a7a5278a
enable Codecov (#1133)
* update config

* try Drone cache

* drop Drone cache

* move import

* remove token
2020-03-14 13:01:57 -04:00
Jirka Borovec 5691ffb160
add Drone CI (#1115)
* add Drone config

* update Drone config

* add Drone config

* list GPUs

* add type

* native torch

* native torch

* fix image

* update

* SLURM_LOCALID

* add badge

* simple test
2020-03-11 15:39:59 -04:00