Sean Naren
8439aead66
Update FairScale on CI ( #7017 )
...
* Try updating CI to latest fairscale
* Update availability of imports.py
* Remove some of the fairscale custom ci stuff
* Update grad scaler within the new process as reference is incorrect for spawn
* Remove fairscale from mocks
* Install fairscale 0.3.4 into the base container, remove from extra.txt
* Update docs/source/conf.py
* Fix import issues
* Mock fairscale for docs
* Fix DeepSpeed and FairScale to specific versions
* Swap back to greater than
* extras
* Revert "extras"
This reverts commit 7353479f
* ci
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: jirka <jirka.borovec@seznam.cz>
2021-04-23 12:37:00 +01:00
Jirka Borovec
aa7d3dc6cc
Fix `torchmetrics` compatibility ( #7131 )
...
* get_num_classes
* tmp
* fix one test
* fix deprecated tests
* fix deprecate
* pep8
* deprecate 0.3
* wip
* wip
* HaCK
* brnch
* brnch
* format
* Apply suggestions from code review
* prune
* rev
* mltilabel
* Apply suggestions from code review
* master
* rev
* .
Co-authored-by: Nicki Skafte <skaftenicki@gmail.com>
2021-04-22 20:45:46 +00:00
Jirka Borovec
1e4bc69a16
Ban `tensorboard==2.5.0` and `deepspeed==0.3.15` ( #7159 )
...
* ban TB 2.5
* note
* push
* Ban tb==2.5.0 and deepspeed==0.3.15
* Fix pip command
* pull
* up
* up
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-04-22 11:08:21 -04:00
Mauricio Villegas
f852a4f592
Changed basic_examples to use `LightningCLI` ( #6862 )
...
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-04-15 15:01:16 +00:00
Carlos Mocholí
745aed0ad4
Remove lightning-dtrun installation ( #7018 )
2021-04-14 23:44:52 +02:00
Jirka Borovec
dcf6e4e310
remake nvidia docker ( #6686 )
...
* use latest
* remake
* examples
2021-03-29 09:39:06 +01:00
Carlos Mocholí
21fc5eb21e
Automatically find and run special tests ( #6669 )
2021-03-26 17:04:59 +00:00
Jirka Borovec
64d0fa4472
update coverage config ( #6524 )
...
* update coverage config
* parallel
* parallel
* Apply suggestions from code review
* Apply suggestions from code review
* paralel
* paralel
* paralel
* combine
* combine
* .
* ..
* ..
* ..
* rev
* cb
* cb
* drop
* drop
* .
* ..
* ...
* ...
* ...
* .
2021-03-23 23:05:04 +01:00
Jirka Borovec
e62c7c7839
hotfix: mock examples ( #6632 )
...
* mock examples
* drop from GA
2021-03-22 16:49:01 +00:00
Jirka Borovec
cb59039288
fixing examples ( #6600 )
...
* try Azure
* -e
* path
2021-03-20 18:58:59 +00:00
Jirka Borovec
eb3ff413a9
CI: Azure publish results ( #6514 )
2021-03-15 14:38:40 +00:00
Jirka Borovec
85c8074bee
require: adjust versions ( #6363 )
...
* adjust versions
* release
* manifest
* pep8
* CI
* fix
* build
2021-03-06 14:34:54 +01:00
Jirka Borovec
e84854264f
CI: fix examples - patch download MNIST ( #6357 )
...
* patch download
* CI
* isort
* extra
2021-03-05 16:50:21 +00:00
Jirka Borovec
e038e747a0
hotfix for PT1.6 and torchtext ( #6323 )
...
* ci: azure reinstall torchtext
* move
* todos
* 0.6.0
* skip examples
* formatter
* skip
* todo
* Apply suggestions from code review
2021-03-04 17:48:17 +01:00
Jirka Borovec
6788dbabff
switch agents pool ( #6270 )
2021-03-01 22:14:55 +01:00
Jirka Borovec
f2660acbf9
add sanity check on nb available GPUs ( #6092 )
2021-02-19 21:45:53 +00:00
Jirka Borovec
e12c8a7254
add Azure tags trigger ( #6066 )
...
* add Azure tags trigger
* fix
* mnodes
2021-02-18 16:41:16 -05:00
Sean Naren
8440595b26
[CI] Move DeepSpeed into CUDA image, remove DeepSpeed install from azure ( #6043 )
...
* Move to CUDA image
* Remove deepspeed install as deepspeed now in the cuda image
* Remove path setting, as ninja should be in the container now
2021-02-17 18:51:31 -05:00
Sean Naren
7189d673f6
DeepSpeed Integration ( #5954 )
...
* Add initial deepspeed changes
* Address code review
* Move static method outside of function
* Fixes
* Add missing annotation
* Remove seed setting
* Doc changes
* Doc changes, add address reviews
* Fix docs
* Try fixing issue by moving to torch adam
* Clean up check
* Changes, better APIs!
* Add wrapper, swap to git install revision
* Add special test
* Add warning
* Address review
* Add better disclaimer
* Turn off ZeRO for testing due to compilation
* Add description on modifying parameters via the plugin
* Doc strings clear
* Small doc fixes
* Fix hash, reduce test
* Added CI change
* Move to azure pipeline
* Fix test name
* Add missing flag
* Remove sudo...
* Try conda instead
* Swap to conda base
* Try suggested install
* Apply suggestions from code review
* Apply suggestions from code review
* Revert "Apply suggestions from code review"
This reverts commit 41cca05a
* Revert "Apply suggestions from code review"
This reverts commit e06ec29e
* Remove setter
* Address most review
* Move out function, remove DeepSpeed from requirements
* Install deepspeed/mpi4py within container
* Use special tests, move to master commit for deepspeed
* Export path
* Force compile to happen first
* Remove!
* Debugging ninja
* Fix error in optimizer step logic
* Attempt to fix symbolic link
* Reverse to aid debugging
* Export path again
* Clean up mess
* var
* Revert "var"
This reverts commit 3450eaca
* Address review, add todo
* Add note about unsupported functionality
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: tchaton <thomas@grid.ai>
Co-authored-by: Jirka Borovec <jirka.borovec@seznam.cz>
2021-02-17 15:23:42 -05:00
Jirka Borovec
c0ee1f19fc
fix install dtrun ( #6025 )
2021-02-17 11:43:51 +00:00
Jirka Borovec
ba806c8ee0
enable testing DDP examples ( #4995 )
...
* enable testing DDP examples
* args
* ddp_spawn
* ddp as extra script
* path
# Conflicts:
# .drone.yml
* install
* -u
* q
2021-02-15 15:36:13 +00:00
Nicki Skafte
979c879e45
drop DDP CLI test ( #5938 )
...
* fix tests
* =
Co-authored-by: Jirka Borovec <jirka.borovec@seznam.cz>
2021-02-12 17:42:32 +01:00
Jirka Borovec
373a31e63e
add azure timeout ( #5907 )
...
* add azure timeout
* rework
2021-02-10 20:21:20 +00:00
Jirka Borovec
c2c82dad62
CI: Azure ( #5882 )
...
* add base Azure pipeline
* skip
2021-02-10 04:43:26 -05:00