Sean Naren
|
8440595b26
|
[CI] Move DeepSpeed into CUDA image, remove DeepSpeed install from azure (#6043)
* Move to CUDA image
* Remove deepspeed install as deepspeed now in the cuda image
* Remove path setting, as ninja should be in the container now
|
2021-02-17 18:51:31 -05:00 |
Sean Naren
|
7189d673f6
|
DeepSpeed Integration (#5954)
* Add initial deepspeed changes
* Address code review
* Move static method outside of function
* Fixes
* Add missing annotation
* Remove seed setting
* Doc changes
* Doc changes, add address reviews
* Fix docs
* Try fixing issue by moving to torch adam
* Clean up check
* Changes, better APIs!
* Add wrapper, swap to git install revision
* Add special test
* Add warning
* Address review
* Add better disclaimer
* Turn off ZeRO for testing due to compilation
* Add description on modifying parameters via the plugin
* Doc strings clear
* Small doc fixes
* Fix hash, reduce test
* Added CI change
* Move to azure pipeline
* Fix test name
* Add missing flag
* Remove sudo...
* Try conda instead
* Swap to conda base
* Try suggested install
* Apply suggestions from code review
* Apply suggestions from code review
* Revert "Apply suggestions from code review"
This reverts commit 41cca05a
* Revert "Apply suggestions from code review"
This reverts commit e06ec29e
* Remove setter
* Address most review
* Move out function, remove DeepSpeed from requirements
* Install deepspeed/mpi4py within container
* Use special tests, move to master commit for deepspeed
* Export path
* Force compile to happen first
* Remove!
* Debugging ninja
* Fix error in optimizer step logic
* Attempt to fix symbolic link
* Reverse to aid debugging
* Export path again
* Clean up mess
* var
* Revert "var"
This reverts commit 3450eaca
* Address review, add todo
* Add note about unsupported functionality
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: tchaton <thomas@grid.ai>
Co-authored-by: Jirka Borovec <jirka.borovec@seznam.cz>
|
2021-02-17 15:23:42 -05:00 |
Jirka Borovec
|
c0ee1f19fc
|
fix install dtrun (#6025)
|
2021-02-17 11:43:51 +00:00 |
Jirka Borovec
|
ba806c8ee0
|
enable testing DDP examples (#4995)
* enable testing DDP examples
* args
* ddp_spawn
* ddp as extra script
* path
# Conflicts:
# .drone.yml
* install
* -u
* q
|
2021-02-15 15:36:13 +00:00 |
Nicki Skafte
|
979c879e45
|
drop DDP CLI test (#5938)
* fix tests
* =
Co-authored-by: Jirka Borovec <jirka.borovec@seznam.cz>
|
2021-02-12 17:42:32 +01:00 |
Jirka Borovec
|
373a31e63e
|
add azure timeout (#5907)
* add azure timeout
* rework
|
2021-02-10 20:21:20 +00:00 |
Jirka Borovec
|
c2c82dad62
|
CI: Azure (#5882)
* add base Azure pipeline
* skip
|
2021-02-10 04:43:26 -05:00 |