Sean Naren
07b7dc9c17
[Fix] Add delay property for checkpointing, refactor loading checkpoint (DeepSpeed Checkpointing Fix 1/n) ( #8627 )
...
* Add property to delay checkpointing, move loading checkpoint file into the run function to allow deepspeed engine to be loaded
* Add a small test
* Apply suggestions from code review
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* Update pytorch_lightning/accelerators/accelerator.py
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* Address review
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-07-30 11:31:08 +01:00
Carlos Mocholí
a64cc37394
Replace `yapf` with `black` ( #7783 )
...
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-07-26 13:37:35 +02:00
Sean Naren
6388c29e87
[IPU] Add reset dataloader hooks to training type plugin 3/n ( #7861 )
...
* Add hooks
* Add tests for hooks
* Add changelog
* Test changes, add typing
2021-06-07 10:37:09 +00:00
Martin Kristiansen
c3fc0313ef
Updating docs and error message: half precision not available on CPU ( #7384 )
...
* Updating docs and error message to specify that half precission not available on CPU
* update messages
Co-authored-by: Martin Kristiansen <martinkristiansen@sixgill.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: jirka <jirka.borovec@seznam.cz>
2021-05-06 09:05:50 +00:00
thomas chaton
fd5cb7fcc3
Add PyTorch 1.8 Profiler 5/5 ( #6618 )
...
* Refactor profilers
* Update PassThrough
* WIP - This is broken and will change
* Update pytorch_lightning/profiler/pytorch.py
Co-authored-by: thomas chaton <thomas@grid.ai>
* resolve tests
* resolve tests
* find output
* try something
* update
* add support for test and predict
* update
* update
* use getattr
* test
* test
* update
* tests
* update
* update
* update
* update
* update
* remove file
* update
* update
* update
* update
* update
* test
* update#
* update
* update tests
* update
* add suport for 1.8
* rename records
* add support for 1.8
* update
* resolve flake8
* resolve test
* Refactor basic profilers
* Fixes
* Unused import
* Introduce setup
* Profile on all ranks. Print to stdout on 0
* Introduce dirpath + filename
* CHANGELOG
* Add tests. Address comments
* add `on_run_stage_setup`
* add on_run_stage_setup function
* update
* add test for RegisterRecordFunction
* update lightnng flow direction
* move variable to private
* remove trace
* Undo code that should be in 3/4
* Multi-stage multi-rank
* 2/5 changes
* Pass stage in __del__
* Remove TODOs
* Describe on_evaluation_end. Add tests
* Typo
* Address comments
* deepcopy tests
* Advanced teardown
* Fix teardown test
* Fix tests
* Minor change
* Update CHANGELOG.md
* Fix test
* Quick fixes
* Fix 6522
* resolve ddp tests
* resolve tests
* resolve some tests
* update tests
* resolve tests
* update
* resolve tests
* resolve some tests
* Missed fixes from 3/5
* Fixes
* resolve some tests
* resolve test for 1.7.1
* Broken refactor
* Missed stage
* Minor changes
* resolve tests
* Update CHANGELOG
* resolve bug
* remove print
* Typo
* Cleanup
* resolve ddp test
* remove barrier
* update profiler
* update
* Smaller model
* update
* resolve tests
* update
* Minor changes. CHANGELOG
* Minimize diff
* update to 1.8.1
* RunIf. Extra code. Check segfault
* resolve tests
* Typo. Bad merge
* Fixing a bad merge
* replace for kineto
* Update pytorch_lightning/profiler/pytorch.py
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
* Update pytorch_lightning/profiler/pytorch.py
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
* Minor changes
* Bad merge
* Use lists for flexibility
* Use sets
* predict_step
* Ananth's suggestion
* update
* Docs
* Update pl_examples/basic_examples/profiler_example.py
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* update example
* update example
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2021-03-23 20:43:21 +00:00
Jirka Borovec
efce2b7777
Prune metrics: regression 8/n ( #6636 )
...
* explained_variance
* tests
* mean_absolute_error
* mean_squared_error
* mean_relative_error
* mean_squared_log_error
* chlog
2021-03-23 09:35:51 +01:00
Sean Naren
58c9fa7edb
Allow training type plugin to delay optimizer creation (FSDP 2/n) ( #6331 )
...
* Allow training_type_plugin to delay optimizer configure
* Add missing references to trainer, add a CPU accelerator based test
2021-03-22 11:43:53 +00:00
Jirka Borovec
09baf29ecb
prune deprecated profiler as bool ( #6164 )
...
* prune profiler
* chlog
2021-02-24 09:08:21 +00:00
Adrian Wälchli
ae6ce17598
fix amp/apex misconfiguration error for cpu ( #6107 )
...
* fix weird test
* fix apex plugin test
* fix raise
* cpu test
* fix type
* add changelog
2021-02-22 01:02:31 +01:00