Commit Graph

5916 Commits

Author SHA1 Message Date
Kaushik B 762af9505b
Add missing test for testing custom registered training plugin (#10225) 2021-10-29 04:06:06 +00:00
Adrian Wälchli 6ed7a0c172
Fix sigterm signal handling (#10189)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-10-29 00:01:39 +00:00
thomas chaton 255e3edc98
resolve failing test (#10191)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-10-28 15:27:03 +00:00
Carlos Mocholí 03f01fb5ec
Fix gradient norm tracking and gradient clipping (#9287)
* WIP

* Progress

* Undo test change

* Fix plugin closure execution order

* Update CHANGELOG

* Fix manual optimization on AMP and skipping backward

* Fix for deepspeed

* Typo

* Hook test for manual closure

* Add skipping test with AMP

* You are hideous, apex

* Add deepspeed test

* Update CHANGELOG

* Fix for broken master

* Add RunIf

* FIXMEs

* Rename

* Fix grad norm

* add a simple test

* update test

* update  test

* update test

* fix merge conflicts

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Sea of changes

* Undo change

* Introduce TPUPrecisionPlugin

* Undo changes

* Undo changes

* Resolve FIXME

* Undo change

* Undo change

* Undo change

* Fix FIXMEs

* Fix FIXME

* Correct value

* Bad merge

* Fix circular imports

* WIP

* Fixing clipping

* Fixes

* Bad merge

* Move optimizer step and clipping into the `PrecisionPlugin`

* Fix AMP

* Update CHANGELOG

* Fix tests

* Underscore

* Progress

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove pre_optimizer_step

* Missed one

* Progress

* Progress

* Fix test

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update FIXMEs

* Fix test

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix test

* DeepSpeed warning. mypy

* Rename

* Finish tests

* Update CHANGELOG

* Dumb fixes

* accelerator=auto

* Apply suggestions from code review

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

* Update on comments

* Use ClassifModule

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2021-10-28 15:23:27 +00:00
Carlos Mocholí 5262b63dff
Pass the scaler as an input to `NativeMixedPrecisionPlugin` (#10055)
Co-authored-by: thomas chaton <thomas@grid.ai>
2021-10-28 14:13:53 +00:00
Low Weng Fei 83d74bb385
Fix `reset_seed()` converting the `PL_SEED_WORKERS` environment variable `str` read to `bool` (#10099)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: tchaton <thomas@grid.ai>
2021-10-28 12:57:41 +00:00
Rohit Gupta 9af1dd7443
Deprecate `lr_sch_names` from `LearningRateMonitor` (#10066) 2021-10-28 12:57:04 +00:00
Adam J. Stewart b8ac17624d
Docs: fix mistakes in New Project docs (#10137)
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2021-10-28 12:31:02 +00:00
Rohit Gupta 85eb17cde5
initialize poptorch_models based on trainer_fn (#10149) 2021-10-28 11:59:52 +00:00
Kaushik B d1985ebf96
Add Plugins Registry to docs (#10181) 2021-10-28 16:44:08 +05:30
Adrian Wälchli 63015b5c87
Let `DDPSpawnPlugin.spawn` return a result from rank 0 (#10162)
Co-authored-by: Kaushik B <kaushikbokka@gmail.com>
2021-10-28 11:39:13 +02:00
Adrian Wälchli 07b1b56d5c
Fix setting device when creating "inf" monitor value in `ModelCheckpoint` (#10118)
Co-authored-by: thomas chaton <thomas@grid.ai>
2021-10-28 09:10:55 +00:00
Adrian Wälchli afd1ae124e
Update deepspeed precision plugin for Lite (#10164) 2021-10-28 08:33:56 +00:00
Carlos Mocholí 3a4e9970d6
Pin fairscale version (#10200) 2021-10-27 23:24:17 +00:00
Carlos Mocholí dbe1662dc3
Replace `_TORCH_GREATER_EQUAL_DEV_1_10` with `_TORCH_GREATER_EQUAL_1_10` (#10157) 2021-10-27 13:38:39 +01:00
Adrian Wälchli 808edcdebf
update type (#10163) 2021-10-27 11:16:09 +00:00
Kaushik B c33df2639f
Set `dataset` attribute to `MpDeviceLoader` used in TPU Spawn (#10151) 2021-10-27 01:23:01 +05:30
Adrian Wälchli 5ade197580
Update README page in pl_examples folder (#10114)
Co-authored-by: tchaton <thomas@grid.ai>
2021-10-26 17:38:56 +00:00
Adrian Wälchli 4a4a27db05
Update docutils package version in requirements.txt (#10158) 2021-10-26 16:32:47 +00:00
Carlos Mocholí 48b6292cf0
Move optimizer step and clipping into the `PrecisionPlugin` (#10143) 2021-10-26 17:26:26 +02:00
Rohit Gupta 93266e2c22
Avoid deprecated warnings from accelerator and checkpoint connector #10142 2021-10-26 14:10:30 +02:00
Carlos Mocholí a0e45dc071
Some minor CI cleanup (#10088) 2021-10-26 13:58:20 +02:00
Danielle Pintz 38090e47d7
Small code simplification in `training_epoch_loop.py` (#10146)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-10-26 13:22:36 +02:00
twsl 971281d27d
Make sure file and folder exists in Profiler (#10073)
Co-authored-by: tchaton <thomas@grid.ai>
2021-10-26 11:13:31 +00:00
Charlie_Tang 84ce1d095c
add 'sanity_checking' to datamodule 'on_after_batch_transfer' docs (#10067)
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2021-10-26 11:12:57 +00:00
Danielle Pintz a5235d5b01
Remove `model_connector.py` (#10111) 2021-10-26 11:52:14 +02:00
Adrian Wälchli 871a96701a
Rename `master_params` to `main_params` (#10105)
Co-authored-by: tchaton <thomas@grid.ai>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2021-10-26 11:17:32 +02:00
Rohit Gupta 34d5980df6
Raise `MisconfigurationException` if `trainer.eval` is missing required methods (#10016) 2021-10-25 23:12:08 -07:00
Danielle Pintz 13d6d7bad1
Remove `optimizer_connector.py` (#10120) 2021-10-26 00:52:43 +00:00
Adrian Wälchli 21a5867dad
Rename `ClusterEnvironment.creates_processes` (#10106)
Co-authored-by: tchaton <thomas@grid.ai>
2021-10-25 23:15:41 +00:00
Adrian Wälchli f1623355bd
Add example table to loop docs (#10058)
Co-authored-by: tchaton <thomas@grid.ai>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2021-10-25 22:42:15 +00:00
Rajat Goel 47e7a2860f
Fix Enums parsing in generated hparms yaml (#9170)
Co-authored-by: tchaton <thomas@grid.ai>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2021-10-25 21:23:20 +00:00
Jirka Borovec 0e0247a4d4
docker Conda timeout (#10087) 2021-10-25 20:56:47 +00:00
Eric Wiener 0e20119d24
Change default value of the `max_steps` Trainer argument from `None` to `-1` (#9460)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: thomas chaton <thomas@grid.ai>
Co-authored-by: rohitgr7 <rohitgr1998@gmail.com>
2021-10-25 20:21:33 +00:00
Rohit Gupta d9dfb2e920
fix tests (#10138) 2021-10-25 19:37:47 +00:00
Danielle Pintz 1f7bd6650c
Mark accelerator connector as protected (#10032) 2021-10-25 19:24:54 +00:00
jjenniferdai 6d79184ec5
Unify checkpoint load paths [redo #9693] (#10061) 2021-10-25 19:05:31 +00:00
Adrian Wälchli 76081fb846
Mark SLURM detection methods in `AcceleratorConnector` as protected (#10101)
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
2021-10-25 17:52:15 +00:00
Carlos Mocholí 2ee3127661
Use `torch.autocast` (#10053) 2021-10-25 17:33:52 +00:00
Carlos Mocholí 43c70ece17
Fix `optimizers` overloads typing annotation (#10069) 2021-10-25 16:51:46 +00:00
Carlos Mocholí b376799430
Minor fixes related to clipping (#10130)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2021-10-25 16:40:22 +00:00
Adrian Wälchli d3e5a43546
Restrict setup methods to accept a single model (#10064) 2021-10-25 16:32:57 +00:00
manipopopo cfb2d87765
Disable quantization aware training observers (#8540)
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: tchaton <thomas@grid.ai>
Co-authored-by: rohitgr7 <rohitgr1998@gmail.com>
2021-10-25 15:46:09 +00:00
Adrian Wälchli f8a7f3fde0
Add Yield loop example (#9983)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: tchaton <thomas@grid.ai>
2021-10-25 14:26:36 +00:00
Adrian Wälchli aff80477b7
Remove dead code in accelerator connector (#10100)
* remove dead code in accelerator connector

* remove slurm "fake_slurm_managing_tasks" dead code
2021-10-25 13:37:40 +00:00
Adrian Wälchli 7eb2edf421
rename set_random_master_port (#10104)
Co-authored-by: tchaton <thomas@grid.ai>
2021-10-25 12:09:05 +00:00
Kaushik B 64fc0d4257
Add method to TPUSpawn plugin to override how models are setup (#10039) 2021-10-25 11:44:32 +00:00
Danielle Pintz e94dcf6936
Mark `trainer.data_connector` as protected (#10031)
Co-authored-by: tchaton <thomas@grid.ai>
2021-10-25 12:29:09 +01:00
Carlos Mocholí f95ba20012
Do not use the base version by default in `_compare_version` (#10051) 2021-10-25 16:41:32 +05:30
Adrian Wälchli 225989363b
update links in callback examples pointing to bolts (#10117) 2021-10-25 10:27:14 +00:00