Kaushik B
762af9505b
Add missing test for testing custom registered training plugin ( #10225 )
2021-10-29 04:06:06 +00:00
Adrian Wälchli
6ed7a0c172
Fix sigterm signal handling ( #10189 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-10-29 00:01:39 +00:00
thomas chaton
255e3edc98
resolve failing test ( #10191 )
...
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-10-28 15:27:03 +00:00
Carlos Mocholí
03f01fb5ec
Fix gradient norm tracking and gradient clipping ( #9287 )
...
* WIP
* Progress
* Undo test change
* Fix plugin closure execution order
* Update CHANGELOG
* Fix manual optimization on AMP and skipping backward
* Fix for deepspeed
* Typo
* Hook test for manual closure
* Add skipping test with AMP
* You are hideous, apex
* Add deepspeed test
* Update CHANGELOG
* Fix for broken master
* Add RunIf
* FIXMEs
* Rename
* Fix grad norm
* add a simple test
* update test
* update test
* update test
* fix merge conflicts
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Sea of changes
* Undo change
* Introduce TPUPrecisionPlugin
* Undo changes
* Undo changes
* Resolve FIXME
* Undo change
* Undo change
* Undo change
* Fix FIXMEs
* Fix FIXME
* Correct value
* Bad merge
* Fix circular imports
* WIP
* Fixing clipping
* Fixes
* Bad merge
* Move optimizer step and clipping into the `PrecisionPlugin`
* Fix AMP
* Update CHANGELOG
* Fix tests
* Underscore
* Progress
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Remove pre_optimizer_step
* Missed one
* Progress
* Progress
* Fix test
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Update FIXMEs
* Fix test
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Fix test
* DeepSpeed warning. mypy
* Rename
* Finish tests
* Update CHANGELOG
* Dumb fixes
* accelerator=auto
* Apply suggestions from code review
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
* Update on comments
* Use ClassifModule
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2021-10-28 15:23:27 +00:00
Carlos Mocholí
5262b63dff
Pass the scaler as an input to `NativeMixedPrecisionPlugin` ( #10055 )
...
Co-authored-by: thomas chaton <thomas@grid.ai>
2021-10-28 14:13:53 +00:00
Low Weng Fei
83d74bb385
Fix `reset_seed()` converting the `PL_SEED_WORKERS` environment variable `str` read to `bool` ( #10099 )
...
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: tchaton <thomas@grid.ai>
2021-10-28 12:57:41 +00:00
Rohit Gupta
9af1dd7443
Deprecate `lr_sch_names` from `LearningRateMonitor` ( #10066 )
2021-10-28 12:57:04 +00:00
Adam J. Stewart
b8ac17624d
Docs: fix mistakes in New Project docs ( #10137 )
...
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2021-10-28 12:31:02 +00:00
Rohit Gupta
85eb17cde5
initialize poptorch_models based on trainer_fn ( #10149 )
2021-10-28 11:59:52 +00:00
Kaushik B
d1985ebf96
Add Plugins Registry to docs ( #10181 )
2021-10-28 16:44:08 +05:30
Adrian Wälchli
63015b5c87
Let `DDPSpawnPlugin.spawn` return a result from rank 0 ( #10162 )
...
Co-authored-by: Kaushik B <kaushikbokka@gmail.com>
2021-10-28 11:39:13 +02:00
Adrian Wälchli
07b1b56d5c
Fix setting device when creating "inf" monitor value in `ModelCheckpoint` ( #10118 )
...
Co-authored-by: thomas chaton <thomas@grid.ai>
2021-10-28 09:10:55 +00:00
Adrian Wälchli
afd1ae124e
Update deepspeed precision plugin for Lite ( #10164 )
2021-10-28 08:33:56 +00:00
Carlos Mocholí
3a4e9970d6
Pin fairscale version ( #10200 )
2021-10-27 23:24:17 +00:00
Carlos Mocholí
dbe1662dc3
Replace `_TORCH_GREATER_EQUAL_DEV_1_10` with `_TORCH_GREATER_EQUAL_1_10` ( #10157 )
2021-10-27 13:38:39 +01:00
Adrian Wälchli
808edcdebf
update type ( #10163 )
2021-10-27 11:16:09 +00:00
Kaushik B
c33df2639f
Set `dataset` attribute to `MpDeviceLoader` used in TPU Spawn ( #10151 )
2021-10-27 01:23:01 +05:30
Adrian Wälchli
5ade197580
Update README page in pl_examples folder ( #10114 )
...
Co-authored-by: tchaton <thomas@grid.ai>
2021-10-26 17:38:56 +00:00
Adrian Wälchli
4a4a27db05
Update docutils package version in requirements.txt ( #10158 )
2021-10-26 16:32:47 +00:00
Carlos Mocholí
48b6292cf0
Move optimizer step and clipping into the `PrecisionPlugin` ( #10143 )
2021-10-26 17:26:26 +02:00
Rohit Gupta
93266e2c22
Avoid deprecated warnings from accelerator and checkpoint connector #10142
2021-10-26 14:10:30 +02:00
Carlos Mocholí
a0e45dc071
Some minor CI cleanup ( #10088 )
2021-10-26 13:58:20 +02:00
Danielle Pintz
38090e47d7
Small code simplification in `training_epoch_loop.py` ( #10146 )
...
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-10-26 13:22:36 +02:00
twsl
971281d27d
Make sure file and folder exists in Profiler ( #10073 )
...
Co-authored-by: tchaton <thomas@grid.ai>
2021-10-26 11:13:31 +00:00
Charlie_Tang
84ce1d095c
add 'sanity_checking' to datamodule 'on_after_batch_transfer' docs ( #10067 )
...
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2021-10-26 11:12:57 +00:00
Danielle Pintz
a5235d5b01
Remove `model_connector.py` ( #10111 )
2021-10-26 11:52:14 +02:00
Adrian Wälchli
871a96701a
Rename `master_params` to `main_params` ( #10105 )
...
Co-authored-by: tchaton <thomas@grid.ai>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2021-10-26 11:17:32 +02:00
Rohit Gupta
34d5980df6
Raise `MisconfigurationException` if `trainer.eval` is missing required methods ( #10016 )
2021-10-25 23:12:08 -07:00
Danielle Pintz
13d6d7bad1
Remove `optimizer_connector.py` ( #10120 )
2021-10-26 00:52:43 +00:00
Adrian Wälchli
21a5867dad
Rename `ClusterEnvironment.creates_processes` ( #10106 )
...
Co-authored-by: tchaton <thomas@grid.ai>
2021-10-25 23:15:41 +00:00
Adrian Wälchli
f1623355bd
Add example table to loop docs ( #10058 )
...
Co-authored-by: tchaton <thomas@grid.ai>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2021-10-25 22:42:15 +00:00
Rajat Goel
47e7a2860f
Fix Enums parsing in generated hparms yaml ( #9170 )
...
Co-authored-by: tchaton <thomas@grid.ai>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2021-10-25 21:23:20 +00:00
Jirka Borovec
0e0247a4d4
docker Conda timeout ( #10087 )
2021-10-25 20:56:47 +00:00
Eric Wiener
0e20119d24
Change default value of the `max_steps` Trainer argument from `None` to `-1` ( #9460 )
...
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: thomas chaton <thomas@grid.ai>
Co-authored-by: rohitgr7 <rohitgr1998@gmail.com>
2021-10-25 20:21:33 +00:00
Rohit Gupta
d9dfb2e920
fix tests ( #10138 )
2021-10-25 19:37:47 +00:00
Danielle Pintz
1f7bd6650c
Mark accelerator connector as protected ( #10032 )
2021-10-25 19:24:54 +00:00
jjenniferdai
6d79184ec5
Unify checkpoint load paths [redo #9693 ] ( #10061 )
2021-10-25 19:05:31 +00:00
Adrian Wälchli
76081fb846
Mark SLURM detection methods in `AcceleratorConnector` as protected ( #10101 )
...
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
2021-10-25 17:52:15 +00:00
Carlos Mocholí
2ee3127661
Use `torch.autocast` ( #10053 )
2021-10-25 17:33:52 +00:00
Carlos Mocholí
43c70ece17
Fix `optimizers` overloads typing annotation ( #10069 )
2021-10-25 16:51:46 +00:00
Carlos Mocholí
b376799430
Minor fixes related to clipping ( #10130 )
...
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2021-10-25 16:40:22 +00:00
Adrian Wälchli
d3e5a43546
Restrict setup methods to accept a single model ( #10064 )
2021-10-25 16:32:57 +00:00
manipopopo
cfb2d87765
Disable quantization aware training observers ( #8540 )
...
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: tchaton <thomas@grid.ai>
Co-authored-by: rohitgr7 <rohitgr1998@gmail.com>
2021-10-25 15:46:09 +00:00
Adrian Wälchli
f8a7f3fde0
Add Yield loop example ( #9983 )
...
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: tchaton <thomas@grid.ai>
2021-10-25 14:26:36 +00:00
Adrian Wälchli
aff80477b7
Remove dead code in accelerator connector ( #10100 )
...
* remove dead code in accelerator connector
* remove slurm "fake_slurm_managing_tasks" dead code
2021-10-25 13:37:40 +00:00
Adrian Wälchli
7eb2edf421
rename set_random_master_port ( #10104 )
...
Co-authored-by: tchaton <thomas@grid.ai>
2021-10-25 12:09:05 +00:00
Kaushik B
64fc0d4257
Add method to TPUSpawn plugin to override how models are setup ( #10039 )
2021-10-25 11:44:32 +00:00
Danielle Pintz
e94dcf6936
Mark `trainer.data_connector` as protected ( #10031 )
...
Co-authored-by: tchaton <thomas@grid.ai>
2021-10-25 12:29:09 +01:00
Carlos Mocholí
f95ba20012
Do not use the base version by default in `_compare_version` ( #10051 )
2021-10-25 16:41:32 +05:30
Adrian Wälchli
225989363b
update links in callback examples pointing to bolts ( #10117 )
2021-10-25 10:27:14 +00:00