Adrian Wälchli
9d136a9fc5
Lightning Lite core and tests ( #10175 )
2021-10-29 21:46:39 +00:00
Adrian Wälchli
b4f43b1695
Update docs for sync_dist logging option ( #10186 )
...
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-10-29 20:44:23 +00:00
Kaushik B
cedaebfcbb
Add `auto_device_count` method to `Accelerators` ( #10222 )
...
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-10-29 22:31:32 +02:00
Danielle Pintz
848ad3f41d
Remove `training_tricks_connector.py` ( #10112 )
...
* deprecate training tricks connector
* fixes
2021-10-29 18:20:17 +00:00
Gili Tzabari
a967b6eba0
del iterator on_run_end() ( #9915 )
2021-10-29 16:29:44 +00:00
Carlos Mocholí
e4eb61d812
Raise exception for `strategy=ddp_cpu|tpu_spawn` ( #10185 )
2021-10-29 16:15:24 +00:00
Carlos Mocholí
81d15c5986
Implement double optimizer closure for hook structure consistency ( #10167 )
2021-10-29 13:03:04 +00:00
Danielle Pintz
c211adb579
Mark `callback_connector` as protected ( #10121 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-10-29 12:58:47 +00:00
thomas chaton
bd77f65463
Resolve batch_size in ResultCollection not resetted to 1 on epoch end ( #10242 )
2021-10-29 13:55:11 +01:00
thomas chaton
843bf26297
Fix `log(sync_dist=True, on_epoch=True, on_step=True)` not reducing on step ( #10227 )
...
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-10-29 12:08:32 +00:00
Carlos Mocholí
4bc73b2b76
Avoid deprecated usage in accelerator connector tests ( #10184 )
2021-10-29 12:36:21 +01:00
Ning
dbfadedfe7
Revert "Add support for `len(datamodule)` ( #9895 )" ( #10072 )
...
This reverts commit 6429de8944
.
2021-10-29 13:33:51 +02:00
Rohit Gupta
6a9adf26f7
Replace `_TORCH_GREATER_EQUAL_DEV_1_10` with `_TORCH_GREATER_EQUAL_1_10` ( #10240 )
2021-10-29 10:36:02 +00:00
thomas chaton
5f4ffdee41
cleanup ( #10081 )
2021-10-29 08:40:43 +00:00
Adrian Wälchli
3f9dfe4949
Fix iterating over a DummyLogger when `fast_dev_run > 0` ( #10232 )
2021-10-29 07:22:59 +00:00
Adrian Wälchli
6ed7a0c172
Fix sigterm signal handling ( #10189 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-10-29 00:01:39 +00:00
Carlos Mocholí
03f01fb5ec
Fix gradient norm tracking and gradient clipping ( #9287 )
...
* WIP
* Progress
* Undo test change
* Fix plugin closure execution order
* Update CHANGELOG
* Fix manual optimization on AMP and skipping backward
* Fix for deepspeed
* Typo
* Hook test for manual closure
* Add skipping test with AMP
* You are hideous, apex
* Add deepspeed test
* Update CHANGELOG
* Fix for broken master
* Add RunIf
* FIXMEs
* Rename
* Fix grad norm
* add a simple test
* update test
* update test
* update test
* fix merge conflicts
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Sea of changes
* Undo change
* Introduce TPUPrecisionPlugin
* Undo changes
* Undo changes
* Resolve FIXME
* Undo change
* Undo change
* Undo change
* Fix FIXMEs
* Fix FIXME
* Correct value
* Bad merge
* Fix circular imports
* WIP
* Fixing clipping
* Fixes
* Bad merge
* Move optimizer step and clipping into the `PrecisionPlugin`
* Fix AMP
* Update CHANGELOG
* Fix tests
* Underscore
* Progress
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Remove pre_optimizer_step
* Missed one
* Progress
* Progress
* Fix test
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Update FIXMEs
* Fix test
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Fix test
* DeepSpeed warning. mypy
* Rename
* Finish tests
* Update CHANGELOG
* Dumb fixes
* accelerator=auto
* Apply suggestions from code review
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
* Update on comments
* Use ClassifModule
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2021-10-28 15:23:27 +00:00
Carlos Mocholí
5262b63dff
Pass the scaler as an input to `NativeMixedPrecisionPlugin` ( #10055 )
...
Co-authored-by: thomas chaton <thomas@grid.ai>
2021-10-28 14:13:53 +00:00
Low Weng Fei
83d74bb385
Fix `reset_seed()` converting the `PL_SEED_WORKERS` environment variable `str` read to `bool` ( #10099 )
...
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: tchaton <thomas@grid.ai>
2021-10-28 12:57:41 +00:00
Rohit Gupta
9af1dd7443
Deprecate `lr_sch_names` from `LearningRateMonitor` ( #10066 )
2021-10-28 12:57:04 +00:00
Rohit Gupta
85eb17cde5
initialize poptorch_models based on trainer_fn ( #10149 )
2021-10-28 11:59:52 +00:00
Adrian Wälchli
63015b5c87
Let `DDPSpawnPlugin.spawn` return a result from rank 0 ( #10162 )
...
Co-authored-by: Kaushik B <kaushikbokka@gmail.com>
2021-10-28 11:39:13 +02:00
Adrian Wälchli
07b1b56d5c
Fix setting device when creating "inf" monitor value in `ModelCheckpoint` ( #10118 )
...
Co-authored-by: thomas chaton <thomas@grid.ai>
2021-10-28 09:10:55 +00:00
Adrian Wälchli
afd1ae124e
Update deepspeed precision plugin for Lite ( #10164 )
2021-10-28 08:33:56 +00:00
Carlos Mocholí
dbe1662dc3
Replace `_TORCH_GREATER_EQUAL_DEV_1_10` with `_TORCH_GREATER_EQUAL_1_10` ( #10157 )
2021-10-27 13:38:39 +01:00
Adrian Wälchli
808edcdebf
update type ( #10163 )
2021-10-27 11:16:09 +00:00
Kaushik B
c33df2639f
Set `dataset` attribute to `MpDeviceLoader` used in TPU Spawn ( #10151 )
2021-10-27 01:23:01 +05:30
Carlos Mocholí
48b6292cf0
Move optimizer step and clipping into the `PrecisionPlugin` ( #10143 )
2021-10-26 17:26:26 +02:00
Rohit Gupta
93266e2c22
Avoid deprecated warnings from accelerator and checkpoint connector #10142
2021-10-26 14:10:30 +02:00
Danielle Pintz
38090e47d7
Small code simplification in `training_epoch_loop.py` ( #10146 )
...
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-10-26 13:22:36 +02:00
twsl
971281d27d
Make sure file and folder exists in Profiler ( #10073 )
...
Co-authored-by: tchaton <thomas@grid.ai>
2021-10-26 11:13:31 +00:00
Danielle Pintz
a5235d5b01
Remove `model_connector.py` ( #10111 )
2021-10-26 11:52:14 +02:00
Adrian Wälchli
871a96701a
Rename `master_params` to `main_params` ( #10105 )
...
Co-authored-by: tchaton <thomas@grid.ai>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2021-10-26 11:17:32 +02:00
Rohit Gupta
34d5980df6
Raise `MisconfigurationException` if `trainer.eval` is missing required methods ( #10016 )
2021-10-25 23:12:08 -07:00
Danielle Pintz
13d6d7bad1
Remove `optimizer_connector.py` ( #10120 )
2021-10-26 00:52:43 +00:00
Adrian Wälchli
21a5867dad
Rename `ClusterEnvironment.creates_processes` ( #10106 )
...
Co-authored-by: tchaton <thomas@grid.ai>
2021-10-25 23:15:41 +00:00
Rajat Goel
47e7a2860f
Fix Enums parsing in generated hparms yaml ( #9170 )
...
Co-authored-by: tchaton <thomas@grid.ai>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2021-10-25 21:23:20 +00:00
Eric Wiener
0e20119d24
Change default value of the `max_steps` Trainer argument from `None` to `-1` ( #9460 )
...
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: thomas chaton <thomas@grid.ai>
Co-authored-by: rohitgr7 <rohitgr1998@gmail.com>
2021-10-25 20:21:33 +00:00
Danielle Pintz
1f7bd6650c
Mark accelerator connector as protected ( #10032 )
2021-10-25 19:24:54 +00:00
jjenniferdai
6d79184ec5
Unify checkpoint load paths [redo #9693 ] ( #10061 )
2021-10-25 19:05:31 +00:00
Adrian Wälchli
76081fb846
Mark SLURM detection methods in `AcceleratorConnector` as protected ( #10101 )
...
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
2021-10-25 17:52:15 +00:00
Carlos Mocholí
2ee3127661
Use `torch.autocast` ( #10053 )
2021-10-25 17:33:52 +00:00
Carlos Mocholí
43c70ece17
Fix `optimizers` overloads typing annotation ( #10069 )
2021-10-25 16:51:46 +00:00
Carlos Mocholí
b376799430
Minor fixes related to clipping ( #10130 )
...
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2021-10-25 16:40:22 +00:00
Adrian Wälchli
d3e5a43546
Restrict setup methods to accept a single model ( #10064 )
2021-10-25 16:32:57 +00:00
manipopopo
cfb2d87765
Disable quantization aware training observers ( #8540 )
...
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: tchaton <thomas@grid.ai>
Co-authored-by: rohitgr7 <rohitgr1998@gmail.com>
2021-10-25 15:46:09 +00:00
Adrian Wälchli
aff80477b7
Remove dead code in accelerator connector ( #10100 )
...
* remove dead code in accelerator connector
* remove slurm "fake_slurm_managing_tasks" dead code
2021-10-25 13:37:40 +00:00
Kaushik B
64fc0d4257
Add method to TPUSpawn plugin to override how models are setup ( #10039 )
2021-10-25 11:44:32 +00:00
Danielle Pintz
e94dcf6936
Mark `trainer.data_connector` as protected ( #10031 )
...
Co-authored-by: tchaton <thomas@grid.ai>
2021-10-25 12:29:09 +01:00
Carlos Mocholí
f95ba20012
Do not use the base version by default in `_compare_version` ( #10051 )
2021-10-25 16:41:32 +05:30