Commit Graph

3537 Commits

Author SHA1 Message Date
Adrian Wälchli 9d136a9fc5
Lightning Lite core and tests (#10175) 2021-10-29 21:46:39 +00:00
Adrian Wälchli b4f43b1695
Update docs for sync_dist logging option (#10186)
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-10-29 20:44:23 +00:00
Kaushik B cedaebfcbb
Add `auto_device_count` method to `Accelerators` (#10222)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-10-29 22:31:32 +02:00
Danielle Pintz 848ad3f41d
Remove `training_tricks_connector.py` (#10112)
* deprecate training tricks connector

* fixes
2021-10-29 18:20:17 +00:00
Gili Tzabari a967b6eba0
del iterator on_run_end() (#9915) 2021-10-29 16:29:44 +00:00
Carlos Mocholí e4eb61d812
Raise exception for `strategy=ddp_cpu|tpu_spawn` (#10185) 2021-10-29 16:15:24 +00:00
Carlos Mocholí 81d15c5986
Implement double optimizer closure for hook structure consistency (#10167) 2021-10-29 13:03:04 +00:00
Danielle Pintz c211adb579
Mark `callback_connector` as protected (#10121)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-10-29 12:58:47 +00:00
thomas chaton bd77f65463
Resolve batch_size in ResultCollection not resetted to 1 on epoch end (#10242) 2021-10-29 13:55:11 +01:00
thomas chaton 843bf26297
Fix `log(sync_dist=True, on_epoch=True, on_step=True)` not reducing on step (#10227)
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-10-29 12:08:32 +00:00
Carlos Mocholí 4bc73b2b76
Avoid deprecated usage in accelerator connector tests (#10184) 2021-10-29 12:36:21 +01:00
Ning dbfadedfe7
Revert "Add support for `len(datamodule)` (#9895)" (#10072)
This reverts commit 6429de8944.
2021-10-29 13:33:51 +02:00
Rohit Gupta 6a9adf26f7
Replace `_TORCH_GREATER_EQUAL_DEV_1_10` with `_TORCH_GREATER_EQUAL_1_10` (#10240) 2021-10-29 10:36:02 +00:00
thomas chaton 5f4ffdee41
cleanup (#10081) 2021-10-29 08:40:43 +00:00
Adrian Wälchli 3f9dfe4949
Fix iterating over a DummyLogger when `fast_dev_run > 0` (#10232) 2021-10-29 07:22:59 +00:00
Adrian Wälchli 6ed7a0c172
Fix sigterm signal handling (#10189)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-10-29 00:01:39 +00:00
Carlos Mocholí 03f01fb5ec
Fix gradient norm tracking and gradient clipping (#9287)
* WIP

* Progress

* Undo test change

* Fix plugin closure execution order

* Update CHANGELOG

* Fix manual optimization on AMP and skipping backward

* Fix for deepspeed

* Typo

* Hook test for manual closure

* Add skipping test with AMP

* You are hideous, apex

* Add deepspeed test

* Update CHANGELOG

* Fix for broken master

* Add RunIf

* FIXMEs

* Rename

* Fix grad norm

* add a simple test

* update test

* update  test

* update test

* fix merge conflicts

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Sea of changes

* Undo change

* Introduce TPUPrecisionPlugin

* Undo changes

* Undo changes

* Resolve FIXME

* Undo change

* Undo change

* Undo change

* Fix FIXMEs

* Fix FIXME

* Correct value

* Bad merge

* Fix circular imports

* WIP

* Fixing clipping

* Fixes

* Bad merge

* Move optimizer step and clipping into the `PrecisionPlugin`

* Fix AMP

* Update CHANGELOG

* Fix tests

* Underscore

* Progress

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove pre_optimizer_step

* Missed one

* Progress

* Progress

* Fix test

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update FIXMEs

* Fix test

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix test

* DeepSpeed warning. mypy

* Rename

* Finish tests

* Update CHANGELOG

* Dumb fixes

* accelerator=auto

* Apply suggestions from code review

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

* Update on comments

* Use ClassifModule

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2021-10-28 15:23:27 +00:00
Carlos Mocholí 5262b63dff
Pass the scaler as an input to `NativeMixedPrecisionPlugin` (#10055)
Co-authored-by: thomas chaton <thomas@grid.ai>
2021-10-28 14:13:53 +00:00
Low Weng Fei 83d74bb385
Fix `reset_seed()` converting the `PL_SEED_WORKERS` environment variable `str` read to `bool` (#10099)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: tchaton <thomas@grid.ai>
2021-10-28 12:57:41 +00:00
Rohit Gupta 9af1dd7443
Deprecate `lr_sch_names` from `LearningRateMonitor` (#10066) 2021-10-28 12:57:04 +00:00
Rohit Gupta 85eb17cde5
initialize poptorch_models based on trainer_fn (#10149) 2021-10-28 11:59:52 +00:00
Adrian Wälchli 63015b5c87
Let `DDPSpawnPlugin.spawn` return a result from rank 0 (#10162)
Co-authored-by: Kaushik B <kaushikbokka@gmail.com>
2021-10-28 11:39:13 +02:00
Adrian Wälchli 07b1b56d5c
Fix setting device when creating "inf" monitor value in `ModelCheckpoint` (#10118)
Co-authored-by: thomas chaton <thomas@grid.ai>
2021-10-28 09:10:55 +00:00
Adrian Wälchli afd1ae124e
Update deepspeed precision plugin for Lite (#10164) 2021-10-28 08:33:56 +00:00
Carlos Mocholí dbe1662dc3
Replace `_TORCH_GREATER_EQUAL_DEV_1_10` with `_TORCH_GREATER_EQUAL_1_10` (#10157) 2021-10-27 13:38:39 +01:00
Adrian Wälchli 808edcdebf
update type (#10163) 2021-10-27 11:16:09 +00:00
Kaushik B c33df2639f
Set `dataset` attribute to `MpDeviceLoader` used in TPU Spawn (#10151) 2021-10-27 01:23:01 +05:30
Carlos Mocholí 48b6292cf0
Move optimizer step and clipping into the `PrecisionPlugin` (#10143) 2021-10-26 17:26:26 +02:00
Rohit Gupta 93266e2c22
Avoid deprecated warnings from accelerator and checkpoint connector #10142 2021-10-26 14:10:30 +02:00
Danielle Pintz 38090e47d7
Small code simplification in `training_epoch_loop.py` (#10146)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-10-26 13:22:36 +02:00
twsl 971281d27d
Make sure file and folder exists in Profiler (#10073)
Co-authored-by: tchaton <thomas@grid.ai>
2021-10-26 11:13:31 +00:00
Danielle Pintz a5235d5b01
Remove `model_connector.py` (#10111) 2021-10-26 11:52:14 +02:00
Adrian Wälchli 871a96701a
Rename `master_params` to `main_params` (#10105)
Co-authored-by: tchaton <thomas@grid.ai>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2021-10-26 11:17:32 +02:00
Rohit Gupta 34d5980df6
Raise `MisconfigurationException` if `trainer.eval` is missing required methods (#10016) 2021-10-25 23:12:08 -07:00
Danielle Pintz 13d6d7bad1
Remove `optimizer_connector.py` (#10120) 2021-10-26 00:52:43 +00:00
Adrian Wälchli 21a5867dad
Rename `ClusterEnvironment.creates_processes` (#10106)
Co-authored-by: tchaton <thomas@grid.ai>
2021-10-25 23:15:41 +00:00
Rajat Goel 47e7a2860f
Fix Enums parsing in generated hparms yaml (#9170)
Co-authored-by: tchaton <thomas@grid.ai>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2021-10-25 21:23:20 +00:00
Eric Wiener 0e20119d24
Change default value of the `max_steps` Trainer argument from `None` to `-1` (#9460)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: thomas chaton <thomas@grid.ai>
Co-authored-by: rohitgr7 <rohitgr1998@gmail.com>
2021-10-25 20:21:33 +00:00
Danielle Pintz 1f7bd6650c
Mark accelerator connector as protected (#10032) 2021-10-25 19:24:54 +00:00
jjenniferdai 6d79184ec5
Unify checkpoint load paths [redo #9693] (#10061) 2021-10-25 19:05:31 +00:00
Adrian Wälchli 76081fb846
Mark SLURM detection methods in `AcceleratorConnector` as protected (#10101)
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
2021-10-25 17:52:15 +00:00
Carlos Mocholí 2ee3127661
Use `torch.autocast` (#10053) 2021-10-25 17:33:52 +00:00
Carlos Mocholí 43c70ece17
Fix `optimizers` overloads typing annotation (#10069) 2021-10-25 16:51:46 +00:00
Carlos Mocholí b376799430
Minor fixes related to clipping (#10130)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2021-10-25 16:40:22 +00:00
Adrian Wälchli d3e5a43546
Restrict setup methods to accept a single model (#10064) 2021-10-25 16:32:57 +00:00
manipopopo cfb2d87765
Disable quantization aware training observers (#8540)
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: tchaton <thomas@grid.ai>
Co-authored-by: rohitgr7 <rohitgr1998@gmail.com>
2021-10-25 15:46:09 +00:00
Adrian Wälchli aff80477b7
Remove dead code in accelerator connector (#10100)
* remove dead code in accelerator connector

* remove slurm "fake_slurm_managing_tasks" dead code
2021-10-25 13:37:40 +00:00
Kaushik B 64fc0d4257
Add method to TPUSpawn plugin to override how models are setup (#10039) 2021-10-25 11:44:32 +00:00
Danielle Pintz e94dcf6936
Mark `trainer.data_connector` as protected (#10031)
Co-authored-by: tchaton <thomas@grid.ai>
2021-10-25 12:29:09 +01:00
Carlos Mocholí f95ba20012
Do not use the base version by default in `_compare_version` (#10051) 2021-10-25 16:41:32 +05:30