Commit Graph

1378 Commits

Author SHA1 Message Date
Danielle Pintz 01f5f99919
Deprecate callback hooks `on_init_start` and `on_init_end` (#10940) 2021-12-08 07:42:19 +00:00
Danielle Pintz aeb0b5595f
Deprecate `call_hook` (#10979) 2021-12-08 00:52:47 +00:00
Rohit Gupta 6369e3b77f
Update Changelog after 1.5.5 release (#10977) 2021-12-07 12:35:20 -08:00
Adrian Wälchli 6bfc0bbc56
Remove `TrainingTypePlugin.post_dispatch` in favor of `teardown` (#10939)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
2021-12-06 22:27:30 +00:00
four4fish 629ca09e09
fix TypeError cause failure in singal_connector teardown (#10961) 2021-12-06 21:48:31 +00:00
four4fish 63bb4ec77d
4/n Move Accelerator into strategy - remove X_step() from accelerator (#10890)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-12-06 20:16:54 +00:00
Adrian Wälchli 6c79b2e969
Change temporary spawn checkpoint name (#10934) 2021-12-06 16:08:55 +00:00
Adrian Wälchli 3e1f8aa312
Fix spawn plugins not deleting temp checkpoint (#10935) 2021-12-06 13:41:19 +00:00
four4fish 2fc64e9656
2/n Move Accelerator into strategy - remove dispatch functions from Accelerator (#10885)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-12-06 09:51:14 +00:00
Rajath Bharadwaj 7914e5c157
added UserWarnings if max_epochs not set in the Trainer class (#10700) 2021-12-06 09:44:25 +00:00
Kaushik B 6599ced17d
Don't import torch_xla.debug for torch-xla<1.8 (#10836) 2021-12-06 06:31:38 +00:00
Luca Moschella 7792b77932
Resolve: 'DummyExperiment' object does not support item assignment (#10917)
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-12-03 17:54:05 +00:00
four4fish 6fe3211573
Unroll dict input before call Accelerator X_steps (#10908)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-12-03 17:00:52 +00:00
Rohit Gupta 8ba3b383c0
Fix filtration logic for eval results with multiple dataloaders (#10810)
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-12-03 14:34:46 +00:00
four4fish e646ca1d59
Remove `setup_optimizers_in_pre_dispatch` logic (#10906) 2021-12-03 15:05:08 +01:00
Adrian Wälchli c55bc433ce
Fix retrieval of batch indices when dataloader num_workers > 0 (#10870)
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2021-12-02 10:36:10 +00:00
Adrian Wälchli 98cb7e8790
1/n Simplify spawn plugins: Simplify handling of multiprocessing queue (#10034)
Co-authored-by: thomas chaton <thomas@grid.ai>
2021-12-02 10:30:44 +00:00
Rohit Gupta 5b9995da04
Fix schedule reset logic in pytorch profiler (#10837) 2021-12-02 14:22:49 +05:30
four4fish 9beeabbced
Removed unnecessary `_move_optimizer_state` method overrides (#10849)
* Update tpu tp share same logic with ttp

* run test

* Update tpu_spawn.py

* debug

* Add changelog

* Apply suggestions from code review

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* Update training_type_plugin.py

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update training_type_plugin.py

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-12-02 05:03:30 +00:00
four4fish 45dd8066e7
3/n Move Accelerator into strategy - remove model_sharded_context() (#10886)
* 3/n Move Accelerator into strategy - remove model_sharded_context()

* update ttp function

* update changelog

* update changelog

Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
2021-12-02 03:34:51 +00:00
four4fish 44cd412e91
Remove precision_plugin pre_dispatch() method (#10887)
* Remove precision_plugin pre_dispatch() method

* update changelog
2021-12-01 18:42:17 -08:00
Carlos Mocholí a7aed2af7a
[CLI] Add support for `ReduceLROnPlateau` (#10860) 2021-12-01 15:41:22 +00:00
Rafał Jankowski c6478414ee
Fixed uploading best model checkpoint in NeptuneLogger (#10369) 2021-12-01 13:58:54 +00:00
Aka.Fido 72cc8b7ca9
Disable validation completely when `overfit_batches>0` (#9709)
Co-authored-by: thomas chaton <thomas@grid.ai>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-12-01 13:57:57 +00:00
Adrian Wälchli 7514adf814
Remove `return_result` argument from `DDPSpawnPlugin.spawn()` (#10867) 2021-12-01 13:29:08 +00:00
Kaushik B ec0fb2fd95
Raise exception if rich is less than 10.2.2 (#10839) 2021-12-01 06:14:19 +00:00
Kaushik B 3c9488f62f
Update changelog after v1.5.4 release (#10843) 2021-11-30 23:26:25 +00:00
Mauricio Villegas f3b0a06e90
Fix `SignalConnector._has_already_handler` check for callable type (#10483)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: thomas chaton <thomas@grid.ai>
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-11-30 22:47:52 +00:00
Adrian Wälchli 25473acddb
Restore signals on teardown (#10611)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-11-30 22:07:14 +00:00
Rohit Gupta 1437be5e98
Disable batch_size extraction for torchmetric instances (#10815)
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-11-30 20:47:05 +00:00
four4fish 1d2878523a
2/n Move Precision Plugin into strategy - move optimizer related logics (#10596)
Co-authored-by: Danielle Pintz <38207072+daniellepintz@users.noreply.github.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: thomas chaton <thomas@grid.ai>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-11-30 08:31:23 +00:00
four4fish 8bf7f9cce7
1/n Move Accelerator into strategy - move batch_to_device to strategy (#10649)
* 1/n Integrate Device Specific Accelerator Logic with strategy - move batch_to_device to strategy

* add changelog

* add model is not none check

* Apply suggestions from code review

Co-authored-by: thomas chaton <thomas@grid.ai>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* Update CHANGELOG.md

* Update test_datamodules.py

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update test_hooks.py

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update dp.py

Co-authored-by: thomas chaton <thomas@grid.ai>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-11-29 12:11:21 -08:00
Rohit Gupta 753cc4dfad
Fix default logging levels for train step specific hooks (#10756) 2021-11-29 19:51:17 +00:00
Carlos Mocholí d3b7492bd0
[CLI] Add support for `--key.help=class` (#10767) 2021-11-29 14:12:53 +00:00
Adrian Wälchli 49d09aa28b
Update changelog after 1.5.3 release (#10744) 2021-11-27 05:28:23 +00:00
Adrian Wälchli c752060712
Consolidate state when retrieving sharded state dict in Lite (#10746)
Co-authored-by: thomas chaton <thomas@grid.ai>
2021-11-27 04:54:45 +00:00
thomas chaton e94aff1c5b
Fault Tolerant: Add support for fault tolerant dataloader validator (#10465) 2021-11-26 19:33:47 +00:00
thomas chaton 6fe6e9e414
Delete TensorBoardLogger experiment before spawning the processes. (#10777) 2021-11-26 17:07:57 +00:00
thomas chaton 412d507a73
Fault Tolerant: move signal to SIGTERM (#10605) 2021-11-26 13:37:27 +00:00
Kaushik B e507bc9027
Fix compare version for packages (#10762) 2021-11-26 09:15:22 +00:00
thomas chaton 3d6262b7a9
Fault Tolerant Manual: Add support for DDP (#10638) 2021-11-25 18:31:53 +01:00
Kaushik B e0b4bb2ea3
Deprecate `DeviceType` in favor of `_AcceleratorType` (#10503)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-11-25 16:41:03 +01:00
Carlos Mocholí f8b2d5b128
Improve error message on `TypeError` during `DataLoader` reconstruction (#10719) 2021-11-24 21:51:11 +00:00
thomas chaton 0066ff0129
Fault Tolerant Manual: Enable the feature (#10707) 2021-11-24 17:36:08 +00:00
Adrian Wälchli 30ec4815cb
Support re-instantiation for custom DataLoader in Lightning (#10680)
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
2021-11-24 15:58:51 +01:00
thomas chaton e51a8ee7a3
Fault Tolerant Manual: utilities cleanup (#10703)
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-11-24 15:01:55 +01:00
thomas chaton b28ab34ff5
Fault Tolerant Manual: Add loading to reload the states (#10699)
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-11-23 17:18:36 +00:00
thomas chaton 7cf6374bd0
Fault Tolerant Manual: Add support for collecting states across processes (#10639) 2021-11-23 14:27:33 +00:00
Adrian Wälchli ee9f7c0421
Update DeepSpeed precision handling after moving PrecisionPlugin (#10657) 2021-11-23 13:51:41 +00:00
thomas chaton 1702036c14
Fault Tolerant Manual: Add stateful dataloader iter (#10674) 2021-11-23 12:30:50 +00:00
thomas chaton 2036dfb5df
Fault Tolerant Manual: Add _rotate_worker_indices utility (#10647) 2021-11-22 19:52:04 +00:00
thomas chaton 6acfef680f
Fault Tolerant Manual: Add is_obj_stateful utility (#10646) 2021-11-22 18:48:32 +00:00
Andres Algaba 6fc7c54c3a
refactor slurm_job_id (#10622)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: thomas chaton <thomas@grid.ai>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
2021-11-22 17:41:08 +00:00
Rohit Gupta d431ce14a1
Raise an error if batch_size cannot be inferred from current batch (#10541)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-11-22 16:55:19 +00:00
Carlos Mocholí a6dedcf492
Fix `move_metrics_to_cpu` with evaluation (#10631) 2021-11-22 15:58:21 +00:00
thomas chaton 991cd895c6
1/n Add `FaultTolerantMode` (#10645) 2021-11-22 14:58:23 +00:00
Kaushik B ce0a977742
Moved `env_vars_connector._defaults_from_env_vars` to `utilities.argsparse._defaults_from_env_vars` (#10501)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-11-22 08:06:35 +00:00
ananthsub a18b6409d1
Check torch.distributed availability before sharded tensor state dict hook registration (#10621)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-11-19 17:34:23 +00:00
Mauricio Villegas 5d748e560b
LightningCLI changes for jsonargparse>=4.0.0 (#10426)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: thomas chaton <thomas@grid.ai>
2021-11-19 17:03:14 +00:00
Rohit Gupta ec27313be2
Fix batch size extraction when set by the user in `LightningModule.log` (#10408)
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-11-19 16:48:26 +00:00
Biho-Kim e83e8ae305
Respect the passed dtype with `self.log` (#10076)
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2021-11-19 15:16:33 +00:00
thomas chaton 94390aba56
Lite: Don't pop value if they don't exist (#10613) 2021-11-19 14:04:33 +00:00
Kaushik B 137b62d80d
Add `refresh_rate` to RichProgressBar (#10497)
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-11-19 05:59:57 +00:00
thomas chaton 7d3ad5b76e
Don't register signal in thread (#10610) 2021-11-19 04:13:35 +01:00
four4fish 700521c7d3
1/n Move precision plugin into strategy - update reference (#10570)
* 1/n move precision plugin into strategy - update reference

* update precision plugin reference in tpu_spawn

* add missing reference in error message

* add back removed license line

* update references in tests

* update reference in trainer

* update return annotation for precision_plugin property on TTP

* simplify access to precision plugin reference in sharded plug

* add changelog

* remove precision property from ttp and add deprecation message

* fix make doc and update precision reference

* simplify a reference to precision

accidentally overridden Adrian's change, now add it back

* Update CHANGELOG.md

add Adrian's change back

* Update accelerator precision

Add Adrian's change back

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add none check for precision plugin

just to be safe

* Update ipu.py

* update precision_plugin param deprecation message

* Update accelerator.py

* Remove deprecated warning 

Tests will fail after 9940

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-11-19 00:39:01 +00:00
Adrian Wälchli 0f6d89422b
Control automatic resubmission on SLURM (#10601) 2021-11-18 17:48:53 +00:00
Adrian Wälchli 261ea90822
Update changelog after 1.5.2 release (#10590) 2021-11-17 23:31:09 +00:00
Adrian Wälchli d50e1696f9
Fix propagation of device and dtype properties in Lite modules (#10559) 2021-11-16 17:26:46 +00:00
Carlos Mocholí edebd8a3bc
Fix scripting causing false positive deprecation warnings (#10555)
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2021-11-16 15:52:09 +00:00
Sean Naren e98ace3adc
[DeepSpeed] Do not fail if batch size could not be inferred for logging (#10438) 2021-11-16 11:42:25 +00:00
Rohit Gupta de7ef41fea
remove deprecated `reload_dataloaders_every_epoch` from `Trainer` (#10481) 2021-11-16 06:47:43 +00:00
Rohit Gupta 60850ef510
fix overfit_batch sampler replacement logic (#10486)
Co-authored-by: thomas chaton <thomas@grid.ai>
2021-11-15 22:31:45 +00:00
Carlos Mocholí 65ebfed3ae
Fix `to_torchscript()` causing false positive deprecation warnings (#10470) 2021-11-15 22:12:55 +00:00
Carlos Mocholí dcafc95f2b
Avoid deprecated `progress_bar_refresh_rate` usage (#10520)
Co-authored-by: Danielle Pintz <38207072+daniellepintz@users.noreply.github.com>
2021-11-15 22:04:48 +01:00
thomas chaton 1de3539eac
Resolve instantiation problem with init_meta_context (#10493) 2021-11-15 19:13:01 +00:00
Kaushik B ae71284627
Remove deprecated `disable_validation` property from Trainer (#10450) 2021-11-15 18:42:00 +00:00
Kaushik B 01cf7a2ac5
Deprecate `DistributedType` in favor of `StrategyType` (#10505) 2021-11-15 17:10:08 +00:00
Shivam Mehta 794c4b08c0
Remove deprecated `is_overridden(model=...)` (#10507)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-11-15 12:56:30 +00:00
puhuk 8b0cb47cc0
Remove deprecated `hpc_load` in `CheckpointConnector` (#10525)
Co-authored-by: Aki Nitta <nitta@akihironitta.com>
2021-11-15 11:54:47 +00:00
thomas chaton ffb40060c0
shutdown workers on failure (#10463) 2021-11-15 10:03:46 +00:00
Rohit Gupta a8c2725ff8
remove deprecated signature for `transfer_batch_to_device` (#10480)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-11-13 19:32:30 +00:00
Kaushik B fabb364402
Remove deprecated `mode` argument from ModelSummary (#10449)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-11-12 19:32:43 +00:00
Carlos Mocholí 847e24011a
Squeeze the early stopping monitor (#10461) 2021-11-12 18:03:47 +00:00
Rohit Gupta fa0ed17f8a
remove deprecated train_loop (#10482)
* remove deprecated train_loop

* chlog
2021-11-12 12:42:25 +00:00
Kaushik B d577f461a4
Remove deprecated `utilities.distributed.rank_zero_{warn,deprecation}` (#10451) 2021-11-10 07:35:48 -08:00
ananthsub aad86423f7
Remove more deprecated methods from base `Accelerator` class (#10448) 2021-11-10 12:58:24 +05:30
a-gardner1 ce149f6451
Fix support for dataclasses with ClassVar/InitVar in `apply_to_collection` (#9702)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-11-10 04:42:27 +00:00
Carlos Mocholí d515bcac96
Remove deprecated profiler import (#10443) 2021-11-09 23:13:02 +01:00
Justus Schock eeef5a80ac
Update Changelog for v1.5.1 (#10439)
* Missing Changelogs

* Add 1.5.1 entry to changelog

Co-authored-by: Kaushik B <kaushikbokka@gmail.com>
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
2021-11-09 21:25:54 +00:00
thomas chaton 8d810d6144
Enable distributed training with CombinedDataLoader and max_size_cycle (#10374)
* solve combinedloader

* update

* update changelog

* update on comments

* resolve iterable dataset support

* update test description

* update

* update on comments

* update

* Accelerator auto

* Address review

* Refactor

Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-11-09 20:06:10 +00:00
Carlos Mocholí c413b69240
Remove deprecated `task_idx` (#10441) 2021-11-09 18:54:38 +00:00
Carlos Mocholí ebab4be3e4
Remove deprecated `DeviceDtypeModuleMixin` import (#10442) 2021-11-09 18:35:53 +00:00
Ross Johnstone c2f25d42ab
Make `monitor` required arg of EarlyStopping callback (#10328)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-11-09 18:08:03 +00:00
Carlos Mocholí 069ec1005a
Do not autodetach extras (#10424)
* Do not autodetach extras

* Update CHANGELOG

* Use foo
2021-11-09 16:07:16 +00:00
thomas chaton 7fb277f260
Resolve workers being forcelly deleted with `persistent_workers=True` (#10434)
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-11-09 14:58:31 +00:00
Carlos Mocholí edbf27430d
Remove deprecated `self.log` arguments (#10423) 2021-11-09 15:49:55 +01:00
Adrian Wälchli aaa6aa75e9
Fix converting only float type tensors in Lite (#10429)
* fix

* less code

* add test case

* add test cases

* update input

* add test cases

* add type hint

* add changelog note

Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
2021-11-09 15:21:00 +01:00
Kaushik B 5eeca87e98
Fix deadlocks for distributed training for RichProgressBar (#10428) 2021-11-09 18:30:37 +05:30
Rohit Gupta 21eafafcb0
disable step logging in epoch hooks (#10409)
* disable step logging in epoch hooks

* chlog

* Apply suggestions from code review

* chlog
2021-11-09 16:53:27 +05:30
four4fish 0ed5e3dc8a
Raise exceptions when torch distributed is not available (#10418)
* Raise exceptions when torch distributed is not avalible

* add changelog
2021-11-09 09:11:05 +00:00