Commit Graph

1279 Commits

Author SHA1 Message Date
thomas chaton 1702036c14
Fault Tolerant Manual: Add stateful dataloader iter (#10674) 2021-11-23 12:30:50 +00:00
thomas chaton 2036dfb5df
Fault Tolerant Manual: Add _rotate_worker_indices utility (#10647) 2021-11-22 19:52:04 +00:00
thomas chaton 6acfef680f
Fault Tolerant Manual: Add is_obj_stateful utility (#10646) 2021-11-22 18:48:32 +00:00
Andres Algaba 6fc7c54c3a
refactor slurm_job_id (#10622)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: thomas chaton <thomas@grid.ai>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
2021-11-22 17:41:08 +00:00
Rohit Gupta d431ce14a1
Raise an error if batch_size cannot be inferred from current batch (#10541)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-11-22 16:55:19 +00:00
Carlos Mocholí a6dedcf492
Fix `move_metrics_to_cpu` with evaluation (#10631) 2021-11-22 15:58:21 +00:00
thomas chaton 991cd895c6
1/n Add `FaultTolerantMode` (#10645) 2021-11-22 14:58:23 +00:00
Kaushik B ce0a977742
Moved `env_vars_connector._defaults_from_env_vars` to `utilities.argsparse._defaults_from_env_vars` (#10501)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-11-22 08:06:35 +00:00
ananthsub a18b6409d1
Check torch.distributed availability before sharded tensor state dict hook registration (#10621)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-11-19 17:34:23 +00:00
Mauricio Villegas 5d748e560b
LightningCLI changes for jsonargparse>=4.0.0 (#10426)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: thomas chaton <thomas@grid.ai>
2021-11-19 17:03:14 +00:00
Rohit Gupta ec27313be2
Fix batch size extraction when set by the user in `LightningModule.log` (#10408)
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-11-19 16:48:26 +00:00
Biho-Kim e83e8ae305
Respect the passed dtype with `self.log` (#10076)
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2021-11-19 15:16:33 +00:00
thomas chaton 94390aba56
Lite: Don't pop value if they don't exist (#10613) 2021-11-19 14:04:33 +00:00
Kaushik B 137b62d80d
Add `refresh_rate` to RichProgressBar (#10497)
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-11-19 05:59:57 +00:00
thomas chaton 7d3ad5b76e
Don't register signal in thread (#10610) 2021-11-19 04:13:35 +01:00
four4fish 700521c7d3
1/n Move precision plugin into strategy - update reference (#10570)
* 1/n move precision plugin into strategy - update reference

* update precision plugin reference in tpu_spawn

* add missing reference in error message

* add back removed license line

* update references in tests

* update reference in trainer

* update return annotation for precision_plugin property on TTP

* simplify access to precision plugin reference in sharded plug

* add changelog

* remove precision property from ttp and add deprecation message

* fix make doc and update precision reference

* simplify a reference to precision

accidentally overridden Adrian's change, now add it back

* Update CHANGELOG.md

add Adrian's change back

* Update accelerator precision

Add Adrian's change back

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add none check for precision plugin

just to be safe

* Update ipu.py

* update precision_plugin param deprecation message

* Update accelerator.py

* Remove deprecated warning 

Tests will fail after 9940

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-11-19 00:39:01 +00:00
Adrian Wälchli 0f6d89422b
Control automatic resubmission on SLURM (#10601) 2021-11-18 17:48:53 +00:00
Adrian Wälchli 261ea90822
Update changelog after 1.5.2 release (#10590) 2021-11-17 23:31:09 +00:00
Adrian Wälchli d50e1696f9
Fix propagation of device and dtype properties in Lite modules (#10559) 2021-11-16 17:26:46 +00:00
Carlos Mocholí edebd8a3bc
Fix scripting causing false positive deprecation warnings (#10555)
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2021-11-16 15:52:09 +00:00
Sean Naren e98ace3adc
[DeepSpeed] Do not fail if batch size could not be inferred for logging (#10438) 2021-11-16 11:42:25 +00:00
Rohit Gupta de7ef41fea
remove deprecated `reload_dataloaders_every_epoch` from `Trainer` (#10481) 2021-11-16 06:47:43 +00:00
Rohit Gupta 60850ef510
fix overfit_batch sampler replacement logic (#10486)
Co-authored-by: thomas chaton <thomas@grid.ai>
2021-11-15 22:31:45 +00:00
Carlos Mocholí 65ebfed3ae
Fix `to_torchscript()` causing false positive deprecation warnings (#10470) 2021-11-15 22:12:55 +00:00
Carlos Mocholí dcafc95f2b
Avoid deprecated `progress_bar_refresh_rate` usage (#10520)
Co-authored-by: Danielle Pintz <38207072+daniellepintz@users.noreply.github.com>
2021-11-15 22:04:48 +01:00
thomas chaton 1de3539eac
Resolve instantiation problem with init_meta_context (#10493) 2021-11-15 19:13:01 +00:00
Kaushik B ae71284627
Remove deprecated `disable_validation` property from Trainer (#10450) 2021-11-15 18:42:00 +00:00
Kaushik B 01cf7a2ac5
Deprecate `DistributedType` in favor of `StrategyType` (#10505) 2021-11-15 17:10:08 +00:00
Shivam Mehta 794c4b08c0
Remove deprecated `is_overridden(model=...)` (#10507)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-11-15 12:56:30 +00:00
puhuk 8b0cb47cc0
Remove deprecated `hpc_load` in `CheckpointConnector` (#10525)
Co-authored-by: Aki Nitta <nitta@akihironitta.com>
2021-11-15 11:54:47 +00:00
thomas chaton ffb40060c0
shutdown workers on failure (#10463) 2021-11-15 10:03:46 +00:00
Rohit Gupta a8c2725ff8
remove deprecated signature for `transfer_batch_to_device` (#10480)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-11-13 19:32:30 +00:00
Kaushik B fabb364402
Remove deprecated `mode` argument from ModelSummary (#10449)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-11-12 19:32:43 +00:00
Carlos Mocholí 847e24011a
Squeeze the early stopping monitor (#10461) 2021-11-12 18:03:47 +00:00
Rohit Gupta fa0ed17f8a
remove deprecated train_loop (#10482)
* remove deprecated train_loop

* chlog
2021-11-12 12:42:25 +00:00
Kaushik B d577f461a4
Remove deprecated `utilities.distributed.rank_zero_{warn,deprecation}` (#10451) 2021-11-10 07:35:48 -08:00
ananthsub aad86423f7
Remove more deprecated methods from base `Accelerator` class (#10448) 2021-11-10 12:58:24 +05:30
a-gardner1 ce149f6451
Fix support for dataclasses with ClassVar/InitVar in `apply_to_collection` (#9702)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-11-10 04:42:27 +00:00
Carlos Mocholí d515bcac96
Remove deprecated profiler import (#10443) 2021-11-09 23:13:02 +01:00
Justus Schock eeef5a80ac
Update Changelog for v1.5.1 (#10439)
* Missing Changelogs

* Add 1.5.1 entry to changelog

Co-authored-by: Kaushik B <kaushikbokka@gmail.com>
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
2021-11-09 21:25:54 +00:00
thomas chaton 8d810d6144
Enable distributed training with CombinedDataLoader and max_size_cycle (#10374)
* solve combinedloader

* update

* update changelog

* update on comments

* resolve iterable dataset support

* update test description

* update

* update on comments

* update

* Accelerator auto

* Address review

* Refactor

Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-11-09 20:06:10 +00:00
Carlos Mocholí c413b69240
Remove deprecated `task_idx` (#10441) 2021-11-09 18:54:38 +00:00
Carlos Mocholí ebab4be3e4
Remove deprecated `DeviceDtypeModuleMixin` import (#10442) 2021-11-09 18:35:53 +00:00
Ross Johnstone c2f25d42ab
Make `monitor` required arg of EarlyStopping callback (#10328)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-11-09 18:08:03 +00:00
Carlos Mocholí 069ec1005a
Do not autodetach extras (#10424)
* Do not autodetach extras

* Update CHANGELOG

* Use foo
2021-11-09 16:07:16 +00:00
thomas chaton 7fb277f260
Resolve workers being forcelly deleted with `persistent_workers=True` (#10434)
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-11-09 14:58:31 +00:00
Carlos Mocholí edbf27430d
Remove deprecated `self.log` arguments (#10423) 2021-11-09 15:49:55 +01:00
Adrian Wälchli aaa6aa75e9
Fix converting only float type tensors in Lite (#10429)
* fix

* less code

* add test case

* add test cases

* update input

* add test cases

* add type hint

* add changelog note

Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
2021-11-09 15:21:00 +01:00
Kaushik B 5eeca87e98
Fix deadlocks for distributed training for RichProgressBar (#10428) 2021-11-09 18:30:37 +05:30
Rohit Gupta 21eafafcb0
disable step logging in epoch hooks (#10409)
* disable step logging in epoch hooks

* chlog

* Apply suggestions from code review

* chlog
2021-11-09 16:53:27 +05:30