Commit Graph

6138 Commits

Author SHA1 Message Date
Carlos Mocholí a6dedcf492
Fix `move_metrics_to_cpu` with evaluation (#10631) 2021-11-22 15:58:21 +00:00
thomas chaton 991cd895c6
1/n Add `FaultTolerantMode` (#10645) 2021-11-22 14:58:23 +00:00
Carlos Mocholí 48cb38ac5d
Fix docs filterwarnings snippet (#10671) 2021-11-22 14:52:21 +00:00
Kaushik B 1284ead317
Remove metrics references from docs (#10567) 2021-11-22 14:29:06 +00:00
Rohit Gupta eb13e1df89
update bug_report model links and notebook (#10665) 2021-11-22 11:19:46 +00:00
Kaushik B ce0a977742
Moved `env_vars_connector._defaults_from_env_vars` to `utilities.argsparse._defaults_from_env_vars` (#10501)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-11-22 08:06:35 +00:00
Adrian Wälchli 8ea39d2c8f
LiteDataLoader code improvements and docs (#10625)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-11-21 02:33:13 +01:00
puhuk af0bb96f0f
Remove the "_precision" suffix from some precision plugin files (#10052) 2021-11-19 17:37:39 +00:00
ananthsub a18b6409d1
Check torch.distributed availability before sharded tensor state dict hook registration (#10621)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-11-19 17:34:23 +00:00
Adam Reeve 5fe0dac119
Fix misleading ModelCheckpoint documentation on every_n_epochs parameter (#10421) 2021-11-19 17:26:50 +00:00
Mauricio Villegas 5d748e560b
LightningCLI changes for jsonargparse>=4.0.0 (#10426)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: thomas chaton <thomas@grid.ai>
2021-11-19 17:03:14 +00:00
Rohit Gupta ff8ac6e2e1
Make `_get_nvidia_gpu_stats` public (#10406) 2021-11-19 17:52:24 +01:00
Aki Nitta 17a8290ca7
Use new GitHub labels (#10552) 2021-11-19 16:49:07 +00:00
Rohit Gupta ec27313be2
Fix batch size extraction when set by the user in `LightningModule.log` (#10408)
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-11-19 16:48:26 +00:00
Jaime Ferrando Huertas 721b8413a0
Added boring model as a ipynb so it can be updated (#10521)
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-11-19 16:32:30 +00:00
Biho-Kim e83e8ae305
Respect the passed dtype with `self.log` (#10076)
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2021-11-19 15:16:33 +00:00
Carlos Mocholí 3d2d0f2536
MANIFEST.in and setup.py clean-up (#7614) 2021-11-19 15:38:42 +01:00
thomas chaton 94390aba56
Lite: Don't pop value if they don't exist (#10613) 2021-11-19 14:04:33 +00:00
Adrian Wälchli 8950354fe4
Extract dataloader utilities from `TrainerDataLoadingMixin` (#10145) 2021-11-19 12:45:35 +00:00
Adrian Wälchli 085e82f454
Introduce `ClusterEnvironment.detect()` (#10564)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-11-19 12:24:10 +00:00
Adrian Wälchli c09c9c7607
Remove redundant fit call from accelerator connector test (#10626) 2021-11-19 12:19:52 +05:30
Kaushik B 137b62d80d
Add `refresh_rate` to RichProgressBar (#10497)
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-11-19 05:59:57 +00:00
thomas chaton 7d3ad5b76e
Don't register signal in thread (#10610) 2021-11-19 04:13:35 +01:00
Carlos Mocholí 5788789f01
Move benchmarks into the test directory (#10614) 2021-11-19 03:07:33 +01:00
Carlos Mocholí 0de8ab4f2e
Fix failing master due to an interction between PRs (#10627) 2021-11-19 02:04:53 +00:00
Carlos Mocholí 35f6cbe09f
Use `update_wrapper` in test_hooks.py (#10578) 2021-11-19 01:52:55 +01:00
four4fish 700521c7d3
1/n Move precision plugin into strategy - update reference (#10570)
* 1/n move precision plugin into strategy - update reference

* update precision plugin reference in tpu_spawn

* add missing reference in error message

* add back removed license line

* update references in tests

* update reference in trainer

* update return annotation for precision_plugin property on TTP

* simplify access to precision plugin reference in sharded plug

* add changelog

* remove precision property from ttp and add deprecation message

* fix make doc and update precision reference

* simplify a reference to precision

accidentally overridden Adrian's change, now add it back

* Update CHANGELOG.md

add Adrian's change back

* Update accelerator precision

Add Adrian's change back

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add none check for precision plugin

just to be safe

* Update ipu.py

* update precision_plugin param deprecation message

* Update accelerator.py

* Remove deprecated warning 

Tests will fail after 9940

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-11-19 00:39:01 +00:00
ananthsub 2c7c4aab80
Refactor progress bar initialization to avoid extra attribute set on Trainer (#10553) 2021-11-18 18:51:54 +00:00
Adrian Wälchli 0f6d89422b
Control automatic resubmission on SLURM (#10601) 2021-11-18 17:48:53 +00:00
shabie 6b728713bb
log metrics for correct dataloader only (#10522)
Co-authored-by: tchaton <thomas@grid.ai>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-11-18 18:29:13 +01:00
Adrian Wälchli 261ea90822
Update changelog after 1.5.2 release (#10590) 2021-11-17 23:31:09 +00:00
Adrian Wälchli 1ff35ed0f5
Improve code quality in `AcceleratorConnector._configure_slurm_ddp` (#10102) 2021-11-17 23:10:47 +00:00
Carlos Mocholí 0fa07da987
Fail the test when a `DeprecationWarning` is raised (#9940) 2021-11-17 23:41:50 +01:00
Carlos Mocholí c15b84dae7
Simplify hanging queue test (#10591) 2021-11-17 22:29:48 +00:00
Carlos Mocholí ff3443fe42
Use single quotes in action job (#10579) 2021-11-17 15:54:41 +00:00
Carlos Mocholí ba036fdeea
Support special test parametrizations (#10569) 2021-11-17 15:46:14 +00:00
Carlos Mocholí 3b2e164cab
Fix `caplog` with `logger.propagate=False` (#10577) 2021-11-17 16:25:55 +01:00
Carlos Mocholí 247f5aacc2
Tune cc-bot settings (#10544) 2021-11-16 18:23:57 +00:00
Adrian Wälchli d50e1696f9
Fix propagation of device and dtype properties in Lite modules (#10559) 2021-11-16 17:26:46 +00:00
Carlos Mocholí af4af3d73a
Mock GPU accelerator connector tests (#10554) 2021-11-16 16:13:40 +00:00
Carlos Mocholí edebd8a3bc
Fix scripting causing false positive deprecation warnings (#10555)
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2021-11-16 15:52:09 +00:00
Sean Naren e98ace3adc
[DeepSpeed] Do not fail if batch size could not be inferred for logging (#10438) 2021-11-16 11:42:25 +00:00
Kaushik B 4117028400
Don't collapse Lightning API section (#10545) 2021-11-16 11:04:54 +00:00
Rohit Gupta de7ef41fea
remove deprecated `reload_dataloaders_every_epoch` from `Trainer` (#10481) 2021-11-16 06:47:43 +00:00
ananthsub 98de69b14a
Fix loop examples after Accelerator API removals (#10514) 2021-11-16 05:37:14 +00:00
Carlos Mocholí 6dfcb6afc5
Skip strategy=ddp_spawn, accelerator=cpu, python>=3.9 tests (#10550) 2021-11-16 10:06:47 +05:30
Rohit Gupta 60850ef510
fix overfit_batch sampler replacement logic (#10486)
Co-authored-by: thomas chaton <thomas@grid.ai>
2021-11-15 22:31:45 +00:00
Carlos Mocholí 65ebfed3ae
Fix `to_torchscript()` causing false positive deprecation warnings (#10470) 2021-11-15 22:12:55 +00:00
Carlos Mocholí dcafc95f2b
Avoid deprecated `progress_bar_refresh_rate` usage (#10520)
Co-authored-by: Danielle Pintz <38207072+daniellepintz@users.noreply.github.com>
2021-11-15 22:04:48 +01:00
thomas chaton 1de3539eac
Resolve instantiation problem with init_meta_context (#10493) 2021-11-15 19:13:01 +00:00