Commit Graph

2433 Commits

Author SHA1 Message Date
Carlos Mocholí 26977043bf
Add separate CI job for slow tests (#10830) 2021-12-01 19:58:18 +00:00
Carlos Mocholí a7aed2af7a
[CLI] Add support for `ReduceLROnPlateau` (#10860) 2021-12-01 15:41:22 +00:00
Rafał Jankowski c6478414ee
Fixed uploading best model checkpoint in NeptuneLogger (#10369) 2021-12-01 13:58:54 +00:00
Aka.Fido 72cc8b7ca9
Disable validation completely when `overfit_batches>0` (#9709)
Co-authored-by: thomas chaton <thomas@grid.ai>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-12-01 13:57:57 +00:00
Adrian Wälchli e6cc99ef90
Fix selection of standalone tests (#10857)
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-12-01 09:48:37 +01:00
Kaushik B ec0fb2fd95
Raise exception if rich is less than 10.2.2 (#10839) 2021-12-01 06:14:19 +00:00
Andres Algaba 1a26af1519
Add job_name as a staticmethod in SLURMEnvironment class (#10698)
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-12-01 00:01:44 +00:00
Mauricio Villegas f3b0a06e90
Fix `SignalConnector._has_already_handler` check for callable type (#10483)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: thomas chaton <thomas@grid.ai>
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-11-30 22:47:52 +00:00
Adrian Wälchli 25473acddb
Restore signals on teardown (#10611)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-11-30 22:07:14 +00:00
Rohit Gupta 1437be5e98
Disable batch_size extraction for torchmetric instances (#10815)
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-11-30 20:47:05 +00:00
Carlos Mocholí 0061619e0a
Improve typing for loops (#10780) 2021-11-30 20:28:55 +00:00
Abhinav Arora f63222d966
Remove references to torchtext.legacy from PyTorch Lightning (#10724)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-11-30 19:32:07 +00:00
Carlos Mocholí 8e1b9b306c
Skip hanging spawn tests (#10838)
* Skip hanging spawn tests

* Docstring fix

* Add back to TPU spawn
2021-11-30 18:36:12 +00:00
Carlos Mocholí 38ed26ec5a
Do not require omegaconf to run tests (#10832) 2021-11-30 14:48:03 +00:00
Adrian Wälchli a81accb2ad
Update LiteOptimizer signature after optimizer changes in TrainingTypePlugin (#10708) 2021-11-30 15:16:59 +01:00
Carlos Mocholí 1b43e43e9f
Minor changes in preparation for saving the loops state (#10783) 2021-11-30 19:37:04 +05:30
Carlos Mocholí 4710734f14
Improve `@RunIf` docs (#10828) 2021-11-30 14:21:38 +01:00
Andres Algaba e0474f8f0f
Add test for `job_id` (#10774) 2021-11-30 11:53:55 +01:00
four4fish 1d2878523a
2/n Move Precision Plugin into strategy - move optimizer related logics (#10596)
Co-authored-by: Danielle Pintz <38207072+daniellepintz@users.noreply.github.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: thomas chaton <thomas@grid.ai>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-11-30 08:31:23 +00:00
four4fish 8bf7f9cce7
1/n Move Accelerator into strategy - move batch_to_device to strategy (#10649)
* 1/n Integrate Device Specific Accelerator Logic with strategy - move batch_to_device to strategy

* add changelog

* add model is not none check

* Apply suggestions from code review

Co-authored-by: thomas chaton <thomas@grid.ai>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* Update CHANGELOG.md

* Update test_datamodules.py

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update test_hooks.py

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update dp.py

Co-authored-by: thomas chaton <thomas@grid.ai>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-11-29 12:11:21 -08:00
Rohit Gupta 753cc4dfad
Fix default logging levels for train step specific hooks (#10756) 2021-11-29 19:51:17 +00:00
Carlos Mocholí d3b7492bd0
[CLI] Add support for `--key.help=class` (#10767) 2021-11-29 14:12:53 +00:00
Adrian Wälchli 97e52619ea
Fix typing in `pl.overrides.data_parallel` (#10796)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-11-29 10:58:23 +01:00
Carlos Mocholí 724a92b065
Mark outputs as protected in the evaluation loops (#10781)
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2021-11-28 20:09:30 +00:00
Adrian Wälchli c752060712
Consolidate state when retrieving sharded state dict in Lite (#10746)
Co-authored-by: thomas chaton <thomas@grid.ai>
2021-11-27 04:54:45 +00:00
thomas chaton e94aff1c5b
Fault Tolerant: Add support for fault tolerant dataloader validator (#10465) 2021-11-26 19:33:47 +00:00
Carlos Mocholí 31bb6e69ca
Avoid optional instances in Loops (#10735)
* Avoid optional instances in Loops

* More cleanup
2021-11-26 18:00:18 +00:00
Carlos Mocholí 152eb57def
Rename special to standalone (#10779) 2021-11-26 17:13:14 +00:00
thomas chaton 6fe6e9e414
Delete TensorBoardLogger experiment before spawning the processes. (#10777) 2021-11-26 17:07:57 +00:00
thomas chaton 412d507a73
Fault Tolerant: move signal to SIGTERM (#10605) 2021-11-26 13:37:27 +00:00
thomas chaton 3d6262b7a9
Fault Tolerant Manual: Add support for DDP (#10638) 2021-11-25 18:31:53 +01:00
Kaushik B e0b4bb2ea3
Deprecate `DeviceType` in favor of `_AcceleratorType` (#10503)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-11-25 16:41:03 +01:00
Carlos Mocholí f8b2d5b128
Improve error message on `TypeError` during `DataLoader` reconstruction (#10719) 2021-11-24 21:51:11 +00:00
thomas chaton 0066ff0129
Fault Tolerant Manual: Enable the feature (#10707) 2021-11-24 17:36:08 +00:00
Adrian Wälchli 30ec4815cb
Support re-instantiation for custom DataLoader in Lightning (#10680)
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
2021-11-24 15:58:51 +01:00
thomas chaton e51a8ee7a3
Fault Tolerant Manual: utilities cleanup (#10703)
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-11-24 15:01:55 +01:00
Rohit Gupta f36b395c4e
Update `LightningDataModule` docs (#10678) 2021-11-24 11:31:03 +00:00
thomas chaton b28ab34ff5
Fault Tolerant Manual: Add loading to reload the states (#10699)
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-11-23 17:18:36 +00:00
Adrian Wälchli dca1776870
LiteDataLoader wrapper improvements (#10297) 2021-11-23 16:35:07 +01:00
thomas chaton 7cf6374bd0
Fault Tolerant Manual: Add support for collecting states across processes (#10639) 2021-11-23 14:27:33 +00:00
thomas chaton 1702036c14
Fault Tolerant Manual: Add stateful dataloader iter (#10674) 2021-11-23 12:30:50 +00:00
Kaushik B 48cf1adfd3
Move Colab setup to ProgressBar (#10542)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-11-23 06:16:31 +00:00
thomas chaton 2036dfb5df
Fault Tolerant Manual: Add _rotate_worker_indices utility (#10647) 2021-11-22 19:52:04 +00:00
Rohit Gupta 823bfa6f8a
Update `LightningModule` docs (#10637) 2021-11-23 01:02:04 +05:30
thomas chaton 6acfef680f
Fault Tolerant Manual: Add is_obj_stateful utility (#10646) 2021-11-22 18:48:32 +00:00
Andres Algaba 6fc7c54c3a
refactor slurm_job_id (#10622)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: thomas chaton <thomas@grid.ai>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
2021-11-22 17:41:08 +00:00
Rohit Gupta d431ce14a1
Raise an error if batch_size cannot be inferred from current batch (#10541)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-11-22 16:55:19 +00:00
Danielle Pintz 6810c40fc9
Small improvements to `_init_debugging_flags` (#10620) 2021-11-22 11:38:09 -05:00
Carlos Mocholí a6dedcf492
Fix `move_metrics_to_cpu` with evaluation (#10631) 2021-11-22 15:58:21 +00:00
thomas chaton 991cd895c6
1/n Add `FaultTolerantMode` (#10645) 2021-11-22 14:58:23 +00:00
puhuk af0bb96f0f
Remove the "_precision" suffix from some precision plugin files (#10052) 2021-11-19 17:37:39 +00:00
Mauricio Villegas 5d748e560b
LightningCLI changes for jsonargparse>=4.0.0 (#10426)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: thomas chaton <thomas@grid.ai>
2021-11-19 17:03:14 +00:00
Rohit Gupta ec27313be2
Fix batch size extraction when set by the user in `LightningModule.log` (#10408)
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-11-19 16:48:26 +00:00
Jaime Ferrando Huertas 721b8413a0
Added boring model as a ipynb so it can be updated (#10521)
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-11-19 16:32:30 +00:00
Biho-Kim e83e8ae305
Respect the passed dtype with `self.log` (#10076)
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2021-11-19 15:16:33 +00:00
Carlos Mocholí 3d2d0f2536
MANIFEST.in and setup.py clean-up (#7614) 2021-11-19 15:38:42 +01:00
Adrian Wälchli 8950354fe4
Extract dataloader utilities from `TrainerDataLoadingMixin` (#10145) 2021-11-19 12:45:35 +00:00
Adrian Wälchli 085e82f454
Introduce `ClusterEnvironment.detect()` (#10564)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-11-19 12:24:10 +00:00
Adrian Wälchli c09c9c7607
Remove redundant fit call from accelerator connector test (#10626) 2021-11-19 12:19:52 +05:30
Kaushik B 137b62d80d
Add `refresh_rate` to RichProgressBar (#10497)
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-11-19 05:59:57 +00:00
thomas chaton 7d3ad5b76e
Don't register signal in thread (#10610) 2021-11-19 04:13:35 +01:00
Carlos Mocholí 5788789f01
Move benchmarks into the test directory (#10614) 2021-11-19 03:07:33 +01:00
Carlos Mocholí 0de8ab4f2e
Fix failing master due to an interction between PRs (#10627) 2021-11-19 02:04:53 +00:00
Carlos Mocholí 35f6cbe09f
Use `update_wrapper` in test_hooks.py (#10578) 2021-11-19 01:52:55 +01:00
four4fish 700521c7d3
1/n Move precision plugin into strategy - update reference (#10570)
* 1/n move precision plugin into strategy - update reference

* update precision plugin reference in tpu_spawn

* add missing reference in error message

* add back removed license line

* update references in tests

* update reference in trainer

* update return annotation for precision_plugin property on TTP

* simplify access to precision plugin reference in sharded plug

* add changelog

* remove precision property from ttp and add deprecation message

* fix make doc and update precision reference

* simplify a reference to precision

accidentally overridden Adrian's change, now add it back

* Update CHANGELOG.md

add Adrian's change back

* Update accelerator precision

Add Adrian's change back

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add none check for precision plugin

just to be safe

* Update ipu.py

* update precision_plugin param deprecation message

* Update accelerator.py

* Remove deprecated warning 

Tests will fail after 9940

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-11-19 00:39:01 +00:00
Adrian Wälchli 0f6d89422b
Control automatic resubmission on SLURM (#10601) 2021-11-18 17:48:53 +00:00
shabie 6b728713bb
log metrics for correct dataloader only (#10522)
Co-authored-by: tchaton <thomas@grid.ai>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-11-18 18:29:13 +01:00
Adrian Wälchli 1ff35ed0f5
Improve code quality in `AcceleratorConnector._configure_slurm_ddp` (#10102) 2021-11-17 23:10:47 +00:00
Carlos Mocholí 0fa07da987
Fail the test when a `DeprecationWarning` is raised (#9940) 2021-11-17 23:41:50 +01:00
Carlos Mocholí c15b84dae7
Simplify hanging queue test (#10591) 2021-11-17 22:29:48 +00:00
Carlos Mocholí ba036fdeea
Support special test parametrizations (#10569) 2021-11-17 15:46:14 +00:00
Carlos Mocholí 3b2e164cab
Fix `caplog` with `logger.propagate=False` (#10577) 2021-11-17 16:25:55 +01:00
Adrian Wälchli d50e1696f9
Fix propagation of device and dtype properties in Lite modules (#10559) 2021-11-16 17:26:46 +00:00
Carlos Mocholí af4af3d73a
Mock GPU accelerator connector tests (#10554) 2021-11-16 16:13:40 +00:00
Sean Naren e98ace3adc
[DeepSpeed] Do not fail if batch size could not be inferred for logging (#10438) 2021-11-16 11:42:25 +00:00
Rohit Gupta de7ef41fea
remove deprecated `reload_dataloaders_every_epoch` from `Trainer` (#10481) 2021-11-16 06:47:43 +00:00
Carlos Mocholí 6dfcb6afc5
Skip strategy=ddp_spawn, accelerator=cpu, python>=3.9 tests (#10550) 2021-11-16 10:06:47 +05:30
Rohit Gupta 60850ef510
fix overfit_batch sampler replacement logic (#10486)
Co-authored-by: thomas chaton <thomas@grid.ai>
2021-11-15 22:31:45 +00:00
Carlos Mocholí dcafc95f2b
Avoid deprecated `progress_bar_refresh_rate` usage (#10520)
Co-authored-by: Danielle Pintz <38207072+daniellepintz@users.noreply.github.com>
2021-11-15 22:04:48 +01:00
thomas chaton 1de3539eac
Resolve instantiation problem with init_meta_context (#10493) 2021-11-15 19:13:01 +00:00
Kaushik B ae71284627
Remove deprecated `disable_validation` property from Trainer (#10450) 2021-11-15 18:42:00 +00:00
Kaushik B 01cf7a2ac5
Deprecate `DistributedType` in favor of `StrategyType` (#10505) 2021-11-15 17:10:08 +00:00
Shivam Mehta 794c4b08c0
Remove deprecated `is_overridden(model=...)` (#10507)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-11-15 12:56:30 +00:00
puhuk 8b0cb47cc0
Remove deprecated `hpc_load` in `CheckpointConnector` (#10525)
Co-authored-by: Aki Nitta <nitta@akihironitta.com>
2021-11-15 11:54:47 +00:00
thomas chaton ffb40060c0
shutdown workers on failure (#10463) 2021-11-15 10:03:46 +00:00
Carlos Mocholí 7a9a08c5d3
Drop torch 1.6 testing (#10390)
* Drop torch 1.6 support

* Drop 1.6 support

* Update CHANGELOG

* Fixes

* Split change

* Undo change

* 1.7 -> 1.7.1

https://github.com/pytorch/pytorch/issues/47354

* Force trigger nightly

* Update .github/workflows/events-nightly.yml

Co-authored-by: Aki Nitta <nitta@akihironitta.com>

* Revert 1.7.1 change - try wildcard

* Update adjust versions and test it

* Undo test changes

* Revert "Undo test changes"

This reverts commit 3a6acadd11.

* Update CHANGELOG.md

Co-authored-by: Aki Nitta <nitta@akihironitta.com>
2021-11-13 20:35:03 +00:00
Rohit Gupta a8c2725ff8
remove deprecated signature for `transfer_batch_to_device` (#10480)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-11-13 19:32:30 +00:00
Kaushik B fabb364402
Remove deprecated `mode` argument from ModelSummary (#10449)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-11-12 19:32:43 +00:00
Carlos Mocholí 847e24011a
Squeeze the early stopping monitor (#10461) 2021-11-12 18:03:47 +00:00
Rohit Gupta fa0ed17f8a
remove deprecated train_loop (#10482)
* remove deprecated train_loop

* chlog
2021-11-12 12:42:25 +00:00
Raahul Singh 09cf167237
Change attributes of `RichProgressBarTheme` dataclass (#10454)
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
2021-11-11 19:53:40 +00:00
Carlos Mocholí 5ba5b72473
Update tests to avoid the deprecated `weights_summary` (#10446) 2021-11-11 18:15:18 +01:00
Kaushik B d577f461a4
Remove deprecated `utilities.distributed.rank_zero_{warn,deprecation}` (#10451) 2021-11-10 07:35:48 -08:00
a-gardner1 ce149f6451
Fix support for dataclasses with ClassVar/InitVar in `apply_to_collection` (#9702)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-11-10 04:42:27 +00:00
Carlos Mocholí d515bcac96
Remove deprecated profiler import (#10443) 2021-11-09 23:13:02 +01:00
thomas chaton 8d810d6144
Enable distributed training with CombinedDataLoader and max_size_cycle (#10374)
* solve combinedloader

* update

* update changelog

* update on comments

* resolve iterable dataset support

* update test description

* update

* update on comments

* update

* Accelerator auto

* Address review

* Refactor

Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-11-09 20:06:10 +00:00
Carlos Mocholí c413b69240
Remove deprecated `task_idx` (#10441) 2021-11-09 18:54:38 +00:00
Carlos Mocholí ebab4be3e4
Remove deprecated `DeviceDtypeModuleMixin` import (#10442) 2021-11-09 18:35:53 +00:00
Ross Johnstone c2f25d42ab
Make `monitor` required arg of EarlyStopping callback (#10328)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-11-09 18:08:03 +00:00
Carlos Mocholí 069ec1005a
Do not autodetach extras (#10424)
* Do not autodetach extras

* Update CHANGELOG

* Use foo
2021-11-09 16:07:16 +00:00
thomas chaton 7fb277f260
Resolve workers being forcelly deleted with `persistent_workers=True` (#10434)
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-11-09 14:58:31 +00:00
Carlos Mocholí edbf27430d
Remove deprecated `self.log` arguments (#10423) 2021-11-09 15:49:55 +01:00
Adrian Wälchli aaa6aa75e9
Fix converting only float type tensors in Lite (#10429)
* fix

* less code

* add test case

* add test cases

* update input

* add test cases

* add type hint

* add changelog note

Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
2021-11-09 15:21:00 +01:00
Kaushik B 5eeca87e98
Fix deadlocks for distributed training for RichProgressBar (#10428) 2021-11-09 18:30:37 +05:30
Rohit Gupta 21eafafcb0
disable step logging in epoch hooks (#10409)
* disable step logging in epoch hooks

* chlog

* Apply suggestions from code review

* chlog
2021-11-09 16:53:27 +05:30
puhuk f9b9cdb0d1
Remove deprecated accelerator pass through functions in Accelerator (#10403)
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-11-08 17:36:37 +00:00
Adrian Wälchli a270a79ed9
Rename "master" methods to "main" in ClusterEnvironment plugins (#10103)
* rename occurrences of master port, master address, maser node, master process

* rename properties

* add property decorators

* occurrences in docs

* update changelog

* update changelog

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add lost method

* create deprecation

* add changelog

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix typo (but it was already there!!!)

* Apply suggestions from code review

Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>

* add todo

* update more occurences

* add types

* add missing import

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
2021-11-08 12:32:58 +00:00
Carlos Mocholí 613aa09514
Revert part of #10279 (#10376) 2021-11-08 11:28:58 +00:00
Espen Haugsdal 89e1360e75
Fix pickling error with CSVLogger (#10388)
* Don't store csv.Dictwriter in ExperimentWriter

* Add test for pickle after .save()

* Add entry in changelog
2021-11-08 10:36:35 +00:00
puhuk c58f84c176
Remove deprecated master_params attributes in PrecisionPlugin (#10372)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-11-08 02:42:03 +00:00
Adrian Wälchli 45f6a3b175
Fix DataLoader inspection and re-instantiation in Lite (#10334)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-11-05 17:31:45 +00:00
Connor Anderson 1c28f361d4
Remove `every_n_val_epochs` from ModelCheckpoint (#10366)
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-11-05 15:19:33 +00:00
Saurav Maheshkar a9bd4fbd96
Remove deprecated property `configure_slurm_dpp` from accelerator connector (#10370)
* Remove deprecated configure_slurm_ddp

* Update CHANGELOG

* Remove deprecated tests from test suite
2021-11-05 14:11:30 +00:00
puhuk 9c4112ce1c
Remove deprecated sync_batchnorm and num_nodes attributes in DDP plugins (#10357)
* Remove deprecated sync_batchnorm and num_nodes attributes in DDPPlugin

Part of #10312

test_v1_6_0_ddp_num_nodes()
test_v1_6_0_ddp_sync_batchnorm()

* Remove deprecated sync_batchnorm and num_nodes attributes in DDPPlugin

Part of #10312

test_v1_6_0_ddp_num_nodes()
test_v1_6_0_ddp_sync_batchnorm()

* remove deprecation warnings

* apply removal to spawn plugin

* update changelog

* remove num_nodes in deepspeed

* remove unused imports

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-11-05 10:13:12 +00:00
four4fish 973305c6a5
Add more trainer config tests (#10319)
* Add more trainer config tests

* Add more trainer config and ttp register tests

* Add more trainer config and ttp register tests
2021-11-05 10:42:58 +01:00
Saurav Maheshkar 6b5e185d07
Remove deprecated property `is_slurm_managing_tasks` from accelerator connector (#10353)
* Remove deprecated property _slurm_managing_tasks from accelerator connector

* Update CHANGELOG

* Update Changelog

* Removed is_slurm_managing_tasks from AcceleratorConnector

* resolve merge conflict

* add back accidentally removed lines

* remove test

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-11-05 09:38:53 +00:00
Alexandre Mayerowitz b3c0f121ca
Remove deprecated datamodule lifecycle properties (#10350)
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-11-05 05:03:57 +00:00
Adrian Wälchli 3664659094
Remove deprecated method `ClusterEnvironment.creates_children` (#10339)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-11-04 17:11:32 +00:00
Peter Dudfield ce3e63262a
Fix failure when `DataLoader(batch_size=None)` is passed (#10345)
* add test, + add change to data loading batch sample method

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Refactor and CHANGELOG

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-11-04 12:46:57 +01:00
puhuk 412f0a4d24
Remove deprecated dataloader arguments in Trainer methods (#10325)
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-11-04 11:03:39 +01:00
Connor Anderson 6f00ba21c2
Remove deprecated `loaded_optimizer_states_dict` property (#10346)
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-11-04 01:51:46 +01:00
Carlos Mocholí ba23d91320
Update recommendation on `dataloader_idx` (#10318) 2021-11-04 01:39:55 +01:00
Danielle Pintz c5d011c3cf
Remove `TrainerModelHooksMixin` (#10322)
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-11-03 20:26:59 +00:00
Carlos Mocholí 93caa7cda9
Fix `apply_to_collection(defaultdict)` (#10316) 2021-11-03 11:18:10 +00:00
Ning f6ed0bd8ca
introduce has_len_all_ranks() to check the length of dataloader across ranks (#9827)
* introduce , udpate tests

* update CHANGELOG.md

* change staticmethod and hook attribute naming

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix typo

* remove non-essential comment

* fix merge error and comment format

* try to fix test_tpu.py failure

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update on comments

* chlog

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* chlog

* update

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* try fix

* Revert back TPUSpawn changes

* Update test

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: thomas chaton <thomas@grid.ai>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: Kaushik B <kaushikbokka@gmail.com>
2021-11-02 13:22:58 -04:00
Kaushik B 34fcb87a2b
Add `leave` argument to RichProgressBar (#10301)
* Add display_every_n_epochs argument to RichProgressBar

* Add tests

* Update test

* Update test

* Update changelog

* use leave argument instead

* Update pytorch_lightning/callbacks/progress/rich_progress.py

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

Co-authored-by: thomas chaton <thomas@grid.ai>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-11-02 13:20:52 -04:00
Adrian Wälchli 373c32e34b
Fix yielding from iterator in LiteDataLoader (#10304)
* fix yielding form iterator

* update description

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove unused code

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-11-02 11:40:35 +01:00
Adrian Wälchli 3cd65b592b
Lightning Lite Examples (#9987)
Co-authored-by: Kaushik B <kaushikbokka@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
Co-authored-by: SeanNaren <sean@grid.ai>
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
Co-authored-by: thomas chaton <thomas@grid.ai>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: four4fish <88516121+four4fish@users.noreply.github.com>
Co-authored-by: Nicki Skafte Detlefsen <skaftenicki@gmail.com>
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
Co-authored-by: Pietro Lesci <61748653+pietrolesci@users.noreply.github.com>
2021-11-02 08:04:29 +00:00
Rohit Gupta e4ee6df196
Add warning if multiple batch_sizes are found from ambiguous batch (#10247) 2021-11-01 19:50:30 +00:00
victorjoos cc0e9f96a8
Add support for empty `gpus` list to run on CPU (#10246)
Co-authored-by: rohitgr7 <rohitgr1998@gmail.com>
2021-11-01 18:37:38 +00:00
thomas chaton facaff94b8
Add custom dataloader support with Lite (#10279) 2021-11-01 18:33:13 +00:00
Kaushik B c52d7ba73d
Add `configure_columns` method to RichProgressBar (#10288)
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2021-11-01 17:22:53 +00:00
Rohit Gupta 6609b2e46f
enable `on_load_checkpoint` for `datamodule` for all `trainer_fn` (#10238) 2021-11-01 14:20:46 +00:00
Kaushik B 45c45dc7b0
Deprecate `ProgressBar` and rename it to `TQDMProgressBar` (#10134)
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-11-01 11:42:21 +00:00
Kaushik B 2ee6d9fbc7
Fix `distrib_type` not being set when Plugin instances being passed to Trainer (#10251) 2021-11-01 17:11:57 +05:30
Carlos Mocholí 2b24be2e45
Simplify `LightningOptimizer` (#10224) 2021-10-30 15:56:15 +00:00
Kaushik B e0f7dbdd1c
Add support for `devices='auto'` (#10264) 2021-10-30 15:05:51 +00:00
Carlos Mocholí 9237106451
Clip before step (#10248) 2021-10-30 11:27:49 +01:00
Adrian Wälchli 9d136a9fc5
Lightning Lite core and tests (#10175) 2021-10-29 21:46:39 +00:00
Kaushik B cedaebfcbb
Add `auto_device_count` method to `Accelerators` (#10222)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-10-29 22:31:32 +02:00
Gili Tzabari a967b6eba0
del iterator on_run_end() (#9915) 2021-10-29 16:29:44 +00:00
Carlos Mocholí e4eb61d812
Raise exception for `strategy=ddp_cpu|tpu_spawn` (#10185) 2021-10-29 16:15:24 +00:00
Carlos Mocholí 81d15c5986
Implement double optimizer closure for hook structure consistency (#10167) 2021-10-29 13:03:04 +00:00
thomas chaton bd77f65463
Resolve batch_size in ResultCollection not resetted to 1 on epoch end (#10242) 2021-10-29 13:55:11 +01:00
thomas chaton 843bf26297
Fix `log(sync_dist=True, on_epoch=True, on_step=True)` not reducing on step (#10227)
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-10-29 12:08:32 +00:00
Carlos Mocholí 4bc73b2b76
Avoid deprecated usage in accelerator connector tests (#10184) 2021-10-29 12:36:21 +01:00
Ning dbfadedfe7
Revert "Add support for `len(datamodule)` (#9895)" (#10072)
This reverts commit 6429de8944.
2021-10-29 13:33:51 +02:00
Rohit Gupta 6a9adf26f7
Replace `_TORCH_GREATER_EQUAL_DEV_1_10` with `_TORCH_GREATER_EQUAL_1_10` (#10240) 2021-10-29 10:36:02 +00:00
thomas chaton 5f4ffdee41
cleanup (#10081) 2021-10-29 08:40:43 +00:00
Adrian Wälchli 3f9dfe4949
Fix iterating over a DummyLogger when `fast_dev_run > 0` (#10232) 2021-10-29 07:22:59 +00:00
Kaushik B 762af9505b
Add missing test for testing custom registered training plugin (#10225) 2021-10-29 04:06:06 +00:00
thomas chaton 255e3edc98
resolve failing test (#10191)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-10-28 15:27:03 +00:00
Carlos Mocholí 03f01fb5ec
Fix gradient norm tracking and gradient clipping (#9287)
* WIP

* Progress

* Undo test change

* Fix plugin closure execution order

* Update CHANGELOG

* Fix manual optimization on AMP and skipping backward

* Fix for deepspeed

* Typo

* Hook test for manual closure

* Add skipping test with AMP

* You are hideous, apex

* Add deepspeed test

* Update CHANGELOG

* Fix for broken master

* Add RunIf

* FIXMEs

* Rename

* Fix grad norm

* add a simple test

* update test

* update  test

* update test

* fix merge conflicts

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Sea of changes

* Undo change

* Introduce TPUPrecisionPlugin

* Undo changes

* Undo changes

* Resolve FIXME

* Undo change

* Undo change

* Undo change

* Fix FIXMEs

* Fix FIXME

* Correct value

* Bad merge

* Fix circular imports

* WIP

* Fixing clipping

* Fixes

* Bad merge

* Move optimizer step and clipping into the `PrecisionPlugin`

* Fix AMP

* Update CHANGELOG

* Fix tests

* Underscore

* Progress

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove pre_optimizer_step

* Missed one

* Progress

* Progress

* Fix test

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update FIXMEs

* Fix test

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix test

* DeepSpeed warning. mypy

* Rename

* Finish tests

* Update CHANGELOG

* Dumb fixes

* accelerator=auto

* Apply suggestions from code review

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

* Update on comments

* Use ClassifModule

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2021-10-28 15:23:27 +00:00
Carlos Mocholí 5262b63dff
Pass the scaler as an input to `NativeMixedPrecisionPlugin` (#10055)
Co-authored-by: thomas chaton <thomas@grid.ai>
2021-10-28 14:13:53 +00:00
Low Weng Fei 83d74bb385
Fix `reset_seed()` converting the `PL_SEED_WORKERS` environment variable `str` read to `bool` (#10099)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: tchaton <thomas@grid.ai>
2021-10-28 12:57:41 +00:00
Rohit Gupta 9af1dd7443
Deprecate `lr_sch_names` from `LearningRateMonitor` (#10066) 2021-10-28 12:57:04 +00:00
Rohit Gupta 85eb17cde5
initialize poptorch_models based on trainer_fn (#10149) 2021-10-28 11:59:52 +00:00
Carlos Mocholí dbe1662dc3
Replace `_TORCH_GREATER_EQUAL_DEV_1_10` with `_TORCH_GREATER_EQUAL_1_10` (#10157) 2021-10-27 13:38:39 +01:00
Kaushik B c33df2639f
Set `dataset` attribute to `MpDeviceLoader` used in TPU Spawn (#10151) 2021-10-27 01:23:01 +05:30
Carlos Mocholí 48b6292cf0
Move optimizer step and clipping into the `PrecisionPlugin` (#10143) 2021-10-26 17:26:26 +02:00
Carlos Mocholí a0e45dc071
Some minor CI cleanup (#10088) 2021-10-26 13:58:20 +02:00
twsl 971281d27d
Make sure file and folder exists in Profiler (#10073)
Co-authored-by: tchaton <thomas@grid.ai>
2021-10-26 11:13:31 +00:00
Adrian Wälchli 871a96701a
Rename `master_params` to `main_params` (#10105)
Co-authored-by: tchaton <thomas@grid.ai>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2021-10-26 11:17:32 +02:00
Rohit Gupta 34d5980df6
Raise `MisconfigurationException` if `trainer.eval` is missing required methods (#10016) 2021-10-25 23:12:08 -07:00
Danielle Pintz 13d6d7bad1
Remove `optimizer_connector.py` (#10120) 2021-10-26 00:52:43 +00:00
Adrian Wälchli 21a5867dad
Rename `ClusterEnvironment.creates_processes` (#10106)
Co-authored-by: tchaton <thomas@grid.ai>
2021-10-25 23:15:41 +00:00
Rajat Goel 47e7a2860f
Fix Enums parsing in generated hparms yaml (#9170)
Co-authored-by: tchaton <thomas@grid.ai>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2021-10-25 21:23:20 +00:00
Eric Wiener 0e20119d24
Change default value of the `max_steps` Trainer argument from `None` to `-1` (#9460)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: thomas chaton <thomas@grid.ai>
Co-authored-by: rohitgr7 <rohitgr1998@gmail.com>
2021-10-25 20:21:33 +00:00
Rohit Gupta d9dfb2e920
fix tests (#10138) 2021-10-25 19:37:47 +00:00
Danielle Pintz 1f7bd6650c
Mark accelerator connector as protected (#10032) 2021-10-25 19:24:54 +00:00
jjenniferdai 6d79184ec5
Unify checkpoint load paths [redo #9693] (#10061) 2021-10-25 19:05:31 +00:00
Adrian Wälchli 76081fb846
Mark SLURM detection methods in `AcceleratorConnector` as protected (#10101)
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
2021-10-25 17:52:15 +00:00
Carlos Mocholí 2ee3127661
Use `torch.autocast` (#10053) 2021-10-25 17:33:52 +00:00
Carlos Mocholí b376799430
Minor fixes related to clipping (#10130)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2021-10-25 16:40:22 +00:00
manipopopo cfb2d87765
Disable quantization aware training observers (#8540)
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: tchaton <thomas@grid.ai>
Co-authored-by: rohitgr7 <rohitgr1998@gmail.com>
2021-10-25 15:46:09 +00:00
Adrian Wälchli 7eb2edf421
rename set_random_master_port (#10104)
Co-authored-by: tchaton <thomas@grid.ai>
2021-10-25 12:09:05 +00:00
Danielle Pintz e94dcf6936
Mark `trainer.data_connector` as protected (#10031)
Co-authored-by: tchaton <thomas@grid.ai>
2021-10-25 12:29:09 +01:00
Carlos Mocholí f95ba20012
Do not use the base version by default in `_compare_version` (#10051) 2021-10-25 16:41:32 +05:30
thomas chaton ed9802643c
[CI] Comment flaky tests (#10084) 2021-10-25 10:31:06 +02:00
Kaushik B c3614f1c07
Fix: skip importing DistributedOptimizer for Windows (#10071) 2021-10-21 21:01:56 +00:00
thomas chaton 454e93bace
Add support for init_meta_context, materialize_module (#9920) 2021-10-21 15:48:31 +01:00
jjenniferdai 2d9db211b5
Revert "Support serialized checkpoint loading (#9605)" (#10057)
This reverts commit f0e6f1b58a.
2021-10-21 02:51:22 +02:00
Kaushik B aa1540410f
Add XLACheckpointIO (#9972)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-10-21 02:39:16 +05:30
Rohit Gupta 1599c77d16
Fix `LearningRateMonitor` logging with multiple param groups optimizer with no scheduler (#10044) 2021-10-20 22:13:00 +05:30
Carlos Mocholí 6aeebf1bd3
Remove unnecessary dependency available checks (#10050) 2021-10-20 16:21:37 +00:00
Alessio Bonfiglio 2a2fa5a56a
Group all the logged gradients under the same sub-folder (#7756) 2021-10-20 15:48:36 +00:00
Kaushik B 56bc55db71
Update strategy flag in docs (#10000)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-10-20 21:02:53 +05:30
kingyiusuen 2ed92ecabb
Rerun flaky profiler tests on failure (#10035) 2021-10-20 18:57:04 +05:30
Carlos Mocholí f0b3e0f4de
Default to `precision=bf16` on CPU when `precision=16` is passed (#10033) 2021-10-20 13:25:13 +00:00
Adrian Wälchli 2c16f1d6b9
remove dataloader patching on the LightningModule (#9764)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: rohitgr7 <rohitgr1998@gmail.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-10-20 15:23:20 +02:00
jjenniferdai f0e6f1b58a
Support serialized checkpoint loading (#9605)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-10-20 09:38:35 +01:00
Carlos Mocholí 53c62f63e8
Constrain IPU precision choices (#10030) 2021-10-20 00:52:01 +00:00
Carlos Mocholí ad8d6c83da
[CLI] Shorthand notation to instantiate datamodules (#10011)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-10-20 00:49:48 +00:00
Carlos Mocholí e44921ee21
Fix `self.log(on_epoch=True, reduce_fx=sum)` on_batch_start (#9791) 2021-10-20 01:56:37 +02:00
Carlos Mocholí d45897d522
Rename `TPUHalfPrecisionPlugin` to `TPUBf16PrecisionPlugin` (#10026)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-10-19 21:09:37 +00:00
Ning 0b68f2abf8
Remove `reset_train_val_dataloaders` from Trainer and move data reloading logic to loop (#9671)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
2021-10-19 21:45:52 +02:00
Carlos Mocholí e8beceb631
Add `TPUPrecisionPlugin` (#10020)
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-10-19 17:48:57 +00:00
thomas chaton 1759403c8d
Add check for callable with datamodule len (#10003)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2021-10-19 14:51:08 +00:00
Rohit Gupta 0aa220b46b
Remove deprecated `distributed_backend` from `Trainer` (#10017)
* rm distributed_backend from Trainer

* unused

* chlog

* internal distributed_backend

* Docstring

Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-10-19 13:54:37 +00:00
Danielle Pintz 203737bfce
Don't raise DeprecationWarning for `LoggerConnector.gpus_metrics` (#9959) 2021-10-18 22:51:09 +00:00
Adrian Wälchli a99b7440b5
Add unit tests for `pl.utilities.grads` (#9765)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-10-18 18:58:51 +05:30
Rohit Gupta 4dc32ad7db
Fix logic to check for spawn in worker_check (#9902)
* fix

* update tests

* chlog

* skip windows
2021-10-18 13:02:46 +00:00
Carlos Mocholí 3f355d0eb7
Remove manual tracking of optimizer steps (#9957) 2021-10-18 12:43:06 +00:00
Carlos Mocholí 0684e5295f
Remove deprecated `DataModule.dims` usage in tests (#9948) 2021-10-18 17:35:41 +05:30
Carlos Mocholí c69a79c86f
Fix `self.log(on_epoch=True)` on_batch_start (#9780) 2021-10-18 14:02:16 +02:00
Elad Segal 8c76cf5ae1
reset val dataloader for binsearch (#9975) 2021-10-18 12:54:26 +02:00
Carlos Mocholí 01b304ec57
Update accelerator connector messages after the addition of strategy (#9937) 2021-10-18 01:10:48 +00:00
Carlos Mocholí 788f6864d9
Fix `LightningOptimizer` step and toggling logic (#9958) 2021-10-18 00:23:51 +00:00
ronif 7b4df7bf91
Fix issue with no-init dataclass fields in move_to_device (#9963)
Co-authored-by: ronif <ronif@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-10-17 07:10:47 +00:00
Carlos Mocholí e5dfdf34f9
Avoid deprecation warning after #9901 (#9951) 2021-10-16 17:36:25 +01:00
Kaushik B 5e8829b97d
(1/n) tests: Use strategy flag instead of accelerator for training strategies (#9931)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-10-16 20:40:25 +05:30
Carlos Mocholí e973bcb76a
Use non-deprecated options in tests (#9949) 2021-10-15 16:58:07 -07:00
Carlos Mocholí db4e770004
Validate the precision input earlier (#9763) 2021-10-15 17:30:00 +00:00
kingyiusuen 6429de8944
Add support for `len(datamodule)` (#9895)
Co-authored-by: tchaton <thomas@grid.ai>
2021-10-15 14:19:50 +02:00
Danielle Pintz 16213b1635
Deprecate `log_gpu_memory`, `gpu_metrics`, and util funcs in favor of `DeviceStatsMonitor` callback (#9921)
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2021-10-14 22:45:44 +02:00
Oliver Borchert afbf703684
Single-process multi-node CPU training (#9603)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: thomas chaton <thomas@grid.ai>
2021-10-14 22:21:41 +02:00
Kaushik B af4a8f1950
Refactor tests for TPU Accelerator (#9718)
Co-authored-by: tchaton <thomas@grid.ai>
2021-10-14 19:45:15 +00:00
Danielle Pintz 6feda08109
Deprecate `GPUStatsMonitor` and `XLAStatsMonitor` in favor of `DeviceStatsMonitor` (#9924)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Nicki Skafte Detlefsen <skaftenicki@gmail.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2021-10-14 15:52:45 +00:00
four4fish a002f872ea
[2/n] Directly call TrainingTypePlugin APIs instead of going through the Accelerator (#9901)
Co-authored-by: tchaton <thomas@grid.ai>
2021-10-14 17:38:22 +02:00
Viraj Bagal 15698698c4
Log LR using LearningRateMonitor even when LR Scheduler is not defined. (#9786)
* LR logging works even with no lr scheduler, wrote few extra tests as well

* updated changelog

* modified code as suggested by DeepSource

* added helper functions

* opt with no scheduler

* rename

* chlog

* update test

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: rohitgr7 <rohitgr1998@gmail.com>
2021-10-14 13:28:19 +00:00
Danielle Pintz 940b910d27
[2/4] Add DeviceStatsMonitor callback (#9712)
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
Co-authored-by: thomas chaton <thomas@grid.ai>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Kaushik B <kaushikbokka@gmail.com>
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
2021-10-13 18:29:36 +00:00
Rohit Gupta 23e8b59ae7
Add `configure_gradient_clipping` hook in `LightningModule` (#9584)
* init hook

* docs

* dep train args

* update tests

* doc

* doc

* .gitignore

* not dep

* add trainer args

* add & update tests

* fix tests

* pre-commit

* docs

* add docs

* add exception

* code review

* deepspeed

* update tests

* not

* try fix

* Apply suggestions from code review

* update deepspeed

* disable some tests

* disable some tests

* enable all tests
2021-10-13 20:15:13 +05:30
Kaushik B 05b15e63f0
Add `strategy` argument to Trainer (#8597)
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-10-13 12:34:06 +00:00
ananthsub 28fc8d2016
Add `enable_model_summary` flag and deprecate `weights_summary` (#9699)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Kaushik B <kaushikbokka@gmail.com>
2021-10-13 17:20:54 +05:30
Rohit Gupta 0f8fd20443
Remove epoch from `trainer.logged_metrics` (#9904) 2021-10-13 11:30:27 +02:00
ananthsub 4610fddb19
Mark `Trainer.terminate_on_nan` protected and deprecate public property (#9849)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-10-12 20:23:22 +00:00
Danielle Pintz dd6d797e0e
Remove type error handling in _configure_checkpoint_callbacks (#9823)
* remove type error handling in _configure_checkpoint_callbacks

* rm test

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2021-10-12 20:13:02 +00:00
Adrian Wälchli b530b7afd2
update tests to not rely on patched dataloaders (#9905) 2021-10-12 12:45:28 +02:00
Rohit Gupta 98c0a110e0
Update docs for `GradientAccumulationScheduler` (#9891)
* update docs and add tests

* update docs and add tests

* Update pytorch_lightning/callbacks/gradient_accumulation_scheduler.py

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

* Apply suggestions from code review

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-10-12 10:37:16 +00:00
Rohit Gupta f2b0db60f1
Raise a `MisconfigurationException` when trainer functions are called with `ckpt_path="best"` but `checkpoint_callback` isn't configured (#9841)
* add check

* chlog

* Apply suggestions from code review

Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>

* Apply suggestions from code review

Co-authored-by: thomas chaton <thomas@grid.ai>

Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
Co-authored-by: thomas chaton <thomas@grid.ai>
2021-10-12 15:35:55 +05:30
Sean Naren 6da5829e53
DeepSpeed support for device IDs (#9847) 2021-10-12 09:24:46 +00:00
Rohit Gupta db322f4bbb
Deprecate `checkpoint_callback` from the `Trainer` constructor in favour of `enable_checkpointing` (#9754)
* enable_chekpointing

* update codebase

* chlog

* update tests

* fix warning

* Apply suggestions from code review

Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Apply suggestions from code review

Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>

* Apply suggestions from code review

Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-10-12 07:55:07 +00:00
Kaushik B 14fb076a30
Fix deprecation test version for accelerator collective (#9892) 2021-10-12 11:50:31 +05:30
Sean Naren 83acb8671d
Update DeepSpeed version, fix failing tests (#9898) 2021-10-11 22:35:33 +00:00
yopknopixx 173f4c8466
Deprecate `terminate_on_nan` Trainer argument in favor of `detect_anomaly` (#9175)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-10-11 17:17:43 +00:00
Adrian Wälchli 6a0c47a014
remove redundant accumulation normalization in manual optimization (#9769) 2021-10-11 15:26:12 +00:00
Ranuga-Disansa f915a8a283
Removed a redundant warning with `ModelCheckpoint(monitor=None)` callback (#9875)
* Update README.md

* Update README.md

* Create evaluation.py

* Update README.md

* Update evaluation.py

* Create evaluation.py

* Create evaluation.py

* Update evaluation.py

* Create nlp.py

* Update evaluation.py

* Create evaluation.py

* Update nlp.py

* Update nlp.py

* Update evaluation.py

* Create evaluation.py

* Update nlp.py

* Update nlp.py

* Update requirements.txt

* Update evaluation.py

* Create data_loader.py

* Update nlp.py

* Update evaluation.py

* Update data_loader.py

* Update nlp.py

* Update data_loader.py

* Update requirements.txt

* Update model_checkpoint.py

* Delete evaluation.py

* Delete data_loader.py

* Delete nlp.py

* Update requirements.txt

* Update model_checkpoint.py

* Update README.md

* Update pytorch_lightning/callbacks/model_checkpoint.py

* Update CHANGELOG.md

* Update test_model_checkpoint.py

* Update model_checkpoint.py

* update

* update

* chlog update

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2021-10-11 14:54:07 +00:00
Boris Dayma 2db9ea3500
feat(wandb): support media logging (#9545) 2021-10-11 10:15:36 +01:00
Rohit Gupta d71501d97f
Reset `val_dataloader` in `tuner/batch_size_scaling` (#9857)
* reset val

* chlog
2021-10-11 09:13:33 +01:00
kingyiusuen 8740c801bb
Fix typo in _validate_scheduler_optimizer() (#9886) 2021-10-11 09:16:17 +02:00
ananthsub 5206e52786
Add support for `torch.set_detect_anomaly` (#9848)
* Add support for `detect_anomaly`

* Update CHANGELOG.md
2021-10-07 16:03:56 +00:00
Rohit Gupta 4decbc0d95
Deprecate `dataloader_idx` from `on_train_batch_start/end` (#9816)
* deprecate hooks

* dep todo

* explicit

* Apply suggestions from code review

* Apply suggestions from code review

* code review

* base
2021-10-07 10:18:11 +00:00
Rohit Gupta 8a8ecb8d01
Update the logic to check for accumulation steps with deepspeed (#9826)
* support_dict

* chlog

* fix test

* epochs
2021-10-06 17:50:10 +01:00
Rohit Gupta b303b4f895
Fix restoring training state during `trainer.fit` only (#9413)
* reload state on fit

* trainer.state

* add test

* chlog

* revert

* review

* review

* rev and ammend

* fix test and logic

* update

* code review

* Apply suggestions from code review

* better assertions

* better assertions

* Apply suggestions from code review

* add loop test

* Apply suggestions from code review

* Split for typing

* review comments

* review comments

* use if_else

* code review

* code review

* code review

* Apply suggestions from code review

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* Remove unnecessary pieces from the test

* move test

Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-10-06 14:57:40 +00:00
Jirka Borovec b3e9dff32d
rename callback FineTune arg `round` (#9711)
* rename CB Tune arg round

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-10-06 09:39:36 +01:00
Kaushik B f94faa9cd3
Enable auto parameters tying for TPUs (#9525)
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-10-06 10:16:44 +02:00
Elad Segal 86ad941d06
Fix missing arguments when saving hyperparams from parent class only (#9800)
* Fix missing arguments when saving hyperparams from parent class only

* fix antipattern
2021-10-06 08:32:29 +01:00
Danielle Pintz 3392215ef6
Fix broken `test_cpu_amp_precision_context_manager` (#9809)
* @RunIf(min_gpus=1)

* dtype -> fast_dtype
2021-10-04 12:14:13 +00:00
kingyiusuen 6d530373c0
Add warnings regarding unsupported keys in optim config and OneCycleLR (#9666)
* Add warnings regarding unsupported keys in optim config and OneCycleLR

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix docstring

* Update CHANGELOG.md

* Split  into two parts

* Use difference operator to find extra keys

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-10-04 08:25:05 +00:00
thomas chaton 5841ca9782
[Feat] Add auto_restart for fault tolerant training (#9722) 2021-10-01 16:37:17 +00:00