Carlos Mocholí
26977043bf
Add separate CI job for slow tests ( #10830 )
2021-12-01 19:58:18 +00:00
Carlos Mocholí
a7aed2af7a
[CLI] Add support for `ReduceLROnPlateau` ( #10860 )
2021-12-01 15:41:22 +00:00
Rafał Jankowski
c6478414ee
Fixed uploading best model checkpoint in NeptuneLogger ( #10369 )
2021-12-01 13:58:54 +00:00
Aka.Fido
72cc8b7ca9
Disable validation completely when `overfit_batches>0` ( #9709 )
...
Co-authored-by: thomas chaton <thomas@grid.ai>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-12-01 13:57:57 +00:00
Adrian Wälchli
e6cc99ef90
Fix selection of standalone tests ( #10857 )
...
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-12-01 09:48:37 +01:00
Kaushik B
ec0fb2fd95
Raise exception if rich is less than 10.2.2 ( #10839 )
2021-12-01 06:14:19 +00:00
Andres Algaba
1a26af1519
Add job_name as a staticmethod in SLURMEnvironment class ( #10698 )
...
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-12-01 00:01:44 +00:00
Mauricio Villegas
f3b0a06e90
Fix `SignalConnector._has_already_handler` check for callable type ( #10483 )
...
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: thomas chaton <thomas@grid.ai>
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-11-30 22:47:52 +00:00
Adrian Wälchli
25473acddb
Restore signals on teardown ( #10611 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-11-30 22:07:14 +00:00
Rohit Gupta
1437be5e98
Disable batch_size extraction for torchmetric instances ( #10815 )
...
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-11-30 20:47:05 +00:00
Carlos Mocholí
0061619e0a
Improve typing for loops ( #10780 )
2021-11-30 20:28:55 +00:00
Abhinav Arora
f63222d966
Remove references to torchtext.legacy from PyTorch Lightning ( #10724 )
...
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-11-30 19:32:07 +00:00
Carlos Mocholí
8e1b9b306c
Skip hanging spawn tests ( #10838 )
...
* Skip hanging spawn tests
* Docstring fix
* Add back to TPU spawn
2021-11-30 18:36:12 +00:00
Carlos Mocholí
38ed26ec5a
Do not require omegaconf to run tests ( #10832 )
2021-11-30 14:48:03 +00:00
Adrian Wälchli
a81accb2ad
Update LiteOptimizer signature after optimizer changes in TrainingTypePlugin ( #10708 )
2021-11-30 15:16:59 +01:00
Carlos Mocholí
1b43e43e9f
Minor changes in preparation for saving the loops state ( #10783 )
2021-11-30 19:37:04 +05:30
Carlos Mocholí
4710734f14
Improve `@RunIf` docs ( #10828 )
2021-11-30 14:21:38 +01:00
Andres Algaba
e0474f8f0f
Add test for `job_id` ( #10774 )
2021-11-30 11:53:55 +01:00
four4fish
1d2878523a
2/n Move Precision Plugin into strategy - move optimizer related logics ( #10596 )
...
Co-authored-by: Danielle Pintz <38207072+daniellepintz@users.noreply.github.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: thomas chaton <thomas@grid.ai>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-11-30 08:31:23 +00:00
four4fish
8bf7f9cce7
1/n Move Accelerator into strategy - move batch_to_device to strategy ( #10649 )
...
* 1/n Integrate Device Specific Accelerator Logic with strategy - move batch_to_device to strategy
* add changelog
* add model is not none check
* Apply suggestions from code review
Co-authored-by: thomas chaton <thomas@grid.ai>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
* Update CHANGELOG.md
* Update test_datamodules.py
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Update test_hooks.py
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Update dp.py
Co-authored-by: thomas chaton <thomas@grid.ai>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-11-29 12:11:21 -08:00
Rohit Gupta
753cc4dfad
Fix default logging levels for train step specific hooks ( #10756 )
2021-11-29 19:51:17 +00:00
Carlos Mocholí
d3b7492bd0
[CLI] Add support for `--key.help=class` ( #10767 )
2021-11-29 14:12:53 +00:00
Adrian Wälchli
97e52619ea
Fix typing in `pl.overrides.data_parallel` ( #10796 )
...
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-11-29 10:58:23 +01:00
Carlos Mocholí
724a92b065
Mark outputs as protected in the evaluation loops ( #10781 )
...
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2021-11-28 20:09:30 +00:00
Adrian Wälchli
c752060712
Consolidate state when retrieving sharded state dict in Lite ( #10746 )
...
Co-authored-by: thomas chaton <thomas@grid.ai>
2021-11-27 04:54:45 +00:00
thomas chaton
e94aff1c5b
Fault Tolerant: Add support for fault tolerant dataloader validator ( #10465 )
2021-11-26 19:33:47 +00:00
Carlos Mocholí
31bb6e69ca
Avoid optional instances in Loops ( #10735 )
...
* Avoid optional instances in Loops
* More cleanup
2021-11-26 18:00:18 +00:00
Carlos Mocholí
152eb57def
Rename special to standalone ( #10779 )
2021-11-26 17:13:14 +00:00
thomas chaton
6fe6e9e414
Delete TensorBoardLogger experiment before spawning the processes. ( #10777 )
2021-11-26 17:07:57 +00:00
thomas chaton
412d507a73
Fault Tolerant: move signal to SIGTERM ( #10605 )
2021-11-26 13:37:27 +00:00
thomas chaton
3d6262b7a9
Fault Tolerant Manual: Add support for DDP ( #10638 )
2021-11-25 18:31:53 +01:00
Kaushik B
e0b4bb2ea3
Deprecate `DeviceType` in favor of `_AcceleratorType` ( #10503 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-11-25 16:41:03 +01:00
Carlos Mocholí
f8b2d5b128
Improve error message on `TypeError` during `DataLoader` reconstruction ( #10719 )
2021-11-24 21:51:11 +00:00
thomas chaton
0066ff0129
Fault Tolerant Manual: Enable the feature ( #10707 )
2021-11-24 17:36:08 +00:00
Adrian Wälchli
30ec4815cb
Support re-instantiation for custom DataLoader in Lightning ( #10680 )
...
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
2021-11-24 15:58:51 +01:00
thomas chaton
e51a8ee7a3
Fault Tolerant Manual: utilities cleanup ( #10703 )
...
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-11-24 15:01:55 +01:00
Rohit Gupta
f36b395c4e
Update `LightningDataModule` docs ( #10678 )
2021-11-24 11:31:03 +00:00
thomas chaton
b28ab34ff5
Fault Tolerant Manual: Add loading to reload the states ( #10699 )
...
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-11-23 17:18:36 +00:00
Adrian Wälchli
dca1776870
LiteDataLoader wrapper improvements ( #10297 )
2021-11-23 16:35:07 +01:00
thomas chaton
7cf6374bd0
Fault Tolerant Manual: Add support for collecting states across processes ( #10639 )
2021-11-23 14:27:33 +00:00
thomas chaton
1702036c14
Fault Tolerant Manual: Add stateful dataloader iter ( #10674 )
2021-11-23 12:30:50 +00:00
Kaushik B
48cf1adfd3
Move Colab setup to ProgressBar ( #10542 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-11-23 06:16:31 +00:00
thomas chaton
2036dfb5df
Fault Tolerant Manual: Add _rotate_worker_indices utility ( #10647 )
2021-11-22 19:52:04 +00:00
Rohit Gupta
823bfa6f8a
Update `LightningModule` docs ( #10637 )
2021-11-23 01:02:04 +05:30
thomas chaton
6acfef680f
Fault Tolerant Manual: Add is_obj_stateful utility ( #10646 )
2021-11-22 18:48:32 +00:00
Andres Algaba
6fc7c54c3a
refactor slurm_job_id ( #10622 )
...
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: thomas chaton <thomas@grid.ai>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
2021-11-22 17:41:08 +00:00
Rohit Gupta
d431ce14a1
Raise an error if batch_size cannot be inferred from current batch ( #10541 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-11-22 16:55:19 +00:00
Danielle Pintz
6810c40fc9
Small improvements to `_init_debugging_flags` ( #10620 )
2021-11-22 11:38:09 -05:00
Carlos Mocholí
a6dedcf492
Fix `move_metrics_to_cpu` with evaluation ( #10631 )
2021-11-22 15:58:21 +00:00
thomas chaton
991cd895c6
1/n Add `FaultTolerantMode` ( #10645 )
2021-11-22 14:58:23 +00:00
puhuk
af0bb96f0f
Remove the "_precision" suffix from some precision plugin files ( #10052 )
2021-11-19 17:37:39 +00:00
Mauricio Villegas
5d748e560b
LightningCLI changes for jsonargparse>=4.0.0 ( #10426 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: thomas chaton <thomas@grid.ai>
2021-11-19 17:03:14 +00:00
Rohit Gupta
ec27313be2
Fix batch size extraction when set by the user in `LightningModule.log` ( #10408 )
...
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-11-19 16:48:26 +00:00
Jaime Ferrando Huertas
721b8413a0
Added boring model as a ipynb so it can be updated ( #10521 )
...
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-11-19 16:32:30 +00:00
Biho-Kim
e83e8ae305
Respect the passed dtype with `self.log` ( #10076 )
...
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2021-11-19 15:16:33 +00:00
Carlos Mocholí
3d2d0f2536
MANIFEST.in and setup.py clean-up ( #7614 )
2021-11-19 15:38:42 +01:00
Adrian Wälchli
8950354fe4
Extract dataloader utilities from `TrainerDataLoadingMixin` ( #10145 )
2021-11-19 12:45:35 +00:00
Adrian Wälchli
085e82f454
Introduce `ClusterEnvironment.detect()` ( #10564 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-11-19 12:24:10 +00:00
Adrian Wälchli
c09c9c7607
Remove redundant fit call from accelerator connector test ( #10626 )
2021-11-19 12:19:52 +05:30
Kaushik B
137b62d80d
Add `refresh_rate` to RichProgressBar ( #10497 )
...
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-11-19 05:59:57 +00:00
thomas chaton
7d3ad5b76e
Don't register signal in thread ( #10610 )
2021-11-19 04:13:35 +01:00
Carlos Mocholí
5788789f01
Move benchmarks into the test directory ( #10614 )
2021-11-19 03:07:33 +01:00
Carlos Mocholí
0de8ab4f2e
Fix failing master due to an interction between PRs ( #10627 )
2021-11-19 02:04:53 +00:00
Carlos Mocholí
35f6cbe09f
Use `update_wrapper` in test_hooks.py ( #10578 )
2021-11-19 01:52:55 +01:00
four4fish
700521c7d3
1/n Move precision plugin into strategy - update reference ( #10570 )
...
* 1/n move precision plugin into strategy - update reference
* update precision plugin reference in tpu_spawn
* add missing reference in error message
* add back removed license line
* update references in tests
* update reference in trainer
* update return annotation for precision_plugin property on TTP
* simplify access to precision plugin reference in sharded plug
* add changelog
* remove precision property from ttp and add deprecation message
* fix make doc and update precision reference
* simplify a reference to precision
accidentally overridden Adrian's change, now add it back
* Update CHANGELOG.md
add Adrian's change back
* Update accelerator precision
Add Adrian's change back
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Add none check for precision plugin
just to be safe
* Update ipu.py
* update precision_plugin param deprecation message
* Update accelerator.py
* Remove deprecated warning
Tests will fail after 9940
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-11-19 00:39:01 +00:00
Adrian Wälchli
0f6d89422b
Control automatic resubmission on SLURM ( #10601 )
2021-11-18 17:48:53 +00:00
shabie
6b728713bb
log metrics for correct dataloader only ( #10522 )
...
Co-authored-by: tchaton <thomas@grid.ai>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-11-18 18:29:13 +01:00
Adrian Wälchli
1ff35ed0f5
Improve code quality in `AcceleratorConnector._configure_slurm_ddp` ( #10102 )
2021-11-17 23:10:47 +00:00
Carlos Mocholí
0fa07da987
Fail the test when a `DeprecationWarning` is raised ( #9940 )
2021-11-17 23:41:50 +01:00
Carlos Mocholí
c15b84dae7
Simplify hanging queue test ( #10591 )
2021-11-17 22:29:48 +00:00
Carlos Mocholí
ba036fdeea
Support special test parametrizations ( #10569 )
2021-11-17 15:46:14 +00:00
Carlos Mocholí
3b2e164cab
Fix `caplog` with `logger.propagate=False` ( #10577 )
2021-11-17 16:25:55 +01:00
Adrian Wälchli
d50e1696f9
Fix propagation of device and dtype properties in Lite modules ( #10559 )
2021-11-16 17:26:46 +00:00
Carlos Mocholí
af4af3d73a
Mock GPU accelerator connector tests ( #10554 )
2021-11-16 16:13:40 +00:00
Sean Naren
e98ace3adc
[DeepSpeed] Do not fail if batch size could not be inferred for logging ( #10438 )
2021-11-16 11:42:25 +00:00
Rohit Gupta
de7ef41fea
remove deprecated `reload_dataloaders_every_epoch` from `Trainer` ( #10481 )
2021-11-16 06:47:43 +00:00
Carlos Mocholí
6dfcb6afc5
Skip strategy=ddp_spawn, accelerator=cpu, python>=3.9 tests ( #10550 )
2021-11-16 10:06:47 +05:30
Rohit Gupta
60850ef510
fix overfit_batch sampler replacement logic ( #10486 )
...
Co-authored-by: thomas chaton <thomas@grid.ai>
2021-11-15 22:31:45 +00:00
Carlos Mocholí
dcafc95f2b
Avoid deprecated `progress_bar_refresh_rate` usage ( #10520 )
...
Co-authored-by: Danielle Pintz <38207072+daniellepintz@users.noreply.github.com>
2021-11-15 22:04:48 +01:00
thomas chaton
1de3539eac
Resolve instantiation problem with init_meta_context ( #10493 )
2021-11-15 19:13:01 +00:00
Kaushik B
ae71284627
Remove deprecated `disable_validation` property from Trainer ( #10450 )
2021-11-15 18:42:00 +00:00
Kaushik B
01cf7a2ac5
Deprecate `DistributedType` in favor of `StrategyType` ( #10505 )
2021-11-15 17:10:08 +00:00
Shivam Mehta
794c4b08c0
Remove deprecated `is_overridden(model=...)` ( #10507 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-11-15 12:56:30 +00:00
puhuk
8b0cb47cc0
Remove deprecated `hpc_load` in `CheckpointConnector` ( #10525 )
...
Co-authored-by: Aki Nitta <nitta@akihironitta.com>
2021-11-15 11:54:47 +00:00
thomas chaton
ffb40060c0
shutdown workers on failure ( #10463 )
2021-11-15 10:03:46 +00:00
Carlos Mocholí
7a9a08c5d3
Drop torch 1.6 testing ( #10390 )
...
* Drop torch 1.6 support
* Drop 1.6 support
* Update CHANGELOG
* Fixes
* Split change
* Undo change
* 1.7 -> 1.7.1
https://github.com/pytorch/pytorch/issues/47354
* Force trigger nightly
* Update .github/workflows/events-nightly.yml
Co-authored-by: Aki Nitta <nitta@akihironitta.com>
* Revert 1.7.1 change - try wildcard
* Update adjust versions and test it
* Undo test changes
* Revert "Undo test changes"
This reverts commit 3a6acadd11
.
* Update CHANGELOG.md
Co-authored-by: Aki Nitta <nitta@akihironitta.com>
2021-11-13 20:35:03 +00:00
Rohit Gupta
a8c2725ff8
remove deprecated signature for `transfer_batch_to_device` ( #10480 )
...
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-11-13 19:32:30 +00:00
Kaushik B
fabb364402
Remove deprecated `mode` argument from ModelSummary ( #10449 )
...
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-11-12 19:32:43 +00:00
Carlos Mocholí
847e24011a
Squeeze the early stopping monitor ( #10461 )
2021-11-12 18:03:47 +00:00
Rohit Gupta
fa0ed17f8a
remove deprecated train_loop ( #10482 )
...
* remove deprecated train_loop
* chlog
2021-11-12 12:42:25 +00:00
Raahul Singh
09cf167237
Change attributes of `RichProgressBarTheme` dataclass ( #10454 )
...
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
2021-11-11 19:53:40 +00:00
Carlos Mocholí
5ba5b72473
Update tests to avoid the deprecated `weights_summary` ( #10446 )
2021-11-11 18:15:18 +01:00
Kaushik B
d577f461a4
Remove deprecated `utilities.distributed.rank_zero_{warn,deprecation}` ( #10451 )
2021-11-10 07:35:48 -08:00
a-gardner1
ce149f6451
Fix support for dataclasses with ClassVar/InitVar in `apply_to_collection` ( #9702 )
...
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-11-10 04:42:27 +00:00
Carlos Mocholí
d515bcac96
Remove deprecated profiler import ( #10443 )
2021-11-09 23:13:02 +01:00
thomas chaton
8d810d6144
Enable distributed training with CombinedDataLoader and max_size_cycle ( #10374 )
...
* solve combinedloader
* update
* update changelog
* update on comments
* resolve iterable dataset support
* update test description
* update
* update on comments
* update
* Accelerator auto
* Address review
* Refactor
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-11-09 20:06:10 +00:00
Carlos Mocholí
c413b69240
Remove deprecated `task_idx` ( #10441 )
2021-11-09 18:54:38 +00:00
Carlos Mocholí
ebab4be3e4
Remove deprecated `DeviceDtypeModuleMixin` import ( #10442 )
2021-11-09 18:35:53 +00:00
Ross Johnstone
c2f25d42ab
Make `monitor` required arg of EarlyStopping callback ( #10328 )
...
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-11-09 18:08:03 +00:00
Carlos Mocholí
069ec1005a
Do not autodetach extras ( #10424 )
...
* Do not autodetach extras
* Update CHANGELOG
* Use foo
2021-11-09 16:07:16 +00:00
thomas chaton
7fb277f260
Resolve workers being forcelly deleted with `persistent_workers=True` ( #10434 )
...
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-11-09 14:58:31 +00:00
Carlos Mocholí
edbf27430d
Remove deprecated `self.log` arguments ( #10423 )
2021-11-09 15:49:55 +01:00
Adrian Wälchli
aaa6aa75e9
Fix converting only float type tensors in Lite ( #10429 )
...
* fix
* less code
* add test case
* add test cases
* update input
* add test cases
* add type hint
* add changelog note
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
2021-11-09 15:21:00 +01:00
Kaushik B
5eeca87e98
Fix deadlocks for distributed training for RichProgressBar ( #10428 )
2021-11-09 18:30:37 +05:30
Rohit Gupta
21eafafcb0
disable step logging in epoch hooks ( #10409 )
...
* disable step logging in epoch hooks
* chlog
* Apply suggestions from code review
* chlog
2021-11-09 16:53:27 +05:30
puhuk
f9b9cdb0d1
Remove deprecated accelerator pass through functions in Accelerator ( #10403 )
...
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-11-08 17:36:37 +00:00
Adrian Wälchli
a270a79ed9
Rename "master" methods to "main" in ClusterEnvironment plugins ( #10103 )
...
* rename occurrences of master port, master address, maser node, master process
* rename properties
* add property decorators
* occurrences in docs
* update changelog
* update changelog
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* add lost method
* create deprecation
* add changelog
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* fix typo (but it was already there!!!)
* Apply suggestions from code review
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
* add todo
* update more occurences
* add types
* add missing import
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
2021-11-08 12:32:58 +00:00
Carlos Mocholí
613aa09514
Revert part of #10279 ( #10376 )
2021-11-08 11:28:58 +00:00
Espen Haugsdal
89e1360e75
Fix pickling error with CSVLogger ( #10388 )
...
* Don't store csv.Dictwriter in ExperimentWriter
* Add test for pickle after .save()
* Add entry in changelog
2021-11-08 10:36:35 +00:00
puhuk
c58f84c176
Remove deprecated master_params attributes in PrecisionPlugin ( #10372 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-11-08 02:42:03 +00:00
Adrian Wälchli
45f6a3b175
Fix DataLoader inspection and re-instantiation in Lite ( #10334 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-11-05 17:31:45 +00:00
Connor Anderson
1c28f361d4
Remove `every_n_val_epochs` from ModelCheckpoint ( #10366 )
...
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-11-05 15:19:33 +00:00
Saurav Maheshkar
a9bd4fbd96
Remove deprecated property `configure_slurm_dpp` from accelerator connector ( #10370 )
...
* Remove deprecated configure_slurm_ddp
* Update CHANGELOG
* Remove deprecated tests from test suite
2021-11-05 14:11:30 +00:00
puhuk
9c4112ce1c
Remove deprecated sync_batchnorm and num_nodes attributes in DDP plugins ( #10357 )
...
* Remove deprecated sync_batchnorm and num_nodes attributes in DDPPlugin
Part of #10312
test_v1_6_0_ddp_num_nodes()
test_v1_6_0_ddp_sync_batchnorm()
* Remove deprecated sync_batchnorm and num_nodes attributes in DDPPlugin
Part of #10312
test_v1_6_0_ddp_num_nodes()
test_v1_6_0_ddp_sync_batchnorm()
* remove deprecation warnings
* apply removal to spawn plugin
* update changelog
* remove num_nodes in deepspeed
* remove unused imports
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-11-05 10:13:12 +00:00
four4fish
973305c6a5
Add more trainer config tests ( #10319 )
...
* Add more trainer config tests
* Add more trainer config and ttp register tests
* Add more trainer config and ttp register tests
2021-11-05 10:42:58 +01:00
Saurav Maheshkar
6b5e185d07
Remove deprecated property `is_slurm_managing_tasks` from accelerator connector ( #10353 )
...
* Remove deprecated property _slurm_managing_tasks from accelerator connector
* Update CHANGELOG
* Update Changelog
* Removed is_slurm_managing_tasks from AcceleratorConnector
* resolve merge conflict
* add back accidentally removed lines
* remove test
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-11-05 09:38:53 +00:00
Alexandre Mayerowitz
b3c0f121ca
Remove deprecated datamodule lifecycle properties ( #10350 )
...
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-11-05 05:03:57 +00:00
Adrian Wälchli
3664659094
Remove deprecated method `ClusterEnvironment.creates_children` ( #10339 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-11-04 17:11:32 +00:00
Peter Dudfield
ce3e63262a
Fix failure when `DataLoader(batch_size=None)` is passed ( #10345 )
...
* add test, + add change to data loading batch sample method
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Refactor and CHANGELOG
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-11-04 12:46:57 +01:00
puhuk
412f0a4d24
Remove deprecated dataloader arguments in Trainer methods ( #10325 )
...
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-11-04 11:03:39 +01:00
Connor Anderson
6f00ba21c2
Remove deprecated `loaded_optimizer_states_dict` property ( #10346 )
...
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-11-04 01:51:46 +01:00
Carlos Mocholí
ba23d91320
Update recommendation on `dataloader_idx` ( #10318 )
2021-11-04 01:39:55 +01:00
Danielle Pintz
c5d011c3cf
Remove `TrainerModelHooksMixin` ( #10322 )
...
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-11-03 20:26:59 +00:00
Carlos Mocholí
93caa7cda9
Fix `apply_to_collection(defaultdict)` ( #10316 )
2021-11-03 11:18:10 +00:00
Ning
f6ed0bd8ca
introduce has_len_all_ranks() to check the length of dataloader across ranks ( #9827 )
...
* introduce , udpate tests
* update CHANGELOG.md
* change staticmethod and hook attribute naming
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* fix typo
* remove non-essential comment
* fix merge error and comment format
* try to fix test_tpu.py failure
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* update on comments
* chlog
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* chlog
* update
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* try fix
* Revert back TPUSpawn changes
* Update test
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: thomas chaton <thomas@grid.ai>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: Kaushik B <kaushikbokka@gmail.com>
2021-11-02 13:22:58 -04:00
Kaushik B
34fcb87a2b
Add `leave` argument to RichProgressBar ( #10301 )
...
* Add display_every_n_epochs argument to RichProgressBar
* Add tests
* Update test
* Update test
* Update changelog
* use leave argument instead
* Update pytorch_lightning/callbacks/progress/rich_progress.py
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: thomas chaton <thomas@grid.ai>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-11-02 13:20:52 -04:00
Adrian Wälchli
373c32e34b
Fix yielding from iterator in LiteDataLoader ( #10304 )
...
* fix yielding form iterator
* update description
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* remove unused code
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-11-02 11:40:35 +01:00
Adrian Wälchli
3cd65b592b
Lightning Lite Examples ( #9987 )
...
Co-authored-by: Kaushik B <kaushikbokka@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
Co-authored-by: SeanNaren <sean@grid.ai>
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
Co-authored-by: thomas chaton <thomas@grid.ai>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: four4fish <88516121+four4fish@users.noreply.github.com>
Co-authored-by: Nicki Skafte Detlefsen <skaftenicki@gmail.com>
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
Co-authored-by: Pietro Lesci <61748653+pietrolesci@users.noreply.github.com>
2021-11-02 08:04:29 +00:00
Rohit Gupta
e4ee6df196
Add warning if multiple batch_sizes are found from ambiguous batch ( #10247 )
2021-11-01 19:50:30 +00:00
victorjoos
cc0e9f96a8
Add support for empty `gpus` list to run on CPU ( #10246 )
...
Co-authored-by: rohitgr7 <rohitgr1998@gmail.com>
2021-11-01 18:37:38 +00:00
thomas chaton
facaff94b8
Add custom dataloader support with Lite ( #10279 )
2021-11-01 18:33:13 +00:00
Kaushik B
c52d7ba73d
Add `configure_columns` method to RichProgressBar ( #10288 )
...
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2021-11-01 17:22:53 +00:00
Rohit Gupta
6609b2e46f
enable `on_load_checkpoint` for `datamodule` for all `trainer_fn` ( #10238 )
2021-11-01 14:20:46 +00:00
Kaushik B
45c45dc7b0
Deprecate `ProgressBar` and rename it to `TQDMProgressBar` ( #10134 )
...
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-11-01 11:42:21 +00:00
Kaushik B
2ee6d9fbc7
Fix `distrib_type` not being set when Plugin instances being passed to Trainer ( #10251 )
2021-11-01 17:11:57 +05:30
Carlos Mocholí
2b24be2e45
Simplify `LightningOptimizer` ( #10224 )
2021-10-30 15:56:15 +00:00
Kaushik B
e0f7dbdd1c
Add support for `devices='auto'` ( #10264 )
2021-10-30 15:05:51 +00:00
Carlos Mocholí
9237106451
Clip before step ( #10248 )
2021-10-30 11:27:49 +01:00
Adrian Wälchli
9d136a9fc5
Lightning Lite core and tests ( #10175 )
2021-10-29 21:46:39 +00:00
Kaushik B
cedaebfcbb
Add `auto_device_count` method to `Accelerators` ( #10222 )
...
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-10-29 22:31:32 +02:00
Gili Tzabari
a967b6eba0
del iterator on_run_end() ( #9915 )
2021-10-29 16:29:44 +00:00
Carlos Mocholí
e4eb61d812
Raise exception for `strategy=ddp_cpu|tpu_spawn` ( #10185 )
2021-10-29 16:15:24 +00:00
Carlos Mocholí
81d15c5986
Implement double optimizer closure for hook structure consistency ( #10167 )
2021-10-29 13:03:04 +00:00
thomas chaton
bd77f65463
Resolve batch_size in ResultCollection not resetted to 1 on epoch end ( #10242 )
2021-10-29 13:55:11 +01:00
thomas chaton
843bf26297
Fix `log(sync_dist=True, on_epoch=True, on_step=True)` not reducing on step ( #10227 )
...
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-10-29 12:08:32 +00:00
Carlos Mocholí
4bc73b2b76
Avoid deprecated usage in accelerator connector tests ( #10184 )
2021-10-29 12:36:21 +01:00
Ning
dbfadedfe7
Revert "Add support for `len(datamodule)` ( #9895 )" ( #10072 )
...
This reverts commit 6429de8944
.
2021-10-29 13:33:51 +02:00
Rohit Gupta
6a9adf26f7
Replace `_TORCH_GREATER_EQUAL_DEV_1_10` with `_TORCH_GREATER_EQUAL_1_10` ( #10240 )
2021-10-29 10:36:02 +00:00
thomas chaton
5f4ffdee41
cleanup ( #10081 )
2021-10-29 08:40:43 +00:00
Adrian Wälchli
3f9dfe4949
Fix iterating over a DummyLogger when `fast_dev_run > 0` ( #10232 )
2021-10-29 07:22:59 +00:00
Kaushik B
762af9505b
Add missing test for testing custom registered training plugin ( #10225 )
2021-10-29 04:06:06 +00:00
thomas chaton
255e3edc98
resolve failing test ( #10191 )
...
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-10-28 15:27:03 +00:00
Carlos Mocholí
03f01fb5ec
Fix gradient norm tracking and gradient clipping ( #9287 )
...
* WIP
* Progress
* Undo test change
* Fix plugin closure execution order
* Update CHANGELOG
* Fix manual optimization on AMP and skipping backward
* Fix for deepspeed
* Typo
* Hook test for manual closure
* Add skipping test with AMP
* You are hideous, apex
* Add deepspeed test
* Update CHANGELOG
* Fix for broken master
* Add RunIf
* FIXMEs
* Rename
* Fix grad norm
* add a simple test
* update test
* update test
* update test
* fix merge conflicts
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Sea of changes
* Undo change
* Introduce TPUPrecisionPlugin
* Undo changes
* Undo changes
* Resolve FIXME
* Undo change
* Undo change
* Undo change
* Fix FIXMEs
* Fix FIXME
* Correct value
* Bad merge
* Fix circular imports
* WIP
* Fixing clipping
* Fixes
* Bad merge
* Move optimizer step and clipping into the `PrecisionPlugin`
* Fix AMP
* Update CHANGELOG
* Fix tests
* Underscore
* Progress
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Remove pre_optimizer_step
* Missed one
* Progress
* Progress
* Fix test
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Update FIXMEs
* Fix test
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Fix test
* DeepSpeed warning. mypy
* Rename
* Finish tests
* Update CHANGELOG
* Dumb fixes
* accelerator=auto
* Apply suggestions from code review
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
* Update on comments
* Use ClassifModule
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2021-10-28 15:23:27 +00:00
Carlos Mocholí
5262b63dff
Pass the scaler as an input to `NativeMixedPrecisionPlugin` ( #10055 )
...
Co-authored-by: thomas chaton <thomas@grid.ai>
2021-10-28 14:13:53 +00:00
Low Weng Fei
83d74bb385
Fix `reset_seed()` converting the `PL_SEED_WORKERS` environment variable `str` read to `bool` ( #10099 )
...
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: tchaton <thomas@grid.ai>
2021-10-28 12:57:41 +00:00
Rohit Gupta
9af1dd7443
Deprecate `lr_sch_names` from `LearningRateMonitor` ( #10066 )
2021-10-28 12:57:04 +00:00
Rohit Gupta
85eb17cde5
initialize poptorch_models based on trainer_fn ( #10149 )
2021-10-28 11:59:52 +00:00
Carlos Mocholí
dbe1662dc3
Replace `_TORCH_GREATER_EQUAL_DEV_1_10` with `_TORCH_GREATER_EQUAL_1_10` ( #10157 )
2021-10-27 13:38:39 +01:00
Kaushik B
c33df2639f
Set `dataset` attribute to `MpDeviceLoader` used in TPU Spawn ( #10151 )
2021-10-27 01:23:01 +05:30
Carlos Mocholí
48b6292cf0
Move optimizer step and clipping into the `PrecisionPlugin` ( #10143 )
2021-10-26 17:26:26 +02:00
Carlos Mocholí
a0e45dc071
Some minor CI cleanup ( #10088 )
2021-10-26 13:58:20 +02:00
twsl
971281d27d
Make sure file and folder exists in Profiler ( #10073 )
...
Co-authored-by: tchaton <thomas@grid.ai>
2021-10-26 11:13:31 +00:00
Adrian Wälchli
871a96701a
Rename `master_params` to `main_params` ( #10105 )
...
Co-authored-by: tchaton <thomas@grid.ai>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2021-10-26 11:17:32 +02:00
Rohit Gupta
34d5980df6
Raise `MisconfigurationException` if `trainer.eval` is missing required methods ( #10016 )
2021-10-25 23:12:08 -07:00
Danielle Pintz
13d6d7bad1
Remove `optimizer_connector.py` ( #10120 )
2021-10-26 00:52:43 +00:00
Adrian Wälchli
21a5867dad
Rename `ClusterEnvironment.creates_processes` ( #10106 )
...
Co-authored-by: tchaton <thomas@grid.ai>
2021-10-25 23:15:41 +00:00
Rajat Goel
47e7a2860f
Fix Enums parsing in generated hparms yaml ( #9170 )
...
Co-authored-by: tchaton <thomas@grid.ai>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2021-10-25 21:23:20 +00:00
Eric Wiener
0e20119d24
Change default value of the `max_steps` Trainer argument from `None` to `-1` ( #9460 )
...
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: thomas chaton <thomas@grid.ai>
Co-authored-by: rohitgr7 <rohitgr1998@gmail.com>
2021-10-25 20:21:33 +00:00
Rohit Gupta
d9dfb2e920
fix tests ( #10138 )
2021-10-25 19:37:47 +00:00
Danielle Pintz
1f7bd6650c
Mark accelerator connector as protected ( #10032 )
2021-10-25 19:24:54 +00:00
jjenniferdai
6d79184ec5
Unify checkpoint load paths [redo #9693 ] ( #10061 )
2021-10-25 19:05:31 +00:00
Adrian Wälchli
76081fb846
Mark SLURM detection methods in `AcceleratorConnector` as protected ( #10101 )
...
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
2021-10-25 17:52:15 +00:00
Carlos Mocholí
2ee3127661
Use `torch.autocast` ( #10053 )
2021-10-25 17:33:52 +00:00
Carlos Mocholí
b376799430
Minor fixes related to clipping ( #10130 )
...
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2021-10-25 16:40:22 +00:00
manipopopo
cfb2d87765
Disable quantization aware training observers ( #8540 )
...
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: tchaton <thomas@grid.ai>
Co-authored-by: rohitgr7 <rohitgr1998@gmail.com>
2021-10-25 15:46:09 +00:00
Adrian Wälchli
7eb2edf421
rename set_random_master_port ( #10104 )
...
Co-authored-by: tchaton <thomas@grid.ai>
2021-10-25 12:09:05 +00:00
Danielle Pintz
e94dcf6936
Mark `trainer.data_connector` as protected ( #10031 )
...
Co-authored-by: tchaton <thomas@grid.ai>
2021-10-25 12:29:09 +01:00
Carlos Mocholí
f95ba20012
Do not use the base version by default in `_compare_version` ( #10051 )
2021-10-25 16:41:32 +05:30
thomas chaton
ed9802643c
[CI] Comment flaky tests ( #10084 )
2021-10-25 10:31:06 +02:00
Kaushik B
c3614f1c07
Fix: skip importing DistributedOptimizer for Windows ( #10071 )
2021-10-21 21:01:56 +00:00
thomas chaton
454e93bace
Add support for init_meta_context, materialize_module ( #9920 )
2021-10-21 15:48:31 +01:00
jjenniferdai
2d9db211b5
Revert "Support serialized checkpoint loading ( #9605 )" ( #10057 )
...
This reverts commit f0e6f1b58a
.
2021-10-21 02:51:22 +02:00
Kaushik B
aa1540410f
Add XLACheckpointIO ( #9972 )
...
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-10-21 02:39:16 +05:30
Rohit Gupta
1599c77d16
Fix `LearningRateMonitor` logging with multiple param groups optimizer with no scheduler ( #10044 )
2021-10-20 22:13:00 +05:30
Carlos Mocholí
6aeebf1bd3
Remove unnecessary dependency available checks ( #10050 )
2021-10-20 16:21:37 +00:00
Alessio Bonfiglio
2a2fa5a56a
Group all the logged gradients under the same sub-folder ( #7756 )
2021-10-20 15:48:36 +00:00
Kaushik B
56bc55db71
Update strategy flag in docs ( #10000 )
...
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-10-20 21:02:53 +05:30
kingyiusuen
2ed92ecabb
Rerun flaky profiler tests on failure ( #10035 )
2021-10-20 18:57:04 +05:30
Carlos Mocholí
f0b3e0f4de
Default to `precision=bf16` on CPU when `precision=16` is passed ( #10033 )
2021-10-20 13:25:13 +00:00
Adrian Wälchli
2c16f1d6b9
remove dataloader patching on the LightningModule ( #9764 )
...
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: rohitgr7 <rohitgr1998@gmail.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-10-20 15:23:20 +02:00
jjenniferdai
f0e6f1b58a
Support serialized checkpoint loading ( #9605 )
...
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-10-20 09:38:35 +01:00
Carlos Mocholí
53c62f63e8
Constrain IPU precision choices ( #10030 )
2021-10-20 00:52:01 +00:00
Carlos Mocholí
ad8d6c83da
[CLI] Shorthand notation to instantiate datamodules ( #10011 )
...
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-10-20 00:49:48 +00:00
Carlos Mocholí
e44921ee21
Fix `self.log(on_epoch=True, reduce_fx=sum)` on_batch_start ( #9791 )
2021-10-20 01:56:37 +02:00
Carlos Mocholí
d45897d522
Rename `TPUHalfPrecisionPlugin` to `TPUBf16PrecisionPlugin` ( #10026 )
...
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-10-19 21:09:37 +00:00
Ning
0b68f2abf8
Remove `reset_train_val_dataloaders` from Trainer and move data reloading logic to loop ( #9671 )
...
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
2021-10-19 21:45:52 +02:00
Carlos Mocholí
e8beceb631
Add `TPUPrecisionPlugin` ( #10020 )
...
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-10-19 17:48:57 +00:00
thomas chaton
1759403c8d
Add check for callable with datamodule len ( #10003 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2021-10-19 14:51:08 +00:00
Rohit Gupta
0aa220b46b
Remove deprecated `distributed_backend` from `Trainer` ( #10017 )
...
* rm distributed_backend from Trainer
* unused
* chlog
* internal distributed_backend
* Docstring
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-10-19 13:54:37 +00:00
Danielle Pintz
203737bfce
Don't raise DeprecationWarning for `LoggerConnector.gpus_metrics` ( #9959 )
2021-10-18 22:51:09 +00:00
Adrian Wälchli
a99b7440b5
Add unit tests for `pl.utilities.grads` ( #9765 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-10-18 18:58:51 +05:30
Rohit Gupta
4dc32ad7db
Fix logic to check for spawn in worker_check ( #9902 )
...
* fix
* update tests
* chlog
* skip windows
2021-10-18 13:02:46 +00:00
Carlos Mocholí
3f355d0eb7
Remove manual tracking of optimizer steps ( #9957 )
2021-10-18 12:43:06 +00:00
Carlos Mocholí
0684e5295f
Remove deprecated `DataModule.dims` usage in tests ( #9948 )
2021-10-18 17:35:41 +05:30
Carlos Mocholí
c69a79c86f
Fix `self.log(on_epoch=True)` on_batch_start ( #9780 )
2021-10-18 14:02:16 +02:00
Elad Segal
8c76cf5ae1
reset val dataloader for binsearch ( #9975 )
2021-10-18 12:54:26 +02:00
Carlos Mocholí
01b304ec57
Update accelerator connector messages after the addition of strategy ( #9937 )
2021-10-18 01:10:48 +00:00
Carlos Mocholí
788f6864d9
Fix `LightningOptimizer` step and toggling logic ( #9958 )
2021-10-18 00:23:51 +00:00
ronif
7b4df7bf91
Fix issue with no-init dataclass fields in move_to_device ( #9963 )
...
Co-authored-by: ronif <ronif@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-10-17 07:10:47 +00:00
Carlos Mocholí
e5dfdf34f9
Avoid deprecation warning after #9901 ( #9951 )
2021-10-16 17:36:25 +01:00
Kaushik B
5e8829b97d
(1/n) tests: Use strategy flag instead of accelerator for training strategies ( #9931 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-10-16 20:40:25 +05:30
Carlos Mocholí
e973bcb76a
Use non-deprecated options in tests ( #9949 )
2021-10-15 16:58:07 -07:00
Carlos Mocholí
db4e770004
Validate the precision input earlier ( #9763 )
2021-10-15 17:30:00 +00:00
kingyiusuen
6429de8944
Add support for `len(datamodule)` ( #9895 )
...
Co-authored-by: tchaton <thomas@grid.ai>
2021-10-15 14:19:50 +02:00
Danielle Pintz
16213b1635
Deprecate `log_gpu_memory`, `gpu_metrics`, and util funcs in favor of `DeviceStatsMonitor` callback ( #9921 )
...
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2021-10-14 22:45:44 +02:00
Oliver Borchert
afbf703684
Single-process multi-node CPU training ( #9603 )
...
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: thomas chaton <thomas@grid.ai>
2021-10-14 22:21:41 +02:00
Kaushik B
af4a8f1950
Refactor tests for TPU Accelerator ( #9718 )
...
Co-authored-by: tchaton <thomas@grid.ai>
2021-10-14 19:45:15 +00:00
Danielle Pintz
6feda08109
Deprecate `GPUStatsMonitor` and `XLAStatsMonitor` in favor of `DeviceStatsMonitor` ( #9924 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Nicki Skafte Detlefsen <skaftenicki@gmail.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2021-10-14 15:52:45 +00:00
four4fish
a002f872ea
[2/n] Directly call TrainingTypePlugin APIs instead of going through the Accelerator ( #9901 )
...
Co-authored-by: tchaton <thomas@grid.ai>
2021-10-14 17:38:22 +02:00
Viraj Bagal
15698698c4
Log LR using LearningRateMonitor even when LR Scheduler is not defined. ( #9786 )
...
* LR logging works even with no lr scheduler, wrote few extra tests as well
* updated changelog
* modified code as suggested by DeepSource
* added helper functions
* opt with no scheduler
* rename
* chlog
* update test
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: rohitgr7 <rohitgr1998@gmail.com>
2021-10-14 13:28:19 +00:00
Danielle Pintz
940b910d27
[2/4] Add DeviceStatsMonitor callback ( #9712 )
...
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
Co-authored-by: thomas chaton <thomas@grid.ai>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Kaushik B <kaushikbokka@gmail.com>
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
2021-10-13 18:29:36 +00:00
Rohit Gupta
23e8b59ae7
Add `configure_gradient_clipping` hook in `LightningModule` ( #9584 )
...
* init hook
* docs
* dep train args
* update tests
* doc
* doc
* .gitignore
* not dep
* add trainer args
* add & update tests
* fix tests
* pre-commit
* docs
* add docs
* add exception
* code review
* deepspeed
* update tests
* not
* try fix
* Apply suggestions from code review
* update deepspeed
* disable some tests
* disable some tests
* enable all tests
2021-10-13 20:15:13 +05:30
Kaushik B
05b15e63f0
Add `strategy` argument to Trainer ( #8597 )
...
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-10-13 12:34:06 +00:00
ananthsub
28fc8d2016
Add `enable_model_summary` flag and deprecate `weights_summary` ( #9699 )
...
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Kaushik B <kaushikbokka@gmail.com>
2021-10-13 17:20:54 +05:30
Rohit Gupta
0f8fd20443
Remove epoch from `trainer.logged_metrics` ( #9904 )
2021-10-13 11:30:27 +02:00
ananthsub
4610fddb19
Mark `Trainer.terminate_on_nan` protected and deprecate public property ( #9849 )
...
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-10-12 20:23:22 +00:00
Danielle Pintz
dd6d797e0e
Remove type error handling in _configure_checkpoint_callbacks ( #9823 )
...
* remove type error handling in _configure_checkpoint_callbacks
* rm test
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2021-10-12 20:13:02 +00:00
Adrian Wälchli
b530b7afd2
update tests to not rely on patched dataloaders ( #9905 )
2021-10-12 12:45:28 +02:00
Rohit Gupta
98c0a110e0
Update docs for `GradientAccumulationScheduler` ( #9891 )
...
* update docs and add tests
* update docs and add tests
* Update pytorch_lightning/callbacks/gradient_accumulation_scheduler.py
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
* Apply suggestions from code review
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-10-12 10:37:16 +00:00
Rohit Gupta
f2b0db60f1
Raise a `MisconfigurationException` when trainer functions are called with `ckpt_path="best"` but `checkpoint_callback` isn't configured ( #9841 )
...
* add check
* chlog
* Apply suggestions from code review
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
* Apply suggestions from code review
Co-authored-by: thomas chaton <thomas@grid.ai>
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
Co-authored-by: thomas chaton <thomas@grid.ai>
2021-10-12 15:35:55 +05:30
Sean Naren
6da5829e53
DeepSpeed support for device IDs ( #9847 )
2021-10-12 09:24:46 +00:00
Rohit Gupta
db322f4bbb
Deprecate `checkpoint_callback` from the `Trainer` constructor in favour of `enable_checkpointing` ( #9754 )
...
* enable_chekpointing
* update codebase
* chlog
* update tests
* fix warning
* Apply suggestions from code review
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Apply suggestions from code review
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
* Apply suggestions from code review
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-10-12 07:55:07 +00:00
Kaushik B
14fb076a30
Fix deprecation test version for accelerator collective ( #9892 )
2021-10-12 11:50:31 +05:30
Sean Naren
83acb8671d
Update DeepSpeed version, fix failing tests ( #9898 )
2021-10-11 22:35:33 +00:00
yopknopixx
173f4c8466
Deprecate `terminate_on_nan` Trainer argument in favor of `detect_anomaly` ( #9175 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-10-11 17:17:43 +00:00
Adrian Wälchli
6a0c47a014
remove redundant accumulation normalization in manual optimization ( #9769 )
2021-10-11 15:26:12 +00:00
Ranuga-Disansa
f915a8a283
Removed a redundant warning with `ModelCheckpoint(monitor=None)` callback ( #9875 )
...
* Update README.md
* Update README.md
* Create evaluation.py
* Update README.md
* Update evaluation.py
* Create evaluation.py
* Create evaluation.py
* Update evaluation.py
* Create nlp.py
* Update evaluation.py
* Create evaluation.py
* Update nlp.py
* Update nlp.py
* Update evaluation.py
* Create evaluation.py
* Update nlp.py
* Update nlp.py
* Update requirements.txt
* Update evaluation.py
* Create data_loader.py
* Update nlp.py
* Update evaluation.py
* Update data_loader.py
* Update nlp.py
* Update data_loader.py
* Update requirements.txt
* Update model_checkpoint.py
* Delete evaluation.py
* Delete data_loader.py
* Delete nlp.py
* Update requirements.txt
* Update model_checkpoint.py
* Update README.md
* Update pytorch_lightning/callbacks/model_checkpoint.py
* Update CHANGELOG.md
* Update test_model_checkpoint.py
* Update model_checkpoint.py
* update
* update
* chlog update
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2021-10-11 14:54:07 +00:00
Boris Dayma
2db9ea3500
feat(wandb): support media logging ( #9545 )
2021-10-11 10:15:36 +01:00
Rohit Gupta
d71501d97f
Reset `val_dataloader` in `tuner/batch_size_scaling` ( #9857 )
...
* reset val
* chlog
2021-10-11 09:13:33 +01:00
kingyiusuen
8740c801bb
Fix typo in _validate_scheduler_optimizer() ( #9886 )
2021-10-11 09:16:17 +02:00
ananthsub
5206e52786
Add support for `torch.set_detect_anomaly` ( #9848 )
...
* Add support for `detect_anomaly`
* Update CHANGELOG.md
2021-10-07 16:03:56 +00:00
Rohit Gupta
4decbc0d95
Deprecate `dataloader_idx` from `on_train_batch_start/end` ( #9816 )
...
* deprecate hooks
* dep todo
* explicit
* Apply suggestions from code review
* Apply suggestions from code review
* code review
* base
2021-10-07 10:18:11 +00:00
Rohit Gupta
8a8ecb8d01
Update the logic to check for accumulation steps with deepspeed ( #9826 )
...
* support_dict
* chlog
* fix test
* epochs
2021-10-06 17:50:10 +01:00
Rohit Gupta
b303b4f895
Fix restoring training state during `trainer.fit` only ( #9413 )
...
* reload state on fit
* trainer.state
* add test
* chlog
* revert
* review
* review
* rev and ammend
* fix test and logic
* update
* code review
* Apply suggestions from code review
* better assertions
* better assertions
* Apply suggestions from code review
* add loop test
* Apply suggestions from code review
* Split for typing
* review comments
* review comments
* use if_else
* code review
* code review
* code review
* Apply suggestions from code review
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
* Remove unnecessary pieces from the test
* move test
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-10-06 14:57:40 +00:00
Jirka Borovec
b3e9dff32d
rename callback FineTune arg `round` ( #9711 )
...
* rename CB Tune arg round
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-10-06 09:39:36 +01:00
Kaushik B
f94faa9cd3
Enable auto parameters tying for TPUs ( #9525 )
...
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-10-06 10:16:44 +02:00
Elad Segal
86ad941d06
Fix missing arguments when saving hyperparams from parent class only ( #9800 )
...
* Fix missing arguments when saving hyperparams from parent class only
* fix antipattern
2021-10-06 08:32:29 +01:00
Danielle Pintz
3392215ef6
Fix broken `test_cpu_amp_precision_context_manager` ( #9809 )
...
* @RunIf(min_gpus=1)
* dtype -> fast_dtype
2021-10-04 12:14:13 +00:00
kingyiusuen
6d530373c0
Add warnings regarding unsupported keys in optim config and OneCycleLR ( #9666 )
...
* Add warnings regarding unsupported keys in optim config and OneCycleLR
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Fix docstring
* Update CHANGELOG.md
* Split into two parts
* Use difference operator to find extra keys
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-10-04 08:25:05 +00:00
thomas chaton
5841ca9782
[Feat] Add auto_restart for fault tolerant training ( #9722 )
2021-10-01 16:37:17 +00:00