Commit Graph

379 Commits

Author SHA1 Message Date
B. Kerim Tshimanga 07ee8fc9a0
Remove deprecated property `ModelCheckpoint.period` in favor of `ModelCheckpoint.every_n_epochs` (#9213) 2021-08-31 10:04:29 +02:00
Adrian Wälchli 0abd6e94b5
[3 / 3] improvements to saving and loading callback state (#7161)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-08-26 10:02:49 +02:00
Adrian Wälchli b9443a07b9
[2 / 3] improvements to saving and loading callback state (#7187)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2021-08-24 17:35:19 +00:00
Kaushik B 538e743f17
feat: Add Rich Progress Bar (#8929)
Co-authored-by: thomas chaton <thomas@grid.ai>
2021-08-24 02:40:36 +00:00
Carlos Mocholí b1a859f312
Remove deprecated `on_{save,load}_checkpoint` signature (#8697)
Co-authored-by: Yifu Wang <yifuwang2012@gmail.com>
2021-08-21 22:48:28 -07:00
Michele Sanna 9ff0c22e43
Handle the case with no queries in `GPUStatsMonitor` (#9014)
Co-authored-by: Michele Sanna <{ID}+{username}@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-08-21 05:22:33 +02:00
Adrian Wälchli 5329b0d113
Fix "line too long" PEP8 complaint (#8957) 2021-08-18 03:31:36 +00:00
Danielle Pintz bd13d392af
Add error handling for all trainer entry points (#8819)
* [lightning] Ensure error handling works different trainer entry points
2021-08-18 02:04:40 +00:00
Carlos Mocholí bfeffde8f4
Smart handling of `EarlyStopping.check_on_train_epoch_end` (#8888)
* Smart handling of `EarlyStopping.check_on_train_epoch_end`

* dummy value

* Extra flag
2021-08-14 08:50:39 +02:00
Carlos Mocholí 7d87879350
Fix SWA with a list of learning rates (#8747)
* Fix swa lrs - needs test

* Add test

* Update CHANGELOG
2021-08-14 08:50:08 +02:00
christopherfish 0749c1e7d8
Remove call to deprecated fit_loop (#8873) 2021-08-13 10:06:36 +02:00
Stefan Wijnja c77cd518b5
Fix on_train_batch_end signature and call in ProgressBarBase example (#8836) 2021-08-12 12:24:12 +00:00
Carlos Mocholí 4928dc5579
Improve SWA docs (#8717) 2021-08-05 16:07:50 +00:00
Carlos Mocholí 299e289980
Remove deprecated `on_save_checkpoint` argument (#8688) 2021-08-05 16:16:30 +01:00
Sean Naren e5d9e21dea
Fix save/load/resume from checkpoint for DeepSpeed Plugin (#8397) 2021-08-02 22:31:05 +00:00
Carlos Mocholí ca96b2d23e
Delete deprecated save function (#8680) 2021-08-02 19:28:31 +02:00
Carlos Mocholí 93784da2c3
Fix pre-commit blacken-docs failures (#8624) 2021-07-30 12:10:15 +00:00
Carlos Mocholí 0dc0472e1f
Use class name in SWA info message (#8602) 2021-07-29 09:39:46 +02:00
Adrian Wälchli 8c27fa71fa
[1 / 3] improvements to saving and loading callback state (#6886)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-07-29 00:12:32 +02:00
Carlos Mocholí 47c47faeae
Remove `outputs` in `on_train_epoch_end` hooks (#8587) 2021-07-28 18:27:54 +02:00
Jirka Borovec 0a71fe2859
CI: black docs (#8566)
* black docs

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-07-28 18:08:31 +02:00
Carlos Mocholí e63968ab88
Add `pyupgrade` to `pre-commit` (#8557)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-07-26 14:38:12 +02:00
Carlos Mocholí a64cc37394
Replace `yapf` with `black` (#7783)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-07-26 13:37:35 +02:00
Elad Segal 07635d0e86
fix restoring finetune callbacks after accelerator setup on training resume (#8501)
Co-authored-by: tchaton <thomas@grid.ai>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-07-23 19:49:32 +02:00
Carlos Mocholí f7027a8701
Remove `torch >= 1.6` checks (#8523) 2021-07-23 04:03:20 +00:00
Jirka Borovec b7dbcc3e13
Quant as optional step (#8464)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-07-22 12:44:27 +00:00
thomas chaton 063f5ba73e
[bugfix] Re-compute accumulated_grad_batches (#8493)
* resolve resolution

* update changelog

* typo

* optimize test

* update on comments

* resolve comments

* update
2021-07-21 10:46:25 +00:00
thomas chaton ea13f6021c
[bugfix] Prevent deepcopy of dataloaders / Trainer in SWA Callback (#8472)
* resolve deepcopy

* update changelog

* move private

* update on comments

* Update CHANGELOG

* Set skipped attributes to None

* Simplify test

* update

* update changelog

* update

* update on comments

* typo

* update

Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-07-20 18:31:49 +00:00
deepsource-autofix[bot] 4bc3d70ad9
Remove unnecessary generator (#8470)
Co-authored-by: deepsource-autofix[bot] <62050782+deepsource-autofix[bot]@users.noreply.github.com>
2021-07-19 14:02:07 +00:00
Xuehai Pan 2c5d94d98b
Fix: handle logical CUDA device IDs for GPUStatsMonitor if `CUDA_VISIBLE_DEVICES` set (#8260)
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-07-19 11:42:43 +00:00
Carlos Mocholí 710df398c9
Remove `check_checkpoint_callback` (#7724)
Co-authored-by: thomas chaton <thomas@grid.ai>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-07-19 11:29:00 +00:00
deepsource-autofix[bot] cbf71d0a14
Remove unnecessary comprehension (#8405) 2021-07-19 08:30:24 +00:00
Adrian Wälchli 8c5042e1a8
fix internal call to deprecated train_loop (#8434) 2021-07-16 02:24:18 +02:00
Carlos Mocholí 6ce77a102b
Set minimum PyTorch version to 1.6 (#8288)
Co-authored-by: Jirka <jirka.borovec@seznam.cz>
2021-07-13 17:12:49 +00:00
Carlos Mocholí 321689f52e
Add `ModelCheckpoint(save_on_train_epoch_end)` (#8389)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-07-13 14:47:59 +00:00
Carlos Mocholí 733cdbb9ad
`every_n_val_epochs` -> `every_n_epochs` (#8383) 2021-07-13 01:20:20 +02:00
Dusan Drevicky 1b06edf2f2
Add the `on_before_optimizer_step` hook (#8048)
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-07-09 13:30:52 +02:00
thomas chaton 1c825a2a9c
Add the `on_before_backward` hook (#7865)
* Add callback to hook tests and add predict test

* Fix lambda callback test

* Simplify lambda call test

* Use LambdaCallback

* Dynamically append to called for the model

* Remove print

* Consistency

* Consistency

* Prepare args/kwargs testing

* yapf doesn't like dict literals

* Add arguments for fit no val test

* Add arguments for fit no val test

* add before_backward_hook

* add test

* resolve flake8

* resolve tests

* update changelog

* add on_before_backward to LightningModule

* update on comments

* Test arguments

* Datamodule refactor

* Fix eval test

* remove extra file

* resolve bug

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* move to hooks

* update

* resolve flake8

* update on comments

* Update full fit + val test

* Update test

* Remove FIXME

* Remove FIXME

* Undo change

* Fix

* Parametrize fit hook test

* Comment

* Parametrize fit hook test with different precision plugins

* Fix tests

* Parametrize fit hook test with manual optimization

* Unnecessary parenthesis

* WIP

* Comments

* Fix message

* Test CI error

* Revert "Test CI error"

This reverts commit 39c4a85a83.

* Add ddp training type teardown

* Update CHANGELOG

* Adrian's fix

* Use destructor

* Update CHANGELOG.md

* RPC destructor

* Update pytorch_lightning/plugins/training_type/ddp.py

* Why do you not work :(

* Missing condition

* Fix deepspeed test

* GC collect in conftest

* Do not show warnings for special tests

* Needs to run on 1.8

To avoid: "RuntimeError: NCCL error in: /pytorch/torch/lib/c10d/ProcessGroupNCCL.cpp:32, unhandled cuda error, NCCL version 2.4.8"

* Run torch 1.8

* Skip test due to 'Python bus error'

* Debug NCCL

* shm size

* Disable warnings for special tests

* Remove NCCL_DEBUG statement

* Try smaller shm size

* Revert "Skip test due to 'Python bus error'"

This reverts commit e0a3e8785d.

* README and adjust versions

* Avoid self.on_gpu call

* empty cache cleanup

* More garbage collection

* Unroll parametrizations

* Do not reuse mock

* Undo changes

* Undo notebooks modification

* resolve test

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* delete file

* Undo

* Fix test

* Revert "WIP"

This reverts commit f5828a8c42.

* Rename

* Remove optimizers

* Fix bug with LightningOptimizer

* Add optimizers

* update

* update

* Update CHANGELOG

* On after backward refactor

* Do not call super

* Fixes

* Remove should_accumulate

* pre/post backward refactor

* Call the LM backward hook

* Update tests

* Remove dev debug patch

* Fix test

* Remove optimizer arguments and typing

* Docs fixes

* Fix comment

* Undo changes

* Split manual and auto

* Undo change

* Deepsource

* Remove optimizers

* Undo changes

* Call the hook

* Docs

* Docs

Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-07-09 06:15:57 +00:00
Jaime Ferrando Huertas 9bbca402ff
Add auto_insert_metric_name to ModelCheckpoint docstring. (#8310)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-07-07 23:15:21 +00:00
Carlos Mocholí 07d7c37a79
Remove magic monitor support for `ModelCheckpoint` (#8293) 2021-07-07 18:36:19 +01:00
Carlos Mocholí 9877265887
Simplify logger connector access (#8318) 2021-07-07 14:13:30 +02:00
Adrian Wälchli 1e1d1821d0
fix best score on wrong device in EarlyStopping callback (#8295) 2021-07-06 10:59:33 +02:00
Carlos Mocholí 441e16f61c
Default `EarlyStopping.check_on_train_epoch_end=True` (#8286) 2021-07-05 15:45:23 +02:00
Kaushik B 3a8322deda
Add XLAStatsMonitor Callback (#8235) 2021-07-05 17:09:46 +05:30
Yuta Hayashibe 8193bae6bd
Add periods to the documentation (#8252) 2021-07-02 16:48:55 +02:00
Adrian Wälchli 6db0fe3659
training loop refactor - move val loop (#8120)
* EvaluationDataLoaderLoop -> EvaluationLoop

* proposed rename files

* imports

* bad merge

* update init files

* glue imports together

* rename fit_loop.validation_loop to fit_loop.val_loop

* move loop

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Group imports

* Resolve circular import

* Comment

* fix test

* try to resolve circ import

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-06-29 09:06:44 +00:00
Justus Schock d6435a5b73
Bugfix/swa iterable dset (#8172)
* add test

* add fix

* Update CHANGELOG.md
2021-06-28 21:18:25 +00:00
thomas chaton 1f025789fc
[bugfix] Clean Validation Sanity Checking metrics (#8171)
* resolve logging issue

* update changelog

* remove breakpoint

* resolve bugs

* remove pass
2021-06-28 13:49:56 -04:00
Ethan Harris 2a372e3682
Fix module dict in base finetuning (#8170)
* Fix module dict in base finetuning

* Update CHANGELOG.md
2021-06-28 10:55:32 +00:00
deepsource-autofix[bot] 03154eb30a
Refactor unnecessary `else` / `elif` when `if` block has a `return` statement (#8156)
Co-authored-by: deepsource-autofix[bot] <62050782+deepsource-autofix[bot]@users.noreply.github.com>
Co-authored-by: Jirka <jirka.borovec@seznam.cz>
2021-06-28 15:27:41 +05:30