Commit Graph

930 Commits

Author SHA1 Message Date
Carlos Mocholí ca96b2d23e
Delete deprecated save function (#8680) 2021-08-02 19:28:31 +02:00
Carlos Mocholí cf0d362658
Delete deprecated `TrainerTrainingTricksMixin` (#8679) 2021-08-02 18:00:32 +02:00
Rio H ba8053492f
Deprecate LightningModule.model_size (#8495)
Co-authored-by: Caleb Robinson <calebrob6@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-07-30 13:53:40 +00:00
Adrian Wälchli 529c42f848
fix collecting training_step outputs (#8613) 2021-07-30 13:03:15 +00:00
Carlos Mocholí 5789e9f5e4
Fix reference issues during epoch end result collection (#8621)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-07-30 12:16:47 +00:00
Carlos Mocholí 9720e264f5
Fix references for `ResultCollection.extra` and improve `str` and `repr` (#8622) 2021-07-30 12:47:34 +02:00
Adrian Wälchli b6ea6373dd
exclude mpi run from auto-detection of horovod (#8610) 2021-07-30 12:01:00 +02:00
Adrian Wälchli 7901d297d3
remove support for optimizer_idx in the training_step for manual optimization (#8576) 2021-07-29 08:30:45 +00:00
Carlos Mocholí c2199fbbee
Fix `trainer.fit_loop.split_idx` reference (#8601)
* Fix split idx reference

* Update CHANGELOG

* Add comment
2021-07-29 08:00:04 +00:00
Carlos Mocholí ebd2e87752
Delete deprecated `TrainerLoggingMixin` (#8609)
* Delete deprecated `TrainerLoggingMixin`
* Update CHANGELOG
* Delete from Trainer
2021-07-29 08:39:16 +02:00
Adrian Wälchli 8c27fa71fa
[1 / 3] improvements to saving and loading callback state (#6886)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-07-29 00:12:32 +02:00
Jirka Borovec 0c0b24c031
Prune deprecated metrics (#8586)
* drop metrics

* drop tests

* fix imports

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-07-28 16:57:31 +00:00
Carlos Mocholí 47c47faeae
Remove `outputs` in `on_train_epoch_end` hooks (#8587) 2021-07-28 18:27:54 +02:00
Sean Naren aadd2a9d9c
Load ckpt path when model provided in validate/test/predict (#8352)
* Change trainer loading behaviour for validate/test/predict

* Fix

* Fix/add tests

* remove

* Cleanups

* Space

* cleanups

* Add CHANGELOG.md

* Move after setup

* Cleanups on logic

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remve

* fix test

* feedback

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update pytorch_lightning/trainer/properties.py

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* Feedback

* Same fix

* Same fix

* Add test for behaviour, modify based on feedback

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Wording

* Apply suggestions from code review

Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* Cleanup docs

* Update pytorch_lightning/trainer/trainer.py

Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>

* feedback

* Fixes to test API

* Add carlos description

* Move logic further

* Move checkpoint connector logic

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
2021-07-28 10:12:46 +00:00
Jirka Borovec 6b47cbe3ca
Update version to `1.5.0dev` (#8585)
* dev + chlog

* Add placeholders

* Clean previous entry

* Add CHANGELOG fix

Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-07-27 12:52:18 -04:00
Carlos Mocholí 7914e494dd
Replace `iteration_count` and other index attributes in the loops with progress dataclasses (#8477)
* Delete `iteration_count` and `batches_seen`

* Update CHANGELOG

* Protect should accumulate

* Update pytorch_lightning/loops/epoch/training_epoch_loop.py
2021-07-27 18:36:20 +02:00
thomas chaton c7f8c8c3c8
[bugfix] DeepSpeed with no schedulers (#8580)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-07-27 15:28:10 +00:00
Kaushik B 39de7fefeb
Lightning Release v1.4 (#8579)
* Update Lightning version to v1.4

* update notebooks

* Update release date in Changelog

* docs

Co-authored-by: Jirka <jirka.borovec@seznam.cz>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-07-27 14:00:13 +00:00
Kaushik B 4b7f78e200
Add deprecation warning & test for distributed_backend flag (#8575)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-07-27 18:01:48 +05:30
Max d90cb7fceb
Bugfix: Scheduler monitor for manual optimization (#7643)
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: thomas chaton <thomas@grid.ai>
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Kaushik B <kaushikbokka@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2021-07-27 16:04:14 +05:30
Jirka Borovec 75e18a5298
v1.4.0rc2 (#8553)
* v1.4.0rc2

* Apply suggestions from code review

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* Apply suggestions from code review

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-07-26 10:01:23 -04:00
Ethan Harris 52526c20b5
Add support for functions to be parsed by the Lightning CLI in addition to Types (#8400)
* Initial commit

* Update docstrings

* Update CHANGELOG.md

* Fix mypy

* Fixes

* Fixes

* Update to comments

* Fix

* mypy

* Update on comments

* Update

* Fix mypy

* protected

Co-authored-by: thomas chaton <thomas@grid.ai>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-07-26 10:53:48 +02:00
Carlos Mocholí 6dbdf438e8
Support `DataLoader`s with missing arguments in `replace_sampler` (#8519)
* Support `DataLoader`s with missing arguments in `replace_sampler`

* Fix for multiprocessing context

* Fixes and test improvements

* Fixes and test improvements

* Fixes and test improvements

* Test any variadic name

* Update CHANGELOG

* Make sure extra attributes can be present

* Skip on old Windows

* Update pytorch_lightning/trainer/data_loading.py

* Update pytorch_lightning/trainer/data_loading.py

* Check is dataloader

* Typo
2021-07-26 10:04:21 +02:00
Elad Segal 07635d0e86
fix restoring finetune callbacks after accelerator setup on training resume (#8501)
Co-authored-by: tchaton <thomas@grid.ai>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-07-23 19:49:32 +02:00
Carlos Mocholí 4a64bc3fd3
Fix DeepSpeed lr scheduler logic (#8527)
* Fix deepspeed scheduler logic

* Fix tests

* Minor changes

* Improve tests

* inference fix

* CHANGELOG

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2021-07-23 10:08:58 +01:00
Adrian Wälchli 0ad7f3a829
Fix log_dir tracking in case of multiple Trainer instances + DDP (#7403)
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2021-07-23 09:18:23 +02:00
Jirka Borovec b7dbcc3e13
Quant as optional step (#8464)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-07-22 12:44:27 +00:00
Kaushik B 5452590872
fix: Enable manual optimization for TPUs (#8458) 2021-07-22 15:33:35 +05:30
Jirka Borovec d3ed472b20
Merge pull request #8497 from PyTorchLightning/v1.4.0rc1
v1.4.0rc1 & chlog
2021-07-21 09:54:27 -04:00
thomas chaton 063f5ba73e
[bugfix] Re-compute accumulated_grad_batches (#8493)
* resolve resolution

* update changelog

* typo

* optimize test

* update on comments

* resolve comments

* update
2021-07-21 10:46:25 +00:00
thomas chaton c9af1a7aec
[bugfix] Reduce memory leaks (#8490)
* reduce memory leak

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update changelog

* Apply suggestions from code review

Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk>

* resolve flake8

* update on comments

* resolve bug

* update

* Undo whitespace changes

* remove bug

* resolve flake8

* revert change

* update on comments

* delete the ddp wrapper as it hold memory

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* resolve flake8

* update on comments

* update changelog

* resolve test

* Update CHANGELOG

* Refactor teardown

* Fix comment

* Do it for non-gpu too

* remove ref when the model is not a lightning_module

* Fix import error

* move down

* resolve bug

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* resolve assignement

* update

* move above

* Fix device calls to support tpu training

* Updat todo

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk>
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
Co-authored-by: Kaushik B <kaushikbokka@gmail.com>
2021-07-21 11:37:05 +02:00
marsggbo d0038b521c
Bugfix: horovod optimizer missing 2 required positional arguments (#7840)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: thomas chaton <thomas@grid.ai>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-07-21 08:11:26 +00:00
thomas chaton ea13f6021c
[bugfix] Prevent deepcopy of dataloaders / Trainer in SWA Callback (#8472)
* resolve deepcopy

* update changelog

* move private

* update on comments

* Update CHANGELOG

* Set skipped attributes to None

* Simplify test

* update

* update changelog

* update

* update on comments

* typo

* update

Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-07-20 18:31:49 +00:00
Sean Naren 8a9ee403be
Add Windows Support for DeepSpeed (#8488)
* Modify deepspeed distributed to support windows

* Add weak test

* Cleanups

* Capture more in tests

* Add comment

* Cleaner asserts
2021-07-20 13:55:52 +00:00
Kaushik B 556879e5cf
Add support for devices flag to Trainer (#8440)
* Support devices flag to Trainer
2021-07-20 04:33:12 +00:00
Carlos Mocholí a6fd32a708
Do not reset Loops total counters (#8475) 2021-07-19 18:22:47 +02:00
Adrian Wälchli 7d93d70110
Loop specialization (#8226)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: tchaton <thomas@grid.ai>
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
Co-authored-by: Justus Schock <justus.schock@posteo.de>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2021-07-19 15:08:53 +02:00
ramonemiliani93 c9eb7e4433
Hash values in LightningEnum instead of name. (#8421)
Co-authored-by: Ramon Emiliani <ramon@mbp-de-ana.lan>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk>
2021-07-19 15:02:22 +02:00
Xuehai Pan 2c5d94d98b
Fix: handle logical CUDA device IDs for GPUStatsMonitor if `CUDA_VISIBLE_DEVICES` set (#8260)
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-07-19 11:42:43 +00:00
Sean Naren 06ac7d9649
[Fix] Remove DeepSpeed Plugin FP16 exception (#8462)
* Remove error, add mixed to check

* Add test

* Remove test

* Add changelog

* Add test for mixed

* Update tests/plugins/test_deepspeed_plugin.py

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add special

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-07-19 11:12:31 +00:00
Adrian Wälchli 1bfa29a8b0
Clear dataloader references before attaching new dataloaders to Trainer (#8442)
* regression test

* apply fix

* simplify test and docs

* update changlog
2021-07-19 10:43:39 +00:00
thomas chaton 374fae59ef
[Feat] Add utilities for CombinedLoader state dict and dataloader state dict 1/n (#8364)
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Justus Schock <justus.schock@posteo.de>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-07-19 09:56:57 +00:00
thomas chaton 257fabd08d
Add support for missing return obj from to function on custom batch objects (#8433)
* resolve bug

* update

* add changelog

* Update tests/utilities/test_apply_func.py

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-07-19 08:47:14 +00:00
thomas chaton 7bb810f143
Add progress tracking on Loops - 2/n (#8362)
* resolve issues

* update

* update

* update

* add more exceptions

* resolve bug

* update

* update

* update changelog

* resolve bug

* resolve comments

* update

* update

* update changelog

* update

* update

* remove space

* update

* add progress tracking to loops

* validate json

* update

* convert to dict for better readability

* validate reload

* update

* update

* update on comments

* remove deadcode

* clean changelog

* clean changelog

* update

* update on comments

* CHANGELOG

* CHANGELOG

* Update pytorch_lightning/loops/base.py

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* whitespace suggestions

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* make fault_tolerant_enabled protected

* whitespace fixes around Args

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update

* typo it's -> its

* fix copy-paste typo in progress docstring

* Delete classes

* Minor change

* docs

* protected get_loops_state

* merge restore_loops with restore_progress

* Fix tests after removals

* explicit save with trainer.save_checkpoint()

* handle optimization restart based on optimizer_idx

* update increments

* update val batch progress and remove iteration count

* update progress tracking for dataloader loops

* remove self.dataloader_idx from eval_epoch_loop

* add batch progress to predict loop

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* incorporate progress tracking for current_epoch

* Fix test

* Actually remove it

* Remove unused TrainingEpochProgress

* Fix optimization progress - missing scheduler

* Restarting changes

* Scheduler progress

* Unused property, reset on epoch

* Resolve FIXME

* Remove FIXME

* fix test_progress (wip)

* fix batch_progress.current.reset

* Hold off on split progress. Out of scope of this PR

* Unnecessary if

* fix structure in test_progress

* structure

* clean up unused variables in test_progress

* refactor naming and organization in test_progress

* Unnecessary variable

* Remove unnecessary diff

* Improve comment

* Undo typing change to avoid polluting everything with mypy fixes

* Fix and improve test_loops.py

* Fix and organize `test_loop_state_dict`

* Remove unnecessary checks in test

* Update test after disallowing updates on None attributes

* Typing

* Minor test cleanup

* Fix and move loop test

* Move test from progress to loops

* Reset the scheduler progress

* SchedulerProgress fix

* Consistent whitespace

* Fix final test

* Minor test changes

* One test to rule them all

* Formatting

* Rename and clean variables

* Shorter names

* Shorter scheduler name

* Fix optimizer step calculation for stop_batch=2

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove empty connects

* Update CHANGELOG

* Holy shit finally got the formula right

* Fix final thing!!!

* Do not check state dicts

* parametrize multiple_dataloader progress test

* Update CHANGELOG.md

Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Justus Schock <justus.schock@posteo.de>
2021-07-19 08:31:45 +00:00
Jirka Borovec 17f2ae5613
prepare RC0 (#8399)
* RC0

* Update changelog

Co-authored-by: Kaushik B <kaushikbokka@gmail.com>
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
2021-07-15 21:25:29 +00:00
Adrian Wälchli b42efa7d86
support launching Lightning ddp with traditional command (#7480)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2021-07-14 11:25:36 +00:00
Carlos Mocholí 6ce77a102b
Set minimum PyTorch version to 1.6 (#8288)
Co-authored-by: Jirka <jirka.borovec@seznam.cz>
2021-07-13 17:12:49 +00:00
Carlos Mocholí 321689f52e
Add `ModelCheckpoint(save_on_train_epoch_end)` (#8389)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-07-13 14:47:59 +00:00
Luis Perez 000fbe63d3
Expose `extract_batch_size` method and add corresponding tests. (#8357)
* expose extract_batch and make public

* first pass

* early return

* add changelog

* move to utilities/data.py

* add test_data.py

* tests are passing

* precommit hook

* address pep8 failure

Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-07-13 11:35:10 +00:00
Kaushik B 9d5ad7639c
Add logger flag to save_hyperparameters (#7960)
* Add log flag to save_hyperparameters

* FIx setter

* Add test & Update changelog

* Address comments

* Fix conflicts

* Update trainer

* Update CHANGELOG.md

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* Fix datamodule hparams fix

* Fix datamodule hparams fix

* Update test with patch

* Update pytorch_lightning/utilities/hparams_mixin.py

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* Move log_hyperparams to mixin

* Update hparams mixin

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-07-13 11:36:36 +02:00