Commit Graph

379 Commits

Author SHA1 Message Date
Sadiq Jaffer 7f91c5ebbc
Fix `unfreeze_and_add_param_group` expects `modules` rather than `module` (#6822) 2021-04-06 01:50:42 +02:00
Karthik Prasad c3da7f50bb
Sanitize `None` params during pruning (#6836)
* sanitize none params during pruning

* amend
2021-04-06 01:47:59 +02:00
Yuan-Hang Zhang 1bd5f36a5b
Fix validation progress counter with check_val_every_n_epoch > 1 (#5952)
Co-authored-by: rohitgr7 <rohitgr1998@gmail.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-04-02 17:40:41 +09:00
Kaushik B f79a13e495
[Model Parallel] Add configure sharded model hook (#6679)
* Add base hook for model parallel

* fix callback signature

* Simplify hook

* Add hook logic

* add tests

* add property setter

* add logic for being called once

* Update changelog

* Fix

* fix return type

* fix lambda callback test

* Fix tests

* Apply code suggestions

* add logic for setup_optimizers_predispatch

* add common dummy model

* Swap call order

* Remove test that isn't needed anymore

* Update tests

* Add a bit more doc

* Few code review fixes

* Update pytorch_lightning/accelerators/accelerator.py

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* Change hook name

* Fix test

* Test setup hook, refactor names

* Swap call order of callbacks and model initialization

* Change name of context manager

Co-authored-by: SeanNaren <sean@grid.ai>
Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-03-29 14:50:51 -06:00
Carlos Mocholí f0c5479de9
Remove legacy `Result` parameters (#6016) 2021-03-28 11:55:08 +02:00
Carlos Mocholí bc613611e2
Do not add return dict items to callback_metrics (#6682) 2021-03-26 14:05:20 +01:00
Jirka Borovec 217c12a4e7
Simplify deprecations (#6620)
* use external deprecate

* simplify

* simplify

* simplify

* flake8

* .

* others

* .
2021-03-25 15:26:38 +01:00
Rohit Gupta 9be092dbdb
Add on_epoch_start to run at the beginning of every loop irrespective of train/val/test (#6498)
* update docs

* add hook and update docs

* update tests

* chlog

* Update CHANGELOG.md

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* chlog

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-03-25 14:20:49 +01:00
Jirka Borovec 1fae10a2dc
refactoring setup (#6590)
* refactoring setup

* .

* docs

* flake8
2021-03-22 08:39:19 -04:00
Kaushik B b190403e28
Add outputs param for `on_val/test_epoch_end` hooks (#6120)
* add outputs param for on_val/test_epoch_end hooks

* update changelog

* fix warning message

* add custom call hook

* cache logged metrics

* add args to docstrings

* use warning cache

* add utility method for param in sig check

* Update CHANGELOG.md

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* update docstring

* add test for eval epoch end hook

* add types and replace model ref

* add deprecation test

* fix test fx name

* add model hooks warning

* add old signature model to tests

* add clear warning cache

* sopport args param

* update tests

* add tests for model hooks

* code suggestions

* add signature utils

* fix pep8 issues

* fix pep8 issues

* fix outputs issue

* fix tests

* code fixes

* fix validate test

* test

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2021-03-16 12:15:16 -04:00
thomas chaton 0544efd453
[bug] Update broadcast + reduce decision ModelCheckpoint] (#6410)
* resolve bug

* update

* update changelog

* update PR

* Update pytorch_lightning/trainer/connectors/logger_connector/epoch_result_store.py

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* add todo

* resolve issues

* resolve flake8

* update

* add coverage for reduce

* wip

* restore back to brodbact

* remove test.py

* resolve flake8

* update

* check world size

* resolve test

* update

* use pytorch version when defined

* update on comments

* update on comments

* flake8

* resolve bugs

* Update CHANGELOG.md

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* update

* update

* update

* update

* remove test

* update

* resolve flake8

* update

* update

* update

* proxy

* update

* update

* resolve typo

* prune

* update parallel

* update

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-03-14 17:14:27 +00:00
ananthsub cea170e011
[feat] Support iteration-based checkpointing in model checkpoint callback (#6146)
* Update model_checkpoint.py

* add tests

* Update model_checkpoint.py

* Update test_model_checkpoint.py

* fix tests

* every_n_batches

* Update test_model_checkpoint.py

* defaults

* rm tests

* Update model_checkpoint.py

* Update test_model_checkpoint.py

* Prune deprecated metrics for 1.3 (#6161)

* prune deprecated metrics for 1.3

* isort / yapf

* Update model_checkpoint.py

* add tests

* defaults

* Update CHANGELOG.md

* pre-commit

* Update model_checkpoint.py

* update defaults

* Update test_remove_1-5.py

* Update model_checkpoint.py

* Update model_checkpoint.py

* Update model_checkpoint.py

* Update model_checkpoint.py

* Update model_checkpoint.py

* Update model_checkpoint.py

* fix tests

* Update test_model_checkpoint.py

* Update model_checkpoint.py

* Update model_checkpoint.py

* Update model_checkpoint.py

* Update test_model_checkpoint.py

* ckpt-callback

* Update test_model_checkpoint.py

* Update model_checkpoint.py

* Update model_checkpoint.py

* validation-end

* Update model_checkpoint.py

* Update test_model_checkpoint.py

* Update test_model_checkpoint.py

* Update test_model_checkpoint.py

* Update test_model_checkpoint.py

* clarify-names

- Make names explicit as to which hooks they apply to
- Use step instead of batch for consistency with global step

* Update model_checkpoint.py

* Update model_checkpoint.py

* Update model_checkpoint.py

* Update model_checkpoint.py

* Update model_checkpoint.py

* mutual-exclusive

Make every_n_train_steps and every_n_val_epochs mutually exclusive

* fix-default-0

* Update CHANGELOG.md

* formatting

* make-private

make attributes private to the class

* rebase

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2021-03-11 14:44:29 -08:00
Max Frei 2ecda5df52
Allow user to disable the automatic formatting of checkpoint file names. (#6277)
* cleaning SWA (#6259)

* rename

* if

* test

* chlog

* Remove opt from manual_backward in docs (#6267)

* switch agents pool (#6270)

* Allow user to disable the automatic formatting of checkpoint file names.

* Added changelog entry.

* Made flake8 happy.

* Applied review suggestion: quotes for special characters in docstring

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* Fixed example in docstring.

* Fixed syntax error in docstring.

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Akihiro Nitta <nitta@akihironitta.com>
Co-authored-by: thomas chaton <thomas@grid.ai>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-03-11 16:40:23 +08:00
Elia Cereda f4cc7451a9
Add Trainer.validate(…) method to run one validation epoch (#4948)
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
Co-authored-by: chaton <thomas@grid.ai>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-03-11 03:46:37 +01:00
Carlos Mocholí efd272a3ca
Pass {fit,validate,test,predict} to setup() and teardown() (#6386) 2021-03-08 15:27:07 +01:00
Carlos Mocholí 826375effe
Fix ModelCheckpoint(monitor=None, save_last=True) not saving checkpoints (#6136)
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
2021-03-08 00:59:14 +01:00
Elia Cereda d0596fac94
Refactor RunningStage usage in advance of implementing Trainer.validate() (#4945)
* Update code

Co-authored-by: EliaCereda

* More property updates

* Move properties. Introduce trainer._fitting

* Use trainer.fitting

* Fix reset dataloaders

* Unused code

* RunningStage.SANITY_CHECKING

* Use setters

* Fix bugs

* Fix bugs

* TrainerState.{FITTING,VALIDATING,TESTING,PREDICTING,TUNING}

* Fix bugs

* Fix bugs

* Fix tests

* Update CHANGELOG. Add deprecation warning. Fix tests

* Unused imports

* Optional trainer

* More deprecation. More refactoring

* Correct version

* Use properties

* Address comments

* flake8

* Missed renamings

* Typo

* is -> ==

It is recommended to use  for Enums since they are singletons, however, since the LightningEnum subclasses str, it's not a good idea in case a user sets the state/stage with a str

* Also for tests

* Typo

* Address @tchaton's comments

* PEP8

* Correct property

* Update CHANGELOG

* Apply suggestions from code review

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* Update pytorch_lightning/trainer/trainer.py

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* Remove called sanity check

Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-03-06 12:40:19 +00:00
Jirka Borovec e84854264f
CI: fix examples - patch download MNIST (#6357)
* patch download

* CI

* isort

* extra
2021-03-05 16:50:21 +00:00
Jirka Borovec 6166f46281
drop unused variable in API (#6308)
* drop unused pl model in ckpt

* irelevant

* on_evaluation_batch_start

* evaluation_epoch_end

* attach_datamodule
2021-03-04 10:26:54 +01:00
Carlos Mocholí 4a8422c2dc
Fix ModelPruning(make_pruning_permanent=True) buffers getting removed when saved during training (#6073)
Co-authored-by: chaton <thomas@grid.ai>
2021-03-03 13:29:58 +01:00
Adrian Wälchli bc577ca792
fix duplicate console logging bug v2 (#6275)
Co-authored-by: chaton <thomas@grid.ai>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2021-03-02 15:17:55 +05:30
Joseph Turian 22985d2f43
Improved EarlyStopping.patience documentation (#6278)
* Improved early stopping documentation

* Changed to 120 column format

* doc

* doc

* doc

Co-authored-by: Jirka Borovec <jirka.borovec@seznam.cz>
2021-03-02 15:01:07 +05:30
Jirka Borovec ed67490d93
cleaning SWA (#6259)
* rename

* if

* test

* chlog
2021-03-01 19:10:27 +01:00
Carlos Mocholí 3df02b880a
Add checkpoint parameter to on_save_checkpoint (#6072)
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
2021-02-25 21:18:19 +05:30
Sean Naren dd2f5a0212
Fix for multiple callbacks (#6197)
* Fix for multiple callbacks

* Add CHANGELOG.md

* Remove old params

* Skip tests on windows using ddp

* Change name of the variable to not clash with should stop, which is separate

* Apply suggestions from code review

* Fix params

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2021-02-25 15:44:55 +00:00
Carlos Mocholí 8b475278dd
Prune deprecated EarlyStopping(mode='auto') (#6167)
Co-authored-by: Roger Shieh <sh.rog@protonmail.ch>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2021-02-24 13:26:33 +00:00
Jirka Borovec 46617d9021
Prune deprecated checkpoint arguments (#6162)
* prune prefix

* prune mode=auto

* chlog
2021-02-24 06:58:53 -05:00
Alexander 423ecf995a
Feature/5275 clean progress bar print (#5470)
* Trainer.test should return only test metrics (#5214)

* resolve bug

* merge tests

* Fix metric state reset (#5273)

* Fix metric state reset

* Fix test

* Improve formatting

Co-authored-by: Ananya Harsh Jha <ananya@pytorchlightning.ai>

* print() method added to ProgressBar

* printing alongside progress bar added to LightningModule.print()

* LightningModule.print() method documentation updated

* ProgressBarBase.print() stub added

* stub

* add progress bar tests

* fix isort

* Progress Callback fixes

* test_metric.py duplicate DummyList removed

* PEP and isort fixes

* CHANGELOG updated

* test_progress_bar_print win linesep fix

* test_progress_bar.py remove whitespaces

* Update CHANGELOG.md

Co-authored-by: chaton <thomas@grid.ai>
Co-authored-by: Tadej Svetina <tadej.svetina@gmail.com>
Co-authored-by: Ananya Harsh Jha <ananya@pytorchlightning.ai>
Co-authored-by: Alexander Snorkin <Alexander.Snorkin@acronis.com>
Co-authored-by: rohitgr7 <rohitgr1998@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-02-22 09:40:18 +00:00
Carlos Mocholí 57215b79a0
Avoid printing ModelCheckpoint log with monitor=None and verbose=True (#6109) 2021-02-22 08:51:13 +00:00
edenlightning 3449e2d79f
Docs for Pruning, Quantization, and SWA (#6041)
Co-authored-by: chaton <thomas@grid.ai>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
2021-02-18 13:51:51 +00:00
Jirka Borovec 049006a59c
fix/test quant (#6040)
* fix/test quant

* ...

* ---
2021-02-18 10:47:29 +00:00
Carlos Mocholí 38ad9e0764
[ModelPruning] Add missing attribute with use_global_unstructured=False and verbose (#6045) 2021-02-18 10:40:34 +00:00
chaton c9622bafe0
[feat] Add Trainer(stochastic_weight_avg=True/False) (#6038)
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-02-17 23:10:05 +00:00
Nicki Skafte 68fd3086f1
Prevent flickering progress bar (#6009)
* add padding

* fix

* fix

* Update pytorch_lightning/callbacks/progress.py

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* updated based on suggestion

* changelog

* add test

* fix pep8

* resolve test

* fix code format

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: tchaton <thomas@grid.ai>
2021-02-17 19:01:51 +00:00
chaton 6bc4490d01
[HotFix] Resolve TPU Training (#6027)
* fix tpus

* update

* add back reduction in val_loss

* resolve some bugs with TPUs

* update changelog

* update on comments

* forgot status

* Fix train_bn arg

* resolve comments

* update on comments

Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-02-17 16:40:13 +00:00
Carlos Mocholí 7aae589167
Add deprecation warning to ModelCheckpoint when logging val_loss with no monitor (#6012)
* Add deprecation warning when logging val_loss with no monitor

* EOF

* Update CHANGELOG

* Clear warning cache before testing

* pep8

Co-authored-by: Jirka Borovec <jirka.borovec@seznam.cz>
2021-02-17 10:46:58 +00:00
chaton e982800b81
Add PredictLoop (#5752)
* integrate distrib_type

* sync changes

* sync

* fixes

* add forgotten generators

* add missing logic

* update

* import

* missed imports

* import fixes

* isort

* mv f

* changelog

* format

* move helper to parallel plugin

* d

* add world size

* clean up

* duplicate

* activate ddp_sharded and tpu

* set nvidia flags

* remove unused colab var

* use_tpu <-> on_tpu attrs

* make some ddp_cpu and clusterplugin tests pass

* Ref/accelerator connector (#5742)

* final cleanup

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* connector cleanup

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* trainer cleanup

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* accelerator cleanup + missing logic in accelerator connector

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* add missing changes to callbacks

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* reflect accelerator changes to lightning module

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* clean cluster envs

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* cleanup plugins

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* add broadcasting

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* yapf

* remove plugin connector

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* plugins

* add predict_loop

* manual optimization

* clean predictloop

* update optimizer routing

* add predict loop on new accelerator

* resolve a bug

* add rank to torchelastic

* add predict_loop

* add predict loop on new accelerator

* resolve a bug

* fix memory mixed precision

* update

* setstate on trainer for pickling in ddp spawn

* add predict_loop

* clean predictloop

* add predict loop on new accelerator

* resolve a bug

* add predict_loop

* add predict loop on new accelerator

* resolve a bug

* add predict_loop

* add predict loop on new accelerator

* resolve a bug

* add predict_loop

* add predict loop on new accelerator

* resolve a bug

* add predict_loop

* clean predictloop

* add predict loop on new accelerator

* resolve a bug

* add predict_loop

* add predict loop on new accelerator

* resolve a bug

* resolve tests

* add predict method

* add back commented accelerator code

* adapt test for sync_batch_norm to new plugin

* fix deprecated tests

* fix ddp cpu choice when no num_processes are given

* yapf format

* skip a memory test that cannot pass anymore

* remove sanetize

* rename train to run_train

* remove useless hooks

* add misconfigurationException

* remove wrong naming

* resolve some legacy

* udpate docstring

* fix pickle error in spawn plugin

* x

* avoid

* x

* fix cyclic import in docs build

* add support for sharded

* update typing

* add sharded and sharded_spawn to distributed types

* make unwrap model default

* refactor LightningShardedDataParallel similar to LightningDistributedDataParallel

* update sharded spawn to reflect changes

* update sharded to reflect changes

* Merge 1.1.5 changes

* fix merge

* fix merge

* yapf isort

* fix merge

* yapf isort

* fix indentation in test

* copy over reinit scheduler implementation from dev1.2

* fix apex tracking calls with dev_debugger

* reduce diff to dev1.2, clean up

* fix trainer config test  when gpus>0 and num_processes >0 and ddp_cpu

* sort plugin tests legacy/new

* fix error handling for amp on cpu

* fix merge


fix merge


fix merge

* [Feat] Resolve manual_backward (#5837)

* resolve manual_backward

* resolve flake8

* update

* resolve for ddp_spawn

* resolve flake8

* resolve flake8

* resolve flake8

Co-authored-by: Ubuntu <ubuntu@ip-172-31-88-60.ec2.internal>

* fix tests/accelerator tests on cpu

* [BugFix] Resolve manual optimization (#5852)

* resolve manual_optimization

* update

* update

Co-authored-by: Ubuntu <ubuntu@ip-172-31-88-60.ec2.internal>

* Remove copy trainer parameters to happen earlier within the loop and add safe guard to get ref model (#5856)

* resovle a bug

* Accelerator refactor sharded rpc (#5854)

* rpc branch

* merge

* update handling of rpc

* make devices etc. Optional in RPC

* set devices etc. later if necessary

* remove devices from sequential

* make devices optional in rpc

* fix import

* uncomment everything

* fix cluster selection

Co-authored-by: Ubuntu <ubuntu@ip-172-31-88-60.ec2.internal>

* resolve bug

* fix assert in rpc test

* resolve a test

* fix docs compilation

* accelerator refactor - fix for sharded parity test (#5866)

* fix memory issue with ddp_spawn

* x


x


x


x


x


x


x


x


x

* x

* Remove DDP2 as this does not apply

* Add missing pre optimizer hook to ensure lambda closure is called

* fix apex docstring

* [accelerator][BugFix] Resolve some test for 1 gpu (#5863)

* update

* revert init

* resolve a bug

* update

* resolve flake8

* update

* update

* update

* revert init

* resolve a bug

* update

* resolve flake8

* update

* update

* update

* update

* update

* revert init

* resolve a bug

* update

* resolve flake8

* update

* update

* update

* revert init

* update

* resolve flake8

* update

* update

* update

* update

* update

* all_gather

* update

* make plugins work, add misconfig for RPC

* update

* update

* remove breaking test

* resolve some tests

* resolve flake8

* revert to ddp_spawn

Co-authored-by: root <root@ip-172-31-88-60.ec2.internal>
Co-authored-by: Ubuntu <ubuntu@ip-172-31-88-60.ec2.internal>
Co-authored-by: Justus Schock <justus.schock@rwth-aachen.de>

* yapf isort

* resolve flake8

* fix apex doctests

* fix apex doctests 2

* resolve docs

* update drone

* clean env

* update

* update

* update

* update

* merge

* Fix RPC related tests, clean out old API, update for new accelerator API [skip ci] (#5881)

* Fix RPC related tests, clean out old API, update for new accelerator API

* Move tests out of legacy folder, update paths and names

* Update test_remove_1-4.py

* Expose properties for tpu cores/gpus/num_gpus

* Add root GPU property

* Move properties to properties.py

* move tests that were previously in drone

* Fix root GPU property (#5908)

* Move root GPU to property, remove horovod set as this is handled in horovod plugin, ensure we mock correctly to set GPU accelerator

* Add missing tests back

* fix best model path transfer when no checkpoint callback available

* Fix setup hook order [wip] (#5858)

* Call trainer setup hook before accelerator setup

* Add test case

* add new test

* typo

* fix callback order in test

Co-authored-by: tchaton <thomas@grid.ai>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* rename ddp sequential -> rpc sequential for special test

* revert

* fix stupid merge problem

* Use property in connector for sampler (#5913)

* merge the import conflicts

* fix spawning of processes in slurm

* [wip] Fix some bugs for TPU [skip ci] (#5878)

* fixed for single tpu

* fixed spawn

* fixed spawn

* update

* update

* wip

* resolve bugs

* resolve bug

* update on comment

* removed decorator

* resolve comments

* set to 4

* update

* update

* need cleaning

* update

* update

* update

* resolve flake8

* resolve bugs

* exclude broadcast

* resolve bugs

* change test

* update

* update

* skip if meet fails

* properly raise trace

* update

* add catch

* wrap test

* resolve typo

* update

* typo

Co-authored-by: Lezwon Castelino <lezwon@gmail.com>
Co-authored-by: Your Name <you@example.com>

* resolve some tests

* update

* fix imports

* update

* resolve flake8

* update azure pipeline

* skip a sharded test on cpu that requires a gpu

* resolve tpus

* resolve bug

* resolve flake8

* update

* updat utils

* revert permission change on files

* suggestions from carlos

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* remove unrelated formatting changes

* remove incomplete comment

* Update pytorch_lightning/accelerators/__init__.py

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* remove unrelated formatting change

* add types

* warn 1.7 ddp manual backward only if ddp kwarg unset

* yapf + isort

* pep8 unused imports

* fix cyclic import in docs

* Apply suggestions from code review

* typer in accelerator.py

* typo

* resolve flake8

* update code

* update

* Update pytorch_lightning/trainer/predict_loop.py

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* Update pytorch_lightning/trainer/predict_loop.py

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* fix merge

* fix merge

* reset legacy accelerator

* add missing rename dispatch

* rename post traning

* update code

* resolved comments

* typo

* typo

* add flow description

* resolve comments

* update on comments

* update flow

* add backticks

* resolve tpu

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
Co-authored-by: justusschock <justus.schock@posteo.de>
Co-authored-by: Justus Schock <justus.schock@rwth-aachen.de>
Co-authored-by: Ubuntu <ubuntu@ip-172-31-88-60.ec2.internal>
Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
Co-authored-by: SeanNaren <sean@grid.ai>
Co-authored-by: root <root@ip-172-31-88-60.ec2.internal>
Co-authored-by: Lezwon Castelino <lezwon@gmail.com>
Co-authored-by: Your Name <you@example.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2021-02-16 17:11:56 -05:00
Akihiro Nitta 0a2fb05aac
Document exceptions in callbacks (#5541)
* Add Raises: section to docstring

* Add Raises section to the docs

* Add raises section to the docs

* Apply suggestions from code review

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* fix

* Remove unnecessary instance check

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2021-02-15 10:24:36 +00:00
Justus Schock da6dbc8d1d
PoC: Accelerator refactor (#5743)
* restoring the result from subprocess

* fix queue.get() order for results

* add missing "block_backward_sync" context manager

* add missing "block_backward_sync" context manager

* fix sync_batchnorm

* fix supported gpu-ids for tuple

* fix clip gradients and inf recursion

* accelerator selection: added cluster_environment plugin

* fix torchelastic test

* fix reduce early stopping decision for DDP

* fix tests: callbacks, conversion to lightning optimizer

* fix lightning optimizer does not pickle

* fix setting benchmark and deterministic option

* fix slurm amp test

* fix prepare_data test and determine node_rank

* fix retrieving last path when testing

* remove obsolete plugin argument

* fix test: test_trainer_config

* fix torchscript tests

* fix trainer.model access

* move properties

* fix test_transfer_batch_hook

* fix auto_select_gpus

* fix omegaconf test

* fix test that needs to simulate slurm ddp

* add horovod plugin

* fix test with named arguments

* clean up whitespace

* fix datamodules test

* remove old accelerators

* fix naming

* move old plugins

* move to plugins

* create precision subpackage

* create training_type subpackage

* fix all new import errors

* fix wrong arguments order passed to test

* fix LR finder

* Added sharded training type and amp plugin

* Move clip grad to precision plugin

* Added sharded spawn, select accelerators based on distributed_backend + enable custom fp16 plugin automatically

* Fix import issue, attempting to fix tests

* Fix initial test

* Reflect hook logic from master, should wrap model after move to device

* Optional state consolidation, since master has optimizers not wrapped

* change attribute for instance test

* reset optimizers

optimizers are not used in main process, so state would be wrong.

* legacy

* imports in accel

* legacy2

* trainer imports

* fix import errors after rebase

* move hook to new setup location

* provide unwrapping logic

* fix trainer callback system

* added ddp2 implementation

* fix imports .legacy

* move plugins

* restore legacy

* drop test.py from root

* add tpu accelerator and plugins

* fixes

* fix lightning optimizer merge

* reset bugreportmodel

* unwrapping

* step routing forward

* model access

* unwrap

* opt

* integrate distrib_type

* sync changes

* sync

* fixes

* add forgotten generators

* add missing logic

* update

* import

* missed imports

* import fixes

* isort

* mv f

* changelog

* format

* move helper to parallel plugin

* d

* add world size

* clean up

* duplicate

* activate ddp_sharded and tpu

* set nvidia flags

* remove unused colab var

* use_tpu <-> on_tpu attrs

* make some ddp_cpu and clusterplugin tests pass

* Ref/accelerator connector (#5742)

* final cleanup

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* connector cleanup

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* trainer cleanup

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* accelerator cleanup + missing logic in accelerator connector

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* add missing changes to callbacks

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* reflect accelerator changes to lightning module

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* clean cluster envs

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* cleanup plugins

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* add broadcasting

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* yapf

* remove plugin connector

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* plugins

* manual optimization

* update optimizer routing

* add rank to torchelastic

* fix memory mixed precision

* setstate on trainer for pickling in ddp spawn

* add predict method

* add back commented accelerator code

* adapt test for sync_batch_norm to new plugin

* fix deprecated tests

* fix ddp cpu choice when no num_processes are given

* yapf format

* skip a memory test that cannot pass anymore

* fix pickle error in spawn plugin

* x

* avoid

* x

* fix cyclic import in docs build

* add support for sharded

* update typing

* add sharded and sharded_spawn to distributed types

* make unwrap model default

* refactor LightningShardedDataParallel similar to LightningDistributedDataParallel

* update sharded spawn to reflect changes

* update sharded to reflect changes

* Merge 1.1.5 changes

* fix merge

* fix merge

* yapf isort

* fix merge

* yapf isort

* fix indentation in test

* copy over reinit scheduler implementation from dev1.2

* fix apex tracking calls with dev_debugger

* reduce diff to dev1.2, clean up

* fix trainer config test  when gpus>0 and num_processes >0 and ddp_cpu

* sort plugin tests legacy/new

* fix error handling for amp on cpu

* fix merge


fix merge


fix merge

* [Feat] Resolve manual_backward (#5837)

* resolve manual_backward

* resolve flake8

* update

* resolve for ddp_spawn

* resolve flake8

* resolve flake8

* resolve flake8

Co-authored-by: Ubuntu <ubuntu@ip-172-31-88-60.ec2.internal>

* fix tests/accelerator tests on cpu

* [BugFix] Resolve manual optimization (#5852)

* resolve manual_optimization

* update

* update

Co-authored-by: Ubuntu <ubuntu@ip-172-31-88-60.ec2.internal>

* Remove copy trainer parameters to happen earlier within the loop and add safe guard to get ref model (#5856)

* resovle a bug

* Accelerator refactor sharded rpc (#5854)

* rpc branch

* merge

* update handling of rpc

* make devices etc. Optional in RPC

* set devices etc. later if necessary

* remove devices from sequential

* make devices optional in rpc

* fix import

* uncomment everything

* fix cluster selection

Co-authored-by: Ubuntu <ubuntu@ip-172-31-88-60.ec2.internal>

* resolve bug

* fix assert in rpc test

* resolve a test

* fix docs compilation

* accelerator refactor - fix for sharded parity test (#5866)

* fix memory issue with ddp_spawn

* x


x


x


x


x


x


x


x


x

* x

* Remove DDP2 as this does not apply

* Add missing pre optimizer hook to ensure lambda closure is called

* fix apex docstring

* [accelerator][BugFix] Resolve some test for 1 gpu (#5863)

* update

* revert init

* resolve a bug

* update

* resolve flake8

* update

* update

* update

* revert init

* resolve a bug

* update

* resolve flake8

* update

* update

* update

* update

* update

* revert init

* resolve a bug

* update

* resolve flake8

* update

* update

* update

* revert init

* update

* resolve flake8

* update

* update

* update

* update

* update

* all_gather

* update

* make plugins work, add misconfig for RPC

* update

* update

* remove breaking test

* resolve some tests

* resolve flake8

* revert to ddp_spawn

Co-authored-by: root <root@ip-172-31-88-60.ec2.internal>
Co-authored-by: Ubuntu <ubuntu@ip-172-31-88-60.ec2.internal>
Co-authored-by: Justus Schock <justus.schock@rwth-aachen.de>

* yapf isort

* resolve flake8

* fix apex doctests

* fix apex doctests 2

* resolve docs

* update drone

* clean env

* update

* update

* update

* update

* merge

* Fix RPC related tests, clean out old API, update for new accelerator API [skip ci] (#5881)

* Fix RPC related tests, clean out old API, update for new accelerator API

* Move tests out of legacy folder, update paths and names

* Update test_remove_1-4.py

* Expose properties for tpu cores/gpus/num_gpus

* Add root GPU property

* Move properties to properties.py

* move tests that were previously in drone

* Fix root GPU property (#5908)

* Move root GPU to property, remove horovod set as this is handled in horovod plugin, ensure we mock correctly to set GPU accelerator

* Add missing tests back

* fix best model path transfer when no checkpoint callback available

* Fix setup hook order [wip] (#5858)

* Call trainer setup hook before accelerator setup

* Add test case

* add new test

* typo

* fix callback order in test

Co-authored-by: tchaton <thomas@grid.ai>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* rename ddp sequential -> rpc sequential for special test

* revert

* fix stupid merge problem

* Use property in connector for sampler (#5913)

* merge the import conflicts

* fix spawning of processes in slurm

* [wip] Fix some bugs for TPU [skip ci] (#5878)

* fixed for single tpu

* fixed spawn

* fixed spawn

* update

* update

* wip

* resolve bugs

* resolve bug

* update on comment

* removed decorator

* resolve comments

* set to 4

* update

* update

* need cleaning

* update

* update

* update

* resolve flake8

* resolve bugs

* exclude broadcast

* resolve bugs

* change test

* update

* update

* skip if meet fails

* properly raise trace

* update

* add catch

* wrap test

* resolve typo

* update

* typo

Co-authored-by: Lezwon Castelino <lezwon@gmail.com>
Co-authored-by: Your Name <you@example.com>

* resolve some tests

* update

* fix imports

* update

* resolve flake8

* update azure pipeline

* skip a sharded test on cpu that requires a gpu

* resolve tpus

* resolve bug

* resolve flake8

* update

* updat utils

* revert permission change on files

* suggestions from carlos

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* remove unrelated formatting changes

* remove incomplete comment

* Update pytorch_lightning/accelerators/__init__.py

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* remove unrelated formatting change

* add types

* warn 1.7 ddp manual backward only if ddp kwarg unset

* yapf + isort

* pep8 unused imports

* fix cyclic import in docs

* Apply suggestions from code review

* typer in accelerator.py

* typo

* Apply suggestions from code review

* formatting

* update on comments

* update typo

* Update pytorch_lightning/trainer/properties.py

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* update

* suggestion from code review

* suggestion from code review

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: SeanNaren <sean@grid.ai>
Co-authored-by: Jirka Borovec <jirka.borovec@seznam.cz>
Co-authored-by: chaton <thomas@grid.ai>
Co-authored-by: Ubuntu <ubuntu@ip-172-31-88-60.ec2.internal>
Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
Co-authored-by: root <root@ip-172-31-88-60.ec2.internal>
Co-authored-by: Lezwon Castelino <lezwon@gmail.com>
Co-authored-by: Your Name <you@example.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2021-02-12 15:48:56 -05:00
Adrian Wälchli 4bdf2fe55f
remove executable bit on source files (#5929)
* 644
2021-02-12 00:06:40 +01:00
Jirka Borovec e676ff96b1
Typing: callback base (#5919)
* typing for callback base
2021-02-11 14:33:10 +00:00
Jirka Borovec b434c479e7
Quantisation (#5706)
* empty

* sq

* obs


* int

* ts

* helpers

* chlog

* yapf

* avg

* dupl

* Apply suggestions from code review

Co-authored-by: Nicki Skafte <skaftenicki@gmail.com>

* Apply suggestions from code review

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* fixes

* Apply suggestions from code review

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* fixes

* note

* warn

* 45

* link

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* Apply suggestions from code review

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* yapf

* flake8

* Apply suggestions from code review

* Apply suggestions from code review

Co-authored-by: Nicki Skafte <skaftenicki@gmail.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-02-11 07:04:57 -05:00
Jirka Borovec 9475c845cb
Docs/fixes (#5914)
* wip

* ..

* ...

* Apply suggestions from code review

Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>

Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
2021-02-11 10:22:07 +00:00
chaton 7b00894130
[feat] Add StochasticWeightAveragingCallback (#5640)
* add swa callback

* switch back to 1.6.0

* remove optimizer_step

* move super

* update

* forgot update_parameters

* update on comments

* works for ddp

* resolve flake8

* remove set_model

* resolve flake8

* resolve cpu

* resolve flake8

* resolve flake8

* update

* update on comments
2021-02-11 00:05:59 +00:00
Carlos Mocholí a028171f26
Fix Pruning callback and add a few features (#5825)
* Remove pruning check because it was added in 1.4.0 and that is our minimal torch version

* Fixing many bugs

* Fix misconfig test

* Fix tests

* Improve error message

* Reduce whitespace

* WIP

* TODOs

* _MODULE_CONTAINERS

* Add LTH test

* Allow resampling

* Iterative pruning

* Log pruning percentage

* Properly make pruning permanent

* Fix docstring

* Minor changes

* Test loading non-permanent model

* corrent bugs

* Revert "corrent bugs"

This reverts commit ffb8d47547.

* Add beta warning

* Fix docs

* 2 verbosity levels

* OCD

Co-authored-by: Your Name <you@example.com>
2021-02-10 15:03:23 +00:00
Jirka Borovec 79d42d83e7
formatting 3/n: PL modules (#5716)
* cb

* log

* prof

* tune

* flake8
2021-02-08 14:28:38 -05:00
Rohit Gupta cb67e1d0b2 Separate epoch validation from step validation (#5208)
* Seperate epoch validaton from step validation

* update system

* test

* baked logic in callbacks

* unbake logic in callbacks

* fix the call for scheduler

* use property

* pep

* correct rebase

* gitignore

* ref

* add tests

* fix

* add early stopping test

* trigger

* chlog

* rev

* 1.3

* log

* Apply suggestions from code review

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* Update pytorch_lightning/trainer/training_loop.py

* Update CHANGELOG.md

* Apply suggestions from code review

Co-authored-by: chaton <thomas@grid.ai>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

(cherry picked from commit e429f97b67)
2021-02-08 20:22:39 +01:00
Rohit Gupta 2abf4693bc Fix log_dir property (#5537)
* fix and update tests

* update with ModelCheckpoint

* chlog

* wip wandb fix

* all fixed

Co-authored-by: chaton <thomas@grid.ai>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2021-02-05 21:40:42 +01:00
tchaton df5cbf5368 resolve conflict
resolve failing test
2021-02-05 21:40:39 +01:00
Adrian Wälchli bb7d188318 Fix ModelCheckpoint race condition in file existence check (#5155)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Nicki Skafte <skaftenicki@gmail.com>
2021-02-05 21:40:39 +01:00
chaton e425bf3ba9
[BugOnFeat] Resolve bug with Finetuning (#5744)
* resolve bug + add doc

* Update pytorch_lightning/callbacks/finetuning.py

* resolve bug

* start adding more test

* add more tests for finetuning callback functions

* rename to flatten_modules

* resolve doc

* Update pytorch_lightning/callbacks/finetuning.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* resolve comments

* remove update on BoringModel

* update on comments

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2021-02-04 18:36:54 +00:00
Adrian Wälchli 7b42494d0e Fix visual progress bar bug / properly reset progress bar (#4579)
* reset

* fix reset

* changelog

* update chlog

* typing

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2021-02-03 19:41:44 +01:00
chaton 3da28fd634
[feat] 1/2 Add trainer.predict (#5579)
* start adding predict

* add predict

* resolve test

* add predict

* remove limit_predict

* update

* add test for predict

* typo

* update on comments

* remove predict_step

* update ddp_shareded

* check ddp_sharded

* resolve on comments

* resolve isort

* update dp

* add test dp 1 gpu

* made default forward

* resolve path

* resolve bug

* update on comments

* resolve doc

* resolve bug

* update

* resolve bug

* update on comments

* resolve pep8

* update test doc

* update on comments

* solve special tests

* resolve bug

* resolve flake8

* Update pytorch_lightning/callbacks/progress.py

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* Update pytorch_lightning/trainer/trainer.py

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* add predict to LightningModule

* missing predict

* typo

* rename is_prediction to _predicting

* add

* update

* update

* update doc

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2021-01-27 11:38:14 -05:00
chaton d0aaf983b9
[Feat] Adding PruningCallback (#5618)
* wip

* add pruning callback

* add condition for duplicated weights

* update on comments

* update on comments

* update on comments

* add more tests

* resolve flake8

* resolve on comments

* update changelog

* update on comments

* update on comments

* change order

* remove ddp_spawn skip

* update

* typo

* Update pytorch_lightning/callbacks/pruning.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* Update pytorch_lightning/callbacks/pruning.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* update on comments

* forgot platform

* update on comments

* remove     @rank_zero_only

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2021-01-27 01:00:42 -05:00
Carlos Mocholí 9d165f6f56
Start version suffixes at 1 (#5008)
* Rename original filepath to v0

* Clean-up

* Suggestions from code review

* Revert renaming. Start version number at 1

* Add ModelCheckpoint.STARTING_VERSION

* Apply suggestions from code review

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

* Add note about class attributes

* Update CHANGELOG

* Fix doc

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2021-01-26 17:29:34 -05:00
Jirka Borovec dee5553b2b
move to Pages dir (#4869)
* folders

* common / advanced / extensions

* paths

* flake8

* isort

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2021-01-26 15:07:07 -05:00
Jirka Borovec 2846322f60
fix docs render (#5610) 2021-01-25 20:21:00 -05:00
Akihiro Nitta 30f31d32c8
docs: Add BackboneLambdaFinetuningCallback (#5553)
* Add and fix the docs of BackboneLambdaFinetuningCallback

* Apply suggestions from code review

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Roger Shieh <sh.rog@protonmail.ch>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2021-01-25 12:07:02 +00:00
Arnaud Gelas 6386b8d36b
Fix isort a few failures (#5504)
Remove from skipped module in pyproject.toml and fix failures on:
- pytorch_lightning/callbacks/*.py
- pytorch_lightning/cluster_environments/*.py
- pytorch_lightning/profiler/*.py
- pytorch_lightning/tuner/*.py

Co-authored-by: Jirka Borovec <jirka.borovec@seznam.cz>

Co-authored-by: Jirka Borovec <jirka.borovec@seznam.cz>
2021-01-15 17:44:27 -05:00
Wansoo Kim 61f415f2ac
Add LambdaCallback (#5347)
* Add LambdaCallback

* docs

* add pr link

# Conflicts:
#	CHANGELOG.md

* convention

* Fix Callback Typo

* Update pytorch_lightning/callbacks/lambda_cb.py

Co-authored-by: Nicki Skafte <skaftenicki@gmail.com>

* Update pytorch_lightning/callbacks/lambda_cb.py

Co-authored-by: Nicki Skafte <skaftenicki@gmail.com>

* Update pytorch_lightning/callbacks/lambda_cb.py

Co-authored-by: Nicki Skafte <skaftenicki@gmail.com>

* use Misconfigureation

* update docs

* sort export

* use inspect

* string fill

* use fast dev run

* isort

* remove unused import

* sort

* hilightning

* highlighting

* highlighting

* remove debug log

* eq

* res

* results

* add misconfig exception test

* use pytest raises

* fix

* Apply suggestions from code review

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

* Update pytorch_lightning/callbacks/lambda_cb.py

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

* hc

* rm pt

* fix

* try fix

* whitespace

* new hook

* add raise

* fix

* remove unused

* rename

Co-authored-by: Jirka Borovec <jirka.borovec@seznam.cz>
Co-authored-by: Nicki Skafte <skaftenicki@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: chaton <thomas@grid.ai>
2021-01-13 04:42:49 -05:00
Jirka Borovec 54d20dc596
Refactor: clean trainer device & distrib getters (#5300)
* warnings

* .

* .

* flake8

* .

* .

* .

* use_tpu

* use_dp

* .

* use_ddp

* .

* use_horovod

* .

* .

* .
2021-01-12 05:22:37 -05:00
Alan Du f6dc354349
Throw MisconfigurationError on unknown mode (#5255)
* Throw MisconfigurationError on unknown mode

* Add tests

* Add match condition for deprecation message
2021-01-12 02:31:26 -05:00
chaton 48718d7ce7
Feat: Add BackboneLambdaFinetunningCallback (#5377)
* Feat: Add BackboneLambdaFinetunningCallback

* update changelog

* resolve pep8 and update changelog

* add finetunning example

* resolve example

* iremove milestones from model

* iupdate

* update

* Update pytorch_lightning/callbacks/__init__.py

Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>

* Update pytorch_lightning/callbacks/__init__.py

Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>

* update

* add comments

* resolve test

* Update pytorch_lightning/callbacks/finetuning.py

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* Update tests/trainer/logging/test_logger_connector.py

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* update on comments

* resolve merge

Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-01-08 16:33:05 -05:00
chaton 5f94900361
[Feat] Cleanup ModelCheckpoint / EarlyStopping by moving logic to LoggerConnector (#5218)
* [bug-fix] Metric reduction with Logging (#5150)

* add test

* resolve bug

* udpate test

* wrongly copy / paste

* update test

* resolve a second bug

Co-authored-by: Ubuntu <ubuntu@ip-172-31-62-109.ec2.internal>

* iupdate

* resolve bugs

* add test back

* correct flake8

* resolve flake8

* update on comments

* update tests

* add a test

* add test

* update to Callable

Co-authored-by: Ubuntu <ubuntu@ip-172-31-62-109.ec2.internal>
2021-01-07 10:57:26 -05:00
chaton 56437e98a6 [bug-fix] Trainer.test points to latest best_model_path (#5161)
* resolve bug

* update code

* add set -e

* Update pytorch_lightning/callbacks/model_checkpoint.py

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* update test

* Update tests/checkpointing/test_trainer_checkpoint.py

Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>

* Update tests/checkpointing/test_trainer_checkpoint.py

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* update on comments

* resolve test

* convert to set

* update

* add error triggering

* update

* update on comments

* update

* resolve import

* update

* update

* Update pytorch_lightning/plugins/rpc_plugin.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* update

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Ubuntu <ubuntu@ip-172-31-62-109.ec2.internal>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

(cherry picked from commit d5b367871f)
2021-01-06 15:14:10 +01:00
Rohit Gupta f08c025c10 Allow log_momentum for adaptive optimizers (#5333)
* fix

* fix

* chlog

* no momentum warning

* Apply suggestions from code review

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* ref

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

(cherry picked from commit 371daea594)
2021-01-06 12:58:34 +01:00
Rohit Gupta 9cfbf8d609 Disable checkpointing, earlystopping and logging with fast_dev_run (#5277)
* Disable checkpointing, earlystopping and logger with fast_dev_run

* docs

* chlog

* disable callbacks and enable DummyLogger

* add log

* use dummy logger method

* Apply suggestions from code review

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: chaton <thomas@grid.ai>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

(cherry picked from commit f740245521)
2021-01-06 12:57:24 +01:00
Rohit Gupta fd5322d3e7 Update warning if ckpt directory is not empty (#5209) 2021-01-05 09:58:37 +01:00
chaton 6b19198aae [bug-fix] Metric reduction with Logging (#5150)
* add test

* resolve bug

* udpate test

* wrongly copy / paste

* update test

* resolve a second bug

Co-authored-by: Ubuntu <ubuntu@ip-172-31-62-109.ec2.internal>
2021-01-05 09:58:37 +01:00
Rohit Gupta 81e9d4260e Fix saved filename in ModelCheckpoint if it already exists (#4861)
* disable version if not required

* disable version if not required

* pep

* chlog

* improve test

* improve test

* parametrize test and update del_list

* Update pytorch_lightning/callbacks/model_checkpoint.py

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* try appending version to already saved ckpt_file

* Revert "try appending version to already saved ckpt_file"

This reverts commit 710e05e01f738d982aabf1f36c09fa59293e5c0c.

* add more assertions

* use BoringModel

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: chaton <thomas@grid.ai>
Co-authored-by: Roger Shieh <sh.rog@protonmail.ch>
2021-01-05 09:57:37 +01:00
Jirka Borovec fb90eec515
drop deprecated checkpoint filepath (#5321)
* drop deprecated checkpoint filepath

* tests
2021-01-02 00:08:29 +01:00
Jirka Borovec 0f36525e8f
fix/enable - check F401 (#5201)
* refactor - check F401

* missed

* fix
2020-12-21 10:15:04 +01:00
Jirka Borovec 2d54116baa
annotat unused vars (#5017)
* annotate all unused vars

* rank_zero_warn

* Apply suggestions from code review

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* f1 fixed

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2020-12-19 13:53:06 +01:00
Jirka Borovec 059eaecbb4
set xxx_AVAILABLE as protected (#5082)
* sett xxx_AVAILABLE as protected

* docs
2020-12-14 20:19:05 +05:30
Carlos Mocholí 0327f6b4c2
Do not warn when the name key is used in the lr_scheduler dict (#5057)
* Do not warn when the name key is used

* Missing line

* Consistency

* Update pytorch_lightning/callbacks/lr_monitor.py

* Update docs

* Update pytorch_lightning/core/lightning.py

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

* Update CHANGELOG

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2020-12-14 08:38:10 +01:00
Sean Naren ee9b3fe574
[feat] pp 1/n (#5016)
* Added changes for RPC plugin

* Add missing kwargs

* Fix code format

* Loading refactors by introducing is_distributed var, fix optimizer step flow

* Add rpc guard

* Added docstrings and typing

* resolve comments

* Add additional rpc hook, refactor name of exit process hook for clarity

* remove annotation

* Modify behaviour to allow optional return, add test for rpc plugin

* resolve tests

* rename is_ddp_based

* update

* update for windows

* update

* resolve test

* code smell

* Revert back to init_ddp_connection for backwards compat

* Swap to explicit name for property

* Add missing speed parity increase for CI variability, fix call counts for child process

Co-authored-by: tchaton <thomas@grid.ai>
2020-12-08 22:02:10 +00:00
chaton 02152c1729
Simplify optimization Logic (#4984)
* Rely on ddp plugin for blocking sync behaviour, and skip if we're using manual optimization

* debug

* Revert "debug"

This reverts commit ccca6b6b

* Expose manual reduce for automatic optimization

* Add input arguments

* Enable parity test

* clean imports

* Expose hook after to ensure we reset

* Fix naming

* add

* fix test

* uniformize optimizer logic

* resolve test

* resovle flake8

* resolve amp bug

* update tests

* remove bug

* remove optimizer_step in accelerators

* typo

* update lightning optimizer

* set doesn't work with ddp_spawn

* resolve flake8

* update threshold

* ignore pyright

* correct codeFactor

* remove useless if

* remove zer_grad function

* simplify step

* remove typo

* resolve bug

* Apply suggestions from code review

* update on comments

* resolve bugs

* remove tests

* Update pytorch_lightning/trainer/configuration_validator.py

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

* simplify testing

* add more tests

Co-authored-by: SeanNaren <sean@grid.ai>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2020-12-07 12:55:49 +00:00
Jan-Henrik Lambrechts b00991efd8
Added changeable extension variable for model checkpoints (#4977)
* Added changeable extension variable for model checkpoints

* Removed whitespace

* Removed the last bit of whitespace

* Wrote tests for FILE_EXTENSION

* Fixed formatting issues

* More formatting issues

* Simplify test by just using defaults

* Formatting to PEP8

* Added dummy class that inherits ModelCheckpoint; run only one batch instead of epoch for integration test

* Fixed too much whitespace formatting

* some changes

Co-authored-by: rohitgr7 <rohitgr1998@gmail.com>
2020-12-06 22:58:50 +05:30
Rohit Gupta 342a2b6f25
Deprecate auto mode from ModelCheckpoint and EarlyStopping (#4695)
* remove auto mode from callbacks

* chlog

* remove auto mode from callbacks

* mode

* mode

* move back

* update docs

* update docstrings

* docstring warning

* fix syntax

* Apply suggestions from code review

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* isort

* default to 'auto'

* syntax

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2020-12-04 16:11:58 +01:00
Jirka Borovec 442d57f1e9
simplify imports xla / TPU (#4872)
* xla

* tpu

* fix

* fix

* flake8
2020-11-27 00:37:48 +01:00
Teddy Koker 5b74effb1a
Update lr_monitor.py (#4826) 2020-11-24 02:07:33 -05:00
Adrian Wälchli 89e8796e2a
fix incomplete progress bar when refresh_rate > num batches (#4577)
* fix progress bar overshoot

* fix updates for partially incomplete main  progress bar when val loop starts

* add tests

* chlog
2020-11-24 00:01:33 +01:00
Simon-Martin Schröder 8601268c70
Fix #4375: Always use trainer.global_step for step (#4376)
* Fix #4375: Always use trainer.global_step for step

* Changelog

* Remove superflous use "epoch"

* Update Changelog

Co-authored-by: Nicki Skafte <skaftenicki@gmail.com>
2020-11-22 13:02:06 +01:00
Rohit Gupta db69d169e8
Deprecate prefix argument in ModelCheckpoint (#4765)
* Deprecate prefix in ModelCheckpoint

* chlog

Co-authored-by: chaton <thomas@grid.ai>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2020-11-21 18:08:42 +05:30
Carlos Mocholí 396a46f55f
Add current_score to ModelCheckpoint.on_save_checkpoint (#4721)
* Add current_score to ModelCheckpoint.on_save_checkpoint

* Update CHANGELOG

[ci skip]

* fix

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

* fix2

* Add test for NaN

* Fix failing tests

* Simplify line

* Add test docstrings

Co-authored-by: chaton <thomas@grid.ai>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2020-11-18 08:09:44 +00:00
Jeff Yang baa8558cc0
logger docs and api docs (#3950)
* logger and api docs

* remove gpu_usage_logger, lr_logger

* update docstring

* fix wandb example

* remove step result

* charts

* add some charts info

Co-authored-by: Teddy Koker <teddy.koker@gmail.com>
Co-authored-by: Nicki Skafte <skaftenicki@gmail.com>
2020-11-13 20:35:54 +05:30
Kai Zhang 30ad3e2ad3
Replace a MisconfigurationException with warning in ModelCheckpoint callback (#4560)
* replace MisconfigurationException with warning

* update test

* check raising UserWarning

Co-authored-by: chaton <thomas@grid.ai>
Co-authored-by: Nicki Skafte <skaftenicki@gmail.com>
2020-11-10 10:44:43 +01:00
William Falcon 09a51697ed
Adds shortcut for path to log (#4573)
* added log_dir shortcut to trainer properties for writing logs

* added log_dir shortcut

* added log_dir shortcut

* added log_dir shortcut

* added log_dir shortcut

* added log_dir shortcut

* added log_dir shortcut

* added log_dir shortcut

* added log_dir shortcut
2020-11-08 12:16:22 -05:00
Rohit Gupta db3cf2e1f4
Update default filename (#4500)
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Jeff Yang <ydcjeff@outlook.com>
2020-11-04 15:37:42 +05:30
Jirka Borovec ef03c39ab7
Add step index in checkpoint name (#3807)
* true final value of global step

* ch check

* tests

* save each validation interval

* wip

* add test

* add test

* wip

* fix tests, revert old edits, fix merge conflicts, update doctests

* test + bugfix

* sort files

* format test

* suggestion by ananth

* added changelog

* naming

* docs

* example

* suggestion

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* fix test

* pep

* pep

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2020-11-02 15:05:58 +01:00
Adrian Wälchli 6ae4c6ec85
update docs on checkpoint_callback Trainer argument (#4461)
* docs update

* update callbacks docs

* docs

* notebook examples

* warning

* line lenght

* update deprecation

Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
Co-authored-by: Roger Shieh <55400948+s-rog@users.noreply.github.com>
2020-11-02 06:18:20 +01:00
Jeff Yang 20a8eaa335
fix ModelCheckpoint docs (#4426) 2020-10-30 16:42:02 +06:30
Jeremy Jordan 1e1a42260a
add option to log momentum (#4384)
* add option to log momentum

* add docstring

* refactor for cleanliness

Co-authored-by: Nicki Skafte <skaftenicki@gmail.com>
2020-10-28 21:56:58 +05:30
Rohit Gupta b26c71eadf
Add optimizer hooks in callbacks (#4379)
* Add optimizer hooks in callbacks

* optimizer param

* update test

Co-authored-by: Nicki Skafte <skaftenicki@gmail.com>
2020-10-28 13:15:22 +01:00
Carlos Mocholí 00cc69aed7
Add "monitor" to saved ModelCheckpoints (#4383)
* Add key

* Remove unused variables

* Update CHANGELOG [skip ci]

* best_model_monitor -> monitor

Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2020-10-28 15:21:08 +05:30
Rohit Gupta 4c7ebdc32b
Add dirpath and filename parameter in ModelCheckpoint (#4213)
* Add dirpath and filename parameter in ModelCheckpoint

* remove old function

* chlog

* codefactor

* update tests

* docs

* fix doctest and added tests

* pathlib dirpath

* dep version and docs

* try fix doctest

* pep

* suggestions
Co-authored-by: carmocca <carlossmocholi@gmail.com>

* suggestions

* fix test

* pep

* trigger tests

* Apply suggestions from code review

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* suggestions

* try fix windows test

* add and update some tests

* trigger tests

* Apply suggestions from code review

* Apply suggestions from code review

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: William Falcon <waf2107@columbia.edu>
2020-10-23 09:59:12 +05:30
Rohit Gupta af449310aa
limit monitor callback with log_every_n_steps (#3881)
* limit monitor callback with row_log_interval

* try fix gpu test

* log_every_n_steps

* Apply suggestions from code review

* Apply suggestions from code review

* rebase and staticmethod

* suggestions

Co-authored-by: Jeff Yang <ydcjeff@outlook.com>
2020-10-22 16:38:03 +05:30
Dusan Drevicky 2ffad4c89f
Fix info message when EarlyStopping 'mode' not provided [ci skip] (#4282)
* Fix info message when EarlyStopping 'mode' not provided

* fixup! Fix info message when EarlyStopping 'mode' not provided

* Apply suggestions from code review

Co-authored-by: Jeff Yang <ydcjeff@outlook.com>

Co-authored-by: Jeff Yang <ydcjeff@outlook.com>
2020-10-21 23:44:13 +05:30
William Falcon 8a20d6af51
make save fx part of model checkpoint cb (#4284) 2020-10-21 10:06:42 -04:00
Jirka Borovec f37444fa3e
CI: add flake8 (#4239) 2020-10-19 21:20:17 +01:00
William Falcon 09c2020a93
notices (#4118) 2020-10-13 07:18:07 -04:00
Jirka Borovec 36f0c8a61d
remove deprecated callbacks (#3979) 2020-10-08 06:37:14 -04:00
William Falcon 048a816be3
added tests for the training epoch end (#3967) 2020-10-07 22:27:36 -04:00
William Falcon 4c0d063c86
outputs in __batch_end hooks (#3966)
* train_batch_end outputs

* added tests for the output hooks
2020-10-07 21:48:38 -04:00
William Falcon 65b6a6a497
0.10.0 (#3965) 2020-10-07 20:41:56 -04:00
Jeff Yang fe5b943965
Callback docs with autosummary (#3908)
* callback docs with autosummary

* do not show private methods

* callback base docstring
2020-10-06 17:28:45 -04:00
Jirka Borovec 064ae53d63
nb steps in early stop (#3909)
* nb steps

* if

* skip

* rev

* seed

* seed
2020-10-06 15:20:08 -04:00
Lezwon Castelino 69833dad5b
Added check to verify xla device is TPU (#3274)
* tpu device check

* replaced with xmp spawn

* Revert "replaced with xmp spawn"

This reverts commit 6835380f

* replaced all instances of XLA_AVAILABLE

* moved inner_f to global scope

* made refactors

* added changelog

* added TPU_AVAILABLE variable

* fix codefactor issues

* removed form trainer and early stopping

* add TORCHXLA_AVAILABLE check

* added tests

* refactoring

* Update pytorch_lightning/utilities/xla_device_utils.py

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* updated function names

* fixed bug

* updated CHANGELOG.md

* added todo

* added type hints

* isort and black

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: William Falcon <waf2107@columbia.edu>
2020-10-06 19:54:37 +02:00
Jirka Borovec 6ac0958166
fix init nan for checkpointing (#3863)
* add test for checkpoint nan

* fix

* pep
2020-10-05 07:36:12 -04:00
William Falcon d9656d166c
fixed model checkpoint frequency (#3852)
* fixed model checkpoint frequency

* fixed model checkpoint frequency

* fixed model checkpoint frequency

* fixed model checkpoint frequency

* merged
2020-10-04 21:49:20 -04:00
Adrian Wälchli cc9781a0ad
Deprecate early_stop_callback Trainer argument (part 2) (#3845)
* update tests with EarlyStopping default

* imports

* revert legacy tests

* fix test

* revert

* revert
2020-10-04 17:36:47 -04:00
Adrian Wälchli 1906867fd4
deprecation warning (#3844) 2020-10-04 13:17:09 -04:00
Rohit Gupta a628d181ee
Fix val_progress_bar total with num_sanity_val_steps (#3751)
* Fix val_progress_bar total with num_sanity_val_steps

* chlog

* Fix val_progress_bar total with num_sanity_val_steps

* move test

* replaced with sanity flag and suggestions
2020-10-04 08:32:18 -04:00
William Falcon d9bc95f83e
ref: bug fix with logging val epoch end + monitor (#3812)
* ref: fix metric err

* ref: fix metric err

* ref: fix metric err

* ref: merge

* ref: merge

* ref: merge

* ref: merge

* ref: decoupled ddp2

* ref: decoupled ddp2

* ref: decoupled ddp2

* ref: decoupled ddp2

* ref: decoupled ddp2

* ref: clean up ddp before final fix

* ref: clean up ddp before final fix

* ref: clean up ddp before final fix

* ref: clean up ddp before final fix

* ref: clean up ddp before final fix

* ref: clean up ddp before final fix

* ref: clean up ddp before final fix

* ref: clean up ddp before final fix

* ref: clean up ddp before final fix

* ref: clean up ddp before final fix

* ref: clean up ddp before final fix
2020-10-03 12:33:29 -04:00
ananthsub 192fc018f3
Update model_checkpoint.py (#3801) 2020-10-02 14:49:46 -04:00
William Falcon 7c61fc7c27
ref: fixes logging for eval steps (#3763)
* fixes logging for eval steps
2020-10-01 02:31:11 -04:00
William Falcon a38d108a68
add dist lib to enable syncing anything across devices (#3762)
* add dist lib to enable syncing anything across devices
2020-10-01 01:21:38 -04:00
William Falcon cf182e80fc
Finish Allow on_save_checkpoint... (#3688)
* Finish #3562

* Apply suggestions from code review

* Apply suggestions from code review

* fix tests

* Finish #3562

* Apply suggestions from code review

* Apply suggestions from code review

* fix tests

* fix structure

* fix structure

* make save_last test pass

* unnecessary global rank check

* fix test

* update test

* update test

* test

* test

* run save on all

* remove assert

* tracking saves

* check if fails

* test

* clean up

* adjust horovod test

* clean up

* remove unnecessary makdirs

* change

* undo

* debug

* debug

* debug

* debug

* mock

* undo debug code

* add extra assertions

* test

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <adrian.waelchli@inf.unibe.ch>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2020-09-30 16:15:29 -04:00
Adrian Wälchli c73032e39d
Make ModelCheckpoint(save_top_k=-1) track the best models (#3735)
* fix topk=-1 tracking best

* update test

* clean up

* add changelog

* enable loading best topk in trainer.test()

* make trivial

* return right away

* make windows test path happy
2020-09-30 08:34:02 -04:00
Carlos Mocholí 3b2efe5b2a
Fix ModelCheckpoint period (#3630)
* Fix ModelCheckpoint period

* Remove comma

* Minor changes

* skip check

* Revert "skip check"

Already pushed to master

This reverts commit 00d9e77b81.

Co-authored-by: Jirka Borovec <jirka@pytorchlightning.ai>
2020-09-29 15:36:45 +02:00
William Falcon 931995b55b
remove flake 8 (#3687) 2020-09-27 20:40:02 -04:00
Adrian Wälchli d15fd751c7
change default save_top_k, save_last to None (#3680)
* topk default

* fix test that doesn't have best available

* remove print

* #3680 changes

* fix backward

* temp revert

te

* add warning by carmocca

* format docstring for test

* specify monitor in ES test with top k

* improve docstring for save_last

* remove commented lines

* revert passing model to test

* undo regex mistake

* changelog

* fix test covering case monitor=None and savetopk=-1

* docstring

* fix test for saving all checkpoints

* don't save checkpoints for save_top_k=0

* add test for savetopk=0

Co-authored-by @carmocca

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2020-09-27 20:05:02 -04:00
Pariente Manuel 3d76f604bd
Add ModelCheckpoint.to_yaml method (#3048)
* Add ModelCheckpoint.to_json()

* Add ModelCheckpoint.to_json() test

* Fix W292: Add new line at end of file

* Apply suggestions from code review

* Apply suggestions from code review

Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>

* Fixed tests

* Update pytorch_lightning/callbacks/model_checkpoint.py

* Apply suggestions from code review

* fix test

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2020-09-27 14:39:40 +02:00
William Falcon d79bce1dff
enable None model checkpoint default (#3669)
* enable None model checkpoint default

* enable None model checkpoint default

* enable None model checkpoint default

* enable None model checkpoint default

* enable None model checkpoint default

* enable None model checkpoint default

* enable None model checkpoint default

* enable None model checkpoint default

* enable None model checkpoint default

* enable None model checkpoint default

* enable None model checkpoint default

* enable None model checkpoint default

* enable None model checkpoint default

* enable None model checkpoint default

* enable None model checkpoint default

* enable None model checkpoint default

* enable None model checkpoint default

* enable None model checkpoint default

* enable None model checkpoint default

* enable None model checkpoint default

* enable None model checkpoint default

* enable None model checkpoint default
2020-09-26 23:14:04 -04:00
Carlos Mocholí e70aea7642
Allow ModelCheckpoint monitor to be None (#3633)
* Fix ModelCheckpoint period

* Test for less epochs
2020-09-25 15:54:04 +02:00
Carlos Mocholí 908382f196
Split GPUStatsMonitor function (#3644)
* Split function

* Add docstrings

* Add typing annotations

* Minor refactor

* Make static to add a test
2020-09-25 07:30:30 +02:00
ananthsub c61e1e697d
Add stronger typing to gradient accumulation scheduler callback (#3558)
* Update gradient_accumulation_scheduler.py

add types for gradient accumulation scheduler callback

* Update gradient_accumulation_scheduler.py
2020-09-23 20:22:10 +02:00
William Falcon c591013708
enable any logged metric to be accessible in callbacks (#3598)
* enable any logged or written metric to be accessible in callbacks

* enable any logged or written metric to be accessible in callbacks

* enable any logged or written metric to be accessible in callbacks

* enable any logged or written metric to be accessible in callbacks

* enable any logged or written metric to be accessible in callbacks

* enable any logged or written metric to be accessible in callbacks

* enable any logged or written metric to be accessible in callbacks

* enable any logged or written metric to be accessible in callbacks

* enable any logged or written metric to be accessible in callbacks

* enable any logged or written metric to be accessible in callbacks

* enable any logged or written metric to be accessible in callbacks

* enable any logged or written metric to be accessible in callbacks

* enable any logged or written metric to be accessible in callbacks

* enable any logged or written metric to be accessible in callbacks

* enable any logged or written metric to be accessible in callbacks

* enable any logged or written metric to be accessible in callbacks

* enable any logged or written metric to be accessible in callbacks

* enable any logged or written metric to be accessible in callbacks

* enable any logged or written metric to be accessible in callbacks

* enable any logged or written metric to be accessible in callbacks

* enable any logged or written metric to be accessible in callbacks

* clarify forward

* clarify forward

* clarify forward

* clarify forward
2020-09-22 18:00:23 -04:00
Carlos Mocholí 1223cdbaa1
Add missing line. Add a test (#3594) 2020-09-21 22:17:51 -04:00
William Falcon 21cfdf6874
ref: result 1/n (make monitor default to checkpoint_on to simplify re… (#3571)
* ref: result 1/n (make monitor default to checkpoint_on to simplify result syntax)

* ref: result 1/n (make monitor default to checkpoint_on to simplify result syntax)

* ref: result 1/n (make monitor default to checkpoint_on to simplify result syntax)

* ref: result 1/n (make monitor default to checkpoint_on to simplify result syntax)

* ref: result 1/n (make monitor default to checkpoint_on to simplify result syntax)

* ref: result 1/n (make monitor default to checkpoint_on to simplify result syntax)

* ref: result 1/n (make monitor default to checkpoint_on to simplify result syntax)

* ref: result 1/n (make monitor default to checkpoint_on to simplify result syntax)

* ref: result 1/n (make monitor default to checkpoint_on to simplify result syntax)

* ref: result 1/n (make monitor default to checkpoint_on to simplify result syntax)

* ref: result 1/n (make monitor default to checkpoint_on to simplify result syntax)

* ref: result 1/n (make monitor default to checkpoint_on to simplify result syntax)

* ref: result 1/n (make monitor default to checkpoint_on to simplify result syntax)

* ref: result 1/n (make monitor default to checkpoint_on to simplify result syntax)

* ref: result 1/n (make monitor default to checkpoint_on to simplify result syntax)

* ref: result 1/n (make monitor default to checkpoint_on to simplify result syntax)

* ref: result 1/n (make monitor default to checkpoint_on to simplify result syntax)

* ref: result 1/n (make monitor default to checkpoint_on to simplify result syntax)

* ref: result 1/n (make monitor default to checkpoint_on to simplify result syntax)

* Update pytorch_lightning/callbacks/model_checkpoint.py

Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>

* ref: result 1/n (make monitor default to checkpoint_on to simplify result syntax)

* force crash when max_epochs < epochs in a checkpoint

Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
2020-09-20 22:58:43 -04:00
ananthsub 31388eda1a
Add type hints to model checkpoint callback (#3560)
* Update gradient_accumulation_scheduler.py

add types for gradient accumulation scheduler callback

* Apply black formatting to model checkpoint callback

auto-format, no other changes

* Update gradient_accumulation_scheduler.py

drop other changes

* Add type hints to model checkpoint callback

* Update model_checkpoint.py

remove trainer/lightning modules types to avoid circular import
2020-09-19 18:26:49 -04:00
Jirka Borovec 8eb77cd06a
drop v0.10 deprecated (#3454)
* drop v0.10 deprecated

* import

* missed
2020-09-19 11:47:26 -04:00
Carlos Mocholí 580b04b490
Fix ModelCheckpoints name formatting (#3163)
* Fix ModelCheckpoint's name formatting

* Fix failing tests

* Add dot to CHECKPOINT_SUFFIX

* Set variables to their default values at the end of tests

* Fix logic for filepath='' and filename=None. Add test

* Fix Windows tests

* Fix typo. Remove leading line break and zeroes

* Remove CHECKPOINT_SUFFIX

* Fix typos. Use appropriate f-string format

* Apply suggestions from code review

* Fix broken tests after #3320

* Finish changes suggested by Borda

* Use explicit test var names

* Apply suggestions

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

* Apply suggestions

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

* Update CHANGELOG

* Apply suggestions from code review

* for

* prepend whitespace in warn msg

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2020-09-18 23:09:11 +02:00
Lucas Steinmann 197acd535f
Fix early stopping with training step's return dict (#3347)
* Fixes the test for early stopping without val step.

The expression which checked, if early stopping was triggered, had an off-by-one error and hence was true even if early stopping was not triggered.

Furthermore set patience to 0 and max epochs to 10, to ensure loss has enough time to flatten.

* Fixes early stopping without val step.

The issue has been, that only `early_stop_on` key was checked and not an arbitrary monitor key.

* Fixes branch, which checks whether early stopping is done during validation.

Before only `val_early_stop_on` was checked. Since arbitrary keys can be used, the set of possible validation keys cannot be exhaustive. Hence this disables "early stopping on_train_epoch_end" via an instance attribute if early stopping was executed in on_validation_epoch_end.
Furthermore adds a test, which ensures arbitrary keys work.

* Improve check whether eval results are used.

Only disable early checking with train results if eval results are actually used. Before they were always disabled in ``on_validation_epoch_end``.
Rename and document instance variable, to make it more clear.

* Remove wrong documentation on behaviour of early stopping with train result' dict.

* Apply suggestions from code review

* Apply suggestions from code review

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2020-09-18 23:08:04 +02:00
William Falcon 1d7c615d82
cleaning up stale logger tests + flake8 (#3490)
* cleaning up stale logger tests

* cleaning up stale logger tests

* cleaning up stale logger tests

* cleaning up stale logger tests

* cleaning up stale logger tests

* cleaning up stale logger tests
2020-09-14 00:06:48 -04:00
Jeff Yang 951048a81e
docs: use ref for anchor links, fix a few typo (#3486) 2020-09-13 21:04:21 -04:00
William Falcon ef20310873
ref: move specific accelerator code x/n (#3457)
* ref: organize args x/n

* ref: move specific accelerator code x/n

* ref: move specific accelerator code x/n

* ref: move specific accelerator code x/n
2020-09-11 10:56:21 -04:00
William Falcon 7e874d70d6
ref: inner train loop (intermediate step) 19/n (#3385)
* ref: inner train loop (intermediate step) 19/n

* Update debugging.py

* ref: inner train loop (intermediate step) 19/n
2020-09-07 11:55:14 -04:00
William Falcon 0b5b70d6c9
ref: inner train loop (intermediate step) 17/n (#3376)
* ref: inner train loop (intermediate step) 17/n

* ref: inner train loop (intermediate step) 17/n

* ref: inner train loop (intermediate step) 17/n
2020-09-07 09:31:42 -04:00
Rohit Gupta 24809b0b26
Refactor GPUStatsMonitor to improve training speed (#3257)
* Refactor GPUMonitor to improve training speed

* added gpu ids to monitor

* update tests

* added deprecation warning

* pep

* fix test

* fix docs

* fix log_gpu_memory

* move deprecation check

* chlog

* Update CHANGELOG.md

* suggestions and fix

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2020-09-04 06:02:16 -04:00
Rohit Gupta 4a22fca524
Changed LearningRateLogger to LearningRateMonitor (#3251)
* Change LearningRateLogger to LearningRateMonitor

* file rename

* docs

* add LearningRateLogger with deprecation warning

* deprecated LearningRateLogger

* move deprecation check

* chlog

Co-authored-by: Jirka Borovec <jirka@pytorchlightning.ai>
2020-09-03 18:17:15 +00:00
Brendan Fahy 2d8c1b7c54
use fsspec instead of gfile for all IO (#3320)
* use fsspec instead of gfile for all IO

This better supports remote (and local) file operations with a dedicated package

* Apply suggestions from code review

Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>

* chlog

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
Co-authored-by: Jirka Borovec <jirka@pytorchlightning.ai>
2020-09-03 14:19:20 +02:00
Lezwon Castelino 3910ad0330
bugfix/3185 transpose (#3252)
* change t() to transpose() as xla devices do not support .t() on 1-dim tensor

* detach tensor before copying

* Revert "detach tensor before copying"

This reverts commit 37cc7bbe

* changed dims

* added test_result_obj_on_tpu

* detach before copying

* detach before copying

* detach before copying

* replace torch.cat with sum
2020-09-01 09:17:52 -04:00
Jeremy Jordan a5d1176cf6
callback method for on_save_checkpoint (#2501)
* initial draft

* fix test

* Update pytorch_lightning/trainer/callback_hook.py

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* fix tests

* remove old code

* untested upgrade script

* document limitations

* clean up and add tests

* Update pytorch_lightning/trainer/training_io.py

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* reflect PR comments

* fix formatting

* Update docs/source/callbacks.rst

* clarify docs

* revert change for loading checkpoints

* small edits

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2020-08-28 16:50:52 +02:00
Rohit Gupta f03943ee94
Fix GpuUsageLogger to work on different platforms (#3008)
* Fix GpuUsageLogger

* docstrings

* misconfigexception

* add basic tests

* skip doctest

* fix parameter and docstring

* rm cl

* skip doctest

* cleanup

* chlog

* add suggestions from review

* add test from suggestions

* fix import

* fix test

* fix test

* fix test

* fix test

* rename GpuUsageLogger to GPUStatsMonitor

* doc fix

* Apply suggestions from code review

* update docs format

* update docs

* miss

* merge

* fix title formatting

* unindent

* punctuation

* simplify if statements

* fix test

* suggestions

* pep

* Update CHANGELOG.md

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* fix on_train_batch_*

* use AttributeDict

* usage

* rank zero

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* import

* minor changes

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Adrian Wälchli <adrian.waelchli@inf.unibe.ch>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2020-08-27 19:50:32 +02:00
William Falcon f3c63f7746
tests to ensure correct dataloader calls (#3221)
* tests to ensure correct dataloading interval and sequence

* tests to ensure correct dataloading interval and sequence

* tests to ensure correct dataloading interval and sequence

* tests to ensure correct dataloading interval and sequence

* tests to ensure correct dataloading interval and sequence
2020-08-27 09:49:46 -04:00
Duc Pham 4d98419bb8
Fix potential typo in early stopping `monitor` keys (#3213)
* Fix typo

* ref: group prepare data hook (6) (#3212)

* group prepare data hook

* group prepare data hook

* group prepare data hook

* group prepare data hook

* group prepare data hook

* group prepare data hook

* group prepare data hook

* Fix typo

Co-authored-by: William Falcon <waf2107@columbia.edu>
2020-08-26 22:21:30 -04:00
William Falcon a1705441a9
ref: remove _evaluate fx (#3197)
* remove _evaluate

* remove _evaluate

* remove _evaluate

* remove _evaluate

* remove _evaluate

* remove _evaluate

* remove _evaluate

* remove _evaluate
2020-08-26 12:28:14 -04:00
William Falcon bda1400225
ref: restore on_eval_start hook (#3183)
* restore eval loop hook
2020-08-26 00:45:43 -04:00
William Falcon 2f6d82e0e6
ref: remove on_eval_start hook (#3176)
* remove on_eval_start hook

* remove on_eval_start hook
2020-08-25 22:28:00 -04:00