Commit Graph

452 Commits

Author SHA1 Message Date
Ethan Harris b9bc77293b
Fix inconsistent outputs in `on_*_end` and `*_end` (#6969)
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
2021-04-13 15:16:21 +01:00
ananthsub e891ceb836
Remove evaluation loop legacy dict returns for `*_epoch_end` hooks (#6973)
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-04-13 12:37:54 +01:00
Roger Shieh e35192dfcd
Update `DataLoader.persistent_workers` warnings in ddp_spawn (#6762)
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-04-09 11:38:13 +02:00
Akihiro Nitta 5e4dfd75d2
[RFC] Add `self.lr_schedulers()` to LightningModule for manual optimization (#6567)
* Add test for lr_schedulers()

* Add lr_schedulers to LightningModule

* Update test comment

* Update CHANGELOG
2021-04-09 10:32:14 +01:00
Adrian Wälchli 9c9e2a0325
fix gpus default for Trainer.add_argparse_args (#6898) 2021-04-09 11:20:43 +02:00
ananthsub 851f9e3997
Move NaN/Inf detection to a separate utilities file (#6834)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2021-04-09 01:47:02 +02:00
Sean Naren 742c48e994
[Fix] Ensure we set the eval/train flag correctly on accelerator model (#6877)
* Ensure we move the model to eval mode before running evaluation

* Ensure we set the flag appropriately across all stages

* Add test, move hooks logic

* Apply same fix to the validate loop

* Update pytorch_lightning/trainer/trainer.py

* Fix function name

* Fix order, add predict

* Shorten the name

* Fix input dm, drop duplicate on predict start hook call, as it's called in the setup function

* Use hook, remove double call
2021-04-08 14:04:26 -04:00
Anthony Kim 7f6154fcad
Add `Trainer(gradient_clip_algorithm='value'|'norm')` (#6123)
* add changelog

* add clip by value

* fix bug in training tricks.rst

* fix bug in trainer.rst

* Update trainer.rst

* Update trainer.rst

* Update CHANGELOG.md

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* Update pytorch_lightning/plugins/precision/deepspeed_precision.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* Update pytorch_lightning/utilities/enums.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* yapf formatting

* update training tricks

* update based on comment

* update based on comment

* Update pytorch_lightning/trainer/trainer.py

Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>

* update based on comment

* pep8

* mypy

* mypy

* Update docs/source/advanced/training_tricks.rst

Co-authored-by: thomas chaton <thomas@grid.ai>

* Update sharded_native_amp.py

* Update test_sharded_parity.py

* update test codes

* Update test_tpu.py

* Update pytorch_lightning/trainer/connectors/training_trick_connector.py

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* Update test_trainer.py

* Update enums.py

* Update enums.py

* add super-class initialization to precision plugins.

* add clip_grad horovod cpu test

* add clip_grad horovod cpu test

* use subprocess check_call

* change order of horovod tests

* set max_epochs 2 in horovod test

* remove clip_grad_val test from horovod-cpu

* remove "type: ignore"

* divide clip grad val test in horovod

* update based on comments

* add super-class initialization to precision plugins.

* bugfix

* bugfix

* revert some changes

* revert some changes

* Update tests/models/test_horovod.py

* merge master

* Delete signature test

No point in testing a signature

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: thomas chaton <thomas@grid.ai>
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Jirka Borovec <jirka.borovec@seznam.cz>
2021-04-06 08:27:37 -05:00
Adrian Wälchli 127c52af74
Fix EarlyStopping logic when min_epochs not met (#6705)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-04-06 12:41:07 +01:00
Adrian Wälchli 264aa689de
fix boolean check on iterable dataset when len not defined (#6828)
* fix iterable dataset len check

* update predict and validate

* add validate to test

* add changelog

* add predict
2021-04-05 17:47:21 +01:00
Yuan-Hang Zhang 1bd5f36a5b
Fix validation progress counter with check_val_every_n_epoch > 1 (#5952)
Co-authored-by: rohitgr7 <rohitgr1998@gmail.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-04-02 17:40:41 +09:00
Carlos Mocholí 0dd2deebea
Remove legacy support for the magic `log`/`progress_bar` keys in dict returns (#6734) 2021-03-31 00:28:04 +02:00
thomas chaton 1302766f83
DeepSpeed ZeRO Update (#6546)
* Add context to call hook to handle all modules defined within the hook

* Expose some additional parameters

* Added docs, exposed parameters

* Make sure we only configure if necessary

* Setup activation checkpointing regardless, saves the user having to do it manually

* Add some tests that fail currently

* update

* update

* update

* add tests

* change docstring

* resolve accumulate_grad_batches

* resolve flake8

* Update DeepSpeed to use latest version, add some comments

* add metrics

* update

* Small formatting fixes, clean up some code

* Few cleanups

* No need for default state

* Fix tests, add some boilerplate that should move eventually

* Add hook removal

* Add a context manager to handle hook

* Small naming cleanup

* wip

* move save_checkpoint responsability to accelerator

* resolve flake8

* add BC

* Change recommended scale to 16

* resolve flake8

* update test

* update install

* update

* update test

* update

* update

* update test

* resolve flake8

* update

* update

* update on comments

* Push

* pull

* Update pytorch_lightning/plugins/training_type/deepspeed.py

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* Update pytorch_lightning/plugins/training_type/deepspeed.py

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* update

* Apply suggestions from code review

* Swap to using world size defined by plugin

* update

* update todo

* Remove deepspeed from extra, keep it in the base cuda docker install

* Push

* pull

* update

* update

* update

* update

* Minor changes

* duplicate

* format

* format2

Co-authored-by: SeanNaren <sean@grid.ai>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Jirka Borovec <jirka.borovec@seznam.cz>
2021-03-30 13:39:02 -04:00
Carlos Mocholí 90444706b2
Remove logger_connector legacy code (#6733) 2021-03-30 12:33:33 +02:00
Kaushik B f79a13e495
[Model Parallel] Add configure sharded model hook (#6679)
* Add base hook for model parallel

* fix callback signature

* Simplify hook

* Add hook logic

* add tests

* add property setter

* add logic for being called once

* Update changelog

* Fix

* fix return type

* fix lambda callback test

* Fix tests

* Apply code suggestions

* add logic for setup_optimizers_predispatch

* add common dummy model

* Swap call order

* Remove test that isn't needed anymore

* Update tests

* Add a bit more doc

* Few code review fixes

* Update pytorch_lightning/accelerators/accelerator.py

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* Change hook name

* Fix test

* Test setup hook, refactor names

* Swap call order of callbacks and model initialization

* Change name of context manager

Co-authored-by: SeanNaren <sean@grid.ai>
Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-03-29 14:50:51 -06:00
Łukasz Zalewski cca0eca5f3
More explicit exception message when testing with fast_dev_run=True (#6667)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-03-29 13:29:54 +00:00
Carlos Mocholí f0c5479de9
Remove legacy `Result` parameters (#6016) 2021-03-28 11:55:08 +02:00
thomas chaton 0e45220263
[warning] Add warning when values are not being reduced (#6417)
* add warning non reduced

* add test

* update test

* update changelog

* Update pytorch_lightning/trainer/connectors/logger_connector/epoch_result_store.py

Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>

* update

Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
2021-03-26 18:33:11 +00:00
Carlos Mocholí bc613611e2
Do not add return dict items to callback_metrics (#6682) 2021-03-26 14:05:20 +01:00
Jirka Borovec 217c12a4e7
Simplify deprecations (#6620)
* use external deprecate

* simplify

* simplify

* simplify

* flake8

* .

* others

* .
2021-03-25 15:26:38 +01:00
Rohit Gupta 9be092dbdb
Add on_epoch_start to run at the beginning of every loop irrespective of train/val/test (#6498)
* update docs

* add hook and update docs

* update tests

* chlog

* Update CHANGELOG.md

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* chlog

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-03-25 14:20:49 +01:00
Carlos Mocholí 2dd6f9e09d
`MetricsHolder` clean-up + typing (#6645)
* Metrics holder cleanup and better error message

* Update pytorch_lightning/trainer/connectors/logger_connector/logger_connector.py

* _VALUE -> _METRIC_TYPE
2021-03-24 20:34:46 +01:00
Ethan Harris 741c452551
Fix disabled grads after call to predict (#6657) 2021-03-23 23:07:48 +01:00
Carlos Mocholí 51b10f78f4
Refactor PyTorch profiler 4/5 (#6349)
Co-authored-by: thomas chaton <thomas@grid.ai>
2021-03-23 18:13:29 +01:00
thomas chaton 0995d30fab
Flash predict step (#6577)
* add predict_step

* Update predict_loop.py

* Update trainer.py

* Update trainer.py

* resolve bugs

* update

* update

* update

* resolve bug

* resolve some failing tests

* udpate tests

* update

* resolve tests

* add a test

* remove typo

* add a test for attachement

* update

* changed to on_train_dataloader

* remove __flash_special_attr__

* resolve tests

* update

* update

* update

* update on comments

* Update pytorch_lightning/trainer/data_loading.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2021-03-23 11:13:13 -04:00
Carlos Mocholí 36d180e532
Refactor base profilers 3/5 (#6621)
Co-authored-by: tchaton <thomas@grid.ai>
2021-03-23 10:07:35 +00:00
Carlos Mocholí 51c9260fad
Move profiler tests (#6619) 2021-03-21 23:39:55 +00:00
Kaushik B 37f22c99ff
Add trainer.predict config validation (#6543)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-03-21 21:07:54 +00:00
Kaushik B b190403e28
Add outputs param for `on_val/test_epoch_end` hooks (#6120)
* add outputs param for on_val/test_epoch_end hooks

* update changelog

* fix warning message

* add custom call hook

* cache logged metrics

* add args to docstrings

* use warning cache

* add utility method for param in sig check

* Update CHANGELOG.md

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* update docstring

* add test for eval epoch end hook

* add types and replace model ref

* add deprecation test

* fix test fx name

* add model hooks warning

* add old signature model to tests

* add clear warning cache

* sopport args param

* update tests

* add tests for model hooks

* code suggestions

* add signature utils

* fix pep8 issues

* fix pep8 issues

* fix outputs issue

* fix tests

* code fixes

* fix validate test

* test

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2021-03-16 12:15:16 -04:00
Jirka Borovec a312219d42
Prune metric: helpers and inputs 3/n (#6547)
* _basic_input_validation

* _check_shape_and_type_consistency

* _check_num_classes_binary

* _check_num_classes_mc

* _check_num_classes_ml

* _check_top_k

* _check_classification_inputs

* _input_format_classification

* _reduce_stat_scores

* DataType

* rest

* flake8

* chlog
2021-03-16 13:54:06 +01:00
Jirka Borovec 0f07eaf51a
refactor reading env defaults (#6510)
* change tests

* fix

* test

* _defaults_from_env_vars

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-03-16 10:10:17 +00:00
Roger Shieh c48fc6a2ce
[test] lr_find with bs_scale (#6422)
* init test: test_lr_find_with_bs_scale

* Update test_lr_finder.py

* remove gpu req

* try boring model

* custom boring model

* pep8

* fix typo

* Update test_lr_finder.py

* typo

* typo
2021-03-15 22:43:35 +05:30
Jirka Borovec b341b53f70
deprecate metrics pkg (#6505)
* deprecate metrics

* examples

* req

* docs

* Apply suggestions from code review

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Nicki Skafte <skaftenicki@gmail.com>

* pep8

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Nicki Skafte <skaftenicki@gmail.com>
2021-03-15 14:39:38 +00:00
Eric Cousineau e886d55ac1
argparse: Add use_argument_group=True (#6088)
* argparse: Add inplace option

Replicate in GAN model

* datamodule: Deduplicate logic w/ argparser utilities

* Update pl_examples/domain_templates/generative_adversarial_net.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Akihiro Nitta <nitta@akihironitta.com>

* Keep docstrings

* Correct name

* Whitespace

* Consistency

* fix weird type stuff

* try alt - use_argument_group

* fix syntax + lint

* fix ci errs

* fix ci

* change examples... still failing w/ "unrecognized arguments: --batch_size"

* address review

* mnist_datamodule: add some docstrings

* argparse: check cls or cls.__init__ for param

didn't capture issue, but meh

* fix lint

* fix no-doc edge case

* address review

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Akihiro Nitta <nitta@akihironitta.com>
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-03-11 10:50:49 -05:00
Elia Cereda f4cc7451a9
Add Trainer.validate(…) method to run one validation epoch (#4948)
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
Co-authored-by: chaton <thomas@grid.ai>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-03-11 03:46:37 +01:00
Kaushik B 74d79e7e0e
Raise an exception if check_val_every_n_epoch is not an integer (#6411)
* raise an exception if check_val_every_n_epoch is not an integer

* remove unused object

* add type hints

* add return type

* update exception message

* update exception message
2021-03-10 12:08:53 +05:30
Adrian Wälchli 615b2f7363
Improve DummyLogger (#6398)
* fix dummy logger

* docs

* update docs

* add changelog

* add none return annotation

* return empty string for name, version
2021-03-09 23:18:38 +00:00
Jirka Borovec 55dd3a4c64
Typing for tests 1/n (#6313)
* typing

* yapf

* typing
2021-03-09 11:27:15 +00:00
Adrian Wälchli fc6d402733
fix logger creating directory structure too early in DDP (#6380)
* fix

* add simple test

* fix imports

* add changelog

* tighter test with on_fit_start hook closer to the dispatch call

* move class inside test f unction

* add a comment
2021-03-09 09:49:59 +00:00
Adrian Wälchli 718074b99a
Fix trainer not resetting lightning_optimizers (#6372)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-03-08 09:58:03 +08:00
Rohit Gupta 38a5fe7af1
Remove optimizer_idx arg in manual optimization (#6093)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: chaton <thomas@grid.ai>
2021-03-07 08:48:50 +01:00
Rohit Gupta facfda85f1
Remove no return warning from val/test step (#6139)
* remove warning

* auto_opt

* chlog

* auto_opt

* no_warning_call

* rm old code

* add warning for predict

* Apply suggestions from code review

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-03-06 17:15:21 +00:00
Elia Cereda d0596fac94
Refactor RunningStage usage in advance of implementing Trainer.validate() (#4945)
* Update code

Co-authored-by: EliaCereda

* More property updates

* Move properties. Introduce trainer._fitting

* Use trainer.fitting

* Fix reset dataloaders

* Unused code

* RunningStage.SANITY_CHECKING

* Use setters

* Fix bugs

* Fix bugs

* TrainerState.{FITTING,VALIDATING,TESTING,PREDICTING,TUNING}

* Fix bugs

* Fix bugs

* Fix tests

* Update CHANGELOG. Add deprecation warning. Fix tests

* Unused imports

* Optional trainer

* More deprecation. More refactoring

* Correct version

* Use properties

* Address comments

* flake8

* Missed renamings

* Typo

* is -> ==

It is recommended to use  for Enums since they are singletons, however, since the LightningEnum subclasses str, it's not a good idea in case a user sets the state/stage with a str

* Also for tests

* Typo

* Address @tchaton's comments

* PEP8

* Correct property

* Update CHANGELOG

* Apply suggestions from code review

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* Update pytorch_lightning/trainer/trainer.py

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* Remove called sanity check

Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-03-06 12:40:19 +00:00
thomas chaton 2ec67a48b3
[bug] Fix Pytorch profiler with emit_nvtx (#6260)
* resolve bug

* update changelog

* Update tests/trainer/test_trainer.py

* Update pytorch_lightning/profiler/profilers.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* resolve comments

* resolve flake8

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2021-03-05 21:12:03 +01:00
thomas chaton 46540ee260
[bugfix] Resolve memory leak for evaluation (#6326)
* resolve bug

* resolve flake8

* revert name
2021-03-05 16:52:56 +09:00
Jirka Borovec b9cf1223b9
missing tests default_root_dir=tmpdir (#6314)
* default_root_dir=tmpdir

* miss
2021-03-04 19:23:12 +00:00
Jirka Borovec bf6ba83aef
prune duplicite test in optim (#6312) 2021-03-03 15:41:00 +09:00
Jirka Borovec d1a03153f3
Refactor: runif for spec 6/6 (#6307)
* special

* rpc
2021-03-02 18:57:13 +00:00
Jirka Borovec ac583781db
Refactor: Runif for TPU and Horovod 5/n (#6301)
* TPU

* horovod

* extra

* fix

* Apply suggestions from code review

Co-authored-by: Nicki Skafte <skaftenicki@gmail.com>

* doc

Co-authored-by: Nicki Skafte <skaftenicki@gmail.com>
2021-03-02 16:21:20 +00:00
Nicki Skafte 24c3a3fc3e
Add possibility for custom naming when using multiple dataloaders (#6274) 2021-03-02 17:03:36 +01:00