Commit Graph

54 Commits

Author SHA1 Message Date
Carlos Mocholí aea96e45a4
Integrate global step with progress tracking (#11805) 2022-03-07 19:21:37 +00:00
Akash Kwatra 7e2f9fbad5
Refactor codebase to use `trainer.loggers` over `trainer.logger` when needed (#11920) 2022-02-25 16:01:04 -08:00
Carlos Mocholí 789fae828d
Fix `current_epoch` value on training end (#8578)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2022-02-10 17:55:59 +01:00
ananthsub a64438c897
Centralize rank_zero_only utilities into their own module (#11747)
* Centralize rank_zero_only utilities into their own module

Fixes #11746

* PossibleUserWarning

* Update test_warnings.py

* update imports

* more imports

* Update CHANGELOG.md

* Update mlflow.py

* Update cli.py

* Update api_references.rst

* Update meta.py

* add deprecation tests

* debug standalone

* fix standalone tests

* Update CHANGELOG.md
2022-02-07 08:09:55 +00:00
Krishna Kalyan 6586dd23b7
Mark `CheckpointConnector` as protected (#11550)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2022-02-03 02:26:08 +00:00
Carlos Mocholí a44881cd90
Changes in preparation to #8578 (#11562) 2022-02-02 19:57:08 +00:00
Carlos Mocholí 075b8801c9
Fix checkpoint values when saving and resetting the tuner state (#11518) 2022-01-20 18:54:40 +00:00
Carlos Mocholí 62818dbace
Use a dataclass as the scheduler config (#11443) 2022-01-18 20:23:32 +01:00
four4fish cf5ef32f7b
Deprecate Trainer.training_type_plugin in favor of trainer.strategy (#11141)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-12-22 02:11:43 +00:00
Carlos Mocholí fa6d17c96f
Fix typing for utilities.warnings (#11115) 2021-12-17 15:07:27 +01:00
Carlos Mocholí 5ba5b72473
Update tests to avoid the deprecated `weights_summary` (#10446) 2021-11-11 18:15:18 +01:00
Ning f6ed0bd8ca
introduce has_len_all_ranks() to check the length of dataloader across ranks (#9827)
* introduce , udpate tests

* update CHANGELOG.md

* change staticmethod and hook attribute naming

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix typo

* remove non-essential comment

* fix merge error and comment format

* try to fix test_tpu.py failure

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update on comments

* chlog

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* chlog

* update

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* try fix

* Revert back TPUSpawn changes

* Update test

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: thomas chaton <thomas@grid.ai>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: Kaushik B <kaushikbokka@gmail.com>
2021-11-02 13:22:58 -04:00
Danielle Pintz e94dcf6936
Mark `trainer.data_connector` as protected (#10031)
Co-authored-by: tchaton <thomas@grid.ai>
2021-10-25 12:29:09 +01:00
Adrian Wälchli 2c16f1d6b9
remove dataloader patching on the LightningModule (#9764)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: rohitgr7 <rohitgr1998@gmail.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-10-20 15:23:20 +02:00
Elad Segal 8c76cf5ae1
reset val dataloader for binsearch (#9975) 2021-10-18 12:54:26 +02:00
Rohit Gupta 46fa703853
disable_logger (#9837) 2021-10-11 16:36:59 +05:30
Rohit Gupta d71501d97f
Reset `val_dataloader` in `tuner/batch_size_scaling` (#9857)
* reset val

* chlog
2021-10-11 09:13:33 +01:00
Rohit Gupta 83d83abc9d
Fix `lr_find` to generate same results on multiple calls (#9704)
* dump global_step

* add test

* chlog
2021-09-26 19:20:42 +00:00
Rohit Gupta a3def9d228
Use a unique filename to save temp ckpt in tuner (#9682)
* unique filename

* chlog

* update tests
2021-09-25 11:28:51 +00:00
Jirka Borovec 6e124e7207
CI: precommit - docformatter (#8584)
* CI: precommit - docformatter
* fix deprecated

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-09-06 12:49:09 +00:00
Elad Segal 413f7b2894
fix batch auto scaling when `init_val` causes OOM (#8954)
* fix batch auto scaling when `init_val` causes OOM

* Update CHANGELOG.md

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-08-18 09:56:16 +02:00
Carlos Mocholí a1264a6850
Automatic string fixes (#8886) 2021-08-13 14:28:14 +00:00
Carlos Mocholí a64cc37394
Replace `yapf` with `black` (#7783)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-07-26 13:37:35 +02:00
Carlos Mocholí 4a64bc3fd3
Fix DeepSpeed lr scheduler logic (#8527)
* Fix deepspeed scheduler logic

* Fix tests

* Minor changes

* Improve tests

* inference fix

* CHANGELOG

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2021-07-23 10:08:58 +01:00
Adrian Wälchli 4becd1cf31
rename old `Trainer.train_loop` -> `Trainer.fit_loop` (#8025) 2021-06-22 11:49:32 +02:00
Adrian Wälchli 8c32bf2dd4
refactor on_gpu handling in checkpoint connector (#7860) 2021-06-07 11:30:22 +02:00
Xinyao(Alvin) Sun 0c958c5a1f
Fix dataloaders are not reset when tuning the model (#7566)
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-05-24 10:21:45 +02:00
Adrian Wälchli ad9118f04a
remove trainer hidden state | sanity refactor [1 / n] (#7437)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-05-11 11:09:08 +02:00
ramonemiliani93 5db832f181
Fix auto scaling mode when calling tune method on trainer. (#7321)
* Add test for non-existing mode, the test should fail if something different from `power` or `binsearch` is passed.

* Add newline.

* Apply fix

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* Update tests/tuner/test_scale_batch_size.py

* Update pytorch_lightning/tuner/batch_size_scaling.py

Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
2021-05-04 12:03:51 +00:00
Carlos Mocholí 5af086ab9f
Attach data refactor and tuner bugs [4/n] (#7258)
Co-authored-by: Nicki Skafte <skaftenicki@gmail.com>
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
2021-04-30 13:54:58 +00:00
ananthsub 969e857690
Rename `trainer._launch` to `trainer._run` (#7265)
* rename-run

* fix
2021-04-30 13:39:02 +01:00
Carlos Mocholí a5ac3f8a16
Code cleaning in preparation for #7258 [3/n] (#7262) 2021-04-29 14:40:51 +02:00
Carlos Mocholí bdc4272e99
`_launch` refactor and types [1/n] (#7232) 2021-04-28 17:41:08 +02:00
Adrian Wälchli bc577ca792
fix duplicate console logging bug v2 (#6275)
Co-authored-by: chaton <thomas@grid.ai>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2021-03-02 15:17:55 +05:30
Kunal Mundada 3371d32664
docstring changes in tuner (#6264)
* docstring changes in tuner

* added full stop
2021-03-02 09:22:44 +08:00
Adrian Wälchli 02ac4b0b6a
Replace .get_model() with explicit .lightning_module (#6035)
* rename get_model -> lightning_module

* update references to get_model

* pep8

* add proper deprecation

* remove outdated _get_reference_model

* fix cyclic import
2021-02-18 15:59:54 +01:00
Rohit Gupta 99da0d92a5
update lr_finder to check for attribute if not running fast_dev_run (#5990)
* ref lr_finder a bit

* chlog

* Apply suggestions from code review

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2021-02-17 07:15:29 -05:00
Jirka Borovec 79d42d83e7
formatting 3/n: PL modules (#5716)
* cb

* log

* prof

* tune

* flake8
2021-02-08 14:28:38 -05:00
Arnaud Gelas 6386b8d36b
Fix isort a few failures (#5504)
Remove from skipped module in pyproject.toml and fix failures on:
- pytorch_lightning/callbacks/*.py
- pytorch_lightning/cluster_environments/*.py
- pytorch_lightning/profiler/*.py
- pytorch_lightning/tuner/*.py

Co-authored-by: Jirka Borovec <jirka.borovec@seznam.cz>

Co-authored-by: Jirka Borovec <jirka.borovec@seznam.cz>
2021-01-15 17:44:27 -05:00
Jirka Borovec 54d20dc596
Refactor: clean trainer device & distrib getters (#5300)
* warnings

* .

* .

* flake8

* .

* .

* .

* use_tpu

* use_dp

* .

* use_ddp

* .

* use_horovod

* .

* .

* .
2021-01-12 05:22:37 -05:00
Rohit Gupta 6d2aeff26a
fast_dev_run can be int (#4629)
* fast_dev_run can be int

* pep

* chlog

* add check and update docs

* logging with fdr

* update docs

* suggestions

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* fdr flush logs

* update trainer.fast_dev_run

* codefactor and pre-commit isort

* tmp

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Roger Shieh <sh.rog@protonmail.ch>
Co-authored-by: edenlightning <66261195+edenlightning@users.noreply.github.com>
2020-12-09 01:37:53 +05:30
Mohamed Al Salti cd90dd429b
Fix batch_arg_name bug (#4812)
Add `batch_arg_name` to all calls to `_adjust_batch_size`
2020-11-23 11:34:11 +05:30
Nicki Skafte 4f3160ba2e
Skip tuner algorithms on fast dev (#3903)
* skip on fast dev

* fix error

* changelog

* fix recursive issue

* combine tests

* pep8

* move logic to base funcs

* fix mistake

* Update pytorch_lightning/tuner/lr_finder.py

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

* pep

Co-authored-by: William Falcon <waf2107@columbia.edu>
Co-authored-by: Nicki Skafte <nugginea@gmail.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: chaton <thomas@grid.ai>
2020-11-10 00:34:42 +01:00
Rohit Gupta 1396321b4d
Add fsspec to tuner (#4458)
* Add fsspec to tuner

* suggestions

* pathlib

* pep

* missed pep
2020-11-03 15:09:40 +05:30
Adrian Wälchli d1234c592d
deprecate passing ModelCheckpoint instance to Trainer(checkpoint_callback=...) (#4336)
* first attempt

* update tests

* support multiple

* test bugfix

* changelog

* pep

* pep

* import order

* import

* improve test for resuming

* test

* update test

* add references test

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* docstring suggestion deprecation

Co-authored-by: Jeff Yang <ydcjeff@outlook.com>

* paramref

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Jeff Yang <ydcjeff@outlook.com>
2020-10-30 04:47:37 +01:00
Jirka Borovec f37444fa3e
CI: add flake8 (#4239) 2020-10-19 21:20:17 +01:00
Akihiro Nitta b45b57cc58
Use `Optional` for arguments set to `None` by default (#4164)
* Use `Optional` for variables set to `None` by default

* Use `Optional` instead of `Union[None, ...]` for consistency
2020-10-15 23:02:50 +02:00
William Falcon 05e0b4e5a1
Revert "Remove limitation of batch scaler (#4006)" (#4040)
This reverts commit 7e756ca11f.
2020-10-09 21:03:23 -04:00
Nicki Skafte 7e756ca11f
Remove limitation of batch scaler (#4006)
* working code

* add tests

* fix scaling

* move patch dataloader to utils

* renaming

* fix tests

* add changelog

* update docs

* pep8
2020-10-09 14:53:01 -04:00
Jirka Borovec 8873750cf0
remove deprecated early_stop_callback (#3982) 2020-10-08 06:30:33 -04:00