Commit Graph

226 Commits

Author SHA1 Message Date
jjenniferdai 89d37569d8
add `accelerator.is_available()` check (#12104)
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Kaushik B <kaushikbokka@gmail.com>
2022-03-02 10:07:49 +00:00
Jv Kyle Eclarin 9c067c2a3e
Update `tests/plugins/*.py` to use `devices` instead of `gpus` or `ipus` (#11872) 2022-02-21 22:57:21 +01:00
ananthsub 1b107c5892
Add `Accelerator.is_available()` interface requirement (#11797) 2022-02-09 15:11:27 -08:00
ananthsub a64438c897
Centralize rank_zero_only utilities into their own module (#11747)
* Centralize rank_zero_only utilities into their own module

Fixes #11746

* PossibleUserWarning

* Update test_warnings.py

* update imports

* more imports

* Update CHANGELOG.md

* Update mlflow.py

* Update cli.py

* Update api_references.rst

* Update meta.py

* add deprecation tests

* debug standalone

* fix standalone tests

* Update CHANGELOG.md
2022-02-07 08:09:55 +00:00
Carlos Mocholí 5914fb748f
Add typing to accelerators/gpu.py (#11333) 2022-01-12 19:44:51 +00:00
Andrew Tritt dbf1acd5a5
Modify LSFEnvironment to use more reliable environment variable (#10825)
Co-authored-by: thomas chaton <thomas@grid.ai>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2022-01-05 12:45:25 +00:00
Kaushik B 93223ff5ce
Introduce StrategyRegistry (#11233)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2022-01-05 17:14:18 +05:30
Kaushik B 650c710efa
Rename training plugin test files & names to strategy (#11303) 2022-01-04 14:32:45 +01:00
Adrian Wälchli c210e338ef
Update strategy import statements (#11231) 2021-12-23 08:26:28 +01:00
Kaushik B 576a5d62a0
Introduce strategies directory for Training Strategies (#11226)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-12-22 20:23:30 +00:00
Carlos Mocholí eb5b350f9a
Remove explicit isinstance checks in strategies for checkpoint io (#11177)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-12-22 04:41:45 +00:00
Adrian Wälchli ba8e7cd787
Fix BF16 teardown for TPU precision plugin (#10990)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
Co-authored-by: thomas chaton <thomas@grid.ai>
2021-12-22 03:47:14 +00:00
four4fish cf5ef32f7b
Deprecate Trainer.training_type_plugin in favor of trainer.strategy (#11141)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-12-22 02:11:43 +00:00
Adrian Wälchli 17ad1a4c00
Rename `ParallelPlugin` to `ParallelStrategy` (#11123) 2021-12-22 01:09:17 +00:00
four4fish 4bfe5bda0f
Rename the DDPSpawnShardedPlugin to DDPSpawnShardeedStrategy (#11210)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-12-22 00:27:36 +00:00
Aki Nitta 28ce9105e4
Rename `SingleDevicePlugin` to `SingleDeviceStrategy` (#11181)
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-12-21 23:56:14 +00:00
four4fish f98cd78e9e
Renamed the `DDPSpawnPlugin` to `DDPSpawnStrategy` (#11145) 2021-12-21 23:06:14 +00:00
four4fish 1c5a5c3dfe
Renamed the DDP2Plugin to DDP2Strategy (#11185)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-12-21 19:21:00 +00:00
four4fish caab69aabb
Renamed DDPShardPlugin to DDPShardStrategy (#11187)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-12-21 17:18:25 +00:00
Aki Nitta 9da78a94bd
Rename `TPUSpawnPlugin` to `TPUSpawnStrategy` (#11190)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-12-21 16:36:16 +00:00
Kaushik B 283bdece0a
Rename DeepSpeedPlugin to DeepSpeedStrategy (#11194) 2021-12-21 15:18:01 +00:00
four4fish b64dea9dc3
Rename `DDPPlugin` to `DDPStrategy` (#11142)
* Raname DDPPlugin to DDPStrategy

* Change ddp_plugin to ddp_strategy

* update changelog

* rename occurences in docs

* rename more occurrences

* fix line too long

* more fixes

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-12-21 08:55:51 +00:00
Carlos Mocholí e8169bbd46
Fix setter usage for checkpoint io and precision in TTP (#11071)
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2021-12-20 17:45:32 +01:00
Adrian Wälchli f5c2881b68
3/n Simplify spawn plugins: Merge `pre_dispatch` and `setup` logic (#11137) 2021-12-20 17:41:22 +01:00
four4fish 0ee78e96ef
Rename `DDPFullyShardedPlugin` to `DDPFullyShardedStrategy` (#11143)
* Rename DDPFullyShardedPlugin to DDPFullyShardedStrategy

* update fsdp_plugin to fsdp_strategy

* update changelog

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-12-20 17:11:20 +01:00
Kaushik B 2a5d05b562
Fix tpu spawn plugin test (#11131) 2021-12-18 02:53:37 +00:00
Sean Naren c66cd12445
Remove partitioning of model in ZeRO 3 (#10655) 2021-12-17 12:36:53 +00:00
Adrian Wälchli 1a7084634a
Remove leftover `clean_logger` call in tests (#11080) 2021-12-17 00:23:32 +00:00
four4fish cec2d7946b
3/n Move accelerator into Strategy (#11022)
* remove training_step() from accelerator

* remove test, val, predict step

* move

* wip

* accelerator references

* cpu training

* rename occurrences in tests

* update tests

* pull from adrian's commit

* fix changelog merge pro

* fix accelerator_connector and other updates

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix doc build and some mypy

* fix lite

* fix gpu setup environment

* support customized ttp and accelerator

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix tpu error check

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix precision_plugin initialization to recognisze cusomized plugin

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update bug_report_model.py

* Update accelerator_connector.py

* update changelog

* allow shorthand typing references to pl.Accelerator

* rename helper method and add docstring

* fix typing

* Update pytorch_lightning/trainer/connectors/accelerator_connector.py

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* Update tests/accelerators/test_accelerator_connector.py

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* Update tests/accelerators/test_cpu.py

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix pre commit complaint

* update typing to long ugly path

* spacing in flow diagram

* remove todo comments

* docformatter

* Update pytorch_lightning/plugins/training_type/training_type_plugin.py

* revert test changes

* improve custom plugin examples

* remove redundant call to ttp attribute

it is no longer a property

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Apply suggestions from code review

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-12-16 04:41:34 +00:00
Adrian Wälchli ffb1a754af
Standardize model attribute access in training type plugins (#11072) 2021-12-15 16:37:21 +01:00
jona-0 7aee00c679
[DeepSpeed] fix flag forwarding in DeepSpeedPlugin (#10899)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Sean Naren <sean@grid.ai>
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-12-14 15:56:08 +00:00
Adrian Wälchli a4083df586
2/n Simplify spawn plugins: Spawn immediately (#10896) 2021-12-09 18:56:24 +00:00
Adrian Wälchli 46f718d2ba
Fix typing in `pl.plugins.environments` (#10943) 2021-12-07 02:14:02 +00:00
Adrian Wälchli 6c79b2e969
Change temporary spawn checkpoint name (#10934) 2021-12-06 16:08:55 +00:00
Adrian Wälchli d92ab96f17
Simplify some ddp-spawn tests #10921 2021-12-03 17:37:40 +01:00
Rohit Gupta 8ba3b383c0
Fix filtration logic for eval results with multiple dataloaders (#10810)
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-12-03 14:34:46 +00:00
Adrian Wälchli 98cb7e8790
1/n Simplify spawn plugins: Simplify handling of multiprocessing queue (#10034)
Co-authored-by: thomas chaton <thomas@grid.ai>
2021-12-02 10:30:44 +00:00
Andres Algaba 1a26af1519
Add job_name as a staticmethod in SLURMEnvironment class (#10698)
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-12-01 00:01:44 +00:00
Carlos Mocholí 8e1b9b306c
Skip hanging spawn tests (#10838)
* Skip hanging spawn tests

* Docstring fix

* Add back to TPU spawn
2021-11-30 18:36:12 +00:00
Andres Algaba e0474f8f0f
Add test for `job_id` (#10774) 2021-11-30 11:53:55 +01:00
Adrian Wälchli 97e52619ea
Fix typing in `pl.overrides.data_parallel` (#10796)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-11-29 10:58:23 +01:00
Carlos Mocholí 152eb57def
Rename special to standalone (#10779) 2021-11-26 17:13:14 +00:00
puhuk af0bb96f0f
Remove the "_precision" suffix from some precision plugin files (#10052) 2021-11-19 17:37:39 +00:00
Adrian Wälchli 085e82f454
Introduce `ClusterEnvironment.detect()` (#10564)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-11-19 12:24:10 +00:00
four4fish 700521c7d3
1/n Move precision plugin into strategy - update reference (#10570)
* 1/n move precision plugin into strategy - update reference

* update precision plugin reference in tpu_spawn

* add missing reference in error message

* add back removed license line

* update references in tests

* update reference in trainer

* update return annotation for precision_plugin property on TTP

* simplify access to precision plugin reference in sharded plug

* add changelog

* remove precision property from ttp and add deprecation message

* fix make doc and update precision reference

* simplify a reference to precision

accidentally overridden Adrian's change, now add it back

* Update CHANGELOG.md

add Adrian's change back

* Update accelerator precision

Add Adrian's change back

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add none check for precision plugin

just to be safe

* Update ipu.py

* update precision_plugin param deprecation message

* Update accelerator.py

* Remove deprecated warning 

Tests will fail after 9940

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-11-19 00:39:01 +00:00
Adrian Wälchli 0f6d89422b
Control automatic resubmission on SLURM (#10601) 2021-11-18 17:48:53 +00:00
Carlos Mocholí 0fa07da987
Fail the test when a `DeprecationWarning` is raised (#9940) 2021-11-17 23:41:50 +01:00
Sean Naren e98ace3adc
[DeepSpeed] Do not fail if batch size could not be inferred for logging (#10438) 2021-11-16 11:42:25 +00:00
Carlos Mocholí 6dfcb6afc5
Skip strategy=ddp_spawn, accelerator=cpu, python>=3.9 tests (#10550) 2021-11-16 10:06:47 +05:30
Carlos Mocholí 7a9a08c5d3
Drop torch 1.6 testing (#10390)
* Drop torch 1.6 support

* Drop 1.6 support

* Update CHANGELOG

* Fixes

* Split change

* Undo change

* 1.7 -> 1.7.1

https://github.com/pytorch/pytorch/issues/47354

* Force trigger nightly

* Update .github/workflows/events-nightly.yml

Co-authored-by: Aki Nitta <nitta@akihironitta.com>

* Revert 1.7.1 change - try wildcard

* Update adjust versions and test it

* Undo test changes

* Revert "Undo test changes"

This reverts commit 3a6acadd11.

* Update CHANGELOG.md

Co-authored-by: Aki Nitta <nitta@akihironitta.com>
2021-11-13 20:35:03 +00:00