Commit Graph

2522 Commits

Author SHA1 Message Date
Akash Kwatra bc1c8b926c
Deprecate `BaseProfiler` in favor of `Profiler` (#12150)
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
Co-authored-by: Kaushik B <kaushikbokka@gmail.com>
2022-03-21 20:17:03 +00:00
DuYicong515 31c68d107e
Remove `AcceleratorConnector.num_gpus` and deprecate `Trainer.num_gpus` (#12384) 2022-03-21 18:06:39 +01:00
Danielle Pintz caed77f155
Refactor `TorchElasticEnvironment.detect` to use `torch.distributed.is_torchelastic_launched` (#12376)
* Refactor TorchElasticEnvironment.detect to use native utility from torch.distributed

* fix version and tests

* fix version

* Update tests/accelerators/test_accelerator_connector.py

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2022-03-21 16:51:24 +01:00
four4fish 1eff3b53c1
Update fairscale version (#11567)
Co-authored-by: Aki Nitta <nitta@akihironitta.com>
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
Co-authored-by: Jirka <jirka.borovec@seznam.cz>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2022-03-21 11:38:55 +00:00
Rohit Gupta 865c54f308
Fix deepspeed scheduler initialization (#12031) 2022-03-21 10:31:00 +00:00
DuYicong515 523200971d
Remove `AcceleratorConnector.root_gpu` and deprecate `Trainer.root_gpu` (#12262) 2022-03-19 23:53:50 +00:00
jjenniferdai 6ba66789ae
[2/n] add `Stateful` functionality support for Callbacks (#12232) 2022-03-19 20:20:50 +00:00
DuYicong515 ed2bcc5ab3
Deprecate `Trainer.devices` in favor of `Trainer.num_devices` and `Trainer.device_ids` (#12151) 2022-03-18 12:38:57 -07:00
ananthsub 4277845fa7
Add support for specifying process group backend to relevant distributed strategies (#11745) 2022-03-17 23:38:03 -07:00
Danielle Pintz 601948a4bf
Deprecate `Trainer.use_amp` (#12312) 2022-03-18 06:14:35 +00:00
Danielle Pintz 2360049744
Deprecate `LightningModule.use_amp` (#12315) 2022-03-18 03:49:18 +01:00
Danielle Pintz f8e50f9cf5
Fix the case where logger=None is passed to Trainer (#12249) 2022-03-18 02:18:28 +00:00
edward-io 90a9da5abb
check trainerfn == FITTING before configuring sync_batchnorm (#11919)
Co-authored-by: edward-io <me@edward.io>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Aki Nitta <nitta@akihironitta.com>
2022-03-12 03:52:59 +00:00
four4fish 4d74f379a5
Only allow one value for each plugin type in `plugins` flag (#12083) 2022-03-11 19:36:23 +00:00
Jirka Borovec c90174ca31
unify logger testing (#9081)
Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2022-03-11 14:24:30 +00:00
Jirka Borovec 8577ef7bba
Skip horovod 0.24.0 only (#12248)
* try skip horovod 0.24.0 only
* HOROVOD_BUILD_CUDA_CC_LIST
* fix test

Co-authored-by: Aki Nitta <nitta@akihironitta.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2022-03-10 16:01:08 +00:00
jjenniferdai d31126c331
Support passing `storage_options` in `trainer.save_checkpoint()` API (#11891) 2022-03-09 18:35:50 +00:00
Carlos Mocholí 49a4a36ad4
Have the outputs match the loops format (#12182)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2022-03-08 18:10:18 +00:00
Carlos Mocholí 8fa156948a
Add `LightningCLI(auto_registry)` (#12108) 2022-03-08 12:26:10 -05:00
jjenniferdai f3253070c4
Deprecate `LightningDataModule.on_save/load_checkpoint` (#11893) 2022-03-07 18:21:46 -08:00
Carlos Mocholí aea96e45a4
Integrate global step with progress tracking (#11805) 2022-03-07 19:21:37 +00:00
Rohit Gupta fc499bf56f
Disable tuner with distributed strategies (#12179)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2022-03-07 08:45:07 +00:00
four4fish 91052dc6d5
Move ipu precision flag check to IPUPrecisionPlugin init (#12148) 2022-03-05 09:03:24 +00:00
ananthsub 9c3d6b8fc7
Deprecate `LightningModule.on_pretrain_routine_{start/end}` (#12122) 2022-03-04 22:17:08 -08:00
Akash Kwatra eff67d7a02
Deprecate `AbstractProfiler` in favor of `BaseProfiler` (#12106) 2022-03-05 02:35:57 +00:00
Danielle Pintz 0b682b807a
Mark `logger_connector` as protected (#12195) 2022-03-05 02:33:42 +00:00
Louis Taylor 73bda54e63
CI: update poplar sdk version (#12226) 2022-03-04 23:49:30 +00:00
Ethan Harris ac735db0a0
Remove `data_pipeline` attribute patch (#12204) 2022-03-04 23:09:37 +00:00
Akash Kwatra 1f7298d326
Deprecate `LoggerCollection` in favor of `trainer.loggers` (#12147)
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2022-03-04 23:01:43 +00:00
jjenniferdai 5d2a3eab69
add `state_dict`/`load_state_dict` to base `Callback` (#11998) 2022-03-04 02:41:48 +00:00
four4fish 15364c18c8
Check `parallel_devices` passed through `strategy` is consistent with the `accelerator` flag (#12105) 2022-03-03 10:30:24 -08:00
jjenniferdai d923dff627
Deprecate `PrecisionPlugin.on_save/load_checkpoint` (#11978) 2022-03-02 10:14:55 -08:00
jjenniferdai 89d37569d8
add `accelerator.is_available()` check (#12104)
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Kaushik B <kaushikbokka@gmail.com>
2022-03-02 10:07:49 +00:00
Adrian Wälchli 0e24140fe4
Improve mechanism to reset the seed after sanity check (#11870)
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2022-03-01 23:27:30 +00:00
Adrian Wälchli d4d197070f
Add `SyncBatchNormPlugin` (#11754)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
2022-03-01 19:41:40 +05:30
Danielle Pintz 0fe3379fa4
Deprecate `weights_save_path` from the Trainer constructor (#12084)
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2022-02-28 22:45:26 +00:00
Carlos Mocholí 6309a59c3c
Do not prefetch when possible (#12101) 2022-02-28 18:31:18 +00:00
Kushashwa Ravi Shrimali 02ccd874b9
Stop loading a few properties if checkpoint's `dirpath` has changed (#12045)
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2022-02-28 16:42:09 +00:00
Kaushik B a52a6ea030
Add support for pluggable Accelerators (#12030)
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2022-02-28 21:36:23 +05:30
Carlos Mocholí a9024ce870
[CLI] Fix `SaveConfigCallback` with DDP spawn (#12011) 2022-02-28 13:27:42 +00:00
Cai Q.T 01c31ae434
Fix `LightningModule.{un,}toggle_model` when only 1 optimizer is used (#12088)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2022-02-28 12:41:51 +00:00
Rohit Gupta 17bb815d01
Add `estimated_stepping_batches` property to `Trainer` (#11599) 2022-02-28 12:40:48 +00:00
Rohit Gupta 5b342f14a6
fix to avoid common hook warning if no hook is overridden (#12131) 2022-02-28 18:07:05 +05:30
Carlos Mocholí db1c709519
Clean loop fetching usage (#12103) 2022-02-28 10:51:33 +00:00
Carlos Mocholí 5f920dc088
Refactor Horovod NCCL check (#11948) 2022-02-28 10:45:32 +00:00
Mauricio Villegas 54b9a85227
Unit test for CLI with subcommands and a common default config file (#12061)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2022-02-28 10:17:49 +00:00
DuYicong515 c9af112801
Remove `AcceleratorConnector.num_nodes` (#12107)
Co-authored-by: Aki Nitta <nitta@akihironitta.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2022-02-28 09:53:38 +00:00
Carlos Mocholí 8fd17f2edf
[IPU] Support manually instantiating the `poptorch.DataLoader` (#12116) 2022-02-28 09:36:26 +00:00
DuYicong515 0b677ecf2b
Remove `AcceleratorConnector.has_tpu` (#12109) 2022-02-27 14:16:03 +00:00
DuYicong515 b2932337bc
Remove `AcceleratorConnector.has_ipu` (#12111)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2022-02-27 13:36:36 +00:00