Commit Graph

4112 Commits

Author SHA1 Message Date
Carlos Mocholí 939d56c6d6
Drop PyTorch 1.7 support () 2022-03-27 21:31:20 +00:00
Bruno Cabado e618a331fd
Allow log to an existing run ID in MLflow with MLFlowLogger ()
Co-authored-by: bruno.cabado <bruno.cabado@cinfo.es>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2022-03-27 17:28:57 +00:00
Adam Reeve 7c7a4ba233
Fix SWA LR scheduler not being stepped ()
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2022-03-27 15:49:41 +00:00
DuYicong515 01d817cb9f
Deprecate `Trainer.gpus` ()
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2022-03-27 16:53:28 +02:00
Kushashwa Ravi Shrimali 92a2a6e951
Preparing for 1.6.0rc1 () 2022-03-25 18:23:47 +01:00
Rohit Gupta 48f171006d
Avoid fallback on CPU if no devices are provided () 2022-03-25 15:59:06 +00:00
Rohit Gupta e631a66530
Update TQDM progress bar tracking with multiple dataloaders () 2022-03-25 15:13:35 +00:00
Kaushik B 28dac0c8d9
Update tpu_cores flag with accelerator and devices flag () 2022-03-25 11:57:02 +00:00
Ivan Švogor 25b771ca08
Create the loss accumulator directly on the device ()
Co-authored-by: Ivan Svogor <ivan.svogor@iarai.ac.at>
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2022-03-25 12:46:17 +01:00
ananthsub 9ac636335e
Update fit_loop.py () 2022-03-25 11:38:09 +01:00
Jerome Anand 812c2dc3d3
Add support for Habana accelerator (HPU) ()
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
Co-authored-by: Aki Nitta <nitta@akihironitta.com>
Co-authored-by: thomas chaton <thomas@grid.ai>
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
Co-authored-by: four4fish <88516121+four4fish@users.noreply.github.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Kaushik B <kaushikbokka@gmail.com>
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
Co-authored-by: jjenniferdai <89552168+jjenniferdai@users.noreply.github.com>
Co-authored-by: Kushashwa Ravi Shrimali <kushashwaravishrimali@gmail.com>
Co-authored-by: Akarsha Rao <94624926+raoakarsha@users.noreply.github.com>
Co-authored-by: Jirka <jirka.borovec@seznam.cz>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.comk-Pro.local>
2022-03-25 10:24:52 +00:00
DuYicong515 cfc971700d
Remove AcceleratorConnector.parallel_devices () 2022-03-25 01:45:40 +00:00
DuYicong515 b5b951b05a
Remove AcceleratorConnector.devices () 2022-03-24 17:35:46 -07:00
Danielle Pintz 6329be60be
Replace PostLocalSGDOptimizer with a dedicated model averaging component () 2022-03-24 17:33:19 -07:00
jjenniferdai d4a4b77906
[3/3] Update lightning callbacks to `Stateful`, deprecations for old `on_save/load_checkpoint` signatures () 2022-03-25 00:06:10 +00:00
Carlos Mocholí 71e0ddb62f
`ModelCheckpoint`'s `save_last` now ignores `every_n_epochs` ()
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2022-03-24 20:06:52 +01:00
Kaushik B dcc973e019
Add `AcceleratorRegistry` ()
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
2022-03-24 18:29:32 +00:00
Carlos Mocholí 45400be921
Do not print empty evaluation result tables () 2022-03-24 15:26:35 +05:30
ananthsub d418cf23b2
Do not configure launcher if processes are launched externally () 2022-03-24 09:40:34 +00:00
Carlos Mocholí 51575dcf60
Remove manual optimization `find_unused_parameters` override () 2022-03-24 00:17:18 +00:00
Ning a9bfcc7407
Call `Strategy.process_dataloader` in `data_connector.py` () 2022-03-23 22:57:56 +00:00
DuYicong515 923174147d
Remove Accelerator.parallel_device_ids and deprecate Trainer.data_parallel_device_ids () 2022-03-23 22:18:30 +00:00
ananthsub ebbe938dc1
Use debug instead of detail logging for per-iteration hooks () 2022-03-23 21:59:09 +00:00
Kaushik B 7b0d1183db
Update `gpus` flag with `accelerator` and `devices` flag ()
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2022-03-23 19:52:12 +00:00
DuYicong515 7a6efb38b2
fix merge issue () 2022-03-23 11:17:17 -07:00
Carlos Mocholí 1c18d5ecbc
Update version for rc0 release () 2022-03-23 15:15:16 +00:00
Carlos Mocholí cf3bc728b1
Add docs and message for DDP static graph () 2022-03-23 14:16:20 +00:00
Rohit Gupta 0a53e15759
Fix deepspeed keeping old sub-folders in same ckpt path ()
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2022-03-23 13:36:13 +00:00
Rohit Gupta c822a6ac2d
fix returning logged metrics instead of callback metrics during evaluation ()
Co-authored-by: thomas chaton <thomas@grid.ai>
2022-03-23 12:56:11 +00:00
Rohit Gupta 312c5a5af1
Raise a warning when `nn.Module` instance is saved with `save_hyperparameters()` () 2022-03-23 12:49:42 +00:00
Adrian Wälchli 94fe322533
Do not mark LightningModule methods as abstract ()
* do not mark LightningModule methods as abstract

* add concrete test
2022-03-23 08:55:12 +00:00
DuYicong515 491fa02aa3
Remove `AccleratorConnector.num_ipus` and deprecate `Trainer.ipus` () 2022-03-23 07:00:14 +00:00
Danielle Pintz 905a4d8c6a
Add profiling for `on_load_checkpoint`/`on_save_checkpoint` callback and LM hooks () 2022-03-22 10:24:06 -07:00
DuYicong515 5d156f4ff6
Remove `AcceleratorConnector.tpu_id` ()
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2022-03-22 15:51:38 +05:30
DuYicong515 5fbe467168
Remove `AcceleratorConnector.num_processes` and deprecate `Trainer.num_processes` () 2022-03-22 10:11:27 +00:00
Akash Kwatra bc1c8b926c
Deprecate `BaseProfiler` in favor of `Profiler` ()
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
Co-authored-by: Kaushik B <kaushikbokka@gmail.com>
2022-03-21 20:17:03 +00:00
ananthsub d99625fc8d
Reduce number of times optimizers are instantiated with FSDP () 2022-03-21 18:18:59 +01:00
DuYicong515 31c68d107e
Remove `AcceleratorConnector.num_gpus` and deprecate `Trainer.num_gpus` () 2022-03-21 18:06:39 +01:00
Danielle Pintz caed77f155
Refactor `TorchElasticEnvironment.detect` to use `torch.distributed.is_torchelastic_launched` ()
* Refactor TorchElasticEnvironment.detect to use native utility from torch.distributed

* fix version and tests

* fix version

* Update tests/accelerators/test_accelerator_connector.py

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2022-03-21 16:51:24 +01:00
Rohit Gupta 865c54f308
Fix deepspeed scheduler initialization () 2022-03-21 10:31:00 +00:00
DuYicong515 523200971d
Remove `AcceleratorConnector.root_gpu` and deprecate `Trainer.root_gpu` () 2022-03-19 23:53:50 +00:00
jjenniferdai 6ba66789ae
[2/n] add `Stateful` functionality support for Callbacks () 2022-03-19 20:20:50 +00:00
Adrian Wälchli eda53d70c3
update docs for ModelCheckpoint save_last () 2022-03-19 20:15:54 +00:00
DuYicong515 ed2bcc5ab3
Deprecate `Trainer.devices` in favor of `Trainer.num_devices` and `Trainer.device_ids` () 2022-03-18 12:38:57 -07:00
ananthsub 4277845fa7
Add support for specifying process group backend to relevant distributed strategies () 2022-03-17 23:38:03 -07:00
Danielle Pintz 601948a4bf
Deprecate `Trainer.use_amp` () 2022-03-18 06:14:35 +00:00
Danielle Pintz 2360049744
Deprecate `LightningModule.use_amp` () 2022-03-18 03:49:18 +01:00
Danielle Pintz f8e50f9cf5
Fix the case where logger=None is passed to Trainer () 2022-03-18 02:18:28 +00:00
edward-io 90a9da5abb
check trainerfn == FITTING before configuring sync_batchnorm ()
Co-authored-by: edward-io <me@edward.io>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Aki Nitta <nitta@akihironitta.com>
2022-03-12 03:52:59 +00:00
four4fish 4d74f379a5
Only allow one value for each plugin type in `plugins` flag () 2022-03-11 19:36:23 +00:00