Carlos Mocholí
939d56c6d6
Drop PyTorch 1.7 support ( #12432 )
2022-03-27 21:31:20 +00:00
Bruno Cabado
e618a331fd
Allow log to an existing run ID in MLflow with MLFlowLogger ( #12290 )
...
Co-authored-by: bruno.cabado <bruno.cabado@cinfo.es>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2022-03-27 17:28:57 +00:00
Adam Reeve
7c7a4ba233
Fix SWA LR scheduler not being stepped ( #12446 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2022-03-27 15:49:41 +00:00
DuYicong515
01d817cb9f
Deprecate `Trainer.gpus` ( #12436 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2022-03-27 16:53:28 +02:00
Kushashwa Ravi Shrimali
92a2a6e951
Preparing for 1.6.0rc1 ( #12453 )
2022-03-25 18:23:47 +01:00
Rohit Gupta
48f171006d
Avoid fallback on CPU if no devices are provided ( #12410 )
2022-03-25 15:59:06 +00:00
Rohit Gupta
e631a66530
Update TQDM progress bar tracking with multiple dataloaders ( #11657 )
2022-03-25 15:13:35 +00:00
Kaushik B
28dac0c8d9
Update tpu_cores flag with accelerator and devices flag ( #12158 )
2022-03-25 11:57:02 +00:00
Ivan Švogor
25b771ca08
Create the loss accumulator directly on the device ( #12430 )
...
Co-authored-by: Ivan Svogor <ivan.svogor@iarai.ac.at>
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2022-03-25 12:46:17 +01:00
ananthsub
9ac636335e
Update fit_loop.py ( #12450 )
2022-03-25 11:38:09 +01:00
Jerome Anand
812c2dc3d3
Add support for Habana accelerator (HPU) ( #11808 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
Co-authored-by: Aki Nitta <nitta@akihironitta.com>
Co-authored-by: thomas chaton <thomas@grid.ai>
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
Co-authored-by: four4fish <88516121+four4fish@users.noreply.github.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Kaushik B <kaushikbokka@gmail.com>
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
Co-authored-by: jjenniferdai <89552168+jjenniferdai@users.noreply.github.com>
Co-authored-by: Kushashwa Ravi Shrimali <kushashwaravishrimali@gmail.com>
Co-authored-by: Akarsha Rao <94624926+raoakarsha@users.noreply.github.com>
Co-authored-by: Jirka <jirka.borovec@seznam.cz>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.comk-Pro.local>
2022-03-25 10:24:52 +00:00
DuYicong515
cfc971700d
Remove AcceleratorConnector.parallel_devices ( #12075 )
2022-03-25 01:45:40 +00:00
DuYicong515
b5b951b05a
Remove AcceleratorConnector.devices ( #12435 )
2022-03-24 17:35:46 -07:00
Danielle Pintz
6329be60be
Replace PostLocalSGDOptimizer with a dedicated model averaging component ( #12378 )
2022-03-24 17:33:19 -07:00
jjenniferdai
d4a4b77906
[3/3] Update lightning callbacks to `Stateful`, deprecations for old `on_save/load_checkpoint` signatures ( #11887 )
2022-03-25 00:06:10 +00:00
Carlos Mocholí
71e0ddb62f
`ModelCheckpoint`'s `save_last` now ignores `every_n_epochs` ( #12418 )
...
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2022-03-24 20:06:52 +01:00
Kaushik B
dcc973e019
Add `AcceleratorRegistry` ( #12180 )
...
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
2022-03-24 18:29:32 +00:00
Carlos Mocholí
45400be921
Do not print empty evaluation result tables ( #12427 )
2022-03-24 15:26:35 +05:30
ananthsub
d418cf23b2
Do not configure launcher if processes are launched externally ( #12431 )
2022-03-24 09:40:34 +00:00
Carlos Mocholí
51575dcf60
Remove manual optimization `find_unused_parameters` override ( #12425 )
2022-03-24 00:17:18 +00:00
Ning
a9bfcc7407
Call `Strategy.process_dataloader` in `data_connector.py` ( #12251 )
2022-03-23 22:57:56 +00:00
DuYicong515
923174147d
Remove Accelerator.parallel_device_ids and deprecate Trainer.data_parallel_device_ids ( #12072 )
2022-03-23 22:18:30 +00:00
ananthsub
ebbe938dc1
Use debug instead of detail logging for per-iteration hooks ( #12281 )
2022-03-23 21:59:09 +00:00
Kaushik B
7b0d1183db
Update `gpus` flag with `accelerator` and `devices` flag ( #12156 )
...
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2022-03-23 19:52:12 +00:00
DuYicong515
7a6efb38b2
fix merge issue ( #12420 )
2022-03-23 11:17:17 -07:00
Carlos Mocholí
1c18d5ecbc
Update version for rc0 release ( #12423 )
2022-03-23 15:15:16 +00:00
Carlos Mocholí
cf3bc728b1
Add docs and message for DDP static graph ( #12411 )
2022-03-23 14:16:20 +00:00
Rohit Gupta
0a53e15759
Fix deepspeed keeping old sub-folders in same ckpt path ( #12194 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2022-03-23 13:36:13 +00:00
Rohit Gupta
c822a6ac2d
fix returning logged metrics instead of callback metrics during evaluation ( #12224 )
...
Co-authored-by: thomas chaton <thomas@grid.ai>
2022-03-23 12:56:11 +00:00
Rohit Gupta
312c5a5af1
Raise a warning when `nn.Module` instance is saved with `save_hyperparameters()` ( #12068 )
2022-03-23 12:49:42 +00:00
Adrian Wälchli
94fe322533
Do not mark LightningModule methods as abstract ( #12381 )
...
* do not mark LightningModule methods as abstract
* add concrete test
2022-03-23 08:55:12 +00:00
DuYicong515
491fa02aa3
Remove `AccleratorConnector.num_ipus` and deprecate `Trainer.ipus` ( #12386 )
2022-03-23 07:00:14 +00:00
Danielle Pintz
905a4d8c6a
Add profiling for `on_load_checkpoint`/`on_save_checkpoint` callback and LM hooks ( #12149 )
2022-03-22 10:24:06 -07:00
DuYicong515
5d156f4ff6
Remove `AcceleratorConnector.tpu_id` ( #12387 )
...
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2022-03-22 15:51:38 +05:30
DuYicong515
5fbe467168
Remove `AcceleratorConnector.num_processes` and deprecate `Trainer.num_processes` ( #12388 )
2022-03-22 10:11:27 +00:00
Akash Kwatra
bc1c8b926c
Deprecate `BaseProfiler` in favor of `Profiler` ( #12150 )
...
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
Co-authored-by: Kaushik B <kaushikbokka@gmail.com>
2022-03-21 20:17:03 +00:00
ananthsub
d99625fc8d
Reduce number of times optimizers are instantiated with FSDP ( #12267 )
2022-03-21 18:18:59 +01:00
DuYicong515
31c68d107e
Remove `AcceleratorConnector.num_gpus` and deprecate `Trainer.num_gpus` ( #12384 )
2022-03-21 18:06:39 +01:00
Danielle Pintz
caed77f155
Refactor `TorchElasticEnvironment.detect` to use `torch.distributed.is_torchelastic_launched` ( #12376 )
...
* Refactor TorchElasticEnvironment.detect to use native utility from torch.distributed
* fix version and tests
* fix version
* Update tests/accelerators/test_accelerator_connector.py
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2022-03-21 16:51:24 +01:00
Rohit Gupta
865c54f308
Fix deepspeed scheduler initialization ( #12031 )
2022-03-21 10:31:00 +00:00
DuYicong515
523200971d
Remove `AcceleratorConnector.root_gpu` and deprecate `Trainer.root_gpu` ( #12262 )
2022-03-19 23:53:50 +00:00
jjenniferdai
6ba66789ae
[2/n] add `Stateful` functionality support for Callbacks ( #12232 )
2022-03-19 20:20:50 +00:00
Adrian Wälchli
eda53d70c3
update docs for ModelCheckpoint save_last ( #12332 )
2022-03-19 20:15:54 +00:00
DuYicong515
ed2bcc5ab3
Deprecate `Trainer.devices` in favor of `Trainer.num_devices` and `Trainer.device_ids` ( #12151 )
2022-03-18 12:38:57 -07:00
ananthsub
4277845fa7
Add support for specifying process group backend to relevant distributed strategies ( #11745 )
2022-03-17 23:38:03 -07:00
Danielle Pintz
601948a4bf
Deprecate `Trainer.use_amp` ( #12312 )
2022-03-18 06:14:35 +00:00
Danielle Pintz
2360049744
Deprecate `LightningModule.use_amp` ( #12315 )
2022-03-18 03:49:18 +01:00
Danielle Pintz
f8e50f9cf5
Fix the case where logger=None is passed to Trainer ( #12249 )
2022-03-18 02:18:28 +00:00
edward-io
90a9da5abb
check trainerfn == FITTING before configuring sync_batchnorm ( #11919 )
...
Co-authored-by: edward-io <me@edward.io>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Aki Nitta <nitta@akihironitta.com>
2022-03-12 03:52:59 +00:00
four4fish
4d74f379a5
Only allow one value for each plugin type in `plugins` flag ( #12083 )
2022-03-11 19:36:23 +00:00