Commit Graph

2075 Commits

Author SHA1 Message Date
Kaushik B c33df2639f
Set `dataset` attribute to `MpDeviceLoader` used in TPU Spawn (#10151) 2021-10-27 01:23:01 +05:30
Carlos Mocholí 48b6292cf0
Move optimizer step and clipping into the `PrecisionPlugin` (#10143) 2021-10-26 17:26:26 +02:00
Carlos Mocholí a0e45dc071
Some minor CI cleanup (#10088) 2021-10-26 13:58:20 +02:00
twsl 971281d27d
Make sure file and folder exists in Profiler (#10073)
Co-authored-by: tchaton <thomas@grid.ai>
2021-10-26 11:13:31 +00:00
Adrian Wälchli 871a96701a
Rename `master_params` to `main_params` (#10105)
Co-authored-by: tchaton <thomas@grid.ai>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2021-10-26 11:17:32 +02:00
Rohit Gupta 34d5980df6
Raise `MisconfigurationException` if `trainer.eval` is missing required methods (#10016) 2021-10-25 23:12:08 -07:00
Danielle Pintz 13d6d7bad1
Remove `optimizer_connector.py` (#10120) 2021-10-26 00:52:43 +00:00
Adrian Wälchli 21a5867dad
Rename `ClusterEnvironment.creates_processes` (#10106)
Co-authored-by: tchaton <thomas@grid.ai>
2021-10-25 23:15:41 +00:00
Rajat Goel 47e7a2860f
Fix Enums parsing in generated hparms yaml (#9170)
Co-authored-by: tchaton <thomas@grid.ai>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2021-10-25 21:23:20 +00:00
Eric Wiener 0e20119d24
Change default value of the `max_steps` Trainer argument from `None` to `-1` (#9460)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: thomas chaton <thomas@grid.ai>
Co-authored-by: rohitgr7 <rohitgr1998@gmail.com>
2021-10-25 20:21:33 +00:00
Rohit Gupta d9dfb2e920
fix tests (#10138) 2021-10-25 19:37:47 +00:00
Danielle Pintz 1f7bd6650c
Mark accelerator connector as protected (#10032) 2021-10-25 19:24:54 +00:00
jjenniferdai 6d79184ec5
Unify checkpoint load paths [redo #9693] (#10061) 2021-10-25 19:05:31 +00:00
Adrian Wälchli 76081fb846
Mark SLURM detection methods in `AcceleratorConnector` as protected (#10101)
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
2021-10-25 17:52:15 +00:00
Carlos Mocholí 2ee3127661
Use `torch.autocast` (#10053) 2021-10-25 17:33:52 +00:00
Carlos Mocholí b376799430
Minor fixes related to clipping (#10130)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2021-10-25 16:40:22 +00:00
manipopopo cfb2d87765
Disable quantization aware training observers (#8540)
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: tchaton <thomas@grid.ai>
Co-authored-by: rohitgr7 <rohitgr1998@gmail.com>
2021-10-25 15:46:09 +00:00
Adrian Wälchli 7eb2edf421
rename set_random_master_port (#10104)
Co-authored-by: tchaton <thomas@grid.ai>
2021-10-25 12:09:05 +00:00
Danielle Pintz e94dcf6936
Mark `trainer.data_connector` as protected (#10031)
Co-authored-by: tchaton <thomas@grid.ai>
2021-10-25 12:29:09 +01:00
Carlos Mocholí f95ba20012
Do not use the base version by default in `_compare_version` (#10051) 2021-10-25 16:41:32 +05:30
thomas chaton ed9802643c
[CI] Comment flaky tests (#10084) 2021-10-25 10:31:06 +02:00
Kaushik B c3614f1c07
Fix: skip importing DistributedOptimizer for Windows (#10071) 2021-10-21 21:01:56 +00:00
thomas chaton 454e93bace
Add support for init_meta_context, materialize_module (#9920) 2021-10-21 15:48:31 +01:00
jjenniferdai 2d9db211b5
Revert "Support serialized checkpoint loading (#9605)" (#10057)
This reverts commit f0e6f1b58a.
2021-10-21 02:51:22 +02:00
Kaushik B aa1540410f
Add XLACheckpointIO (#9972)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-10-21 02:39:16 +05:30
Rohit Gupta 1599c77d16
Fix `LearningRateMonitor` logging with multiple param groups optimizer with no scheduler (#10044) 2021-10-20 22:13:00 +05:30
Carlos Mocholí 6aeebf1bd3
Remove unnecessary dependency available checks (#10050) 2021-10-20 16:21:37 +00:00
Alessio Bonfiglio 2a2fa5a56a
Group all the logged gradients under the same sub-folder (#7756) 2021-10-20 15:48:36 +00:00
Kaushik B 56bc55db71
Update strategy flag in docs (#10000)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-10-20 21:02:53 +05:30
kingyiusuen 2ed92ecabb
Rerun flaky profiler tests on failure (#10035) 2021-10-20 18:57:04 +05:30
Carlos Mocholí f0b3e0f4de
Default to `precision=bf16` on CPU when `precision=16` is passed (#10033) 2021-10-20 13:25:13 +00:00
Adrian Wälchli 2c16f1d6b9
remove dataloader patching on the LightningModule (#9764)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: rohitgr7 <rohitgr1998@gmail.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-10-20 15:23:20 +02:00
jjenniferdai f0e6f1b58a
Support serialized checkpoint loading (#9605)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-10-20 09:38:35 +01:00
Carlos Mocholí 53c62f63e8
Constrain IPU precision choices (#10030) 2021-10-20 00:52:01 +00:00
Carlos Mocholí ad8d6c83da
[CLI] Shorthand notation to instantiate datamodules (#10011)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-10-20 00:49:48 +00:00
Carlos Mocholí e44921ee21
Fix `self.log(on_epoch=True, reduce_fx=sum)` on_batch_start (#9791) 2021-10-20 01:56:37 +02:00
Carlos Mocholí d45897d522
Rename `TPUHalfPrecisionPlugin` to `TPUBf16PrecisionPlugin` (#10026)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-10-19 21:09:37 +00:00
Ning 0b68f2abf8
Remove `reset_train_val_dataloaders` from Trainer and move data reloading logic to loop (#9671)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
2021-10-19 21:45:52 +02:00
Carlos Mocholí e8beceb631
Add `TPUPrecisionPlugin` (#10020)
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-10-19 17:48:57 +00:00
thomas chaton 1759403c8d
Add check for callable with datamodule len (#10003)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2021-10-19 14:51:08 +00:00
Rohit Gupta 0aa220b46b
Remove deprecated `distributed_backend` from `Trainer` (#10017)
* rm distributed_backend from Trainer

* unused

* chlog

* internal distributed_backend

* Docstring

Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-10-19 13:54:37 +00:00
Danielle Pintz 203737bfce
Don't raise DeprecationWarning for `LoggerConnector.gpus_metrics` (#9959) 2021-10-18 22:51:09 +00:00
Adrian Wälchli a99b7440b5
Add unit tests for `pl.utilities.grads` (#9765)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-10-18 18:58:51 +05:30
Rohit Gupta 4dc32ad7db
Fix logic to check for spawn in worker_check (#9902)
* fix

* update tests

* chlog

* skip windows
2021-10-18 13:02:46 +00:00
Carlos Mocholí 3f355d0eb7
Remove manual tracking of optimizer steps (#9957) 2021-10-18 12:43:06 +00:00
Carlos Mocholí 0684e5295f
Remove deprecated `DataModule.dims` usage in tests (#9948) 2021-10-18 17:35:41 +05:30
Carlos Mocholí c69a79c86f
Fix `self.log(on_epoch=True)` on_batch_start (#9780) 2021-10-18 14:02:16 +02:00
Elad Segal 8c76cf5ae1
reset val dataloader for binsearch (#9975) 2021-10-18 12:54:26 +02:00
Carlos Mocholí 01b304ec57
Update accelerator connector messages after the addition of strategy (#9937) 2021-10-18 01:10:48 +00:00
Carlos Mocholí 788f6864d9
Fix `LightningOptimizer` step and toggling logic (#9958) 2021-10-18 00:23:51 +00:00