Kaushik B
c33df2639f
Set `dataset` attribute to `MpDeviceLoader` used in TPU Spawn ( #10151 )
2021-10-27 01:23:01 +05:30
Carlos Mocholí
48b6292cf0
Move optimizer step and clipping into the `PrecisionPlugin` ( #10143 )
2021-10-26 17:26:26 +02:00
Carlos Mocholí
a0e45dc071
Some minor CI cleanup ( #10088 )
2021-10-26 13:58:20 +02:00
twsl
971281d27d
Make sure file and folder exists in Profiler ( #10073 )
...
Co-authored-by: tchaton <thomas@grid.ai>
2021-10-26 11:13:31 +00:00
Adrian Wälchli
871a96701a
Rename `master_params` to `main_params` ( #10105 )
...
Co-authored-by: tchaton <thomas@grid.ai>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2021-10-26 11:17:32 +02:00
Rohit Gupta
34d5980df6
Raise `MisconfigurationException` if `trainer.eval` is missing required methods ( #10016 )
2021-10-25 23:12:08 -07:00
Danielle Pintz
13d6d7bad1
Remove `optimizer_connector.py` ( #10120 )
2021-10-26 00:52:43 +00:00
Adrian Wälchli
21a5867dad
Rename `ClusterEnvironment.creates_processes` ( #10106 )
...
Co-authored-by: tchaton <thomas@grid.ai>
2021-10-25 23:15:41 +00:00
Rajat Goel
47e7a2860f
Fix Enums parsing in generated hparms yaml ( #9170 )
...
Co-authored-by: tchaton <thomas@grid.ai>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2021-10-25 21:23:20 +00:00
Eric Wiener
0e20119d24
Change default value of the `max_steps` Trainer argument from `None` to `-1` ( #9460 )
...
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: thomas chaton <thomas@grid.ai>
Co-authored-by: rohitgr7 <rohitgr1998@gmail.com>
2021-10-25 20:21:33 +00:00
Rohit Gupta
d9dfb2e920
fix tests ( #10138 )
2021-10-25 19:37:47 +00:00
Danielle Pintz
1f7bd6650c
Mark accelerator connector as protected ( #10032 )
2021-10-25 19:24:54 +00:00
jjenniferdai
6d79184ec5
Unify checkpoint load paths [redo #9693 ] ( #10061 )
2021-10-25 19:05:31 +00:00
Adrian Wälchli
76081fb846
Mark SLURM detection methods in `AcceleratorConnector` as protected ( #10101 )
...
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
2021-10-25 17:52:15 +00:00
Carlos Mocholí
2ee3127661
Use `torch.autocast` ( #10053 )
2021-10-25 17:33:52 +00:00
Carlos Mocholí
b376799430
Minor fixes related to clipping ( #10130 )
...
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2021-10-25 16:40:22 +00:00
manipopopo
cfb2d87765
Disable quantization aware training observers ( #8540 )
...
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: tchaton <thomas@grid.ai>
Co-authored-by: rohitgr7 <rohitgr1998@gmail.com>
2021-10-25 15:46:09 +00:00
Adrian Wälchli
7eb2edf421
rename set_random_master_port ( #10104 )
...
Co-authored-by: tchaton <thomas@grid.ai>
2021-10-25 12:09:05 +00:00
Danielle Pintz
e94dcf6936
Mark `trainer.data_connector` as protected ( #10031 )
...
Co-authored-by: tchaton <thomas@grid.ai>
2021-10-25 12:29:09 +01:00
Carlos Mocholí
f95ba20012
Do not use the base version by default in `_compare_version` ( #10051 )
2021-10-25 16:41:32 +05:30
thomas chaton
ed9802643c
[CI] Comment flaky tests ( #10084 )
2021-10-25 10:31:06 +02:00
Kaushik B
c3614f1c07
Fix: skip importing DistributedOptimizer for Windows ( #10071 )
2021-10-21 21:01:56 +00:00
thomas chaton
454e93bace
Add support for init_meta_context, materialize_module ( #9920 )
2021-10-21 15:48:31 +01:00
jjenniferdai
2d9db211b5
Revert "Support serialized checkpoint loading ( #9605 )" ( #10057 )
...
This reverts commit f0e6f1b58a
.
2021-10-21 02:51:22 +02:00
Kaushik B
aa1540410f
Add XLACheckpointIO ( #9972 )
...
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-10-21 02:39:16 +05:30
Rohit Gupta
1599c77d16
Fix `LearningRateMonitor` logging with multiple param groups optimizer with no scheduler ( #10044 )
2021-10-20 22:13:00 +05:30
Carlos Mocholí
6aeebf1bd3
Remove unnecessary dependency available checks ( #10050 )
2021-10-20 16:21:37 +00:00
Alessio Bonfiglio
2a2fa5a56a
Group all the logged gradients under the same sub-folder ( #7756 )
2021-10-20 15:48:36 +00:00
Kaushik B
56bc55db71
Update strategy flag in docs ( #10000 )
...
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-10-20 21:02:53 +05:30
kingyiusuen
2ed92ecabb
Rerun flaky profiler tests on failure ( #10035 )
2021-10-20 18:57:04 +05:30
Carlos Mocholí
f0b3e0f4de
Default to `precision=bf16` on CPU when `precision=16` is passed ( #10033 )
2021-10-20 13:25:13 +00:00
Adrian Wälchli
2c16f1d6b9
remove dataloader patching on the LightningModule ( #9764 )
...
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: rohitgr7 <rohitgr1998@gmail.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-10-20 15:23:20 +02:00
jjenniferdai
f0e6f1b58a
Support serialized checkpoint loading ( #9605 )
...
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-10-20 09:38:35 +01:00
Carlos Mocholí
53c62f63e8
Constrain IPU precision choices ( #10030 )
2021-10-20 00:52:01 +00:00
Carlos Mocholí
ad8d6c83da
[CLI] Shorthand notation to instantiate datamodules ( #10011 )
...
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-10-20 00:49:48 +00:00
Carlos Mocholí
e44921ee21
Fix `self.log(on_epoch=True, reduce_fx=sum)` on_batch_start ( #9791 )
2021-10-20 01:56:37 +02:00
Carlos Mocholí
d45897d522
Rename `TPUHalfPrecisionPlugin` to `TPUBf16PrecisionPlugin` ( #10026 )
...
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-10-19 21:09:37 +00:00
Ning
0b68f2abf8
Remove `reset_train_val_dataloaders` from Trainer and move data reloading logic to loop ( #9671 )
...
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
2021-10-19 21:45:52 +02:00
Carlos Mocholí
e8beceb631
Add `TPUPrecisionPlugin` ( #10020 )
...
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-10-19 17:48:57 +00:00
thomas chaton
1759403c8d
Add check for callable with datamodule len ( #10003 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2021-10-19 14:51:08 +00:00
Rohit Gupta
0aa220b46b
Remove deprecated `distributed_backend` from `Trainer` ( #10017 )
...
* rm distributed_backend from Trainer
* unused
* chlog
* internal distributed_backend
* Docstring
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-10-19 13:54:37 +00:00
Danielle Pintz
203737bfce
Don't raise DeprecationWarning for `LoggerConnector.gpus_metrics` ( #9959 )
2021-10-18 22:51:09 +00:00
Adrian Wälchli
a99b7440b5
Add unit tests for `pl.utilities.grads` ( #9765 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-10-18 18:58:51 +05:30
Rohit Gupta
4dc32ad7db
Fix logic to check for spawn in worker_check ( #9902 )
...
* fix
* update tests
* chlog
* skip windows
2021-10-18 13:02:46 +00:00
Carlos Mocholí
3f355d0eb7
Remove manual tracking of optimizer steps ( #9957 )
2021-10-18 12:43:06 +00:00
Carlos Mocholí
0684e5295f
Remove deprecated `DataModule.dims` usage in tests ( #9948 )
2021-10-18 17:35:41 +05:30
Carlos Mocholí
c69a79c86f
Fix `self.log(on_epoch=True)` on_batch_start ( #9780 )
2021-10-18 14:02:16 +02:00
Elad Segal
8c76cf5ae1
reset val dataloader for binsearch ( #9975 )
2021-10-18 12:54:26 +02:00
Carlos Mocholí
01b304ec57
Update accelerator connector messages after the addition of strategy ( #9937 )
2021-10-18 01:10:48 +00:00
Carlos Mocholí
788f6864d9
Fix `LightningOptimizer` step and toggling logic ( #9958 )
2021-10-18 00:23:51 +00:00