thomas chaton
1702036c14
Fault Tolerant Manual: Add stateful dataloader iter ( #10674 )
2021-11-23 12:30:50 +00:00
thomas chaton
2036dfb5df
Fault Tolerant Manual: Add _rotate_worker_indices utility ( #10647 )
2021-11-22 19:52:04 +00:00
thomas chaton
6acfef680f
Fault Tolerant Manual: Add is_obj_stateful utility ( #10646 )
2021-11-22 18:48:32 +00:00
Andres Algaba
6fc7c54c3a
refactor slurm_job_id ( #10622 )
...
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: thomas chaton <thomas@grid.ai>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
2021-11-22 17:41:08 +00:00
Rohit Gupta
d431ce14a1
Raise an error if batch_size cannot be inferred from current batch ( #10541 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-11-22 16:55:19 +00:00
Carlos Mocholí
a6dedcf492
Fix `move_metrics_to_cpu` with evaluation ( #10631 )
2021-11-22 15:58:21 +00:00
thomas chaton
991cd895c6
1/n Add `FaultTolerantMode` ( #10645 )
2021-11-22 14:58:23 +00:00
Kaushik B
ce0a977742
Moved `env_vars_connector._defaults_from_env_vars` to `utilities.argsparse._defaults_from_env_vars` ( #10501 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-11-22 08:06:35 +00:00
ananthsub
a18b6409d1
Check torch.distributed availability before sharded tensor state dict hook registration ( #10621 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-11-19 17:34:23 +00:00
Mauricio Villegas
5d748e560b
LightningCLI changes for jsonargparse>=4.0.0 ( #10426 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: thomas chaton <thomas@grid.ai>
2021-11-19 17:03:14 +00:00
Rohit Gupta
ec27313be2
Fix batch size extraction when set by the user in `LightningModule.log` ( #10408 )
...
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-11-19 16:48:26 +00:00
Biho-Kim
e83e8ae305
Respect the passed dtype with `self.log` ( #10076 )
...
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2021-11-19 15:16:33 +00:00
thomas chaton
94390aba56
Lite: Don't pop value if they don't exist ( #10613 )
2021-11-19 14:04:33 +00:00
Kaushik B
137b62d80d
Add `refresh_rate` to RichProgressBar ( #10497 )
...
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-11-19 05:59:57 +00:00
thomas chaton
7d3ad5b76e
Don't register signal in thread ( #10610 )
2021-11-19 04:13:35 +01:00
four4fish
700521c7d3
1/n Move precision plugin into strategy - update reference ( #10570 )
...
* 1/n move precision plugin into strategy - update reference
* update precision plugin reference in tpu_spawn
* add missing reference in error message
* add back removed license line
* update references in tests
* update reference in trainer
* update return annotation for precision_plugin property on TTP
* simplify access to precision plugin reference in sharded plug
* add changelog
* remove precision property from ttp and add deprecation message
* fix make doc and update precision reference
* simplify a reference to precision
accidentally overridden Adrian's change, now add it back
* Update CHANGELOG.md
add Adrian's change back
* Update accelerator precision
Add Adrian's change back
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Add none check for precision plugin
just to be safe
* Update ipu.py
* update precision_plugin param deprecation message
* Update accelerator.py
* Remove deprecated warning
Tests will fail after 9940
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-11-19 00:39:01 +00:00
Adrian Wälchli
0f6d89422b
Control automatic resubmission on SLURM ( #10601 )
2021-11-18 17:48:53 +00:00
Adrian Wälchli
261ea90822
Update changelog after 1.5.2 release ( #10590 )
2021-11-17 23:31:09 +00:00
Adrian Wälchli
d50e1696f9
Fix propagation of device and dtype properties in Lite modules ( #10559 )
2021-11-16 17:26:46 +00:00
Carlos Mocholí
edebd8a3bc
Fix scripting causing false positive deprecation warnings ( #10555 )
...
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2021-11-16 15:52:09 +00:00
Sean Naren
e98ace3adc
[DeepSpeed] Do not fail if batch size could not be inferred for logging ( #10438 )
2021-11-16 11:42:25 +00:00
Rohit Gupta
de7ef41fea
remove deprecated `reload_dataloaders_every_epoch` from `Trainer` ( #10481 )
2021-11-16 06:47:43 +00:00
Rohit Gupta
60850ef510
fix overfit_batch sampler replacement logic ( #10486 )
...
Co-authored-by: thomas chaton <thomas@grid.ai>
2021-11-15 22:31:45 +00:00
Carlos Mocholí
65ebfed3ae
Fix `to_torchscript()` causing false positive deprecation warnings ( #10470 )
2021-11-15 22:12:55 +00:00
Carlos Mocholí
dcafc95f2b
Avoid deprecated `progress_bar_refresh_rate` usage ( #10520 )
...
Co-authored-by: Danielle Pintz <38207072+daniellepintz@users.noreply.github.com>
2021-11-15 22:04:48 +01:00
thomas chaton
1de3539eac
Resolve instantiation problem with init_meta_context ( #10493 )
2021-11-15 19:13:01 +00:00
Kaushik B
ae71284627
Remove deprecated `disable_validation` property from Trainer ( #10450 )
2021-11-15 18:42:00 +00:00
Kaushik B
01cf7a2ac5
Deprecate `DistributedType` in favor of `StrategyType` ( #10505 )
2021-11-15 17:10:08 +00:00
Shivam Mehta
794c4b08c0
Remove deprecated `is_overridden(model=...)` ( #10507 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-11-15 12:56:30 +00:00
puhuk
8b0cb47cc0
Remove deprecated `hpc_load` in `CheckpointConnector` ( #10525 )
...
Co-authored-by: Aki Nitta <nitta@akihironitta.com>
2021-11-15 11:54:47 +00:00
thomas chaton
ffb40060c0
shutdown workers on failure ( #10463 )
2021-11-15 10:03:46 +00:00
Rohit Gupta
a8c2725ff8
remove deprecated signature for `transfer_batch_to_device` ( #10480 )
...
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-11-13 19:32:30 +00:00
Kaushik B
fabb364402
Remove deprecated `mode` argument from ModelSummary ( #10449 )
...
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-11-12 19:32:43 +00:00
Carlos Mocholí
847e24011a
Squeeze the early stopping monitor ( #10461 )
2021-11-12 18:03:47 +00:00
Rohit Gupta
fa0ed17f8a
remove deprecated train_loop ( #10482 )
...
* remove deprecated train_loop
* chlog
2021-11-12 12:42:25 +00:00
Kaushik B
d577f461a4
Remove deprecated `utilities.distributed.rank_zero_{warn,deprecation}` ( #10451 )
2021-11-10 07:35:48 -08:00
ananthsub
aad86423f7
Remove more deprecated methods from base `Accelerator` class ( #10448 )
2021-11-10 12:58:24 +05:30
a-gardner1
ce149f6451
Fix support for dataclasses with ClassVar/InitVar in `apply_to_collection` ( #9702 )
...
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-11-10 04:42:27 +00:00
Carlos Mocholí
d515bcac96
Remove deprecated profiler import ( #10443 )
2021-11-09 23:13:02 +01:00
Justus Schock
eeef5a80ac
Update Changelog for v1.5.1 ( #10439 )
...
* Missing Changelogs
* Add 1.5.1 entry to changelog
Co-authored-by: Kaushik B <kaushikbokka@gmail.com>
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
2021-11-09 21:25:54 +00:00
thomas chaton
8d810d6144
Enable distributed training with CombinedDataLoader and max_size_cycle ( #10374 )
...
* solve combinedloader
* update
* update changelog
* update on comments
* resolve iterable dataset support
* update test description
* update
* update on comments
* update
* Accelerator auto
* Address review
* Refactor
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-11-09 20:06:10 +00:00
Carlos Mocholí
c413b69240
Remove deprecated `task_idx` ( #10441 )
2021-11-09 18:54:38 +00:00
Carlos Mocholí
ebab4be3e4
Remove deprecated `DeviceDtypeModuleMixin` import ( #10442 )
2021-11-09 18:35:53 +00:00
Ross Johnstone
c2f25d42ab
Make `monitor` required arg of EarlyStopping callback ( #10328 )
...
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-11-09 18:08:03 +00:00
Carlos Mocholí
069ec1005a
Do not autodetach extras ( #10424 )
...
* Do not autodetach extras
* Update CHANGELOG
* Use foo
2021-11-09 16:07:16 +00:00
thomas chaton
7fb277f260
Resolve workers being forcelly deleted with `persistent_workers=True` ( #10434 )
...
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-11-09 14:58:31 +00:00
Carlos Mocholí
edbf27430d
Remove deprecated `self.log` arguments ( #10423 )
2021-11-09 15:49:55 +01:00
Adrian Wälchli
aaa6aa75e9
Fix converting only float type tensors in Lite ( #10429 )
...
* fix
* less code
* add test case
* add test cases
* update input
* add test cases
* add type hint
* add changelog note
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
2021-11-09 15:21:00 +01:00
Kaushik B
5eeca87e98
Fix deadlocks for distributed training for RichProgressBar ( #10428 )
2021-11-09 18:30:37 +05:30
Rohit Gupta
21eafafcb0
disable step logging in epoch hooks ( #10409 )
...
* disable step logging in epoch hooks
* chlog
* Apply suggestions from code review
* chlog
2021-11-09 16:53:27 +05:30