DuYicong515
7a6efb38b2
fix merge issue ( #12420 )
2022-03-23 11:17:17 -07:00
Carlos Mocholí
1c18d5ecbc
Update version for rc0 release ( #12423 )
2022-03-23 15:15:16 +00:00
Carlos Mocholí
cf3bc728b1
Add docs and message for DDP static graph ( #12411 )
2022-03-23 14:16:20 +00:00
Rohit Gupta
0a53e15759
Fix deepspeed keeping old sub-folders in same ckpt path ( #12194 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2022-03-23 13:36:13 +00:00
Rohit Gupta
c822a6ac2d
fix returning logged metrics instead of callback metrics during evaluation ( #12224 )
...
Co-authored-by: thomas chaton <thomas@grid.ai>
2022-03-23 12:56:11 +00:00
Aki Nitta
16c41bd8ae
Label new issues as `needs triage` by default ( #12403 )
...
* Add needs triage to issues by default
2022-03-23 12:51:37 +00:00
Rohit Gupta
312c5a5af1
Raise a warning when `nn.Module` instance is saved with `save_hyperparameters()` ( #12068 )
2022-03-23 12:49:42 +00:00
Aki Nitta
b876e9f04c
Update Slack link ( #12421 )
2022-03-23 11:57:19 +00:00
Aki Nitta
4589f2b4ee
Update hash for caching ( #12405 )
...
Co-authored-by: thomas chaton <thomas@grid.ai>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2022-03-23 09:14:29 +00:00
Kaushik B
cf65ca25b0
Update Trainer config tests to use acelerator and devices ( #12152 )
2022-03-23 08:59:10 +00:00
Adrian Wälchli
94fe322533
Do not mark LightningModule methods as abstract ( #12381 )
...
* do not mark LightningModule methods as abstract
* add concrete test
2022-03-23 08:55:12 +00:00
Seth Vargo
ea7f444167
Pin setup-gcloud to v0 instead of master ( #12375 )
...
* Pin setup-gcloud to v0 instead of master.
setup-gcloud will be updating the branch name from master to main in
a future release. Even though GitHub will establish redirects, this
will break any GitHub Actions workflows that pin to master. This PR
updates your GitHub Actions workflows to pin to v0, which is the
recommended best practice.
Co-authored-by: Akihiro Nitta <nitta@akihironitta.com>
2022-03-23 07:13:19 +00:00
DuYicong515
491fa02aa3
Remove `AccleratorConnector.num_ipus` and deprecate `Trainer.ipus` ( #12386 )
2022-03-23 07:00:14 +00:00
Kaushik B
bd035af78a
Fix TPU CI ( #12419 )
2022-03-23 11:35:38 +05:30
Danielle Pintz
905a4d8c6a
Add profiling for `on_load_checkpoint`/`on_save_checkpoint` callback and LM hooks ( #12149 )
2022-03-22 10:24:06 -07:00
DuYicong515
5d156f4ff6
Remove `AcceleratorConnector.tpu_id` ( #12387 )
...
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2022-03-22 15:51:38 +05:30
DuYicong515
5fbe467168
Remove `AcceleratorConnector.num_processes` and deprecate `Trainer.num_processes` ( #12388 )
2022-03-22 10:11:27 +00:00
Jirka Borovec
5bbad8bb1e
mergify: drop ready if conflicts ( #12396 )
2022-03-22 10:06:36 +00:00
Akash Kwatra
bc1c8b926c
Deprecate `BaseProfiler` in favor of `Profiler` ( #12150 )
...
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
Co-authored-by: Kaushik B <kaushikbokka@gmail.com>
2022-03-21 20:17:03 +00:00
Carlos Mocholí
5d190eabd2
Clarify what's the PyTorch profiler used in docs ( #12392 )
...
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2022-03-21 17:33:48 +00:00
ananthsub
d99625fc8d
Reduce number of times optimizers are instantiated with FSDP ( #12267 )
2022-03-21 18:18:59 +01:00
Aki Nitta
fa7aa0babe
Update nightly GPU benchmark pool ( #12366 )
2022-03-21 18:17:34 +01:00
DuYicong515
31c68d107e
Remove `AcceleratorConnector.num_gpus` and deprecate `Trainer.num_gpus` ( #12384 )
2022-03-21 18:06:39 +01:00
Danielle Pintz
caed77f155
Refactor `TorchElasticEnvironment.detect` to use `torch.distributed.is_torchelastic_launched` ( #12376 )
...
* Refactor TorchElasticEnvironment.detect to use native utility from torch.distributed
* fix version and tests
* fix version
* Update tests/accelerators/test_accelerator_connector.py
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2022-03-21 16:51:24 +01:00
Jirka Borovec
fe940e195d
CI: update prune_pkgs ( #12382 )
2022-03-21 12:50:50 +00:00
four4fish
1eff3b53c1
Update fairscale version ( #11567 )
...
Co-authored-by: Aki Nitta <nitta@akihironitta.com>
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
Co-authored-by: Jirka <jirka.borovec@seznam.cz>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2022-03-21 11:38:55 +00:00
Rohit Gupta
865c54f308
Fix deepspeed scheduler initialization ( #12031 )
2022-03-21 10:31:00 +00:00
DuYicong515
523200971d
Remove `AcceleratorConnector.root_gpu` and deprecate `Trainer.root_gpu` ( #12262 )
2022-03-19 23:53:50 +00:00
jjenniferdai
6ba66789ae
[2/n] add `Stateful` functionality support for Callbacks ( #12232 )
2022-03-19 20:20:50 +00:00
Adrian Wälchli
eda53d70c3
update docs for ModelCheckpoint save_last ( #12332 )
2022-03-19 20:15:54 +00:00
William Falcon
f2d6a855af
Update README.md
2022-03-18 16:41:19 -04:00
DuYicong515
ed2bcc5ab3
Deprecate `Trainer.devices` in favor of `Trainer.num_devices` and `Trainer.device_ids` ( #12151 )
2022-03-18 12:38:57 -07:00
Aki Nitta
09d1296040
Avoid `rich` 10.15.0 and 10.15.1 ( #12293 )
...
* Update rich version
* Update requirements/extra.txt
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
2022-03-18 19:14:43 +00:00
Jirka Borovec
efa870eebc
Docker: fix NCCL building Horovod ( #12318 )
...
* Horovod w. MPI
* nccl_built
* fix
2022-03-18 14:23:19 +00:00
ananthsub
4277845fa7
Add support for specifying process group backend to relevant distributed strategies ( #11745 )
2022-03-17 23:38:03 -07:00
Danielle Pintz
601948a4bf
Deprecate `Trainer.use_amp` ( #12312 )
2022-03-18 06:14:35 +00:00
Danielle Pintz
2360049744
Deprecate `LightningModule.use_amp` ( #12315 )
2022-03-18 03:49:18 +01:00
Danielle Pintz
f8e50f9cf5
Fix the case where logger=None is passed to Trainer ( #12249 )
2022-03-18 02:18:28 +00:00
Aki Nitta
b8b855d411
Pin Docker image for testing on GPUs ( #12368 )
...
* Pin docker image sha
2022-03-18 01:16:54 +00:00
Carlos Mocholí
bc812077c4
Fix CLI snippet in the docs ( #12275 )
2022-03-16 14:58:28 +05:30
Jirka Borovec
7ee690758c
CI: fix running PT 1.11 ( #12304 )
...
* fix fire
* horovod
* assistant
* cmake
* u20
* cuda
* -j2
* fix mypy
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2022-03-12 09:00:20 +00:00
edward-io
90a9da5abb
check trainerfn == FITTING before configuring sync_batchnorm ( #11919 )
...
Co-authored-by: edward-io <me@edward.io>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Aki Nitta <nitta@akihironitta.com>
2022-03-12 03:52:59 +00:00
four4fish
4d74f379a5
Only allow one value for each plugin type in `plugins` flag ( #12083 )
2022-03-11 19:36:23 +00:00
Jirka Borovec
c90174ca31
unify logger testing ( #9081 )
...
Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2022-03-11 14:24:30 +00:00
Jirka Borovec
bc8172856f
aggregate multiple helper scripts to single CLI ( #11147 )
...
* nightly release
* min version
* fire
2022-03-11 11:13:43 +00:00
Jirka Borovec
1144673cd9
CI: sanity check for req. pkgs ( #11819 )
...
* CI: sanity check for req. pkgs
* scripts
* rename
* gcsfs ?
* rich !
* install extra
* move
* set -e
Co-authored-by: Aki Nitta <nitta@akihironitta.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2022-03-11 09:20:47 +00:00
Jirka Borovec
3b4061f39a
CI: enable testing for PT 1.11 ( #11792 )
...
* enable PT 1.11
* horovod
* Apply suggestions from code review
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Aki Nitta <nitta@akihironitta.com>
2022-03-10 18:38:47 +00:00
Jirka Borovec
8577ef7bba
Skip horovod 0.24.0 only ( #12248 )
...
* try skip horovod 0.24.0 only
* HOROVOD_BUILD_CUDA_CC_LIST
* fix test
Co-authored-by: Aki Nitta <nitta@akihironitta.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2022-03-10 16:01:08 +00:00
jjenniferdai
d31126c331
Support passing `storage_options` in `trainer.save_checkpoint()` API ( #11891 )
2022-03-09 18:35:50 +00:00
Carlos Mocholí
49a4a36ad4
Have the outputs match the loops format ( #12182 )
...
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2022-03-08 18:10:18 +00:00