Commit Graph

6694 Commits

Author SHA1 Message Date
DuYicong515 7a6efb38b2
fix merge issue (#12420) 2022-03-23 11:17:17 -07:00
Carlos Mocholí 1c18d5ecbc
Update version for rc0 release (#12423) 2022-03-23 15:15:16 +00:00
Carlos Mocholí cf3bc728b1
Add docs and message for DDP static graph (#12411) 2022-03-23 14:16:20 +00:00
Rohit Gupta 0a53e15759
Fix deepspeed keeping old sub-folders in same ckpt path (#12194)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2022-03-23 13:36:13 +00:00
Rohit Gupta c822a6ac2d
fix returning logged metrics instead of callback metrics during evaluation (#12224)
Co-authored-by: thomas chaton <thomas@grid.ai>
2022-03-23 12:56:11 +00:00
Aki Nitta 16c41bd8ae
Label new issues as `needs triage` by default (#12403)
* Add needs triage to issues by default
2022-03-23 12:51:37 +00:00
Rohit Gupta 312c5a5af1
Raise a warning when `nn.Module` instance is saved with `save_hyperparameters()` (#12068) 2022-03-23 12:49:42 +00:00
Aki Nitta b876e9f04c
Update Slack link (#12421) 2022-03-23 11:57:19 +00:00
Aki Nitta 4589f2b4ee
Update hash for caching (#12405)
Co-authored-by: thomas chaton <thomas@grid.ai>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2022-03-23 09:14:29 +00:00
Kaushik B cf65ca25b0
Update Trainer config tests to use acelerator and devices (#12152) 2022-03-23 08:59:10 +00:00
Adrian Wälchli 94fe322533
Do not mark LightningModule methods as abstract (#12381)
* do not mark LightningModule methods as abstract

* add concrete test
2022-03-23 08:55:12 +00:00
Seth Vargo ea7f444167
Pin setup-gcloud to v0 instead of master (#12375)
* Pin setup-gcloud to v0 instead of master.

setup-gcloud will be updating the branch name from master to main in
a future release. Even though GitHub will establish redirects, this
will break any GitHub Actions workflows that pin to master. This PR
updates your GitHub Actions workflows to pin to v0, which is the
recommended best practice.

Co-authored-by: Akihiro Nitta <nitta@akihironitta.com>
2022-03-23 07:13:19 +00:00
DuYicong515 491fa02aa3
Remove `AccleratorConnector.num_ipus` and deprecate `Trainer.ipus` (#12386) 2022-03-23 07:00:14 +00:00
Kaushik B bd035af78a
Fix TPU CI (#12419) 2022-03-23 11:35:38 +05:30
Danielle Pintz 905a4d8c6a
Add profiling for `on_load_checkpoint`/`on_save_checkpoint` callback and LM hooks (#12149) 2022-03-22 10:24:06 -07:00
DuYicong515 5d156f4ff6
Remove `AcceleratorConnector.tpu_id` (#12387)
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2022-03-22 15:51:38 +05:30
DuYicong515 5fbe467168
Remove `AcceleratorConnector.num_processes` and deprecate `Trainer.num_processes` (#12388) 2022-03-22 10:11:27 +00:00
Jirka Borovec 5bbad8bb1e
mergify: drop ready if conflicts (#12396) 2022-03-22 10:06:36 +00:00
Akash Kwatra bc1c8b926c
Deprecate `BaseProfiler` in favor of `Profiler` (#12150)
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
Co-authored-by: Kaushik B <kaushikbokka@gmail.com>
2022-03-21 20:17:03 +00:00
Carlos Mocholí 5d190eabd2
Clarify what's the PyTorch profiler used in docs (#12392)
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2022-03-21 17:33:48 +00:00
ananthsub d99625fc8d
Reduce number of times optimizers are instantiated with FSDP (#12267) 2022-03-21 18:18:59 +01:00
Aki Nitta fa7aa0babe
Update nightly GPU benchmark pool (#12366) 2022-03-21 18:17:34 +01:00
DuYicong515 31c68d107e
Remove `AcceleratorConnector.num_gpus` and deprecate `Trainer.num_gpus` (#12384) 2022-03-21 18:06:39 +01:00
Danielle Pintz caed77f155
Refactor `TorchElasticEnvironment.detect` to use `torch.distributed.is_torchelastic_launched` (#12376)
* Refactor TorchElasticEnvironment.detect to use native utility from torch.distributed

* fix version and tests

* fix version

* Update tests/accelerators/test_accelerator_connector.py

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2022-03-21 16:51:24 +01:00
Jirka Borovec fe940e195d
CI: update prune_pkgs (#12382) 2022-03-21 12:50:50 +00:00
four4fish 1eff3b53c1
Update fairscale version (#11567)
Co-authored-by: Aki Nitta <nitta@akihironitta.com>
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
Co-authored-by: Jirka <jirka.borovec@seznam.cz>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2022-03-21 11:38:55 +00:00
Rohit Gupta 865c54f308
Fix deepspeed scheduler initialization (#12031) 2022-03-21 10:31:00 +00:00
DuYicong515 523200971d
Remove `AcceleratorConnector.root_gpu` and deprecate `Trainer.root_gpu` (#12262) 2022-03-19 23:53:50 +00:00
jjenniferdai 6ba66789ae
[2/n] add `Stateful` functionality support for Callbacks (#12232) 2022-03-19 20:20:50 +00:00
Adrian Wälchli eda53d70c3
update docs for ModelCheckpoint save_last (#12332) 2022-03-19 20:15:54 +00:00
William Falcon f2d6a855af
Update README.md 2022-03-18 16:41:19 -04:00
DuYicong515 ed2bcc5ab3
Deprecate `Trainer.devices` in favor of `Trainer.num_devices` and `Trainer.device_ids` (#12151) 2022-03-18 12:38:57 -07:00
Aki Nitta 09d1296040
Avoid `rich` 10.15.0 and 10.15.1 (#12293)
* Update rich version

* Update requirements/extra.txt

Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>

Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
2022-03-18 19:14:43 +00:00
Jirka Borovec efa870eebc
Docker: fix NCCL building Horovod (#12318)
* Horovod w. MPI
* nccl_built
* fix
2022-03-18 14:23:19 +00:00
ananthsub 4277845fa7
Add support for specifying process group backend to relevant distributed strategies (#11745) 2022-03-17 23:38:03 -07:00
Danielle Pintz 601948a4bf
Deprecate `Trainer.use_amp` (#12312) 2022-03-18 06:14:35 +00:00
Danielle Pintz 2360049744
Deprecate `LightningModule.use_amp` (#12315) 2022-03-18 03:49:18 +01:00
Danielle Pintz f8e50f9cf5
Fix the case where logger=None is passed to Trainer (#12249) 2022-03-18 02:18:28 +00:00
Aki Nitta b8b855d411
Pin Docker image for testing on GPUs (#12368)
* Pin docker image sha
2022-03-18 01:16:54 +00:00
Carlos Mocholí bc812077c4
Fix CLI snippet in the docs (#12275) 2022-03-16 14:58:28 +05:30
Jirka Borovec 7ee690758c
CI: fix running PT 1.11 (#12304)
* fix fire
* horovod
* assistant
* cmake
* u20
* cuda
* -j2
* fix mypy

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2022-03-12 09:00:20 +00:00
edward-io 90a9da5abb
check trainerfn == FITTING before configuring sync_batchnorm (#11919)
Co-authored-by: edward-io <me@edward.io>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Aki Nitta <nitta@akihironitta.com>
2022-03-12 03:52:59 +00:00
four4fish 4d74f379a5
Only allow one value for each plugin type in `plugins` flag (#12083) 2022-03-11 19:36:23 +00:00
Jirka Borovec c90174ca31
unify logger testing (#9081)
Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2022-03-11 14:24:30 +00:00
Jirka Borovec bc8172856f
aggregate multiple helper scripts to single CLI (#11147)
* nightly release
* min version
* fire
2022-03-11 11:13:43 +00:00
Jirka Borovec 1144673cd9
CI: sanity check for req. pkgs (#11819)
* CI: sanity check for req. pkgs
* scripts
* rename
* gcsfs ?
* rich !
* install extra
* move
* set -e

Co-authored-by: Aki Nitta <nitta@akihironitta.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2022-03-11 09:20:47 +00:00
Jirka Borovec 3b4061f39a
CI: enable testing for PT 1.11 (#11792)
* enable PT 1.11
* horovod
* Apply suggestions from code review

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Aki Nitta <nitta@akihironitta.com>
2022-03-10 18:38:47 +00:00
Jirka Borovec 8577ef7bba
Skip horovod 0.24.0 only (#12248)
* try skip horovod 0.24.0 only
* HOROVOD_BUILD_CUDA_CC_LIST
* fix test

Co-authored-by: Aki Nitta <nitta@akihironitta.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2022-03-10 16:01:08 +00:00
jjenniferdai d31126c331
Support passing `storage_options` in `trainer.save_checkpoint()` API (#11891) 2022-03-09 18:35:50 +00:00
Carlos Mocholí 49a4a36ad4
Have the outputs match the loops format (#12182)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2022-03-08 18:10:18 +00:00