Commit Graph

6675 Commits

Author SHA1 Message Date
Carlos Mocholí 5d190eabd2
Clarify what's the PyTorch profiler used in docs (#12392)
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2022-03-21 17:33:48 +00:00
ananthsub d99625fc8d
Reduce number of times optimizers are instantiated with FSDP (#12267) 2022-03-21 18:18:59 +01:00
Aki Nitta fa7aa0babe
Update nightly GPU benchmark pool (#12366) 2022-03-21 18:17:34 +01:00
DuYicong515 31c68d107e
Remove `AcceleratorConnector.num_gpus` and deprecate `Trainer.num_gpus` (#12384) 2022-03-21 18:06:39 +01:00
Danielle Pintz caed77f155
Refactor `TorchElasticEnvironment.detect` to use `torch.distributed.is_torchelastic_launched` (#12376)
* Refactor TorchElasticEnvironment.detect to use native utility from torch.distributed

* fix version and tests

* fix version

* Update tests/accelerators/test_accelerator_connector.py

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2022-03-21 16:51:24 +01:00
Jirka Borovec fe940e195d
CI: update prune_pkgs (#12382) 2022-03-21 12:50:50 +00:00
four4fish 1eff3b53c1
Update fairscale version (#11567)
Co-authored-by: Aki Nitta <nitta@akihironitta.com>
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
Co-authored-by: Jirka <jirka.borovec@seznam.cz>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2022-03-21 11:38:55 +00:00
Rohit Gupta 865c54f308
Fix deepspeed scheduler initialization (#12031) 2022-03-21 10:31:00 +00:00
DuYicong515 523200971d
Remove `AcceleratorConnector.root_gpu` and deprecate `Trainer.root_gpu` (#12262) 2022-03-19 23:53:50 +00:00
jjenniferdai 6ba66789ae
[2/n] add `Stateful` functionality support for Callbacks (#12232) 2022-03-19 20:20:50 +00:00
Adrian Wälchli eda53d70c3
update docs for ModelCheckpoint save_last (#12332) 2022-03-19 20:15:54 +00:00
William Falcon f2d6a855af
Update README.md 2022-03-18 16:41:19 -04:00
DuYicong515 ed2bcc5ab3
Deprecate `Trainer.devices` in favor of `Trainer.num_devices` and `Trainer.device_ids` (#12151) 2022-03-18 12:38:57 -07:00
Aki Nitta 09d1296040
Avoid `rich` 10.15.0 and 10.15.1 (#12293)
* Update rich version

* Update requirements/extra.txt

Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>

Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
2022-03-18 19:14:43 +00:00
Jirka Borovec efa870eebc
Docker: fix NCCL building Horovod (#12318)
* Horovod w. MPI
* nccl_built
* fix
2022-03-18 14:23:19 +00:00
ananthsub 4277845fa7
Add support for specifying process group backend to relevant distributed strategies (#11745) 2022-03-17 23:38:03 -07:00
Danielle Pintz 601948a4bf
Deprecate `Trainer.use_amp` (#12312) 2022-03-18 06:14:35 +00:00
Danielle Pintz 2360049744
Deprecate `LightningModule.use_amp` (#12315) 2022-03-18 03:49:18 +01:00
Danielle Pintz f8e50f9cf5
Fix the case where logger=None is passed to Trainer (#12249) 2022-03-18 02:18:28 +00:00
Aki Nitta b8b855d411
Pin Docker image for testing on GPUs (#12368)
* Pin docker image sha
2022-03-18 01:16:54 +00:00
Carlos Mocholí bc812077c4
Fix CLI snippet in the docs (#12275) 2022-03-16 14:58:28 +05:30
Jirka Borovec 7ee690758c
CI: fix running PT 1.11 (#12304)
* fix fire
* horovod
* assistant
* cmake
* u20
* cuda
* -j2
* fix mypy

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2022-03-12 09:00:20 +00:00
edward-io 90a9da5abb
check trainerfn == FITTING before configuring sync_batchnorm (#11919)
Co-authored-by: edward-io <me@edward.io>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Aki Nitta <nitta@akihironitta.com>
2022-03-12 03:52:59 +00:00
four4fish 4d74f379a5
Only allow one value for each plugin type in `plugins` flag (#12083) 2022-03-11 19:36:23 +00:00
Jirka Borovec c90174ca31
unify logger testing (#9081)
Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2022-03-11 14:24:30 +00:00
Jirka Borovec bc8172856f
aggregate multiple helper scripts to single CLI (#11147)
* nightly release
* min version
* fire
2022-03-11 11:13:43 +00:00
Jirka Borovec 1144673cd9
CI: sanity check for req. pkgs (#11819)
* CI: sanity check for req. pkgs
* scripts
* rename
* gcsfs ?
* rich !
* install extra
* move
* set -e

Co-authored-by: Aki Nitta <nitta@akihironitta.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2022-03-11 09:20:47 +00:00
Jirka Borovec 3b4061f39a
CI: enable testing for PT 1.11 (#11792)
* enable PT 1.11
* horovod
* Apply suggestions from code review

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Aki Nitta <nitta@akihironitta.com>
2022-03-10 18:38:47 +00:00
Jirka Borovec 8577ef7bba
Skip horovod 0.24.0 only (#12248)
* try skip horovod 0.24.0 only
* HOROVOD_BUILD_CUDA_CC_LIST
* fix test

Co-authored-by: Aki Nitta <nitta@akihironitta.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2022-03-10 16:01:08 +00:00
jjenniferdai d31126c331
Support passing `storage_options` in `trainer.save_checkpoint()` API (#11891) 2022-03-09 18:35:50 +00:00
Carlos Mocholí 49a4a36ad4
Have the outputs match the loops format (#12182)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2022-03-08 18:10:18 +00:00
Kushashwa Ravi Shrimali 821ca7e85d
Drop PyTorch 1.7 testing from the CI (#12191)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Aki Nitta <nitta@akihironitta.com>
2022-03-08 19:02:32 +01:00
Carlos Mocholí 8fa156948a
Add `LightningCLI(auto_registry)` (#12108) 2022-03-08 12:26:10 -05:00
Jirka Borovec cadcc67386
add Azure HPU agent (#12258) 2022-03-08 19:20:43 +04:00
jjenniferdai f3253070c4
Deprecate `LightningDataModule.on_save/load_checkpoint` (#11893) 2022-03-07 18:21:46 -08:00
Carlos Mocholí aea96e45a4
Integrate global step with progress tracking (#11805) 2022-03-07 19:21:37 +00:00
Kaushik B 9b011606f3
Add callout items to the Docs landing page (#12196)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2022-03-07 12:46:22 +04:00
Rohit Gupta fc499bf56f
Disable tuner with distributed strategies (#12179)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2022-03-07 08:45:07 +00:00
Adrian Wälchli d0d41737ab
fix str to_device section in converting.rst (#12243) 2022-03-07 09:04:21 +01:00
ananthsub 5df519572f
Remove accelerator hooks from being called in `call_hook` (#12237) 2022-03-06 11:35:35 +05:30
Kaushik B a14783ea8c
Add Strategy page to docs (#11441)
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
Co-authored-by: rohitgr7 <rohitgr1998@gmail.com>
Co-authored-by: Kushashwa Ravi Shrimali <kushashwaravishrimali@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2022-03-06 00:37:48 +00:00
Aki Nitta ce956af4f2
Fix "Get Started" at the top being 404 (#12210) 2022-03-06 00:24:15 +01:00
ftorres16 a690fb5167
Remove "Optional" hint from non-None arguments (#12214) 2022-03-05 19:20:04 +00:00
whokilleddb 8b7a12c52e
Replace `eval()` with `ast.literal_eval()` for security (#12212)
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2022-03-05 18:38:16 +00:00
four4fish 91052dc6d5
Move ipu precision flag check to IPUPrecisionPlugin init (#12148) 2022-03-05 09:03:24 +00:00
ananthsub b5fe056765
Update configuration_validator.py (#12123) 2022-03-04 22:19:58 -08:00
ananthsub 9c3d6b8fc7
Deprecate `LightningModule.on_pretrain_routine_{start/end}` (#12122) 2022-03-04 22:17:08 -08:00
Akash Kwatra eff67d7a02
Deprecate `AbstractProfiler` in favor of `BaseProfiler` (#12106) 2022-03-05 02:35:57 +00:00
Danielle Pintz 0b682b807a
Mark `logger_connector` as protected (#12195) 2022-03-05 02:33:42 +00:00
Louis Taylor 73bda54e63
CI: update poplar sdk version (#12226) 2022-03-04 23:49:30 +00:00