Carlos Mocholí
5d190eabd2
Clarify what's the PyTorch profiler used in docs ( #12392 )
...
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2022-03-21 17:33:48 +00:00
ananthsub
d99625fc8d
Reduce number of times optimizers are instantiated with FSDP ( #12267 )
2022-03-21 18:18:59 +01:00
Aki Nitta
fa7aa0babe
Update nightly GPU benchmark pool ( #12366 )
2022-03-21 18:17:34 +01:00
DuYicong515
31c68d107e
Remove `AcceleratorConnector.num_gpus` and deprecate `Trainer.num_gpus` ( #12384 )
2022-03-21 18:06:39 +01:00
Danielle Pintz
caed77f155
Refactor `TorchElasticEnvironment.detect` to use `torch.distributed.is_torchelastic_launched` ( #12376 )
...
* Refactor TorchElasticEnvironment.detect to use native utility from torch.distributed
* fix version and tests
* fix version
* Update tests/accelerators/test_accelerator_connector.py
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2022-03-21 16:51:24 +01:00
Jirka Borovec
fe940e195d
CI: update prune_pkgs ( #12382 )
2022-03-21 12:50:50 +00:00
four4fish
1eff3b53c1
Update fairscale version ( #11567 )
...
Co-authored-by: Aki Nitta <nitta@akihironitta.com>
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
Co-authored-by: Jirka <jirka.borovec@seznam.cz>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2022-03-21 11:38:55 +00:00
Rohit Gupta
865c54f308
Fix deepspeed scheduler initialization ( #12031 )
2022-03-21 10:31:00 +00:00
DuYicong515
523200971d
Remove `AcceleratorConnector.root_gpu` and deprecate `Trainer.root_gpu` ( #12262 )
2022-03-19 23:53:50 +00:00
jjenniferdai
6ba66789ae
[2/n] add `Stateful` functionality support for Callbacks ( #12232 )
2022-03-19 20:20:50 +00:00
Adrian Wälchli
eda53d70c3
update docs for ModelCheckpoint save_last ( #12332 )
2022-03-19 20:15:54 +00:00
William Falcon
f2d6a855af
Update README.md
2022-03-18 16:41:19 -04:00
DuYicong515
ed2bcc5ab3
Deprecate `Trainer.devices` in favor of `Trainer.num_devices` and `Trainer.device_ids` ( #12151 )
2022-03-18 12:38:57 -07:00
Aki Nitta
09d1296040
Avoid `rich` 10.15.0 and 10.15.1 ( #12293 )
...
* Update rich version
* Update requirements/extra.txt
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
2022-03-18 19:14:43 +00:00
Jirka Borovec
efa870eebc
Docker: fix NCCL building Horovod ( #12318 )
...
* Horovod w. MPI
* nccl_built
* fix
2022-03-18 14:23:19 +00:00
ananthsub
4277845fa7
Add support for specifying process group backend to relevant distributed strategies ( #11745 )
2022-03-17 23:38:03 -07:00
Danielle Pintz
601948a4bf
Deprecate `Trainer.use_amp` ( #12312 )
2022-03-18 06:14:35 +00:00
Danielle Pintz
2360049744
Deprecate `LightningModule.use_amp` ( #12315 )
2022-03-18 03:49:18 +01:00
Danielle Pintz
f8e50f9cf5
Fix the case where logger=None is passed to Trainer ( #12249 )
2022-03-18 02:18:28 +00:00
Aki Nitta
b8b855d411
Pin Docker image for testing on GPUs ( #12368 )
...
* Pin docker image sha
2022-03-18 01:16:54 +00:00
Carlos Mocholí
bc812077c4
Fix CLI snippet in the docs ( #12275 )
2022-03-16 14:58:28 +05:30
Jirka Borovec
7ee690758c
CI: fix running PT 1.11 ( #12304 )
...
* fix fire
* horovod
* assistant
* cmake
* u20
* cuda
* -j2
* fix mypy
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2022-03-12 09:00:20 +00:00
edward-io
90a9da5abb
check trainerfn == FITTING before configuring sync_batchnorm ( #11919 )
...
Co-authored-by: edward-io <me@edward.io>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Aki Nitta <nitta@akihironitta.com>
2022-03-12 03:52:59 +00:00
four4fish
4d74f379a5
Only allow one value for each plugin type in `plugins` flag ( #12083 )
2022-03-11 19:36:23 +00:00
Jirka Borovec
c90174ca31
unify logger testing ( #9081 )
...
Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2022-03-11 14:24:30 +00:00
Jirka Borovec
bc8172856f
aggregate multiple helper scripts to single CLI ( #11147 )
...
* nightly release
* min version
* fire
2022-03-11 11:13:43 +00:00
Jirka Borovec
1144673cd9
CI: sanity check for req. pkgs ( #11819 )
...
* CI: sanity check for req. pkgs
* scripts
* rename
* gcsfs ?
* rich !
* install extra
* move
* set -e
Co-authored-by: Aki Nitta <nitta@akihironitta.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2022-03-11 09:20:47 +00:00
Jirka Borovec
3b4061f39a
CI: enable testing for PT 1.11 ( #11792 )
...
* enable PT 1.11
* horovod
* Apply suggestions from code review
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Aki Nitta <nitta@akihironitta.com>
2022-03-10 18:38:47 +00:00
Jirka Borovec
8577ef7bba
Skip horovod 0.24.0 only ( #12248 )
...
* try skip horovod 0.24.0 only
* HOROVOD_BUILD_CUDA_CC_LIST
* fix test
Co-authored-by: Aki Nitta <nitta@akihironitta.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2022-03-10 16:01:08 +00:00
jjenniferdai
d31126c331
Support passing `storage_options` in `trainer.save_checkpoint()` API ( #11891 )
2022-03-09 18:35:50 +00:00
Carlos Mocholí
49a4a36ad4
Have the outputs match the loops format ( #12182 )
...
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2022-03-08 18:10:18 +00:00
Kushashwa Ravi Shrimali
821ca7e85d
Drop PyTorch 1.7 testing from the CI ( #12191 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Aki Nitta <nitta@akihironitta.com>
2022-03-08 19:02:32 +01:00
Carlos Mocholí
8fa156948a
Add `LightningCLI(auto_registry)` ( #12108 )
2022-03-08 12:26:10 -05:00
Jirka Borovec
cadcc67386
add Azure HPU agent ( #12258 )
2022-03-08 19:20:43 +04:00
jjenniferdai
f3253070c4
Deprecate `LightningDataModule.on_save/load_checkpoint` ( #11893 )
2022-03-07 18:21:46 -08:00
Carlos Mocholí
aea96e45a4
Integrate global step with progress tracking ( #11805 )
2022-03-07 19:21:37 +00:00
Kaushik B
9b011606f3
Add callout items to the Docs landing page ( #12196 )
...
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2022-03-07 12:46:22 +04:00
Rohit Gupta
fc499bf56f
Disable tuner with distributed strategies ( #12179 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2022-03-07 08:45:07 +00:00
Adrian Wälchli
d0d41737ab
fix str to_device section in converting.rst ( #12243 )
2022-03-07 09:04:21 +01:00
ananthsub
5df519572f
Remove accelerator hooks from being called in `call_hook` ( #12237 )
2022-03-06 11:35:35 +05:30
Kaushik B
a14783ea8c
Add Strategy page to docs ( #11441 )
...
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
Co-authored-by: rohitgr7 <rohitgr1998@gmail.com>
Co-authored-by: Kushashwa Ravi Shrimali <kushashwaravishrimali@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2022-03-06 00:37:48 +00:00
Aki Nitta
ce956af4f2
Fix "Get Started" at the top being 404 ( #12210 )
2022-03-06 00:24:15 +01:00
ftorres16
a690fb5167
Remove "Optional" hint from non-None arguments ( #12214 )
2022-03-05 19:20:04 +00:00
whokilleddb
8b7a12c52e
Replace `eval()` with `ast.literal_eval()` for security ( #12212 )
...
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2022-03-05 18:38:16 +00:00
four4fish
91052dc6d5
Move ipu precision flag check to IPUPrecisionPlugin init ( #12148 )
2022-03-05 09:03:24 +00:00
ananthsub
b5fe056765
Update configuration_validator.py ( #12123 )
2022-03-04 22:19:58 -08:00
ananthsub
9c3d6b8fc7
Deprecate `LightningModule.on_pretrain_routine_{start/end}` ( #12122 )
2022-03-04 22:17:08 -08:00
Akash Kwatra
eff67d7a02
Deprecate `AbstractProfiler` in favor of `BaseProfiler` ( #12106 )
2022-03-05 02:35:57 +00:00
Danielle Pintz
0b682b807a
Mark `logger_connector` as protected ( #12195 )
2022-03-05 02:33:42 +00:00
Louis Taylor
73bda54e63
CI: update poplar sdk version ( #12226 )
2022-03-04 23:49:30 +00:00