lightning

Commit Graph

Author	SHA1	Message	Date
Adrian Wälchli	b42efa7d86	support launching Lightning ddp with traditional command (#7480 ) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>	2021-07-14 11:25:36 +00:00
Carlos Mocholí	6ce77a102b	Set minimum PyTorch version to 1.6 (#8288 ) Co-authored-by: Jirka <jirka.borovec@seznam.cz>	2021-07-13 17:12:49 +00:00
Carlos Mocholí	321689f52e	Add `ModelCheckpoint(save_on_train_epoch_end)` (#8389 ) Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2021-07-13 14:47:59 +00:00
Luis Perez	000fbe63d3	Expose `extract_batch_size` method and add corresponding tests. (#8357 ) * expose extract_batch and make public * first pass * early return * add changelog * move to utilities/data.py * add test_data.py * tests are passing * precommit hook * address pep8 failure Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>	2021-07-13 11:35:10 +00:00
Kaushik B	9d5ad7639c	Add logger flag to save_hyperparameters (#7960 ) * Add log flag to save_hyperparameters * FIx setter * Add test & Update changelog * Address comments * Fix conflicts * Update trainer * Update CHANGELOG.md Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * Fix datamodule hparams fix * Fix datamodule hparams fix * Update test with patch * Update pytorch_lightning/utilities/hparams_mixin.py Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * Move log_hyperparams to mixin * Update hparams mixin Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2021-07-13 11:36:36 +02:00
Carlos Mocholí	733cdbb9ad	`every_n_val_epochs` -> `every_n_epochs` (#8383 )	2021-07-13 01:20:20 +02:00
thomas chaton	370fa67004	[Refactor] Improve loops API 1/n (#8334 ) * resolve issues * update * update * update * add more exceptions * resolve bug * update * update * update changelog * resolve bug * resolve comments * update * update * update changelog * update * update * remove space * update * re-order protected trainer attr * move public method up * add docs to state dict methods * combine __load with load_state_dict * rename shadowed variable * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * move changelog entry to refactor section * refactor loop_progress property for test helper function * update trainer setter docstring * Update CHANGELOG.md * Update pytorch_lightning/loops/base.py * remove trainer check Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>	2021-07-12 11:13:50 +00:00
Kaushik B	825c5dbe8c	Add support for (accelerator='cpu'\|'gpu'\|'tpu'\|'ipu'\|'auto') (#7808 ) Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: SeanNaren <sean@grid.ai>	2021-07-09 15:28:54 +00:00
Tilman Krokotsch	09ff295177	Hyperparameters for datamodule (#3792 ) Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk> Co-authored-by: Tilman Krokotsch <tilman.krokotsch@iav.de> Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Ethan Harris <ethanwharris@gmail.com> Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com> Co-authored-by: Kaushik B <kaushikbokka@gmail.com>	2021-07-09 15:10:00 +00:00
Andrew Tritt	3102922647	Add LSF support (#5102 ) * add ClusterEnvironment for LSF systems * update init file * add available cluster environments * clean up LSFEnvironment * add ddp_hpc as a distributed backend * clean up SLURMEnvironment * remove extra blank line * init device for DDPHPCAccelerator We need to do this so we don't send the model to the same device from multiple ranks * committing current state * add additional methods to ClusterEnvironments * add NVIDIA mixin for setting up CUDA envars * remove troubleshooting prints * cleanup SLURMEnvironment * fix docstring * cleanup TorchElasticEnvironment and add documentation * PEP8 puts a cork in it * add set_ranks_to_trainer * remove unused import * move to new location * update LSF environment * remove mixin * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * changelog * reset slurm env * add tests * add licence * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * test node_rank * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add lsf env to docs * add auto detection for lsf environment * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix is_using_lsf() and test * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2021-07-09 16:14:26 +02:00
Dusan Drevicky	1b06edf2f2	Add the `on_before_optimizer_step` hook (#8048 ) Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>	2021-07-09 13:30:52 +02:00
thomas chaton	1c825a2a9c	Add the `on_before_backward` hook (#7865 ) * Add callback to hook tests and add predict test * Fix lambda callback test * Simplify lambda call test * Use LambdaCallback * Dynamically append to called for the model * Remove print * Consistency * Consistency * Prepare args/kwargs testing * yapf doesn't like dict literals * Add arguments for fit no val test * Add arguments for fit no val test * add before_backward_hook * add test * resolve flake8 * resolve tests * update changelog * add on_before_backward to LightningModule * update on comments * Test arguments * Datamodule refactor * Fix eval test * remove extra file * resolve bug * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * move to hooks * update * resolve flake8 * update on comments * Update full fit + val test * Update test * Remove FIXME * Remove FIXME * Undo change * Fix * Parametrize fit hook test * Comment * Parametrize fit hook test with different precision plugins * Fix tests * Parametrize fit hook test with manual optimization * Unnecessary parenthesis * WIP * Comments * Fix message * Test CI error * Revert "Test CI error" This reverts commit `39c4a85a83`. * Add ddp training type teardown * Update CHANGELOG * Adrian's fix * Use destructor * Update CHANGELOG.md * RPC destructor * Update pytorch_lightning/plugins/training_type/ddp.py * Why do you not work :( * Missing condition * Fix deepspeed test * GC collect in conftest * Do not show warnings for special tests * Needs to run on 1.8 To avoid: "RuntimeError: NCCL error in: /pytorch/torch/lib/c10d/ProcessGroupNCCL.cpp:32, unhandled cuda error, NCCL version 2.4.8" * Run torch 1.8 * Skip test due to 'Python bus error' * Debug NCCL * shm size * Disable warnings for special tests * Remove NCCL_DEBUG statement * Try smaller shm size * Revert "Skip test due to 'Python bus error'" This reverts commit `e0a3e8785d`. * README and adjust versions * Avoid self.on_gpu call * empty cache cleanup * More garbage collection * Unroll parametrizations * Do not reuse mock * Undo changes * Undo notebooks modification * resolve test * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * delete file * Undo * Fix test * Revert "WIP" This reverts commit `f5828a8c42`. * Rename * Remove optimizers * Fix bug with LightningOptimizer * Add optimizers * update * update * Update CHANGELOG * On after backward refactor * Do not call super * Fixes * Remove should_accumulate * pre/post backward refactor * Call the LM backward hook * Update tests * Remove dev debug patch * Fix test * Remove optimizer arguments and typing * Docs fixes * Fix comment * Undo changes * Split manual and auto * Undo change * Deepsource * Remove optimizers * Undo changes * Call the hook * Docs * Docs Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2021-07-09 06:15:57 +00:00
Carlos Mocholí	eb6d991218	Refactor plugins backward (#8328 )	2021-07-08 16:02:09 +02:00
thomas chaton	7956c6bd4b	[Feat] Add FastForwardSampler 2/n - Fault Tolerant Training (#8307 ) * wip * update * resolve bug * wip * wip * wip * resolved tests * update on comments * update * update * Update pytorch_lightning/utilities/auto_restart.py Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com> * update on comments * Update pytorch_lightning/utilities/auto_restart.py Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com> * resolve bug * update * move properties to top * update docs for fast forward sampler * move public attribute to top * add missing super call * update docs for state_dict * fix merge conflict * add missing super() call * move property to top * update on comments * update * resolve bug * update * update on comments * activate coverage for CaptureIterableDataset * update on comments Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2021-07-07 20:21:21 +00:00
Carlos Mocholí	368ac1c622	[CLI] Drop `ArgumentParser` when pickling and save before spawning (#8017 ) Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2021-07-07 17:56:13 +00:00
Carlos Mocholí	07d7c37a79	Remove magic monitor support for `ModelCheckpoint` (#8293 )	2021-07-07 18:36:19 +01:00
Carlos Mocholí	398eed508f	Fix `self.optimizers()` not returning a single `LightningOptimizer` (#8326 )	2021-07-07 18:57:45 +02:00
Sidhant Sundrani	20df24d2a2	Enables reload of dataloaders on every n epochs from every epoch (#5043 ) * edit arg to reload_dataloaders_every_n_epoch * init reload_dataloaders_every_n_epoch * edit logic to reload dl * update arg to test datamodule * update arg test dataloader * edit reload dl logic in eval loop * fix var name in reset_train_val_dataloaders * fix error, use current_epoch attribute * edit every_n_epoch to every_n_epochs * edit every_n_epoch to every_n_epochs * edit every_n_epoch to every_n_epochs * edit every_n_epoch to every_n_epochs * edit every_n_epoch to every_n_epochs * edit every_n_epoch to every_n_epochs * assert reload_dataloaders_every_n_epochs positive * assert reload_dataloaders_every_n_epochs positive * add trainer property should reload dl * update should reload dl in train loop * condition on should reload dl in eval loop * pep8 * fix update should reload dl in train loop * add test case * replace assertion with misconfig exception * remove unused variable * remove unnecessary checks * replace to BoringModel * remove unrequired comment * deprecate _every_epoch * add deprecated argument to trainer * test case for deprecated arg * remove unrequired assertion in train loop Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> * modify misconfig exception for int Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> * conv bool to int of depreciated _every_epoch Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> * update description of deprecated param Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> * update deprecation warning Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> * modify argument to int only * fix deprecated test function name Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * merge tests for reload dls * add propery should reload dl * removed and added to trainer property * use property in train loop * remove deprecated test * add deprecated test to new file * test case for exception * update test datamodule every_n_epochs * update trainer docs * update hooks with every_n_epochs * edit format if statement Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> * Update CHANGELOG.md * Apply suggestions from code review Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * typo in exception * pytest check only misconfig exception * remove unnecessary code in test * remove unnecessary code in deprec test * added match in test * typo in comment * revert to prev, keep only req in context manager * Apply suggestions from code review * docs * rebase * Apply suggestions from code review * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix import: model_helpers instead of model_utils * fix, add reload_dataloaders_every_n_epochs argument to data connector * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add required imports * move deprecated log * add missing import rank_zero_warn * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update varname in should_reload_dl_epoch suggestion from code review * Fix CHANGELOG. Update deprecation versions * Minor change * change property name, mark protected * update property name * update property name * Remove deprecated _loop.py files Rename test func * Update CHANGELOG.md * use rank_zero_deprecation * update deprecation message in trainer api docs * test deprecation with real arg name in message * fix typo in trainer docs Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Akihiro Nitta <nitta@akihironitta.com> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2021-07-07 13:10:08 +02:00
Kaushik B	2b6edae205	Decouple device parsing logic from Acc connector to Trainer (#8180 )	2021-07-07 15:05:26 +05:30
Carlos Mocholí	8fead58273	Add `functools.wraps` support for `is_overridden` (#8296 )	2021-07-06 10:40:54 +02:00
Adrian Wälchli	f1341a555e	Remove deprecated optimizer argument from `manual_backward` (#8287 ) Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>	2021-07-06 08:18:08 +00:00
Carlos Mocholí	7ddcdb26d8	Deprecate `trainer.disable_validation` (#8291 )	2021-07-05 16:52:49 +02:00
Carlos Mocholí	441e16f61c	Default `EarlyStopping.check_on_train_epoch_end=True` (#8286 )	2021-07-05 15:45:23 +02:00
Adrian Wälchli	ced2c94a3e	fix missing call to untoggle_optimizer when accumulating gradients (#8284 ) * add fix * toggle test * re-structure * update changelog * update comment * remove debugging assertion	2021-07-05 11:59:04 +00:00
Kaushik B	3a8322deda	Add XLAStatsMonitor Callback (#8235 )	2021-07-05 17:09:46 +05:30
Carlos Mocholí	3379477242	Connect progress tracking dataclasses to loops (#8244 ) Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2021-07-05 13:33:12 +02:00
Adrian Wälchli	ea5cfd2005	move batch to device before sending it to hooks (#7378 ) * update train step * test * x * limits * val * typeo * x * x * step * min gpus * run all loops * x * limit test * profiler * clean up accelerator code * move files * rename * move tests * changelog * reorder callbacks and model hooks * add test description * replace unneccessary method * fix chlog * adjust batch_to_device for DP Plugin * update tests for dataloader idx * unused imports * hook change * switch None * clear memory * change to None * None * None * memory savings * remove redundant todo * hack * cheat * Revert "cheat" This reverts commit `a8433bd0b4`. * Revert "hack" This reverts commit `43a6d1edeb`. * update new epoch loop * remove from old loop code * update chlog * update hook test * changelog * teardown * integrate changes in new eval loop * fix hook calls * add prediction step * bad merge * Revert "bad merge" This reverts commit `488080863c`. * fix train batch hook test * rm -rf _notebooks * update chlog * release memory * fix type * notebooks mess * debug * Revert "debug" This reverts commit `eec4ee2f77`. * teardown * fix teardown bug * debug * x * debug * Revert "debug" This reverts commit `a6e6101946`. Revert "debug" This reverts commit `5ddeaec069`. debug debug Revert "debug" This reverts commit 605be746f7daedf265b2c05a1c153ce543394435. Revert "Revert "debug"" This reverts commit a7612d5410409ed886cfb609457349ecf44cbfa8. debug x x x s tol x tol * Fix changelog Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2021-07-05 09:31:39 +01:00
Carlos Mocholí	0e19d16ca6	Move result teardown to loops (#8245 ) * Move result teardown to loops * Update CHANGELOG * Remove teardown from run * Move previous teardown to on_run_end * Add comment * Merge 8250 * Remove stage set to None where it shouldnt	2021-07-02 14:36:14 +01:00
thomas chaton	f3e74abad0	[feat] Add restore to base loop (#8247 ) * add loop restart * update	2021-07-02 13:40:31 +01:00
Adrian Wälchli	e7139ab9f7	Support `DDPPlugin` to be used on CPU (#6208 ) * Skip test due to 'Python bus error' * Debug NCCL * Remove NCCL_DEBUG statement * Revert "Skip test due to 'Python bus error'" This reverts commit `e0a3e8785d`. * fix * add test * changelog * yapf * patch os environ * make a special test * destroy pg * debug * revert * revert * problematic test * skip * try the fixture * test * update sensitive test * update changelog * remove comment * update wrong test * update test name * parameterization * Revert "parameterization" This reverts commit b0542f43f59c5ce66800883b5e2f0c66a97408cc. * remove conftest * ignore test * teardown * fix merge * deep speed parameterization * uncomment test * update chlog * update changelog * split tests * update test update test update test update test * update test comments * unroll test * unroll test * unroll test * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * increase shm * sudo * unroll ipu * Revert "sudo" This reverts commit `6cc68c1478`. * Revert "increase shm" This reverts commit `8c27163483`. * x * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * find guilty test * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * POPTORCH_WAIT_FOR_IPU=1 * move test * redo parameterize for ipu * de-comment test * move chlog * Update tests/accelerators/test_accelerator_connector.py Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com> * Update tests/accelerators/test_accelerator_connector.py Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com> Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>	2021-07-02 12:00:24 +01:00
Adrian Wälchli	af52de1198	update changelog after 1.3.8 patch release (#8239 ) Co-authored-by: Jirka <jirka.borovec@seznam.cz>	2021-07-01 21:49:06 +00:00
Guillaume Tauzin	baa7de2d9e	Fix truncated_bptt_steps hiddens detach() and improve docs (#8145 ) * Fix truncated_bptt_steps hiddens detach() * Improve truncated_bptt_docs * Add missing import * Improve documentation wordings * pep8 * detach typo * Update test * Implement comments * parametrize test * Apply suggestions from code review Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Signed-off-by: Guillaume Tauzin <guillaumetauzin.ut@gmail.com> * Remove import Signed-off-by: Guillaume Tauzin <guillaumetauzin.ut@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2021-07-01 22:16:14 +01:00
ananthsub	8b0aec8565	Deprecate `LightningModule.loaded_optimizer_states_dict` (#8229 )	2021-07-01 23:02:29 +02:00
thomas chaton	d51b0ae7fc	Add `state_dict` to loops (#8197 ) Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>	2021-07-01 15:54:37 +00:00
Palermo	36b893c43e	Add `ModelSummary.max_depth` (#8062 ) Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2021-07-01 12:08:16 +02:00
Mauricio Villegas	3c74502919	Add support for optimizers and learning rate schedulers to LightningCLI (#8093 ) Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2021-07-01 12:04:11 +02:00
thomas chaton	acb6f26006	[Refactor] Remove should_raise_exception (#8202 ) Co-authored-by: Ethan Harris <ethanwharris@gmail.com>	2021-06-30 17:02:10 +00:00
Ethan Harris	57dce7244c	Fix double precision casting complex buffers (#8208 ) * Fix double precision casting complex buffers * Update CHANGELOG.md * Fixes * Fixes * Fix Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2021-06-30 10:57:42 +01:00
Carlos Mocholí	2e537b75e3	Deprecate `DDPPlugin.task_idx` (#8203 )	2021-06-30 01:02:55 +02:00
thomas chaton	bae08514d1	[refactor] Add should_raise_exception for gpus / tpus utilities (#8194 ) * add should_raise * update changelog * Update pytorch_lightning/utilities/device_parser.py Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com> * add to tpu_cores parser * add should_raise description * update on comments * update changelog Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>	2021-06-29 10:00:06 -04:00
Carlos Mocholí	571a810a7c	Improvements and changes to progress tracking dataclasses (#8140 ) * Improvements to progress dataclasses * Update CHANGELOG * Rename function * Undo CODEOWNERS update	2021-06-29 13:47:41 +01:00
Justus Schock	d6435a5b73	Bugfix/swa iterable dset (#8172 ) * add test * add fix * Update CHANGELOG.md	2021-06-28 21:18:25 +00:00
Ethan Harris	b1d8840fd8	Fix metric attribute lookup (#8181 ) * Fix metric attribute lookup * Update CHANGELOG.md * Split tests	2021-06-28 20:17:43 +00:00
Adrian Wälchli	bf54ac1cad	fix NCCL error with non-consecutive trainer gpus (#8165 ) * device ids in barrier x x s same fix for spawn fix non-nccl x * add changelog * get nccl backend * get backend Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>	2021-06-28 22:08:10 +02:00
Kaushik B	2f3c65e57b	XLA Profiler integration (#8014 )	2021-06-29 00:58:05 +05:30
thomas chaton	c521624a92	[bugfix] Add mechanism to prevent deadlock for DDP on Exception Trigger (#8167 ) * add mechanism to prevent deadlock * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * resolve flake8 + update changelog * update on comments * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update * remove space * resolve bugs * overwrite config * update on comments * update on comments * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update * update * update test with comments * Update pytorch_lightning/plugins/training_type/parallel.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * update on comments Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>	2021-06-28 19:26:03 +00:00
thomas chaton	1f025789fc	[bugfix] Clean Validation Sanity Checking metrics (#8171 ) * resolve logging issue * update changelog * remove breakpoint * resolve bugs * remove pass	2021-06-28 13:49:56 -04:00
thomas chaton	c4492ad6aa	Merge pull request #8174 from PyTorchLightning/bugfix/8159_log_gpu_memory_on_step [bugfix] Resolve memory not logged when missing metrics	2021-06-28 09:39:17 -04:00
Ethan Harris	2a372e3682	Fix module dict in base finetuning (#8170 ) * Fix module dict in base finetuning * Update CHANGELOG.md	2021-06-28 10:55:32 +00:00
Adrian Wälchli	51ea84222b	resurface lost ddp info message (#8111 )	2021-06-27 21:51:15 +02:00

1 2 3 4 5 ...

885 Commits