lightning

Commit Graph

Author	SHA1	Message	Date
Carlos Mocholí	c5a120ed9d	Update to Mypy>0.9 (#8386 )	2021-07-13 08:23:36 +02:00
Carlos Mocholí	733cdbb9ad	`every_n_val_epochs` -> `every_n_epochs` (#8383 )	2021-07-13 01:20:20 +02:00
Carlos Mocholí	f3e828426a	Clean code formatting CI job (#8378 )	2021-07-12 20:28:35 +02:00
Daniel Stancl	91d98c8345	Fix mypy in utilities.device_dtype_mixin (#8127 )	2021-07-12 18:56:06 +02:00
Kaushik B	4f1e7be5ec	Remove Vulture (#8381 )	2021-07-12 13:39:36 +00:00
Kaushik B	b069493b15	Add troubleshooting section for tpus (#8277 ) Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2021-07-12 13:29:02 +00:00
Kaushik B	652eae684b	Exclude `lightning.py` for vulture (#8379 )	2021-07-12 12:29:16 +00:00
thomas chaton	370fa67004	[Refactor] Improve loops API 1/n (#8334 ) * resolve issues * update * update * update * add more exceptions * resolve bug * update * update * update changelog * resolve bug * resolve comments * update * update * update changelog * update * update * remove space * update * re-order protected trainer attr * move public method up * add docs to state dict methods * combine __load with load_state_dict * rename shadowed variable * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * move changelog entry to refactor section * refactor loop_progress property for test helper function * update trainer setter docstring * Update CHANGELOG.md * Update pytorch_lightning/loops/base.py * remove trainer check Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>	2021-07-12 11:13:50 +00:00
Ethan Harris	e5732a1158	Temporarily pin sphinx version (#8377 )	2021-07-12 09:35:44 +00:00
Kaushik B	825c5dbe8c	Add support for (accelerator='cpu'\|'gpu'\|'tpu'\|'ipu'\|'auto') (#7808 ) Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: SeanNaren <sean@grid.ai>	2021-07-09 15:28:54 +00:00
Tilman Krokotsch	09ff295177	Hyperparameters for datamodule (#3792 ) Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk> Co-authored-by: Tilman Krokotsch <tilman.krokotsch@iav.de> Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Ethan Harris <ethanwharris@gmail.com> Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com> Co-authored-by: Kaushik B <kaushikbokka@gmail.com>	2021-07-09 15:10:00 +00:00
Andrew Tritt	3102922647	Add LSF support (#5102 ) * add ClusterEnvironment for LSF systems * update init file * add available cluster environments * clean up LSFEnvironment * add ddp_hpc as a distributed backend * clean up SLURMEnvironment * remove extra blank line * init device for DDPHPCAccelerator We need to do this so we don't send the model to the same device from multiple ranks * committing current state * add additional methods to ClusterEnvironments * add NVIDIA mixin for setting up CUDA envars * remove troubleshooting prints * cleanup SLURMEnvironment * fix docstring * cleanup TorchElasticEnvironment and add documentation * PEP8 puts a cork in it * add set_ranks_to_trainer * remove unused import * move to new location * update LSF environment * remove mixin * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * changelog * reset slurm env * add tests * add licence * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * test node_rank * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add lsf env to docs * add auto detection for lsf environment * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix is_using_lsf() and test * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2021-07-09 16:14:26 +02:00
Dusan Drevicky	1b06edf2f2	Add the `on_before_optimizer_step` hook (#8048 ) Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>	2021-07-09 13:30:52 +02:00
Sean Naren	31fca1658d	[docs] Add NCCL environment variable docs (#8345 ) * Add nccl env variable docs * Wording * Update docs/source/guides/speed.rst Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com> Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>	2021-07-09 11:27:18 +00:00
Carlos Mocholí	0dfc265e2f	Parametrize fit hook test with manual optimization (#8071 )	2021-07-09 10:36:09 +00:00
thomas chaton	1c825a2a9c	Add the `on_before_backward` hook (#7865 ) * Add callback to hook tests and add predict test * Fix lambda callback test * Simplify lambda call test * Use LambdaCallback * Dynamically append to called for the model * Remove print * Consistency * Consistency * Prepare args/kwargs testing * yapf doesn't like dict literals * Add arguments for fit no val test * Add arguments for fit no val test * add before_backward_hook * add test * resolve flake8 * resolve tests * update changelog * add on_before_backward to LightningModule * update on comments * Test arguments * Datamodule refactor * Fix eval test * remove extra file * resolve bug * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * move to hooks * update * resolve flake8 * update on comments * Update full fit + val test * Update test * Remove FIXME * Remove FIXME * Undo change * Fix * Parametrize fit hook test * Comment * Parametrize fit hook test with different precision plugins * Fix tests * Parametrize fit hook test with manual optimization * Unnecessary parenthesis * WIP * Comments * Fix message * Test CI error * Revert "Test CI error" This reverts commit `39c4a85a83`. * Add ddp training type teardown * Update CHANGELOG * Adrian's fix * Use destructor * Update CHANGELOG.md * RPC destructor * Update pytorch_lightning/plugins/training_type/ddp.py * Why do you not work :( * Missing condition * Fix deepspeed test * GC collect in conftest * Do not show warnings for special tests * Needs to run on 1.8 To avoid: "RuntimeError: NCCL error in: /pytorch/torch/lib/c10d/ProcessGroupNCCL.cpp:32, unhandled cuda error, NCCL version 2.4.8" * Run torch 1.8 * Skip test due to 'Python bus error' * Debug NCCL * shm size * Disable warnings for special tests * Remove NCCL_DEBUG statement * Try smaller shm size * Revert "Skip test due to 'Python bus error'" This reverts commit `e0a3e8785d`. * README and adjust versions * Avoid self.on_gpu call * empty cache cleanup * More garbage collection * Unroll parametrizations * Do not reuse mock * Undo changes * Undo notebooks modification * resolve test * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * delete file * Undo * Fix test * Revert "WIP" This reverts commit `f5828a8c42`. * Rename * Remove optimizers * Fix bug with LightningOptimizer * Add optimizers * update * update * Update CHANGELOG * On after backward refactor * Do not call super * Fixes * Remove should_accumulate * pre/post backward refactor * Call the LM backward hook * Update tests * Remove dev debug patch * Fix test * Remove optimizer arguments and typing * Docs fixes * Fix comment * Undo changes * Split manual and auto * Undo change * Deepsource * Remove optimizers * Undo changes * Call the hook * Docs * Docs Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2021-07-09 06:15:57 +00:00
Jirka Borovec	1ad1a89c09	Unpin `PyYAML<=5.4.1` (#8329 )	2021-07-08 22:04:47 +02:00
Carlos Mocholí	eb6d991218	Refactor plugins backward (#8328 )	2021-07-08 16:02:09 +02:00
Carlos Mocholí	e9d0fe867f	Unpin Pillow after the 8.3.1 release (#8324 )	2021-07-08 12:36:02 +05:30
Mauricio Villegas	7d3452a000	LightningCLI documentation improvements (#8303 ) Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2021-07-08 12:35:26 +05:30
Kaushik B	3c637355a8	Use accelerator instead of dist backend for missing horovod warning (#8319 )	2021-07-07 18:00:34 -07:00
Daniel Stancl	667def8d89	Fix mypy in `utilities.parsing` (#8132 )	2021-07-07 23:32:12 +00:00
Jaime Ferrando Huertas	9bbca402ff	Add auto_insert_metric_name to ModelCheckpoint docstring. (#8310 ) Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2021-07-07 23:15:21 +00:00
Carlos Mocholí	b6108f1477	Fix broadcast for Windows minimal (#8331 )	2021-07-07 22:01:34 +00:00
thomas chaton	7956c6bd4b	[Feat] Add FastForwardSampler 2/n - Fault Tolerant Training (#8307 ) * wip * update * resolve bug * wip * wip * wip * resolved tests * update on comments * update * update * Update pytorch_lightning/utilities/auto_restart.py Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com> * update on comments * Update pytorch_lightning/utilities/auto_restart.py Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com> * resolve bug * update * move properties to top * update docs for fast forward sampler * move public attribute to top * add missing super call * update docs for state_dict * fix merge conflict * add missing super() call * move property to top * update on comments * update * resolve bug * update * update on comments * activate coverage for CaptureIterableDataset * update on comments Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2021-07-07 20:21:21 +00:00
Carlos Mocholí	c4353ea702	Remove `dev_debugger.call_count` (#8317 )	2021-07-07 19:59:59 +02:00
Carlos Mocholí	368ac1c622	[CLI] Drop `ArgumentParser` when pickling and save before spawning (#8017 ) Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2021-07-07 17:56:13 +00:00
Carlos Mocholí	07d7c37a79	Remove magic monitor support for `ModelCheckpoint` (#8293 )	2021-07-07 18:36:19 +01:00
Ethan Harris	56697dd894	Add logo_light.svg (#8327 )	2021-07-07 17:24:31 +00:00
Carlos Mocholí	398eed508f	Fix `self.optimizers()` not returning a single `LightningOptimizer` (#8326 )	2021-07-07 18:57:45 +02:00
Carlos Mocholí	0cd406d4f1	Delete `checkpoint_connector.has_trained` (#8292 )	2021-07-07 17:47:35 +01:00
Sean Naren	01f594baf4	Add quick docs for deepspeed infinity (#8323 )	2021-07-07 15:58:27 +02:00
Sean Naren	fc12fe721f	Remove RC candidate install (#8322 )	2021-07-07 12:21:12 +00:00
Carlos Mocholí	9877265887	Simplify logger connector access (#8318 )	2021-07-07 14:13:30 +02:00
Adrian Wälchli	d73c32ab51	move `torch.cuda.set_device()` to enable collective calls earlier in setup (#8312 )	2021-07-07 13:15:41 +02:00
Sidhant Sundrani	20df24d2a2	Enables reload of dataloaders on every n epochs from every epoch (#5043 ) * edit arg to reload_dataloaders_every_n_epoch * init reload_dataloaders_every_n_epoch * edit logic to reload dl * update arg to test datamodule * update arg test dataloader * edit reload dl logic in eval loop * fix var name in reset_train_val_dataloaders * fix error, use current_epoch attribute * edit every_n_epoch to every_n_epochs * edit every_n_epoch to every_n_epochs * edit every_n_epoch to every_n_epochs * edit every_n_epoch to every_n_epochs * edit every_n_epoch to every_n_epochs * edit every_n_epoch to every_n_epochs * assert reload_dataloaders_every_n_epochs positive * assert reload_dataloaders_every_n_epochs positive * add trainer property should reload dl * update should reload dl in train loop * condition on should reload dl in eval loop * pep8 * fix update should reload dl in train loop * add test case * replace assertion with misconfig exception * remove unused variable * remove unnecessary checks * replace to BoringModel * remove unrequired comment * deprecate _every_epoch * add deprecated argument to trainer * test case for deprecated arg * remove unrequired assertion in train loop Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> * modify misconfig exception for int Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> * conv bool to int of depreciated _every_epoch Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> * update description of deprecated param Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> * update deprecation warning Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> * modify argument to int only * fix deprecated test function name Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * merge tests for reload dls * add propery should reload dl * removed and added to trainer property * use property in train loop * remove deprecated test * add deprecated test to new file * test case for exception * update test datamodule every_n_epochs * update trainer docs * update hooks with every_n_epochs * edit format if statement Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> * Update CHANGELOG.md * Apply suggestions from code review Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * typo in exception * pytest check only misconfig exception * remove unnecessary code in test * remove unnecessary code in deprec test * added match in test * typo in comment * revert to prev, keep only req in context manager * Apply suggestions from code review * docs * rebase * Apply suggestions from code review * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix import: model_helpers instead of model_utils * fix, add reload_dataloaders_every_n_epochs argument to data connector * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add required imports * move deprecated log * add missing import rank_zero_warn * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update varname in should_reload_dl_epoch suggestion from code review * Fix CHANGELOG. Update deprecation versions * Minor change * change property name, mark protected * update property name * update property name * Remove deprecated _loop.py files Rename test func * Update CHANGELOG.md * use rank_zero_deprecation * update deprecation message in trainer api docs * test deprecation with real arg name in message * fix typo in trainer docs Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Akihiro Nitta <nitta@akihironitta.com> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2021-07-07 13:10:08 +02:00
William Falcon	e148a1339a	Update governance.rst	2021-07-07 11:39:19 +02:00
Kaushik B	2b6edae205	Decouple device parsing logic from Acc connector to Trainer (#8180 )	2021-07-07 15:05:26 +05:30
thomas chaton	bca5adf6de	Update contributing templates (#8256 ) * update * add a link * update * remove star * update * Update .github/PULL_REQUEST_TEMPLATE.md Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk> * Update .github/PULL_REQUEST_TEMPLATE.md * update * update Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com> Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk>	2021-07-06 17:34:31 +00:00
gaoteng-git	f796cb435f	Fix `PyTorchProfiler` prefix typo (#8308 )	2021-07-06 17:12:07 +02:00
Carlos Mocholí	4184d7e738	Refactor GPU examples tests (#8294 )	2021-07-06 13:14:04 +01:00
Stephen McGroarty	f6a5bb2eee	Bump IPU version (#8290 )	2021-07-06 10:27:34 +00:00
Adrian Wälchli	1e1d1821d0	fix best score on wrong device in EarlyStopping callback (#8295 )	2021-07-06 10:59:33 +02:00
Carlos Mocholí	8fead58273	Add `functools.wraps` support for `is_overridden` (#8296 )	2021-07-06 10:40:54 +02:00
Daniel Stancl	34efadd5b8	Fix mypy in `utilities.device_parser` (#8136 ) * Fix mypy for utilities.device_parser * Fix remaining mypy issues + disable ignoring mypy errors * Return one Optional type annotation back * Fix annotation for the parse_tpu_cores method * Remove unused import * Include carmocca's suggestion and fix mypy issue * include carmocca's suggestion * add `else` statement to `parse_gpu_ids` to inform mypy `gpus` is a type of `List[int]` * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2021-07-06 10:39:57 +02:00
Adrian Wälchli	f1341a555e	Remove deprecated optimizer argument from `manual_backward` (#8287 ) Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>	2021-07-06 08:18:08 +00:00
Adrian Wälchli	9eda520bee	clean up unused attributes in LightningModule (#8259 )	2021-07-06 10:13:09 +02:00
gaoteng-git	a7e21bd5ad	move profiler.step from training_step_and_backward to optimizer_step_… (#8224 ) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2021-07-06 13:22:10 +05:30
Carlos Mocholí	7ddcdb26d8	Deprecate `trainer.disable_validation` (#8291 )	2021-07-05 16:52:49 +02:00
Carlos Mocholí	441e16f61c	Default `EarlyStopping.check_on_train_epoch_end=True` (#8286 )	2021-07-05 15:45:23 +02:00

1 2 3 4 5 ...

5243 Commits All Branches Search

5243 Commits

All Branches