lightning

Commit Graph

Author	SHA1	Message	Date
Ning	f6ed0bd8ca	introduce has_len_all_ranks() to check the length of dataloader across ranks (#9827 ) * introduce , udpate tests * update CHANGELOG.md * change staticmethod and hook attribute naming * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix typo * remove non-essential comment * fix merge error and comment format * try to fix test_tpu.py failure * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update on comments * chlog * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * chlog * update * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * try fix * Revert back TPUSpawn changes * Update test Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: thomas chaton <thomas@grid.ai> Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> Co-authored-by: Kaushik B <kaushikbokka@gmail.com>	2021-11-02 13:22:58 -04:00
Adrian Wälchli	9d136a9fc5	Lightning Lite core and tests (#10175 )	2021-10-29 21:46:39 +00:00
Ning	0b68f2abf8	Remove `reset_train_val_dataloaders` from Trainer and move data reloading logic to loop (#9671 ) Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>	2021-10-19 21:45:52 +02:00
Kaushik B	5e8829b97d	(1/n) tests: Use strategy flag instead of accelerator for training strategies (#9931 ) Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2021-10-16 20:40:25 +05:30
ananthsub	28fc8d2016	Add `enable_model_summary` flag and deprecate `weights_summary` (#9699 ) Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Kaushik B <kaushikbokka@gmail.com>	2021-10-13 17:20:54 +05:30
Adrian Wälchli	b530b7afd2	update tests to not rely on patched dataloaders (#9905 )	2021-10-12 12:45:28 +02:00
Rohit Gupta	db322f4bbb	Deprecate `checkpoint_callback` from the `Trainer` constructor in favour of `enable_checkpointing` (#9754 ) * enable_chekpointing * update codebase * chlog * update tests * fix warning * Apply suggestions from code review Co-authored-by: ananthsub <ananth.subramaniam@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Apply suggestions from code review Co-authored-by: ananthsub <ananth.subramaniam@gmail.com> * Apply suggestions from code review Co-authored-by: ananthsub <ananth.subramaniam@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2021-10-12 07:55:07 +00:00
Rohit Gupta	4decbc0d95	Deprecate `dataloader_idx` from `on_train_batch_start/end` (#9816 ) * deprecate hooks * dep todo * explicit * Apply suggestions from code review * Apply suggestions from code review * code review * base	2021-10-07 10:18:11 +00:00
Carlos Mocholí	7f95fd04d7	Remove unnecessary `pytest.param` usage (#9760 )	2021-09-30 02:42:11 +00:00
thomas chaton	fa44dbcd9e	[Refactor] Simplify data loading logic around replacing sampler to prevent confusion (#9721 ) Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>	2021-09-28 17:04:02 +00:00
Adrian Wälchli	d67aff7494	remove `InternalDebugger.track_load_dataloader_call` (#9675 ) * wip * reset _notebooks * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * reset _notebooks * testing with mock * update test with mock * update test * update tests * update test * remove track_load_dataloader_calls * update last test * remove unused imports * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2021-09-24 15:37:36 +02:00
Adrian Wälchli	5a846d48ce	mark several methods in evaluation loops as protected (#9516 ) Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>	2021-09-15 14:12:27 +00:00
Jirka Borovec	6e124e7207	CI: precommit - docformatter (#8584 ) * CI: precommit - docformatter * fix deprecated Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2021-09-06 12:49:09 +00:00
Carlos Mocholí	05ff1b2085	Remove unnecessary `TrainingEpochLoop` return (#9298 )	2021-09-06 13:54:33 +02:00
Adrian Wälchli	c0bd658354	Remove calls to internal dev debugger in training- and eval loop (#9188 )	2021-08-30 17:16:59 +02:00
Carlos Mocholí	93ab24d1ee	Replace DataLoader sampler once for IPUs (#8858 )	2021-08-16 11:28:05 +02:00
Carlos Mocholí	ed13040729	Connect the model to the training type plugin at the start of run (#8536 )	2021-08-04 17:43:34 +02:00
thomas chaton	567e905ead	update logic to inject FastForwardSampler / CaptureIterableDataset 2/n (#8366 ) Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: Justus Schock <justus.schock@posteo.de> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2021-08-02 20:52:06 +00:00
Sean Naren	aadd2a9d9c	Load ckpt path when model provided in validate/test/predict (#8352 ) * Change trainer loading behaviour for validate/test/predict * Fix * Fix/add tests * remove * Cleanups * Space * cleanups * Add CHANGELOG.md * Move after setup * Cleanups on logic * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remve * fix test * feedback * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update pytorch_lightning/trainer/properties.py Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * Feedback * Same fix * Same fix * Add test for behaviour, modify based on feedback * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Wording * Apply suggestions from code review Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * Cleanup docs * Update pytorch_lightning/trainer/trainer.py Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com> * feedback * Fixes to test API * Add carlos description * Move logic further * Move checkpoint connector logic Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>	2021-07-28 10:12:46 +00:00
Carlos Mocholí	a64cc37394	Replace `yapf` with `black` (#7783 ) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2021-07-26 13:37:35 +02:00
Carlos Mocholí	6dbdf438e8	Support `DataLoader`s with missing arguments in `replace_sampler` (#8519 ) * Support `DataLoader`s with missing arguments in `replace_sampler` * Fix for multiprocessing context * Fixes and test improvements * Fixes and test improvements * Fixes and test improvements * Test any variadic name * Update CHANGELOG * Make sure extra attributes can be present * Skip on old Windows * Update pytorch_lightning/trainer/data_loading.py * Update pytorch_lightning/trainer/data_loading.py * Check is dataloader * Typo	2021-07-26 10:04:21 +02:00
Carlos Mocholí	f7027a8701	Remove `torch >= 1.6` checks (#8523 )	2021-07-23 04:03:20 +00:00
Adrian Wälchli	1bfa29a8b0	Clear dataloader references before attaching new dataloaders to Trainer (#8442 ) * regression test * apply fix * simplify test and docs * update changlog	2021-07-19 10:43:39 +00:00
Sidhant Sundrani	20df24d2a2	Enables reload of dataloaders on every n epochs from every epoch (#5043 ) * edit arg to reload_dataloaders_every_n_epoch * init reload_dataloaders_every_n_epoch * edit logic to reload dl * update arg to test datamodule * update arg test dataloader * edit reload dl logic in eval loop * fix var name in reset_train_val_dataloaders * fix error, use current_epoch attribute * edit every_n_epoch to every_n_epochs * edit every_n_epoch to every_n_epochs * edit every_n_epoch to every_n_epochs * edit every_n_epoch to every_n_epochs * edit every_n_epoch to every_n_epochs * edit every_n_epoch to every_n_epochs * assert reload_dataloaders_every_n_epochs positive * assert reload_dataloaders_every_n_epochs positive * add trainer property should reload dl * update should reload dl in train loop * condition on should reload dl in eval loop * pep8 * fix update should reload dl in train loop * add test case * replace assertion with misconfig exception * remove unused variable * remove unnecessary checks * replace to BoringModel * remove unrequired comment * deprecate _every_epoch * add deprecated argument to trainer * test case for deprecated arg * remove unrequired assertion in train loop Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> * modify misconfig exception for int Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> * conv bool to int of depreciated _every_epoch Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> * update description of deprecated param Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> * update deprecation warning Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> * modify argument to int only * fix deprecated test function name Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * merge tests for reload dls * add propery should reload dl * removed and added to trainer property * use property in train loop * remove deprecated test * add deprecated test to new file * test case for exception * update test datamodule every_n_epochs * update trainer docs * update hooks with every_n_epochs * edit format if statement Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> * Update CHANGELOG.md * Apply suggestions from code review Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * typo in exception * pytest check only misconfig exception * remove unnecessary code in test * remove unnecessary code in deprec test * added match in test * typo in comment * revert to prev, keep only req in context manager * Apply suggestions from code review * docs * rebase * Apply suggestions from code review * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix import: model_helpers instead of model_utils * fix, add reload_dataloaders_every_n_epochs argument to data connector * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add required imports * move deprecated log * add missing import rank_zero_warn * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update varname in should_reload_dl_epoch suggestion from code review * Fix CHANGELOG. Update deprecation versions * Minor change * change property name, mark protected * update property name * update property name * Remove deprecated _loop.py files Rename test func * Update CHANGELOG.md * use rank_zero_deprecation * update deprecation message in trainer api docs * test deprecation with real arg name in message * fix typo in trainer docs Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Akihiro Nitta <nitta@akihironitta.com> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2021-07-07 13:10:08 +02:00
Carlos Mocholí	7ddcdb26d8	Deprecate `trainer.disable_validation` (#8291 )	2021-07-05 16:52:49 +02:00
Adrian Wälchli	ea5cfd2005	move batch to device before sending it to hooks (#7378 ) * update train step * test * x * limits * val * typeo * x * x * step * min gpus * run all loops * x * limit test * profiler * clean up accelerator code * move files * rename * move tests * changelog * reorder callbacks and model hooks * add test description * replace unneccessary method * fix chlog * adjust batch_to_device for DP Plugin * update tests for dataloader idx * unused imports * hook change * switch None * clear memory * change to None * None * None * memory savings * remove redundant todo * hack * cheat * Revert "cheat" This reverts commit `a8433bd0b4`. * Revert "hack" This reverts commit `43a6d1edeb`. * update new epoch loop * remove from old loop code * update chlog * update hook test * changelog * teardown * integrate changes in new eval loop * fix hook calls * add prediction step * bad merge * Revert "bad merge" This reverts commit `488080863c`. * fix train batch hook test * rm -rf _notebooks * update chlog * release memory * fix type * notebooks mess * debug * Revert "debug" This reverts commit `eec4ee2f77`. * teardown * fix teardown bug * debug * x * debug * Revert "debug" This reverts commit `a6e6101946`. Revert "debug" This reverts commit `5ddeaec069`. debug debug Revert "debug" This reverts commit 605be746f7daedf265b2c05a1c153ce543394435. Revert "Revert "debug"" This reverts commit a7612d5410409ed886cfb609457349ecf44cbfa8. debug x x x s tol x tol * Fix changelog Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2021-07-05 09:31:39 +01:00
deepsource-autofix[bot]	e11fe19673	Remove unnecessary use of comprehension (#8149 ) Co-authored-by: deepsource-autofix[bot] <62050782+deepsource-autofix[bot]@users.noreply.github.com>	2021-06-27 10:00:02 +01:00
Adrian Wälchli	20f37b85b6	add warning when Trainer(log_every_n_steps) not well chosen (#7734 ) * add warning * update changelog * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * logger check * add docstring for test Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Nicki Skafte <skaftenicki@gmail.com>	2021-06-07 12:40:43 +00:00
Rohit Gupta	7ca41734da	Add `dataloader_idx` to batch transfer hooks (#6241 ) * replace with kwargs * chlog * fix * add test * fix * device * deepspeed * pep * optional * docs * bc * comments * pep * mypy * pep * Apply suggestions from code review * kwargs * docs * . * . * 1.3 -> 1.4 * kwargs -> step_kwargs	2021-05-13 23:03:55 +05:30
Justus Schock	7b283e3c46	Bugfix/Multiple dataloaders (#7433 ) * Update supporters.py * Update apply_func.py * Update supporters.py * Update model_train_dataloaders.py * Update model_train_steps.py * Update test_dataloaders.py * Update CHANGELOG.md * Update model_train_steps.py * Update test_dataloaders.py * Update test_dataloaders.py * Update supporters.py * Update test_supporters.py * Apply suggestions from code review Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update tests/trainer/test_dataloaders.py Co-authored-by: Akihiro Nitta <nitta@akihironitta.com> * Apply suggestions from code review Co-authored-by: Edgar Riba <edgar.riba@gmail.com> * Update supporters.py * Update supporters.py * Apply suggestions from code review Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Akihiro Nitta <nitta@akihironitta.com> Co-authored-by: Edgar Riba <edgar.riba@gmail.com> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2021-05-11 16:33:29 +02:00
Adrian Wälchli	1af42d7d1e	fix 1.9 test (#7441 )	2021-05-08 20:03:51 +02:00
Carlos Mocholí	8c0ea92af2	`TrainerState` refactor [5/5] (#7173 ) * `TrainerState` refactor * flake8 * Update finished check * Test cleanup * Fix tests * Fixes * Reorder * flake8 * Update CHANGELOG * Better docs * Better docs * Remove default * Update tests * Bad merge	2021-05-04 12:50:56 +02:00
ananthsub	14c552bb92	[bugfix] Fix dataloading for iterable datasets and limit_train_batches (#7306 ) * bugfix-dataloading * rm-logs * Update CHANGELOG.md * Update test_dataloaders.py * Update test_dataloaders.py * Update training_loop.py * Update test_dataloaders.py * Update CHANGELOG.md * Update CHANGELOG.md * Update test_dataloaders.py * Update training_loop.py * Update training_loop.py * comments * address comments * more tests * Update progress.py * Update test_dataloaders.py * Update test_dataloaders.py * Update training_loop.py * Update training_loop.py * test ckpt fix? * update again	2021-05-03 19:50:26 +01:00
ananthsub	e407edba36	[fix] Attach train+val dataloaders to trainer in trainer loop (#7207 ) * Update training_loop.py * Update test_dataloaders.py * changelog * delay reload * go back * comments * Update training_loop.py * Update test_dataloaders.py * Update tests/trainer/test_dataloaders.py Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2021-04-30 09:01:31 -07:00
Adrian Wälchli	b9b3fa371f	fix case where an IterableDataset doesn't produce a batch for an epoch (#7294 ) * wip * fix * add test * refactor + test * rm * formatting * update changelog * doc * docstring * remove unused import * Update CHANGELOG.md Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com> Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>	2021-04-30 12:45:55 +00:00
Carlos Mocholí	40f80230fe	Remove `trainer.fit` return value [2/n] (#7237 ) * `_fit_impl` refactor and types * Fix return * Remove return docstring * Fixes * Fixes * Remove `trainer.fit` return value * Update CHANGELOG * flake8 * Undo results change * Fix test * Revert changes for a separate PR * flake8	2021-04-28 19:11:32 +01:00
thomas chaton	e147127c0e	[feat] Add better support for predict + ddp 2/3 (#7215 ) * wip * update * update * update * update * update * typo * update on comments * update * update * update * update * update changelog * update * Fix merge * Fix merge * move code * resolve test * add extra test * add an extra test * update on comments * add typing * resolve flake8 * Refactor and Docs * Fix tests * Fix tests * Fix tests * Duplicate * Fix tests * resolve bug * update * update on comments * update * update changelog * update * update * remove tpu * resolve flake8 * update on comments * update on comments * update on comment * resolve flake8 * add a cpu test for predict * add None test * update * Update CHANGELOG.md Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * resolve tests Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2021-04-27 08:46:45 -04:00
Adrian Wälchli	60c1c8fe83	Auto-set `DataLoader.worker_init_fn` with `seed_everything` (#6960 ) Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>	2021-04-19 16:28:37 +02:00
Adrian Wälchli	e9fca760ac	Set `DistributedSampler` seed if `seed_everything` was called (#7024 ) Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2021-04-19 14:50:31 +01:00
Adrian Wälchli	264aa689de	fix boolean check on iterable dataset when len not defined (#6828 ) * fix iterable dataset len check * update predict and validate * add validate to test * add changelog * add predict	2021-04-05 17:47:21 +01:00
thomas chaton	0995d30fab	Flash predict step (#6577 ) * add predict_step * Update predict_loop.py * Update trainer.py * Update trainer.py * resolve bugs * update * update * update * resolve bug * resolve some failing tests * udpate tests * update * resolve tests * add a test * remove typo * add a test for attachement * update * changed to on_train_dataloader * remove __flash_special_attr__ * resolve tests * update * update * update * update on comments * Update pytorch_lightning/trainer/data_loading.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>	2021-03-23 11:13:13 -04:00
Jirka Borovec	b341b53f70	deprecate metrics pkg (#6505 ) * deprecate metrics * examples * req * docs * Apply suggestions from code review Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: Nicki Skafte <skaftenicki@gmail.com> * pep8 Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: Nicki Skafte <skaftenicki@gmail.com>	2021-03-15 14:39:38 +00:00
Elia Cereda	f4cc7451a9	Add Trainer.validate(…) method to run one validation epoch (#4948 ) Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com> Co-authored-by: chaton <thomas@grid.ai> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2021-03-11 03:46:37 +01:00
Elia Cereda	d0596fac94	Refactor RunningStage usage in advance of implementing Trainer.validate() (#4945 ) * Update code Co-authored-by: EliaCereda * More property updates * Move properties. Introduce trainer._fitting * Use trainer.fitting * Fix reset dataloaders * Unused code * RunningStage.SANITY_CHECKING * Use setters * Fix bugs * Fix bugs * TrainerState.{FITTING,VALIDATING,TESTING,PREDICTING,TUNING} * Fix bugs * Fix bugs * Fix tests * Update CHANGELOG. Add deprecation warning. Fix tests * Unused imports * Optional trainer * More deprecation. More refactoring * Correct version * Use properties * Address comments * flake8 * Missed renamings * Typo * is -> == It is recommended to use for Enums since they are singletons, however, since the LightningEnum subclasses str, it's not a good idea in case a user sets the state/stage with a str * Also for tests * Typo * Address @tchaton's comments * PEP8 * Correct property * Update CHANGELOG * Apply suggestions from code review Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * Update pytorch_lightning/trainer/trainer.py Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * Remove called sanity check Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2021-03-06 12:40:19 +00:00
Jirka Borovec	0f9134e043	Refactor: skipif for Windows 2/n (#6268 ) * win * isort * flake8	2021-03-02 09:36:01 +00:00
Jirka Borovec	eb815000f6	Refactor: skipif for multi - gpus 1/n (#6266 ) * ngpus * gpu * isort * pt * flake8	2021-03-02 09:03:32 +01:00
Jirka Borovec	1c851b89e1	fixing miss-leading tested acc values (#5876 ) * fixing tested values * . * tests * yapf * softmax * hvd * rename * lr * duplicate * drop * classif * rm EvalModel * Revert "rm EvalModel" This reverts commit `6c3fb39ebe`. * update tests * fix * azure * azure * self * cpu * Apply suggestions from code review Co-authored-by: rohitgr7 <rohitgr1998@gmail.com>	2021-02-23 22:08:46 +00:00
Carlos Mocholí	0815e2a8c5	Remove torch<=1.4.0 checks (#5998 ) * Remove torch<=1.4.0 checks * Update pytorch_lightning/utilities/data.py Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com> Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>	2021-02-16 17:53:40 -05:00
Justus Schock	da6dbc8d1d	PoC: Accelerator refactor (#5743 ) * restoring the result from subprocess * fix queue.get() order for results * add missing "block_backward_sync" context manager * add missing "block_backward_sync" context manager * fix sync_batchnorm * fix supported gpu-ids for tuple * fix clip gradients and inf recursion * accelerator selection: added cluster_environment plugin * fix torchelastic test * fix reduce early stopping decision for DDP * fix tests: callbacks, conversion to lightning optimizer * fix lightning optimizer does not pickle * fix setting benchmark and deterministic option * fix slurm amp test * fix prepare_data test and determine node_rank * fix retrieving last path when testing * remove obsolete plugin argument * fix test: test_trainer_config * fix torchscript tests * fix trainer.model access * move properties * fix test_transfer_batch_hook * fix auto_select_gpus * fix omegaconf test * fix test that needs to simulate slurm ddp * add horovod plugin * fix test with named arguments * clean up whitespace * fix datamodules test * remove old accelerators * fix naming * move old plugins * move to plugins * create precision subpackage * create training_type subpackage * fix all new import errors * fix wrong arguments order passed to test * fix LR finder * Added sharded training type and amp plugin * Move clip grad to precision plugin * Added sharded spawn, select accelerators based on distributed_backend + enable custom fp16 plugin automatically * Fix import issue, attempting to fix tests * Fix initial test * Reflect hook logic from master, should wrap model after move to device * Optional state consolidation, since master has optimizers not wrapped * change attribute for instance test * reset optimizers optimizers are not used in main process, so state would be wrong. * legacy * imports in accel * legacy2 * trainer imports * fix import errors after rebase * move hook to new setup location * provide unwrapping logic * fix trainer callback system * added ddp2 implementation * fix imports .legacy * move plugins * restore legacy * drop test.py from root * add tpu accelerator and plugins * fixes * fix lightning optimizer merge * reset bugreportmodel * unwrapping * step routing forward * model access * unwrap * opt * integrate distrib_type * sync changes * sync * fixes * add forgotten generators * add missing logic * update * import * missed imports * import fixes * isort * mv f * changelog * format * move helper to parallel plugin * d * add world size * clean up * duplicate * activate ddp_sharded and tpu * set nvidia flags * remove unused colab var * use_tpu <-> on_tpu attrs * make some ddp_cpu and clusterplugin tests pass * Ref/accelerator connector (#5742) * final cleanup Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * connector cleanup Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * trainer cleanup Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * accelerator cleanup + missing logic in accelerator connector Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * add missing changes to callbacks Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * reflect accelerator changes to lightning module Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * clean cluster envs Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * cleanup plugins Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * add broadcasting Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * yapf * remove plugin connector Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * plugins * manual optimization * update optimizer routing * add rank to torchelastic * fix memory mixed precision * setstate on trainer for pickling in ddp spawn * add predict method * add back commented accelerator code * adapt test for sync_batch_norm to new plugin * fix deprecated tests * fix ddp cpu choice when no num_processes are given * yapf format * skip a memory test that cannot pass anymore * fix pickle error in spawn plugin * x * avoid * x * fix cyclic import in docs build * add support for sharded * update typing * add sharded and sharded_spawn to distributed types * make unwrap model default * refactor LightningShardedDataParallel similar to LightningDistributedDataParallel * update sharded spawn to reflect changes * update sharded to reflect changes * Merge 1.1.5 changes * fix merge * fix merge * yapf isort * fix merge * yapf isort * fix indentation in test * copy over reinit scheduler implementation from dev1.2 * fix apex tracking calls with dev_debugger * reduce diff to dev1.2, clean up * fix trainer config test when gpus>0 and num_processes >0 and ddp_cpu * sort plugin tests legacy/new * fix error handling for amp on cpu * fix merge fix merge fix merge * [Feat] Resolve manual_backward (#5837) * resolve manual_backward * resolve flake8 * update * resolve for ddp_spawn * resolve flake8 * resolve flake8 * resolve flake8 Co-authored-by: Ubuntu <ubuntu@ip-172-31-88-60.ec2.internal> * fix tests/accelerator tests on cpu * [BugFix] Resolve manual optimization (#5852) * resolve manual_optimization * update * update Co-authored-by: Ubuntu <ubuntu@ip-172-31-88-60.ec2.internal> * Remove copy trainer parameters to happen earlier within the loop and add safe guard to get ref model (#5856) * resovle a bug * Accelerator refactor sharded rpc (#5854) * rpc branch * merge * update handling of rpc * make devices etc. Optional in RPC * set devices etc. later if necessary * remove devices from sequential * make devices optional in rpc * fix import * uncomment everything * fix cluster selection Co-authored-by: Ubuntu <ubuntu@ip-172-31-88-60.ec2.internal> * resolve bug * fix assert in rpc test * resolve a test * fix docs compilation * accelerator refactor - fix for sharded parity test (#5866) * fix memory issue with ddp_spawn * x x x x x x x x x * x * Remove DDP2 as this does not apply * Add missing pre optimizer hook to ensure lambda closure is called * fix apex docstring * [accelerator][BugFix] Resolve some test for 1 gpu (#5863) * update * revert init * resolve a bug * update * resolve flake8 * update * update * update * revert init * resolve a bug * update * resolve flake8 * update * update * update * update * update * revert init * resolve a bug * update * resolve flake8 * update * update * update * revert init * update * resolve flake8 * update * update * update * update * update * all_gather * update * make plugins work, add misconfig for RPC * update * update * remove breaking test * resolve some tests * resolve flake8 * revert to ddp_spawn Co-authored-by: root <root@ip-172-31-88-60.ec2.internal> Co-authored-by: Ubuntu <ubuntu@ip-172-31-88-60.ec2.internal> Co-authored-by: Justus Schock <justus.schock@rwth-aachen.de> * yapf isort * resolve flake8 * fix apex doctests * fix apex doctests 2 * resolve docs * update drone * clean env * update * update * update * update * merge * Fix RPC related tests, clean out old API, update for new accelerator API [skip ci] (#5881) * Fix RPC related tests, clean out old API, update for new accelerator API * Move tests out of legacy folder, update paths and names * Update test_remove_1-4.py * Expose properties for tpu cores/gpus/num_gpus * Add root GPU property * Move properties to properties.py * move tests that were previously in drone * Fix root GPU property (#5908) * Move root GPU to property, remove horovod set as this is handled in horovod plugin, ensure we mock correctly to set GPU accelerator * Add missing tests back * fix best model path transfer when no checkpoint callback available * Fix setup hook order [wip] (#5858) * Call trainer setup hook before accelerator setup * Add test case * add new test * typo * fix callback order in test Co-authored-by: tchaton <thomas@grid.ai> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * rename ddp sequential -> rpc sequential for special test * revert * fix stupid merge problem * Use property in connector for sampler (#5913) * merge the import conflicts * fix spawning of processes in slurm * [wip] Fix some bugs for TPU [skip ci] (#5878) * fixed for single tpu * fixed spawn * fixed spawn * update * update * wip * resolve bugs * resolve bug * update on comment * removed decorator * resolve comments * set to 4 * update * update * need cleaning * update * update * update * resolve flake8 * resolve bugs * exclude broadcast * resolve bugs * change test * update * update * skip if meet fails * properly raise trace * update * add catch * wrap test * resolve typo * update * typo Co-authored-by: Lezwon Castelino <lezwon@gmail.com> Co-authored-by: Your Name <you@example.com> * resolve some tests * update * fix imports * update * resolve flake8 * update azure pipeline * skip a sharded test on cpu that requires a gpu * resolve tpus * resolve bug * resolve flake8 * update * updat utils * revert permission change on files * suggestions from carlos Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * remove unrelated formatting changes * remove incomplete comment * Update pytorch_lightning/accelerators/__init__.py Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * remove unrelated formatting change * add types * warn 1.7 ddp manual backward only if ddp kwarg unset * yapf + isort * pep8 unused imports * fix cyclic import in docs * Apply suggestions from code review * typer in accelerator.py * typo * Apply suggestions from code review * formatting * update on comments * update typo * Update pytorch_lightning/trainer/properties.py Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * update * suggestion from code review * suggestion from code review Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: SeanNaren <sean@grid.ai> Co-authored-by: Jirka Borovec <jirka.borovec@seznam.cz> Co-authored-by: chaton <thomas@grid.ai> Co-authored-by: Ubuntu <ubuntu@ip-172-31-88-60.ec2.internal> Co-authored-by: Sean Naren <sean.narenthiran@gmail.com> Co-authored-by: root <root@ip-172-31-88-60.ec2.internal> Co-authored-by: Lezwon Castelino <lezwon@gmail.com> Co-authored-by: Your Name <you@example.com> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>	2021-02-12 15:48:56 -05:00
Jirka Borovec	bd920b4102	Refactor simplify tests (#5861 ) * add new * restructure * yapf * move * fix	2021-02-08 11:52:02 +01:00

1 2 3

118 Commits