lightning

Commit Graph

Author	SHA1	Message	Date
Jirka Borovec	1d9c553b86	prune deprecated Trainer arg `enable_pl_optimizer` (#6163 ) * prune enable_pl_optimizer * prune automatic_optimization	2021-02-24 10:01:24 +00:00
Adrian Wälchli	0456b4598f	mini refactor for _running_stage access (#5724 ) * running stage * circular import * running stage cleanup * fix unused import * fix running stage access * add return type * Revert "add return type" This reverts commit `65b0fe269c`. * try fix typing	2021-02-22 12:01:54 +01:00
Adrian Wälchli	6cc1a06078	rename accelerator_backend -> accelerator (#6034 ) * rename accelerator backend * rename new additions from master * add proper deprecation * pep8 * warning match * add missing warning type	2021-02-18 15:54:12 +00:00
Adrian Wälchli	02ac4b0b6a	Replace .get_model() with explicit .lightning_module (#6035 ) * rename get_model -> lightning_module * update references to get_model * pep8 * add proper deprecation * remove outdated _get_reference_model * fix cyclic import	2021-02-18 15:59:54 +01:00
Jirka Borovec	bac617ff93	drop deprecated result object 1/n (#5005 ) * ro1 * ro2	2021-02-17 18:58:28 -05:00
chaton	a121fd3c99	[Bugfix] Apply untoggle_optimizer when result is None (#5983 ) * update changelog * apply untoggle_optimizer when result is None * update tests * still return loss sometimes * Update CHANGELOG.md Co-authored-by: deng-cy <dcy1996@gmail.com> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>	2021-02-17 19:55:09 +00:00
chaton	6bc4490d01	[HotFix] Resolve TPU Training (#6027 ) * fix tpus * update * add back reduction in val_loss * resolve some bugs with TPUs * update changelog * update on comments * forgot status * Fix train_bn arg * resolve comments * update on comments Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>	2021-02-17 16:40:13 +00:00
chaton	e982800b81	Add PredictLoop (#5752 ) * integrate distrib_type * sync changes * sync * fixes * add forgotten generators * add missing logic * update * import * missed imports * import fixes * isort * mv f * changelog * format * move helper to parallel plugin * d * add world size * clean up * duplicate * activate ddp_sharded and tpu * set nvidia flags * remove unused colab var * use_tpu <-> on_tpu attrs * make some ddp_cpu and clusterplugin tests pass * Ref/accelerator connector (#5742) * final cleanup Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * connector cleanup Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * trainer cleanup Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * accelerator cleanup + missing logic in accelerator connector Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * add missing changes to callbacks Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * reflect accelerator changes to lightning module Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * clean cluster envs Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * cleanup plugins Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * add broadcasting Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * yapf * remove plugin connector Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * plugins * add predict_loop * manual optimization * clean predictloop * update optimizer routing * add predict loop on new accelerator * resolve a bug * add rank to torchelastic * add predict_loop * add predict loop on new accelerator * resolve a bug * fix memory mixed precision * update * setstate on trainer for pickling in ddp spawn * add predict_loop * clean predictloop * add predict loop on new accelerator * resolve a bug * add predict_loop * add predict loop on new accelerator * resolve a bug * add predict_loop * add predict loop on new accelerator * resolve a bug * add predict_loop * add predict loop on new accelerator * resolve a bug * add predict_loop * clean predictloop * add predict loop on new accelerator * resolve a bug * add predict_loop * add predict loop on new accelerator * resolve a bug * resolve tests * add predict method * add back commented accelerator code * adapt test for sync_batch_norm to new plugin * fix deprecated tests * fix ddp cpu choice when no num_processes are given * yapf format * skip a memory test that cannot pass anymore * remove sanetize * rename train to run_train * remove useless hooks * add misconfigurationException * remove wrong naming * resolve some legacy * udpate docstring * fix pickle error in spawn plugin * x * avoid * x * fix cyclic import in docs build * add support for sharded * update typing * add sharded and sharded_spawn to distributed types * make unwrap model default * refactor LightningShardedDataParallel similar to LightningDistributedDataParallel * update sharded spawn to reflect changes * update sharded to reflect changes * Merge 1.1.5 changes * fix merge * fix merge * yapf isort * fix merge * yapf isort * fix indentation in test * copy over reinit scheduler implementation from dev1.2 * fix apex tracking calls with dev_debugger * reduce diff to dev1.2, clean up * fix trainer config test when gpus>0 and num_processes >0 and ddp_cpu * sort plugin tests legacy/new * fix error handling for amp on cpu * fix merge fix merge fix merge * [Feat] Resolve manual_backward (#5837) * resolve manual_backward * resolve flake8 * update * resolve for ddp_spawn * resolve flake8 * resolve flake8 * resolve flake8 Co-authored-by: Ubuntu <ubuntu@ip-172-31-88-60.ec2.internal> * fix tests/accelerator tests on cpu * [BugFix] Resolve manual optimization (#5852) * resolve manual_optimization * update * update Co-authored-by: Ubuntu <ubuntu@ip-172-31-88-60.ec2.internal> * Remove copy trainer parameters to happen earlier within the loop and add safe guard to get ref model (#5856) * resovle a bug * Accelerator refactor sharded rpc (#5854) * rpc branch * merge * update handling of rpc * make devices etc. Optional in RPC * set devices etc. later if necessary * remove devices from sequential * make devices optional in rpc * fix import * uncomment everything * fix cluster selection Co-authored-by: Ubuntu <ubuntu@ip-172-31-88-60.ec2.internal> * resolve bug * fix assert in rpc test * resolve a test * fix docs compilation * accelerator refactor - fix for sharded parity test (#5866) * fix memory issue with ddp_spawn * x x x x x x x x x * x * Remove DDP2 as this does not apply * Add missing pre optimizer hook to ensure lambda closure is called * fix apex docstring * [accelerator][BugFix] Resolve some test for 1 gpu (#5863) * update * revert init * resolve a bug * update * resolve flake8 * update * update * update * revert init * resolve a bug * update * resolve flake8 * update * update * update * update * update * revert init * resolve a bug * update * resolve flake8 * update * update * update * revert init * update * resolve flake8 * update * update * update * update * update * all_gather * update * make plugins work, add misconfig for RPC * update * update * remove breaking test * resolve some tests * resolve flake8 * revert to ddp_spawn Co-authored-by: root <root@ip-172-31-88-60.ec2.internal> Co-authored-by: Ubuntu <ubuntu@ip-172-31-88-60.ec2.internal> Co-authored-by: Justus Schock <justus.schock@rwth-aachen.de> * yapf isort * resolve flake8 * fix apex doctests * fix apex doctests 2 * resolve docs * update drone * clean env * update * update * update * update * merge * Fix RPC related tests, clean out old API, update for new accelerator API [skip ci] (#5881) * Fix RPC related tests, clean out old API, update for new accelerator API * Move tests out of legacy folder, update paths and names * Update test_remove_1-4.py * Expose properties for tpu cores/gpus/num_gpus * Add root GPU property * Move properties to properties.py * move tests that were previously in drone * Fix root GPU property (#5908) * Move root GPU to property, remove horovod set as this is handled in horovod plugin, ensure we mock correctly to set GPU accelerator * Add missing tests back * fix best model path transfer when no checkpoint callback available * Fix setup hook order [wip] (#5858) * Call trainer setup hook before accelerator setup * Add test case * add new test * typo * fix callback order in test Co-authored-by: tchaton <thomas@grid.ai> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * rename ddp sequential -> rpc sequential for special test * revert * fix stupid merge problem * Use property in connector for sampler (#5913) * merge the import conflicts * fix spawning of processes in slurm * [wip] Fix some bugs for TPU [skip ci] (#5878) * fixed for single tpu * fixed spawn * fixed spawn * update * update * wip * resolve bugs * resolve bug * update on comment * removed decorator * resolve comments * set to 4 * update * update * need cleaning * update * update * update * resolve flake8 * resolve bugs * exclude broadcast * resolve bugs * change test * update * update * skip if meet fails * properly raise trace * update * add catch * wrap test * resolve typo * update * typo Co-authored-by: Lezwon Castelino <lezwon@gmail.com> Co-authored-by: Your Name <you@example.com> * resolve some tests * update * fix imports * update * resolve flake8 * update azure pipeline * skip a sharded test on cpu that requires a gpu * resolve tpus * resolve bug * resolve flake8 * update * updat utils * revert permission change on files * suggestions from carlos Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * remove unrelated formatting changes * remove incomplete comment * Update pytorch_lightning/accelerators/__init__.py Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * remove unrelated formatting change * add types * warn 1.7 ddp manual backward only if ddp kwarg unset * yapf + isort * pep8 unused imports * fix cyclic import in docs * Apply suggestions from code review * typer in accelerator.py * typo * resolve flake8 * update code * update * Update pytorch_lightning/trainer/predict_loop.py Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * Update pytorch_lightning/trainer/predict_loop.py Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * fix merge * fix merge * reset legacy accelerator * add missing rename dispatch * rename post traning * update code * resolved comments * typo * typo * add flow description * resolve comments * update on comments * update flow * add backticks * resolve tpu Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com> Co-authored-by: justusschock <justus.schock@posteo.de> Co-authored-by: Justus Schock <justus.schock@rwth-aachen.de> Co-authored-by: Ubuntu <ubuntu@ip-172-31-88-60.ec2.internal> Co-authored-by: Sean Naren <sean.narenthiran@gmail.com> Co-authored-by: SeanNaren <sean@grid.ai> Co-authored-by: root <root@ip-172-31-88-60.ec2.internal> Co-authored-by: Lezwon Castelino <lezwon@gmail.com> Co-authored-by: Your Name <you@example.com> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>	2021-02-16 17:11:56 -05:00
chaton	a52be5bb07	[Hot Fix] Ensure process_dataloader is called when tpu_cores > 1 to use Parallel DataLoader (#6015 ) * hotfix for tpu * update changelog * Update CHANGELOG.md Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>	2021-02-16 17:02:25 -05:00
Adrian Wälchli	aa60c08641	move device-specific teardown logic from training loop to accelerator (#5973 ) * on train end * switch order	2021-02-15 17:38:03 -05:00
Kaushik B	42dc5d2af1	Fix: Repeated .fit() calls ignore max_steps iteration bound (#5936 ) * fix repeated fit calls ignoring max_steps * fix fast dev progress bar	2021-02-13 07:36:22 +00:00
Justus Schock	da6dbc8d1d	PoC: Accelerator refactor (#5743 ) * restoring the result from subprocess * fix queue.get() order for results * add missing "block_backward_sync" context manager * add missing "block_backward_sync" context manager * fix sync_batchnorm * fix supported gpu-ids for tuple * fix clip gradients and inf recursion * accelerator selection: added cluster_environment plugin * fix torchelastic test * fix reduce early stopping decision for DDP * fix tests: callbacks, conversion to lightning optimizer * fix lightning optimizer does not pickle * fix setting benchmark and deterministic option * fix slurm amp test * fix prepare_data test and determine node_rank * fix retrieving last path when testing * remove obsolete plugin argument * fix test: test_trainer_config * fix torchscript tests * fix trainer.model access * move properties * fix test_transfer_batch_hook * fix auto_select_gpus * fix omegaconf test * fix test that needs to simulate slurm ddp * add horovod plugin * fix test with named arguments * clean up whitespace * fix datamodules test * remove old accelerators * fix naming * move old plugins * move to plugins * create precision subpackage * create training_type subpackage * fix all new import errors * fix wrong arguments order passed to test * fix LR finder * Added sharded training type and amp plugin * Move clip grad to precision plugin * Added sharded spawn, select accelerators based on distributed_backend + enable custom fp16 plugin automatically * Fix import issue, attempting to fix tests * Fix initial test * Reflect hook logic from master, should wrap model after move to device * Optional state consolidation, since master has optimizers not wrapped * change attribute for instance test * reset optimizers optimizers are not used in main process, so state would be wrong. * legacy * imports in accel * legacy2 * trainer imports * fix import errors after rebase * move hook to new setup location * provide unwrapping logic * fix trainer callback system * added ddp2 implementation * fix imports .legacy * move plugins * restore legacy * drop test.py from root * add tpu accelerator and plugins * fixes * fix lightning optimizer merge * reset bugreportmodel * unwrapping * step routing forward * model access * unwrap * opt * integrate distrib_type * sync changes * sync * fixes * add forgotten generators * add missing logic * update * import * missed imports * import fixes * isort * mv f * changelog * format * move helper to parallel plugin * d * add world size * clean up * duplicate * activate ddp_sharded and tpu * set nvidia flags * remove unused colab var * use_tpu <-> on_tpu attrs * make some ddp_cpu and clusterplugin tests pass * Ref/accelerator connector (#5742) * final cleanup Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * connector cleanup Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * trainer cleanup Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * accelerator cleanup + missing logic in accelerator connector Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * add missing changes to callbacks Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * reflect accelerator changes to lightning module Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * clean cluster envs Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * cleanup plugins Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * add broadcasting Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * yapf * remove plugin connector Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * plugins * manual optimization * update optimizer routing * add rank to torchelastic * fix memory mixed precision * setstate on trainer for pickling in ddp spawn * add predict method * add back commented accelerator code * adapt test for sync_batch_norm to new plugin * fix deprecated tests * fix ddp cpu choice when no num_processes are given * yapf format * skip a memory test that cannot pass anymore * fix pickle error in spawn plugin * x * avoid * x * fix cyclic import in docs build * add support for sharded * update typing * add sharded and sharded_spawn to distributed types * make unwrap model default * refactor LightningShardedDataParallel similar to LightningDistributedDataParallel * update sharded spawn to reflect changes * update sharded to reflect changes * Merge 1.1.5 changes * fix merge * fix merge * yapf isort * fix merge * yapf isort * fix indentation in test * copy over reinit scheduler implementation from dev1.2 * fix apex tracking calls with dev_debugger * reduce diff to dev1.2, clean up * fix trainer config test when gpus>0 and num_processes >0 and ddp_cpu * sort plugin tests legacy/new * fix error handling for amp on cpu * fix merge fix merge fix merge * [Feat] Resolve manual_backward (#5837) * resolve manual_backward * resolve flake8 * update * resolve for ddp_spawn * resolve flake8 * resolve flake8 * resolve flake8 Co-authored-by: Ubuntu <ubuntu@ip-172-31-88-60.ec2.internal> * fix tests/accelerator tests on cpu * [BugFix] Resolve manual optimization (#5852) * resolve manual_optimization * update * update Co-authored-by: Ubuntu <ubuntu@ip-172-31-88-60.ec2.internal> * Remove copy trainer parameters to happen earlier within the loop and add safe guard to get ref model (#5856) * resovle a bug * Accelerator refactor sharded rpc (#5854) * rpc branch * merge * update handling of rpc * make devices etc. Optional in RPC * set devices etc. later if necessary * remove devices from sequential * make devices optional in rpc * fix import * uncomment everything * fix cluster selection Co-authored-by: Ubuntu <ubuntu@ip-172-31-88-60.ec2.internal> * resolve bug * fix assert in rpc test * resolve a test * fix docs compilation * accelerator refactor - fix for sharded parity test (#5866) * fix memory issue with ddp_spawn * x x x x x x x x x * x * Remove DDP2 as this does not apply * Add missing pre optimizer hook to ensure lambda closure is called * fix apex docstring * [accelerator][BugFix] Resolve some test for 1 gpu (#5863) * update * revert init * resolve a bug * update * resolve flake8 * update * update * update * revert init * resolve a bug * update * resolve flake8 * update * update * update * update * update * revert init * resolve a bug * update * resolve flake8 * update * update * update * revert init * update * resolve flake8 * update * update * update * update * update * all_gather * update * make plugins work, add misconfig for RPC * update * update * remove breaking test * resolve some tests * resolve flake8 * revert to ddp_spawn Co-authored-by: root <root@ip-172-31-88-60.ec2.internal> Co-authored-by: Ubuntu <ubuntu@ip-172-31-88-60.ec2.internal> Co-authored-by: Justus Schock <justus.schock@rwth-aachen.de> * yapf isort * resolve flake8 * fix apex doctests * fix apex doctests 2 * resolve docs * update drone * clean env * update * update * update * update * merge * Fix RPC related tests, clean out old API, update for new accelerator API [skip ci] (#5881) * Fix RPC related tests, clean out old API, update for new accelerator API * Move tests out of legacy folder, update paths and names * Update test_remove_1-4.py * Expose properties for tpu cores/gpus/num_gpus * Add root GPU property * Move properties to properties.py * move tests that were previously in drone * Fix root GPU property (#5908) * Move root GPU to property, remove horovod set as this is handled in horovod plugin, ensure we mock correctly to set GPU accelerator * Add missing tests back * fix best model path transfer when no checkpoint callback available * Fix setup hook order [wip] (#5858) * Call trainer setup hook before accelerator setup * Add test case * add new test * typo * fix callback order in test Co-authored-by: tchaton <thomas@grid.ai> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * rename ddp sequential -> rpc sequential for special test * revert * fix stupid merge problem * Use property in connector for sampler (#5913) * merge the import conflicts * fix spawning of processes in slurm * [wip] Fix some bugs for TPU [skip ci] (#5878) * fixed for single tpu * fixed spawn * fixed spawn * update * update * wip * resolve bugs * resolve bug * update on comment * removed decorator * resolve comments * set to 4 * update * update * need cleaning * update * update * update * resolve flake8 * resolve bugs * exclude broadcast * resolve bugs * change test * update * update * skip if meet fails * properly raise trace * update * add catch * wrap test * resolve typo * update * typo Co-authored-by: Lezwon Castelino <lezwon@gmail.com> Co-authored-by: Your Name <you@example.com> * resolve some tests * update * fix imports * update * resolve flake8 * update azure pipeline * skip a sharded test on cpu that requires a gpu * resolve tpus * resolve bug * resolve flake8 * update * updat utils * revert permission change on files * suggestions from carlos Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * remove unrelated formatting changes * remove incomplete comment * Update pytorch_lightning/accelerators/__init__.py Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * remove unrelated formatting change * add types * warn 1.7 ddp manual backward only if ddp kwarg unset * yapf + isort * pep8 unused imports * fix cyclic import in docs * Apply suggestions from code review * typer in accelerator.py * typo * Apply suggestions from code review * formatting * update on comments * update typo * Update pytorch_lightning/trainer/properties.py Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * update * suggestion from code review * suggestion from code review Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: SeanNaren <sean@grid.ai> Co-authored-by: Jirka Borovec <jirka.borovec@seznam.cz> Co-authored-by: chaton <thomas@grid.ai> Co-authored-by: Ubuntu <ubuntu@ip-172-31-88-60.ec2.internal> Co-authored-by: Sean Naren <sean.narenthiran@gmail.com> Co-authored-by: root <root@ip-172-31-88-60.ec2.internal> Co-authored-by: Lezwon Castelino <lezwon@gmail.com> Co-authored-by: Your Name <you@example.com> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>	2021-02-12 15:48:56 -05:00
chaton	7b00894130	[feat] Add StochasticWeightAveragingCallback (#5640 ) * add swa callback * switch back to 1.6.0 * remove optimizer_step * move super * update * forgot update_parameters * update on comments * works for ddp * resolve flake8 * remove set_model * resolve flake8 * resolve cpu * resolve flake8 * resolve flake8 * update * update on comments	2021-02-11 00:05:59 +00:00
ananthsub	d26702bd66	Enable purely iteration-based training (#5726 ) Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com> Co-authored-by: Kaushik Bokka <kaushikbokka@gmail.com>	2021-02-10 08:51:08 +00:00
rohitgr7	bcb6ee5d51	sync	2021-02-08 20:22:39 +01:00
Rohit Gupta	cb67e1d0b2	Separate epoch validation from step validation (#5208 ) * Seperate epoch validaton from step validation * update system * test * baked logic in callbacks * unbake logic in callbacks * fix the call for scheduler * use property * pep * correct rebase * gitignore * ref * add tests * fix * add early stopping test * trigger * chlog * rev * 1.3 * log * Apply suggestions from code review Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * Update pytorch_lightning/trainer/training_loop.py * Update CHANGELOG.md * Apply suggestions from code review Co-authored-by: chaton <thomas@grid.ai> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> (cherry picked from commit `e429f97b67`)	2021-02-08 20:22:39 +01:00
Kaushik B	5dfd62c09e	Disable training with zero num_training_batches when insufficient limit_train_batches (#5703 ) * disable training when zero num_train_batches with limit_train_batches * refactor train skip condition * fix formatting issues * fix formatting issues * ref: test error msg * fix tests for data loader calls * fix train dataloader condition * update limit_train_batches upper range in test comment * remove model state check test Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>	2021-02-05 21:40:42 +01:00
Alex Parinov	ad9b188b78	Fix typo in LightningOptimizer (#5736 )	2021-02-05 21:40:40 +01:00
Jirka Borovec	e633787a3d	flake8 + yapf	2021-02-04 20:55:58 +01:00
Swetha Mandava	c62f68c7cd	passing batch outputs to on_train_batch_end (#4369 ) * passing batch outputs to on_train_batch_end * styling * updating epoch end logic * also condition on on_train_epoch_end hooks * more readable * pep8 * pep8 * readability suggestion accepted Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * adding test_training_epoch_end_metrics_collection_on_override test * fix formatting * fix formatting Co-authored-by: Swetha Mandava <smandava@nvidia.com> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Sean Naren <sean.narenthiran@gmail.com> Co-authored-by: Roger Shieh <sh.rog@protonmail.ch> (cherry picked from commit `5fcca4e43b`)	2021-02-04 20:55:41 +01:00
chaton	cc38d4c342	[bugfix] Resolve bug with multiple optimizers and toggle. (#5574 ) * fix toggle_optimizer * update doc * resolve bug * update * Update pytorch_lightning/core/lightning.py Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> * update on comments * update on comments * update Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> (cherry picked from commit `c76cc23b3d`)	2021-02-04 20:55:41 +01:00
Lexie Troiano	d05cdf83f1	Merge remote-tracking branch 'carmocca/sync-1.1.5' into release/1.2-dev	2021-02-04 09:42:59 -05:00
Kaushik B	26cc3b5357	Change the seq of on_train_batch_end, on_batch_end & on_train_epoch_end, on_epoch_end hooks (#5688 )	2021-02-04 18:30:20 +05:30
rohitgr7	a37416843b	Fix sync resolve wrong merge tpu yapf	2021-02-03 20:11:35 +01:00
Rohit Gupta	10c7dbe6a1	Refactor setup_training and remove test_mode (#5388 ) * ref and fix call for on_pretrained_routine * avoid failing tests * unnecessary_call * unnecessary call in accelerators * tmpdir * rm test_mode * pep * updates * more ref * Revert "more ref" This reverts commit `5d9e95f873`. * more refac Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2021-02-03 19:41:42 +01:00
Jirka Borovec	aba212341a	formatting 4/n: Trainer (#5720 ) * yapf trainer * Apply suggestions from code review Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> * . * fix Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>	2021-02-03 09:25:42 +00:00
chaton	3da28fd634	[feat] 1/2 Add trainer.predict (#5579 ) * start adding predict * add predict * resolve test * add predict * remove limit_predict * update * add test for predict * typo * update on comments * remove predict_step * update ddp_shareded * check ddp_sharded * resolve on comments * resolve isort * update dp * add test dp 1 gpu * made default forward * resolve path * resolve bug * update on comments * resolve doc * resolve bug * update * resolve bug * update on comments * resolve pep8 * update test doc * update on comments * solve special tests * resolve bug * resolve flake8 * Update pytorch_lightning/callbacks/progress.py Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * Update pytorch_lightning/trainer/trainer.py Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * add predict to LightningModule * missing predict * typo * rename is_prediction to _predicting * add * update * update * update doc Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>	2021-01-27 11:38:14 -05:00
Jirka Borovec	53b0ae49b9	fix imports / isort / flake8	2021-01-26 14:57:34 +01:00
SeanNaren	df3c170b2c	Fix imports & issues in lightning optimizer refactor merge	2021-01-26 14:29:47 +01:00
chaton	0435e23a64	deprecate enable_pl_optimizer as it is not restored properly (#5244 ) * update * clean test * still in progress * udpdate test * update * update * resolve flake * add test for zero_grad * update * works without accumulated_grad * update * update * resolve amp * revert back to True * update * clean tests * cleaned out * typo * update test * git repare bug * remove print * udpate * Fix formatting/optimizer imports * Refactor the test for cleanliness * Add vanilla model to the test, better var names * Fixed var names, let's clean up these mock tests * repare test * update test * resolve flake8 * add manual_optimization * update tests * resolve flake8 * add random accumulate_grad_batches * improve test * Update tests/trainer/optimization/test_parity_automatic_optimization.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * Update tests/trainer/optimization/test_parity_automatic_optimization.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * update * clean tests * correct bug * Apply suggestions from code review * format * adress comments * update on comments * wip * typo * depreceate enable_pl_optimizer * resolve latest bugs * update * resolve merge * add comment * Update pytorch_lightning/core/lightning.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * Update tests/deprecated_api/test_remove_1-3.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * Update pytorch_lightning/trainer/connectors/optimizer_connector.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * Update pytorch_lightning/trainer/trainer.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * Update pytorch_lightning/trainer/trainer.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * Update tests/trainer/optimization/test_parity_automatic_optimization.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * update on comments * update restore * add a property * remove setstate as not needed anymore * update test * provide optimizer to on_before_zero_grad * update on comments * update on comments * Update pytorch_lightning/trainer/trainer.py Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * Update tests/trainer/optimization/test_parity_automatic_optimization.py Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * Update tests/trainer/optimization/test_parity_automatic_optimization.py Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * Update tests/trainer/optimization/test_parity_automatic_optimization.py Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * mofidy import * update changelog * resolve flake8 * update * update * clean doc Co-authored-by: SeanNaren <sean@grid.ai> Co-authored-by: Ubuntu <ubuntu@ip-172-31-62-109.ec2.internal> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Jirka Borovec <jirka.borovec@seznam.cz> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: Sean Naren <sean.narenthiran@gmail.com> (cherry picked from commit `f2e99d617f`)	2021-01-26 14:29:46 +01:00
chaton	5f3372871a	[feat] Add PyTorch Profiler. (#5560 ) * add profiler * add profiler * update * resolve flake8 * update doc * update changelog * clean doc * delete prof file * merge pr codebase * update * update doc * update doc * update doc * update on comments * update docstring * update docstring * try * update test * Update pytorch_lightning/profiler/__init__.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * Update pytorch_lightning/profiler/__init__.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * update on comments * remove old code * add support for ddp * resolve flake8 * Update pytorch_lightning/profiler/__init__.py Co-authored-by: Sean Naren <sean.narenthiran@gmail.com> * resolve tests * resolve flake8 Co-authored-by: Jirka Borovec <jirka.borovec@seznam.cz> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>	2021-01-26 06:48:54 -05:00
Jirka Borovec	c3587d39da	prune deprecated EvalResult (#5633 ) * prune EvalResult * drop tests * drop usage * drop class * prune	2021-01-26 03:09:39 -05:00
Jirka Borovec	2846322f60	fix docs render (#5610 )	2021-01-25 20:21:00 -05:00
Arnaud Gelas	a9d9f33a86	Fix isort failures in trainer (#5529 ) Remove from skipped module in pyproject.toml and fix failures on: - pytorch_lightning/trainer/*.py	2021-01-18 13:42:50 -05:00
Adrian Wälchli	e806bb77fa	Refactor LightningDistributedDataParallel (#5185 ) * add wrapper * add squeeze * replace LightningDistributedDP * update import * module access * inputs * refactor warning * update * resolve flake8 * remove old class * set find unused params to False * update docstrings * update docs * update docs * add changelog * deprecation * rename wrapper -> module * rename pl_module * add unit tests * Revert "add changelog" This reverts commit 02ec0a6864f4ba2ace3bb6fc6ebc364e1a80ffd7. * Revert "set find unused params to False" This reverts commit 8e451515e6ba3227d00f4a5cb63f332cfedb7b30. Co-authored-by: Ubuntu <thomas@grid.ai>	2021-01-13 14:35:42 -05:00
Jirka Borovec	54d20dc596	Refactor: clean trainer device & distrib getters (#5300 ) * warnings * . * . * flake8 * . * . * . * use_tpu * use_dp * . * use_ddp * . * use_horovod * . * . * .	2021-01-12 05:22:37 -05:00
chaton	56437e98a6	[bug-fix] Trainer.test points to latest best_model_path (#5161 ) * resolve bug * update code * add set -e * Update pytorch_lightning/callbacks/model_checkpoint.py Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * update test * Update tests/checkpointing/test_trainer_checkpoint.py Co-authored-by: Sean Naren <sean.narenthiran@gmail.com> * Update tests/checkpointing/test_trainer_checkpoint.py Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * update on comments * resolve test * convert to set * update * add error triggering * update * update on comments * update * resolve import * update * update * Update pytorch_lightning/plugins/rpc_plugin.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * update Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: Sean Naren <sean.narenthiran@gmail.com> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: Ubuntu <ubuntu@ip-172-31-62-109.ec2.internal> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> (cherry picked from commit `d5b367871f`)	2021-01-06 15:14:10 +01:00
Rohit Gupta	704e00ee7f	Fix invalid value for weights_summary (#5296 ) * Fix weights_summary * use mode * fix * optional * what was I thinking (cherry picked from commit `062800aa99`)	2021-01-06 12:59:32 +01:00
Rohit Gupta	9cfbf8d609	Disable checkpointing, earlystopping and logging with fast_dev_run (#5277 ) * Disable checkpointing, earlystopping and logger with fast_dev_run * docs * chlog * disable callbacks and enable DummyLogger * add log * use dummy logger method * Apply suggestions from code review Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: chaton <thomas@grid.ai> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> (cherry picked from commit `f740245521`)	2021-01-06 12:57:24 +01:00
Justus Schock	d88cf4a652	Add Support for multiple train loaders (#1959 ) * add support for wrong dtype in apply_func * apply loader resetting to possible collection of loaders * add combined loader iter class * integrate combined loader iter to training loop * fix imports * fix imports * finish supporters * add tests for supporters * add test for model with multiple loaders * fix trainer integration * fix instance check * Train loaders (#4032) * patch for issues discussed in #1959, encapsulating underlying datastructures returned from train_dataloader * update data_loading.py to it uses patch discussed in #1959 * rename class * Separate CombinedLoaderIterator into two classes, and update related tests. (#4606) * Fix the bugs after rebasing. * Add custom get_len for apply_to_collection * Refactor MultiIterator to be as CombinedLoaderIterator * To get the right num_training_batches. Call the wrapper for multi trainloader in data_loading.py, instead of training_loop.py * Reload _loader_iters when calling __iter__ * Don't transform DataLoader to CombinedLoaderIterator when it's along * Updates test_fit_multiple_train_loaders for testing num_training_batches * Seperate CombinedLoaderIterator into CombinedLoaderIterator and CombinedDataLoader. Add CombinedDataset for unified DataLoader format. * Initialize CombinedDataLoader before calculating num_training_batches. Also updating self._worker_check for multiple loaders * Update tests for supporters * Update tests for multiple trainloaders. Add tests about few_workers for multiple loaders. * Fix pep8 issues * Add tests for train_loader_patch.py * Add descriptions to multiple_trainloader_mode * Remove unused variables * Add docstrings and typing * Add more tests for better converage * Remove unused commented codes * Add sampler property * Remove extract_dataset * Update typing * pep8 * Update train_loader_patch.py * Apply suggestions from code review Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * Update pytorch_lightning/trainer/supporters.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * reviewer comments * fix stupid import * add docs * add back line separator * fix line sep * pep8 * Apply suggestions from code review * fix * fix * Apply suggestions from code review Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * Apply suggestions from code review Co-authored-by: Nicki Skafte <skaftenicki@gmail.com> * flake8 Co-authored-by: Justus Schock <justusschock@justuss-mbp.fritz.box> Co-authored-by: Christofer Fransson <christofer_fransson@yahoo.com> Co-authored-by: YI-LIN SUNG <r06942076@ntu.edu.tw> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: Nicki Skafte <skaftenicki@gmail.com> Co-authored-by: Jirka Borovec <jirka.borovec@seznam.cz>	2021-01-04 19:57:53 +00:00
Jirka Borovec	957583544a	mark todo exceptions (#5320 ) * mark todo exceptions * . * . * . * . * . * . * . * . * try * .	2021-01-04 09:07:56 +01:00
Jirka Borovec	a884866ff0	Unify names in Utils (#5199 ) * warnings * argparse * mutils * xla device * deprecated * tests * simple * flake8 * fix * flake8 * 1.4	2020-12-22 00:23:33 +01:00
Jirka Borovec	0f36525e8f	fix/enable - check F401 (#5201 ) * refactor - check F401 * missed * fix	2020-12-21 10:15:04 +01:00
Jirka Borovec	2d54116baa	annotat unused vars (#5017 ) * annotate all unused vars * rank_zero_warn * Apply suggestions from code review Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * f1 fixed Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2020-12-19 13:53:06 +01:00
chaton	f3748ba808	[feat] Enable self.log in callbacks (#5094 ) * enable to use self.log in callbacks * update * revert back to assert	2020-12-16 16:08:39 -05:00
Jirka Borovec	059eaecbb4	set xxx_AVAILABLE as protected (#5082 ) * sett xxx_AVAILABLE as protected * docs	2020-12-14 20:19:05 +05:30
chaton	1a970b2d8d	[hotfix] Extend Optimizer + update doc (#5095 ) * resolve urgent bug * update pr * update doc * update * remove typo * add defaults * Update pytorch_lightning/__init__.py * Update setup.py * update doc * Update docs/source/optimizers.rst Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * update * resolve doc * debug test * update test * Update docs/source/optimizers.rst Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * Update docs/source/optimizers.rst Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * Update docs/source/optimizers.rst Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * remove useless import * Update docs/source/optimizers.rst Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2020-12-11 14:24:59 -05:00
Rohit Gupta	6d2aeff26a	fast_dev_run can be int (#4629 ) * fast_dev_run can be int * pep * chlog * add check and update docs * logging with fdr * update docs * suggestions Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * fdr flush logs * update trainer.fast_dev_run * codefactor and pre-commit isort * tmp Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: Roger Shieh <sh.rog@protonmail.ch> Co-authored-by: edenlightning <66261195+edenlightning@users.noreply.github.com>	2020-12-09 01:37:53 +05:30
chaton	2393474350	[hotfix] ddp + manual_optimisation (#4976 ) * Rely on ddp plugin for blocking sync behaviour, and skip if we're using manual optimization * debug * Revert "debug" This reverts commit `ccca6b6b` * Expose manual reduce for automatic optimization * Add input arguments * Enable parity test * clean imports * Expose hook after to ensure we reset * Fix naming * add * fix test * resolve on comments * typo * Update tests/trainer/optimization/test_manual_optimization.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * Update tests/trainer/optimization/test_manual_optimization.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * update on comments * resolve comments Co-authored-by: SeanNaren <sean@grid.ai> Co-authored-by: Sean Naren <sean.narenthiran@gmail.com> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>	2020-12-07 19:31:54 +00:00
chaton	02152c1729	Simplify optimization Logic (#4984 ) * Rely on ddp plugin for blocking sync behaviour, and skip if we're using manual optimization * debug * Revert "debug" This reverts commit `ccca6b6b` * Expose manual reduce for automatic optimization * Add input arguments * Enable parity test * clean imports * Expose hook after to ensure we reset * Fix naming * add * fix test * uniformize optimizer logic * resolve test * resovle flake8 * resolve amp bug * update tests * remove bug * remove optimizer_step in accelerators * typo * update lightning optimizer * set doesn't work with ddp_spawn * resolve flake8 * update threshold * ignore pyright * correct codeFactor * remove useless if * remove zer_grad function * simplify step * remove typo * resolve bug * Apply suggestions from code review * update on comments * resolve bugs * remove tests * Update pytorch_lightning/trainer/configuration_validator.py Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> * simplify testing * add more tests Co-authored-by: SeanNaren <sean@grid.ai> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>	2020-12-07 12:55:49 +00:00

1 2 3 4 5 ...

288 Commits