lightning

Commit Graph

Author	SHA1	Message	Date
Jirka Borovec	58a6d59784	simplify skip-if tests >> 0/n (#5920 ) * skipif + yapf + isort * tests * docs * pp	2021-03-01 12:17:09 +00:00
Justus Schock	0647340f3b	Add mypy typing to precision plugins. (#6149 ) Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Jirka Borovec <jirka.borovec@seznam.cz> Co-authored-by: Akihiro Nitta <nitta@akihironitta.com>	2021-02-26 14:27:16 +01:00
Kaushik B	e7298b5d38	fix parallel devices return type & add copyright (#6215 )	2021-02-26 11:09:08 +01:00
Justus Schock	3ed8ef8af9	type accelerators (#6148 )	2021-02-25 06:42:23 +00:00
Sean Naren	432e5637d6	Expose DeepSpeed FP16 parameters due to loss instability (#6115 ) * Expose deepspeed config parameters to init function due to instability in parameters * See if tests can run on normal CI, without special tests * Add changelog * Update pytorch_lightning/plugins/training_type/deepspeed.py Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2021-02-21 21:43:11 +01:00
Adrian Wälchli	3bdc0673ea	consistent behavior for reduce method across all Plugins (#6011 ) * reduction docs * docs for abstract base method * make mean the default * add preliminary chlog Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>	2021-02-20 13:30:21 +01:00
Adrian Wälchli	4b7c0fae00	Fix amp autocast (#6080 ) * precision fixes * add amp test model * fix test * revert * move assert to training step * fix test * fix test * remove unrelated changes * add changelog * remove unused import	2021-02-19 17:00:27 +00:00
Rohit Gupta	5d6a091531	default sched (#6062 )	2021-02-18 21:02:50 +01:00
Adrian Wälchli	fc9bb53e13	fix flake8 for new plugins (#5951 ) * flake8 * fix cyclic import * isort	2021-02-18 18:28:23 +00:00
Adrian Wälchli	6cc1a06078	rename accelerator_backend -> accelerator (#6034 ) * rename accelerator backend * rename new additions from master * add proper deprecation * pep8 * warning match * add missing warning type	2021-02-18 15:54:12 +00:00
Sean Naren	ffdcb62e8f	Make parallel devices optional across all plugins (#6051 ) * Make parallel devices optional across all plugins so that they can be instantiated * Add any to types to capture vars passed in	2021-02-18 12:09:53 +00:00
Sean Naren	8d7ac8f0f8	Address code review for deepspeed (#6042 )	2021-02-17 22:53:20 +00:00
Sean Naren	7189d673f6	DeepSpeed Integration (#5954 ) * Add initial deepspeed changes * Address code review * Move static method outside of function * Fixes * Add missing annotation * Remove seed setting * Doc changes * Doc changes, add address reviews * Fix docs * Try fixing issue by moving to torch adam * Clean up check * Changes, better APIs! * Add wrapper, swap to git install revision * Add special test * Add warning * Address review * Add better disclaimer * Turn off ZeRO for testing due to compilation * Add description on modifying parameters via the plugin * Doc strings clear * Small doc fixes * Fix hash, reduce test * Added CI change * Move to azure pipeline * Fix test name * Add missing flag * Remove sudo... * Try conda instead * Swap to conda base * Try suggested install * Apply suggestions from code review * Apply suggestions from code review * Revert "Apply suggestions from code review" This reverts commit `41cca05a` * Revert "Apply suggestions from code review" This reverts commit `e06ec29e` * Remove setter * Address most review * Move out function, remove DeepSpeed from requirements * Install deepspeed/mpi4py within container * Use special tests, move to master commit for deepspeed * Export path * Force compile to happen first * Remove! * Debugging ninja * Fix error in optimizer step logic * Attempt to fix symbolic link * Reverse to aid debugging * Export path again * Clean up mess * var * Revert "var" This reverts commit `3450eaca` * Address review, add todo * Add note about unsupported functionality Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: tchaton <thomas@grid.ai> Co-authored-by: Jirka Borovec <jirka.borovec@seznam.cz>	2021-02-17 15:23:42 -05:00
chaton	6bc4490d01	[HotFix] Resolve TPU Training (#6027 ) * fix tpus * update * add back reduction in val_loss * resolve some bugs with TPUs * update changelog * update on comments * forgot status * Fix train_bn arg * resolve comments * update on comments Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>	2021-02-17 16:40:13 +00:00
chaton	5700fd091f	[Feat] Add TORCH_DISTRIBUTED_BACKEND env variable (#5981 ) * add backend support * resolve flake8 * update changelog * update * Apply suggestions from code review * Update docs/source/advanced/multi_gpu.rst * add patch as context manager Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2021-02-17 16:37:39 +00:00
Sean Naren	ba6290029a	Re-introduce fix for Hydra directory sync with multiple process (#5993 ) * Add hydra fix that was missing from master * Remove error commas Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>	2021-02-17 11:24:57 +00:00
chaton	e982800b81	Add PredictLoop (#5752 ) * integrate distrib_type * sync changes * sync * fixes * add forgotten generators * add missing logic * update * import * missed imports * import fixes * isort * mv f * changelog * format * move helper to parallel plugin * d * add world size * clean up * duplicate * activate ddp_sharded and tpu * set nvidia flags * remove unused colab var * use_tpu <-> on_tpu attrs * make some ddp_cpu and clusterplugin tests pass * Ref/accelerator connector (#5742) * final cleanup Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * connector cleanup Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * trainer cleanup Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * accelerator cleanup + missing logic in accelerator connector Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * add missing changes to callbacks Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * reflect accelerator changes to lightning module Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * clean cluster envs Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * cleanup plugins Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * add broadcasting Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * yapf * remove plugin connector Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * plugins * add predict_loop * manual optimization * clean predictloop * update optimizer routing * add predict loop on new accelerator * resolve a bug * add rank to torchelastic * add predict_loop * add predict loop on new accelerator * resolve a bug * fix memory mixed precision * update * setstate on trainer for pickling in ddp spawn * add predict_loop * clean predictloop * add predict loop on new accelerator * resolve a bug * add predict_loop * add predict loop on new accelerator * resolve a bug * add predict_loop * add predict loop on new accelerator * resolve a bug * add predict_loop * add predict loop on new accelerator * resolve a bug * add predict_loop * clean predictloop * add predict loop on new accelerator * resolve a bug * add predict_loop * add predict loop on new accelerator * resolve a bug * resolve tests * add predict method * add back commented accelerator code * adapt test for sync_batch_norm to new plugin * fix deprecated tests * fix ddp cpu choice when no num_processes are given * yapf format * skip a memory test that cannot pass anymore * remove sanetize * rename train to run_train * remove useless hooks * add misconfigurationException * remove wrong naming * resolve some legacy * udpate docstring * fix pickle error in spawn plugin * x * avoid * x * fix cyclic import in docs build * add support for sharded * update typing * add sharded and sharded_spawn to distributed types * make unwrap model default * refactor LightningShardedDataParallel similar to LightningDistributedDataParallel * update sharded spawn to reflect changes * update sharded to reflect changes * Merge 1.1.5 changes * fix merge * fix merge * yapf isort * fix merge * yapf isort * fix indentation in test * copy over reinit scheduler implementation from dev1.2 * fix apex tracking calls with dev_debugger * reduce diff to dev1.2, clean up * fix trainer config test when gpus>0 and num_processes >0 and ddp_cpu * sort plugin tests legacy/new * fix error handling for amp on cpu * fix merge fix merge fix merge * [Feat] Resolve manual_backward (#5837) * resolve manual_backward * resolve flake8 * update * resolve for ddp_spawn * resolve flake8 * resolve flake8 * resolve flake8 Co-authored-by: Ubuntu <ubuntu@ip-172-31-88-60.ec2.internal> * fix tests/accelerator tests on cpu * [BugFix] Resolve manual optimization (#5852) * resolve manual_optimization * update * update Co-authored-by: Ubuntu <ubuntu@ip-172-31-88-60.ec2.internal> * Remove copy trainer parameters to happen earlier within the loop and add safe guard to get ref model (#5856) * resovle a bug * Accelerator refactor sharded rpc (#5854) * rpc branch * merge * update handling of rpc * make devices etc. Optional in RPC * set devices etc. later if necessary * remove devices from sequential * make devices optional in rpc * fix import * uncomment everything * fix cluster selection Co-authored-by: Ubuntu <ubuntu@ip-172-31-88-60.ec2.internal> * resolve bug * fix assert in rpc test * resolve a test * fix docs compilation * accelerator refactor - fix for sharded parity test (#5866) * fix memory issue with ddp_spawn * x x x x x x x x x * x * Remove DDP2 as this does not apply * Add missing pre optimizer hook to ensure lambda closure is called * fix apex docstring * [accelerator][BugFix] Resolve some test for 1 gpu (#5863) * update * revert init * resolve a bug * update * resolve flake8 * update * update * update * revert init * resolve a bug * update * resolve flake8 * update * update * update * update * update * revert init * resolve a bug * update * resolve flake8 * update * update * update * revert init * update * resolve flake8 * update * update * update * update * update * all_gather * update * make plugins work, add misconfig for RPC * update * update * remove breaking test * resolve some tests * resolve flake8 * revert to ddp_spawn Co-authored-by: root <root@ip-172-31-88-60.ec2.internal> Co-authored-by: Ubuntu <ubuntu@ip-172-31-88-60.ec2.internal> Co-authored-by: Justus Schock <justus.schock@rwth-aachen.de> * yapf isort * resolve flake8 * fix apex doctests * fix apex doctests 2 * resolve docs * update drone * clean env * update * update * update * update * merge * Fix RPC related tests, clean out old API, update for new accelerator API [skip ci] (#5881) * Fix RPC related tests, clean out old API, update for new accelerator API * Move tests out of legacy folder, update paths and names * Update test_remove_1-4.py * Expose properties for tpu cores/gpus/num_gpus * Add root GPU property * Move properties to properties.py * move tests that were previously in drone * Fix root GPU property (#5908) * Move root GPU to property, remove horovod set as this is handled in horovod plugin, ensure we mock correctly to set GPU accelerator * Add missing tests back * fix best model path transfer when no checkpoint callback available * Fix setup hook order [wip] (#5858) * Call trainer setup hook before accelerator setup * Add test case * add new test * typo * fix callback order in test Co-authored-by: tchaton <thomas@grid.ai> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * rename ddp sequential -> rpc sequential for special test * revert * fix stupid merge problem * Use property in connector for sampler (#5913) * merge the import conflicts * fix spawning of processes in slurm * [wip] Fix some bugs for TPU [skip ci] (#5878) * fixed for single tpu * fixed spawn * fixed spawn * update * update * wip * resolve bugs * resolve bug * update on comment * removed decorator * resolve comments * set to 4 * update * update * need cleaning * update * update * update * resolve flake8 * resolve bugs * exclude broadcast * resolve bugs * change test * update * update * skip if meet fails * properly raise trace * update * add catch * wrap test * resolve typo * update * typo Co-authored-by: Lezwon Castelino <lezwon@gmail.com> Co-authored-by: Your Name <you@example.com> * resolve some tests * update * fix imports * update * resolve flake8 * update azure pipeline * skip a sharded test on cpu that requires a gpu * resolve tpus * resolve bug * resolve flake8 * update * updat utils * revert permission change on files * suggestions from carlos Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * remove unrelated formatting changes * remove incomplete comment * Update pytorch_lightning/accelerators/__init__.py Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * remove unrelated formatting change * add types * warn 1.7 ddp manual backward only if ddp kwarg unset * yapf + isort * pep8 unused imports * fix cyclic import in docs * Apply suggestions from code review * typer in accelerator.py * typo * resolve flake8 * update code * update * Update pytorch_lightning/trainer/predict_loop.py Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * Update pytorch_lightning/trainer/predict_loop.py Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * fix merge * fix merge * reset legacy accelerator * add missing rename dispatch * rename post traning * update code * resolved comments * typo * typo * add flow description * resolve comments * update on comments * update flow * add backticks * resolve tpu Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com> Co-authored-by: justusschock <justus.schock@posteo.de> Co-authored-by: Justus Schock <justus.schock@rwth-aachen.de> Co-authored-by: Ubuntu <ubuntu@ip-172-31-88-60.ec2.internal> Co-authored-by: Sean Naren <sean.narenthiran@gmail.com> Co-authored-by: SeanNaren <sean@grid.ai> Co-authored-by: root <root@ip-172-31-88-60.ec2.internal> Co-authored-by: Lezwon Castelino <lezwon@gmail.com> Co-authored-by: Your Name <you@example.com> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>	2021-02-16 17:11:56 -05:00
chaton	a52be5bb07	[Hot Fix] Ensure process_dataloader is called when tpu_cores > 1 to use Parallel DataLoader (#6015 ) * hotfix for tpu * update changelog * Update CHANGELOG.md Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>	2021-02-16 17:02:25 -05:00
chaton	6e79bef996	[accelerator][FeatBugFix] Improve manual optimization API (#5771 ) * fix trainer.model access * move properties * fix test_transfer_batch_hook * fix auto_select_gpus * fix omegaconf test * fix test that needs to simulate slurm ddp * add horovod plugin * fix test with named arguments * clean up whitespace * fix datamodules test * remove old accelerators * fix naming * move old plugins * move to plugins * create precision subpackage * create training_type subpackage * fix all new import errors * fix wrong arguments order passed to test * fix LR finder * Added sharded training type and amp plugin * Move clip grad to precision plugin * Added sharded spawn, select accelerators based on distributed_backend + enable custom fp16 plugin automatically * Fix import issue, attempting to fix tests * Fix initial test * Reflect hook logic from master, should wrap model after move to device * Optional state consolidation, since master has optimizers not wrapped * change attribute for instance test * reset optimizers optimizers are not used in main process, so state would be wrong. * legacy * imports in accel * legacy2 * trainer imports * fix import errors after rebase * move hook to new setup location * provide unwrapping logic * fix trainer callback system * added ddp2 implementation * fix imports .legacy * move plugins * restore legacy * drop test.py from root * add tpu accelerator and plugins * fixes * fix lightning optimizer merge * reset bugreportmodel * unwrapping * step routing forward * model access * unwrap * opt * integrate distrib_type * sync changes * sync * fixes * add forgotten generators * add missing logic * update * import * missed imports * import fixes * isort * mv f * changelog * format * move helper to parallel plugin * d * add world size * clean up * duplicate * activate ddp_sharded and tpu * set nvidia flags * remove unused colab var * use_tpu <-> on_tpu attrs * make some ddp_cpu and clusterplugin tests pass * Ref/accelerator connector (#5742) * final cleanup Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * connector cleanup Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * trainer cleanup Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * accelerator cleanup + missing logic in accelerator connector Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * add missing changes to callbacks Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * reflect accelerator changes to lightning module Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * clean cluster envs Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * cleanup plugins Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * add broadcasting Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * yapf * remove plugin connector Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * plugins * manual optimization * update optimizer routing * add rank to torchelastic * fix memory mixed precision * setstate on trainer for pickling in ddp spawn * add predict method * add back commented accelerator code * adapt test for sync_batch_norm to new plugin * fix deprecated tests * fix ddp cpu choice when no num_processes are given * yapf format * skip a memory test that cannot pass anymore * update on comments * fix pickle error in spawn plugin * x * avoid * x * fix cyclic import in docs build * add support for sharded * update typing * add sharded and sharded_spawn to distributed types * make unwrap model default * refactor LightningShardedDataParallel similar to LightningDistributedDataParallel * update sharded spawn to reflect changes * update sharded to reflect changes * Merge 1.1.5 changes * fix merge * fix merge * yapf isort * fix merge * yapf isort * fix indentation in test * copy over reinit scheduler implementation from dev1.2 * fix apex tracking calls with dev_debugger * reduce diff to dev1.2, clean up * fix trainer config test when gpus>0 and num_processes >0 and ddp_cpu * sort plugin tests legacy/new * fix error handling for amp on cpu * fix merge fix merge fix merge * [Feat] Resolve manual_backward (#5837) * resolve manual_backward * resolve flake8 * update * resolve for ddp_spawn * resolve flake8 * resolve flake8 * resolve flake8 Co-authored-by: Ubuntu <ubuntu@ip-172-31-88-60.ec2.internal> * fix tests/accelerator tests on cpu * [BugFix] Resolve manual optimization (#5852) * resolve manual_optimization * update * update Co-authored-by: Ubuntu <ubuntu@ip-172-31-88-60.ec2.internal> * Remove copy trainer parameters to happen earlier within the loop and add safe guard to get ref model (#5856) * resovle a bug * Accelerator refactor sharded rpc (#5854) * rpc branch * merge * update handling of rpc * make devices etc. Optional in RPC * set devices etc. later if necessary * remove devices from sequential * make devices optional in rpc * fix import * uncomment everything * fix cluster selection Co-authored-by: Ubuntu <ubuntu@ip-172-31-88-60.ec2.internal> * resolve bug * fix assert in rpc test * resolve a test * fix docs compilation * accelerator refactor - fix for sharded parity test (#5866) * fix memory issue with ddp_spawn * x x x x x x x x x * x * Remove DDP2 as this does not apply * Add missing pre optimizer hook to ensure lambda closure is called * fix apex docstring * [accelerator][BugFix] Resolve some test for 1 gpu (#5863) * update * revert init * resolve a bug * update * resolve flake8 * update * update * update * revert init * resolve a bug * update * resolve flake8 * update * update * update * update * update * revert init * resolve a bug * update * resolve flake8 * update * update * update * revert init * update * resolve flake8 * update * update * update * update * update * all_gather * update * make plugins work, add misconfig for RPC * update * update * remove breaking test * resolve some tests * resolve flake8 * revert to ddp_spawn Co-authored-by: root <root@ip-172-31-88-60.ec2.internal> Co-authored-by: Ubuntu <ubuntu@ip-172-31-88-60.ec2.internal> Co-authored-by: Justus Schock <justus.schock@rwth-aachen.de> * yapf isort * resolve flake8 * fix apex doctests * fix apex doctests 2 * resolve docs * update drone * clean env * update * update * update * update * merge * Fix RPC related tests, clean out old API, update for new accelerator API [skip ci] (#5881) * Fix RPC related tests, clean out old API, update for new accelerator API * Move tests out of legacy folder, update paths and names * Update test_remove_1-4.py * Expose properties for tpu cores/gpus/num_gpus * Add root GPU property * Move properties to properties.py * move tests that were previously in drone * Fix root GPU property (#5908) * Move root GPU to property, remove horovod set as this is handled in horovod plugin, ensure we mock correctly to set GPU accelerator * Add missing tests back * fix best model path transfer when no checkpoint callback available * Fix setup hook order [wip] (#5858) * Call trainer setup hook before accelerator setup * Add test case * add new test * typo * fix callback order in test Co-authored-by: tchaton <thomas@grid.ai> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * rename ddp sequential -> rpc sequential for special test * revert * fix stupid merge problem * Use property in connector for sampler (#5913) * merge the import conflicts * fix spawning of processes in slurm * [wip] Fix some bugs for TPU [skip ci] (#5878) * fixed for single tpu * fixed spawn * fixed spawn * update * update * wip * resolve bugs * resolve bug * update on comment * removed decorator * resolve comments * set to 4 * update * update * need cleaning * update * update * update * resolve flake8 * resolve bugs * exclude broadcast * resolve bugs * change test * update * update * skip if meet fails * properly raise trace * update * add catch * wrap test * resolve typo * update * typo Co-authored-by: Lezwon Castelino <lezwon@gmail.com> Co-authored-by: Your Name <you@example.com> * resolve some tests * update * fix imports * update * resolve flake8 * update azure pipeline * skip a sharded test on cpu that requires a gpu * resolve tpus * resolve bug * resolve flake8 * update * updat utils * revert permission change on files * suggestions from carlos Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * remove unrelated formatting changes * remove incomplete comment * Update pytorch_lightning/accelerators/__init__.py Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * remove unrelated formatting change * add types * warn 1.7 ddp manual backward only if ddp kwarg unset * yapf + isort * pep8 unused imports * fix cyclic import in docs * Apply suggestions from code review * typer in accelerator.py * typo * Apply suggestions from code review * formatting * update on comments * update typo * Update pytorch_lightning/trainer/properties.py Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * update * update on comments * resolve some comments * update on comments * resolve test * add toggle_model * update * update on comments * update doc * typo * update * typo * remove space * update * update on comments Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: justusschock <justus.schock@posteo.de> Co-authored-by: SeanNaren <sean@grid.ai> Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com> Co-authored-by: Jirka Borovec <jirka.borovec@seznam.cz> Co-authored-by: Justus Schock <justus.schock@rwth-aachen.de> Co-authored-by: Ubuntu <ubuntu@ip-172-31-88-60.ec2.internal> Co-authored-by: Sean Naren <sean.narenthiran@gmail.com> Co-authored-by: root <root@ip-172-31-88-60.ec2.internal> Co-authored-by: Lezwon Castelino <lezwon@gmail.com> Co-authored-by: Your Name <you@example.com> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>	2021-02-16 16:00:35 -05:00
Jirka Borovec	1c87f1f6cd	remove legacy plugins (#5950 ) * remove legacy plugins * imports * formatting * fix docs references * fix cluster environment inheritance Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2021-02-16 19:20:58 +00:00
Sean Naren	b40d414463	Fix error in pre-optim logic (#5995 )	2021-02-16 19:05:44 +00:00
Justus Schock	da6dbc8d1d	PoC: Accelerator refactor (#5743 ) * restoring the result from subprocess * fix queue.get() order for results * add missing "block_backward_sync" context manager * add missing "block_backward_sync" context manager * fix sync_batchnorm * fix supported gpu-ids for tuple * fix clip gradients and inf recursion * accelerator selection: added cluster_environment plugin * fix torchelastic test * fix reduce early stopping decision for DDP * fix tests: callbacks, conversion to lightning optimizer * fix lightning optimizer does not pickle * fix setting benchmark and deterministic option * fix slurm amp test * fix prepare_data test and determine node_rank * fix retrieving last path when testing * remove obsolete plugin argument * fix test: test_trainer_config * fix torchscript tests * fix trainer.model access * move properties * fix test_transfer_batch_hook * fix auto_select_gpus * fix omegaconf test * fix test that needs to simulate slurm ddp * add horovod plugin * fix test with named arguments * clean up whitespace * fix datamodules test * remove old accelerators * fix naming * move old plugins * move to plugins * create precision subpackage * create training_type subpackage * fix all new import errors * fix wrong arguments order passed to test * fix LR finder * Added sharded training type and amp plugin * Move clip grad to precision plugin * Added sharded spawn, select accelerators based on distributed_backend + enable custom fp16 plugin automatically * Fix import issue, attempting to fix tests * Fix initial test * Reflect hook logic from master, should wrap model after move to device * Optional state consolidation, since master has optimizers not wrapped * change attribute for instance test * reset optimizers optimizers are not used in main process, so state would be wrong. * legacy * imports in accel * legacy2 * trainer imports * fix import errors after rebase * move hook to new setup location * provide unwrapping logic * fix trainer callback system * added ddp2 implementation * fix imports .legacy * move plugins * restore legacy * drop test.py from root * add tpu accelerator and plugins * fixes * fix lightning optimizer merge * reset bugreportmodel * unwrapping * step routing forward * model access * unwrap * opt * integrate distrib_type * sync changes * sync * fixes * add forgotten generators * add missing logic * update * import * missed imports * import fixes * isort * mv f * changelog * format * move helper to parallel plugin * d * add world size * clean up * duplicate * activate ddp_sharded and tpu * set nvidia flags * remove unused colab var * use_tpu <-> on_tpu attrs * make some ddp_cpu and clusterplugin tests pass * Ref/accelerator connector (#5742) * final cleanup Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * connector cleanup Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * trainer cleanup Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * accelerator cleanup + missing logic in accelerator connector Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * add missing changes to callbacks Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * reflect accelerator changes to lightning module Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * clean cluster envs Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * cleanup plugins Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * add broadcasting Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * yapf * remove plugin connector Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * plugins * manual optimization * update optimizer routing * add rank to torchelastic * fix memory mixed precision * setstate on trainer for pickling in ddp spawn * add predict method * add back commented accelerator code * adapt test for sync_batch_norm to new plugin * fix deprecated tests * fix ddp cpu choice when no num_processes are given * yapf format * skip a memory test that cannot pass anymore * fix pickle error in spawn plugin * x * avoid * x * fix cyclic import in docs build * add support for sharded * update typing * add sharded and sharded_spawn to distributed types * make unwrap model default * refactor LightningShardedDataParallel similar to LightningDistributedDataParallel * update sharded spawn to reflect changes * update sharded to reflect changes * Merge 1.1.5 changes * fix merge * fix merge * yapf isort * fix merge * yapf isort * fix indentation in test * copy over reinit scheduler implementation from dev1.2 * fix apex tracking calls with dev_debugger * reduce diff to dev1.2, clean up * fix trainer config test when gpus>0 and num_processes >0 and ddp_cpu * sort plugin tests legacy/new * fix error handling for amp on cpu * fix merge fix merge fix merge * [Feat] Resolve manual_backward (#5837) * resolve manual_backward * resolve flake8 * update * resolve for ddp_spawn * resolve flake8 * resolve flake8 * resolve flake8 Co-authored-by: Ubuntu <ubuntu@ip-172-31-88-60.ec2.internal> * fix tests/accelerator tests on cpu * [BugFix] Resolve manual optimization (#5852) * resolve manual_optimization * update * update Co-authored-by: Ubuntu <ubuntu@ip-172-31-88-60.ec2.internal> * Remove copy trainer parameters to happen earlier within the loop and add safe guard to get ref model (#5856) * resovle a bug * Accelerator refactor sharded rpc (#5854) * rpc branch * merge * update handling of rpc * make devices etc. Optional in RPC * set devices etc. later if necessary * remove devices from sequential * make devices optional in rpc * fix import * uncomment everything * fix cluster selection Co-authored-by: Ubuntu <ubuntu@ip-172-31-88-60.ec2.internal> * resolve bug * fix assert in rpc test * resolve a test * fix docs compilation * accelerator refactor - fix for sharded parity test (#5866) * fix memory issue with ddp_spawn * x x x x x x x x x * x * Remove DDP2 as this does not apply * Add missing pre optimizer hook to ensure lambda closure is called * fix apex docstring * [accelerator][BugFix] Resolve some test for 1 gpu (#5863) * update * revert init * resolve a bug * update * resolve flake8 * update * update * update * revert init * resolve a bug * update * resolve flake8 * update * update * update * update * update * revert init * resolve a bug * update * resolve flake8 * update * update * update * revert init * update * resolve flake8 * update * update * update * update * update * all_gather * update * make plugins work, add misconfig for RPC * update * update * remove breaking test * resolve some tests * resolve flake8 * revert to ddp_spawn Co-authored-by: root <root@ip-172-31-88-60.ec2.internal> Co-authored-by: Ubuntu <ubuntu@ip-172-31-88-60.ec2.internal> Co-authored-by: Justus Schock <justus.schock@rwth-aachen.de> * yapf isort * resolve flake8 * fix apex doctests * fix apex doctests 2 * resolve docs * update drone * clean env * update * update * update * update * merge * Fix RPC related tests, clean out old API, update for new accelerator API [skip ci] (#5881) * Fix RPC related tests, clean out old API, update for new accelerator API * Move tests out of legacy folder, update paths and names * Update test_remove_1-4.py * Expose properties for tpu cores/gpus/num_gpus * Add root GPU property * Move properties to properties.py * move tests that were previously in drone * Fix root GPU property (#5908) * Move root GPU to property, remove horovod set as this is handled in horovod plugin, ensure we mock correctly to set GPU accelerator * Add missing tests back * fix best model path transfer when no checkpoint callback available * Fix setup hook order [wip] (#5858) * Call trainer setup hook before accelerator setup * Add test case * add new test * typo * fix callback order in test Co-authored-by: tchaton <thomas@grid.ai> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * rename ddp sequential -> rpc sequential for special test * revert * fix stupid merge problem * Use property in connector for sampler (#5913) * merge the import conflicts * fix spawning of processes in slurm * [wip] Fix some bugs for TPU [skip ci] (#5878) * fixed for single tpu * fixed spawn * fixed spawn * update * update * wip * resolve bugs * resolve bug * update on comment * removed decorator * resolve comments * set to 4 * update * update * need cleaning * update * update * update * resolve flake8 * resolve bugs * exclude broadcast * resolve bugs * change test * update * update * skip if meet fails * properly raise trace * update * add catch * wrap test * resolve typo * update * typo Co-authored-by: Lezwon Castelino <lezwon@gmail.com> Co-authored-by: Your Name <you@example.com> * resolve some tests * update * fix imports * update * resolve flake8 * update azure pipeline * skip a sharded test on cpu that requires a gpu * resolve tpus * resolve bug * resolve flake8 * update * updat utils * revert permission change on files * suggestions from carlos Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * remove unrelated formatting changes * remove incomplete comment * Update pytorch_lightning/accelerators/__init__.py Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * remove unrelated formatting change * add types * warn 1.7 ddp manual backward only if ddp kwarg unset * yapf + isort * pep8 unused imports * fix cyclic import in docs * Apply suggestions from code review * typer in accelerator.py * typo * Apply suggestions from code review * formatting * update on comments * update typo * Update pytorch_lightning/trainer/properties.py Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * update * suggestion from code review * suggestion from code review Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: SeanNaren <sean@grid.ai> Co-authored-by: Jirka Borovec <jirka.borovec@seznam.cz> Co-authored-by: chaton <thomas@grid.ai> Co-authored-by: Ubuntu <ubuntu@ip-172-31-88-60.ec2.internal> Co-authored-by: Sean Naren <sean.narenthiran@gmail.com> Co-authored-by: root <root@ip-172-31-88-60.ec2.internal> Co-authored-by: Lezwon Castelino <lezwon@gmail.com> Co-authored-by: Your Name <you@example.com> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>	2021-02-12 15:48:56 -05:00
Gyeongjae Choi	f20c1d1cdb	Fix typo in docstring (#5835 ) Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> (cherry picked from commit `0a50bb406f`)	2021-02-08 20:22:39 +01:00
Tobias Maier	1d99530c66	Fix root node resolution in slurm environment Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Nicki Skafte <skaftenicki@gmail.com> Co-authored-by: chaton <thomas@grid.ai>	2021-02-03 19:41:46 +01:00
Adrian Wälchli	8943d8bca0	add missing logic to new plugins and accelerator (#5734 ) * add missing logic * missed imports * import fixes * isort * mv f * changelog * format * move helper to parallel plugin * d	2021-02-01 13:23:53 -05:00
Justus Schock	9064b83caf	Accelerator Refactor/RPC + Sharded (#5732 ) Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Jirka Borovec <jirka.borovec@seznam.cz>	2021-02-01 14:49:15 +01:00
Justus Schock	b3ebc18bcb	Hardware specific parts of Accelerator Refactoring (#5719 ) * add basic accelerator class. Co-Authored with @awaelchi * pep8 Co-authored-by: @awaelchi * add cpu accelerator Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * add gpu accelerator Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * add tpu accelerator Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * add accelerator connector Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * add single device training Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * add single tpu Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * add tpu spawn Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * make on_colab_kaggle utility func * add basic accelerator class. Co-Authored with @awaelchi * pep8 Co-authored-by: @awaelchi * add cpu accelerator Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * add gpu accelerator Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * add tpu accelerator Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * add accelerator connector Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * add single device training Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * add single tpu Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * add tpu spawn Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * make on_colab_kaggle utility func * fixes * move * yapf * . * . * . * flake8 * sync accelerator connector changes from dev1.2 * changelog * fix tpu handling * tpu * aval * yapf * Update pytorch_lightning/plugins/training_type/tpu_spawn.py Co-authored-by: chaton <thomas@grid.ai> * Update pytorch_lightning/accelerators/accelerator_connector.py Co-authored-by: chaton <thomas@grid.ai> * Update pytorch_lightning/plugins/training_type/tpu_spawn.py Co-authored-by: chaton <thomas@grid.ai> * Update tpu_spawn.py * Update pytorch_lightning/accelerators/accelerator_connector.py Co-authored-by: chaton <thomas@grid.ai> * indentation Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: Jirka Borovec <jirka.borovec@seznam.cz> Co-authored-by: chaton <thomas@grid.ai>	2021-02-01 08:34:59 -05:00
Justus Schock	069ae27cef	Accelerator Refactor: Precision Plugins (#5718 ) * add basic accelerator class. Co-Authored with @awaelchi * add basic trainign type plugin. Co-Authored with @awaelchi * pep8 Co-authored-by: @awaelchi * update copyright Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * add apex_amp Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * add mixed base class Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * add native amp Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * add native amp sharded Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * add tpu bfloat Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * add inits Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * Update precision_plugin.py Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2021-01-31 13:12:02 -05:00
Adrian Wälchli	3bacac7734	accelerator refactor - add parallel plugins (#5714 ) Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>	2021-01-31 07:48:14 -05:00
Adrian Wälchli	692f77b8a7	Refactor LightningDataParallel (#5670 ) * module * fix model access * scalar conversion * refactor * kwargs * auto unsqueeze * refactor code duplication * clean up * docs * update dp docs * changelog * generalize test * test * rename * warning cache * isort * unsqueezing test * device * device * scalar test * device * device * include coverage of overrides * clear * add deprecation test * docs * improve coverage * increase coverage * fix merge * extend test * rename base class * mention the predict method in docs * combine iteration over collection * remove override * move * line * Apply suggestions from code review * fix running stage * f401 * fix cyclic import Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>	2021-01-31 06:08:16 -05:00
Justus Schock	5d239ccd70	Base classes for accelerator refactoring (#5715 ) * add basic accelerator class. Co-Authored with @awaelchi * Add base plugin class. Co-authored with @awaelchi * add basic trainign type plugin. Co-Authored with @awaelchi * add basic precision plugin. Co-Authored with @awaelchi * Add missing inits. Co-authored with @awaelchi * pep8 Co-authored-by: @awaelchi * ignore flake8 * coverage omit * imports in init * lost * imports * flake8 * . * . * chlog * Update pytorch_lightning/plugins/training_type/training_type_plugin.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * Update pytorch_lightning/plugins/training_type/training_type_plugin.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * Update pytorch_lightning/plugins/training_type/training_type_plugin.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * Update pytorch_lightning/plugins/training_type/training_type_plugin.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * Update pytorch_lightning/plugins/training_type/training_type_plugin.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * Update pytorch_lightning/plugins/training_type/training_type_plugin.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * Update pytorch_lightning/plugins/training_type/training_type_plugin.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: Jirka Borovec <jirka.borovec@seznam.cz> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>	2021-01-30 14:55:28 -05:00
Arnaud Gelas	8dfcc078c4	Fix isort failures in core (#5526 ) Remove from skipped module in pyproject.toml and fix failures on: - pytorch_lightning/core/*.py	2021-01-27 05:02:16 -05:00
Jirka Borovec	7e2e874d95	Refactor: legacy accelerators and plugins (#5645 ) * tests: legacy * legacy: accel * legacy: plug * fix imports * mypy * flake8	2021-01-26 20:04:36 -05:00
Jirka Borovec	53b0ae49b9	fix imports / isort / flake8	2021-01-26 14:57:34 +01:00
chaton	0435e23a64	deprecate enable_pl_optimizer as it is not restored properly (#5244 ) * update * clean test * still in progress * udpdate test * update * update * resolve flake * add test for zero_grad * update * works without accumulated_grad * update * update * resolve amp * revert back to True * update * clean tests * cleaned out * typo * update test * git repare bug * remove print * udpate * Fix formatting/optimizer imports * Refactor the test for cleanliness * Add vanilla model to the test, better var names * Fixed var names, let's clean up these mock tests * repare test * update test * resolve flake8 * add manual_optimization * update tests * resolve flake8 * add random accumulate_grad_batches * improve test * Update tests/trainer/optimization/test_parity_automatic_optimization.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * Update tests/trainer/optimization/test_parity_automatic_optimization.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * update * clean tests * correct bug * Apply suggestions from code review * format * adress comments * update on comments * wip * typo * depreceate enable_pl_optimizer * resolve latest bugs * update * resolve merge * add comment * Update pytorch_lightning/core/lightning.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * Update tests/deprecated_api/test_remove_1-3.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * Update pytorch_lightning/trainer/connectors/optimizer_connector.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * Update pytorch_lightning/trainer/trainer.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * Update pytorch_lightning/trainer/trainer.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * Update tests/trainer/optimization/test_parity_automatic_optimization.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * update on comments * update restore * add a property * remove setstate as not needed anymore * update test * provide optimizer to on_before_zero_grad * update on comments * update on comments * Update pytorch_lightning/trainer/trainer.py Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * Update tests/trainer/optimization/test_parity_automatic_optimization.py Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * Update tests/trainer/optimization/test_parity_automatic_optimization.py Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * Update tests/trainer/optimization/test_parity_automatic_optimization.py Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * mofidy import * update changelog * resolve flake8 * update * update * clean doc Co-authored-by: SeanNaren <sean@grid.ai> Co-authored-by: Ubuntu <ubuntu@ip-172-31-62-109.ec2.internal> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Jirka Borovec <jirka.borovec@seznam.cz> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: Sean Naren <sean.narenthiran@gmail.com> (cherry picked from commit `f2e99d617f`)	2021-01-26 14:29:46 +01:00
chaton	f2f4a49271	[bug-fix] Call transfer_batch_to_device in DDPlugin (#5195 ) * hacking out * update * remove useless on_before_forward * update * remove overriden * iremove os * use on_before_forward * resolve flake8 * add test * update * add single_process_per_device * resolve flake8 * update * resolve * update * update * update * add comment * resolve bug with sharded * update * remove property * update * resolve test * resolve bug * update on comments * update doc * Update pytorch_lightning/core/hooks.py Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> * update on comments * Update pytorch_lightning/plugins/ddp_plugin.py Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> * Update pytorch_lightning/plugins/ddp_plugin.py Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> * resolve pep8 * add device_ids to pipe * update on comments * update * resolve * update * update * update Co-authored-by: Ubuntu <ubuntu@ip-172-31-62-109.ec2.internal> Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> Co-authored-by: Sean Naren <sean.narenthiran@gmail.com> (cherry picked from commit `d510707bc9`)	2021-01-26 14:28:45 +01:00
Jirka Borovec	f782230412	docs cleaning - testcode (#5595 ) * testcode - python * revert * simple * testcode @rst * pl * fix * pip * update * conf * conf * nn. * typo	2021-01-26 04:44:54 -05:00
Jirka Borovec	2846322f60	fix docs render (#5610 )	2021-01-25 20:21:00 -05:00
Jirka Borovec	6386f45de7	clarify Trainer running state atribs. (#5589 ) * update Trainer is_ attributes * tests * more * isort * split * rename * check * fix	2021-01-24 10:45:42 +00:00
Adrian Wälchli	e806bb77fa	Refactor LightningDistributedDataParallel (#5185 ) * add wrapper * add squeeze * replace LightningDistributedDP * update import * module access * inputs * refactor warning * update * resolve flake8 * remove old class * set find unused params to False * update docstrings * update docs * update docs * add changelog * deprecation * rename wrapper -> module * rename pl_module * add unit tests * Revert "add changelog" This reverts commit 02ec0a6864f4ba2ace3bb6fc6ebc364e1a80ffd7. * Revert "set find unused params to False" This reverts commit 8e451515e6ba3227d00f4a5cb63f332cfedb7b30. Co-authored-by: Ubuntu <thomas@grid.ai>	2021-01-13 14:35:42 -05:00
Adrian Wälchli	61308138c3	set find_unused_parameters=False in DDP as in pytorch (#5435 ) * set find unused params to False * add changelog * fix changelog * fix test * update docs * update changelog Co-authored-by: chaton <thomas@grid.ai>	2021-01-13 10:13:40 -05:00
Jirka Borovec	54d20dc596	Refactor: clean trainer device & distrib getters (#5300 ) * warnings * . * . * flake8 * . * . * . * use_tpu * use_dp * . * use_ddp * . * use_horovod * . * . * .	2021-01-12 05:22:37 -05:00
Carlos Mocholí	3ee3c42035	Prepare 1.1.3 release (#5365 ) * Prepare 1.1.3 release * Fix flake8 error * suppress * Remove 1.1.4 section * Add missing commits to CHANGELOG * Update PR template * Add missing commit * fix * Update CHANGELOG.md * Apply suggestions from code review * Apply suggestions from code review Co-authored-by: Jirka Borovec <jirka.borovec@seznam.cz> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> (cherry picked from commit `4d9db866a1`)	2021-01-06 15:17:27 +01:00
chaton	56437e98a6	[bug-fix] Trainer.test points to latest best_model_path (#5161 ) * resolve bug * update code * add set -e * Update pytorch_lightning/callbacks/model_checkpoint.py Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * update test * Update tests/checkpointing/test_trainer_checkpoint.py Co-authored-by: Sean Naren <sean.narenthiran@gmail.com> * Update tests/checkpointing/test_trainer_checkpoint.py Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * update on comments * resolve test * convert to set * update * add error triggering * update * update on comments * update * resolve import * update * update * Update pytorch_lightning/plugins/rpc_plugin.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * update Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: Sean Naren <sean.narenthiran@gmail.com> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: Ubuntu <ubuntu@ip-172-31-62-109.ec2.internal> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> (cherry picked from commit `d5b367871f`)	2021-01-06 15:14:10 +01:00
Heewon Jeon(gogamza)	c0e9a78db4	supports --num-nodes on DDPSequentialPlugin() (#5327 ) (cherry picked from commit `d20fd8e5ab`)	2021-01-06 12:48:17 +01:00
Jirka Borovec	b72ed71d4e	Refactor: clean trainer device & distrib setters (#5297 ) * naive replace * simplify * clean * . * fix * . * fix * fix	2021-01-04 17:10:13 +00:00
Jirka Borovec	0f36525e8f	fix/enable - check F401 (#5201 ) * refactor - check F401 * missed * fix	2020-12-21 10:15:04 +01:00
Jirka Borovec	2d54116baa	annotat unused vars (#5017 ) * annotate all unused vars * rank_zero_warn * Apply suggestions from code review Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * f1 fixed Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2020-12-19 13:53:06 +01:00
Jirka Borovec	059eaecbb4	set xxx_AVAILABLE as protected (#5082 ) * sett xxx_AVAILABLE as protected * docs	2020-12-14 20:19:05 +05:30
chaton	ef8ef12fd0	[feat] pp 2/n (#5026 ) * Added changes for RPC plugin * Add missing kwargs * Fix code format * Loading refactors by introducing is_distributed var, fix optimizer step flow * Add rpc guard * Added docstrings and typing * resolve comments * Add additional rpc hook, refactor name of exit process hook for clarity * remove annotation * Modify behaviour to allow optional return, add test for rpc plugin * resolve tests * rename is_ddp_based * update * update for windows * update * resolve test * code smell * Added sequential plugin * resolve bug * update * cleanup * add Exception * resolve docs * Remove ddp support * Revert distributed -> ddp * Update pl_examples/basic_examples/conv_sequential_example.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * Update pl_examples/basic_examples/conv_sequential_example.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * Update pytorch_lightning/plugins/ddp_sequential_plugin.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * Address code review points * Update pytorch_lightning/plugins/ddp_sequential_plugin.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * Update pytorch_lightning/plugins/ddp_sequential_plugin.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * Add missing return * Fix formatting, add datamodule args * add small comment * resolve comments * resolve comments * update source for fairscale * update extras * remove staticmethod * resolve flake8 * Skip tests that are failing due to bug upstream with multiple optimizers and shard * update * update on comments * clean test * latest comments * remove old comments * add todo * Update version * update * resolve bugs * resolve bugs * update test * remove hanging test * Update pytorch_lightning/plugins/ddp_sequential_plugin.py Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * resolve on comments * Update pytorch_lightning/plugins/ddp_sequential_plugin.py Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * resolve on comments * Update pytorch_lightning/plugins/ddp_sequential_plugin.py Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * Update pytorch_lightning/plugins/ddp_sequential_plugin.py Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * Update pytorch_lightning/plugins/ddp_sequential_plugin.py Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * Update pytorch_lightning/plugins/ddp_sequential_plugin.py Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * remove ImportError Co-authored-by: SeanNaren <sean@grid.ai> Co-authored-by: Sean Naren <sean.narenthiran@gmail.com> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2020-12-09 12:56:51 +00:00

1 2 3

111 Commits