lightning

Commit Graph

Author	SHA1	Message	Date
Adrian Wälchli	c55bc433ce	Fix retrieval of batch indices when dataloader num_workers > 0 (#10870 ) Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>	2021-12-02 10:36:10 +00:00
Adrian Wälchli	97e52619ea	Fix typing in `pl.overrides.data_parallel` (#10796 ) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2021-11-29 10:58:23 +01:00
Danielle Pintz	1f7bd6650c	Mark accelerator connector as protected (#10032 )	2021-10-25 19:24:54 +00:00
ananthsub	0d3325ea20	Add support for `torch.use_deterministic_algorithms` (#9121 ) * re-add changes * Update test_data_parallel.py * Update CHANGELOG.md * Update test_legacy_checkpoints.py * Update test_horovod.py * Update test_horovod.py * Update accelerator_connector.py * update tests	2021-09-30 04:40:09 +00:00
B. Kerim Tshimanga	f0788b3bbc	scheduled removal of auto_move_data decorator (#9231 ) * scheduled removal of auto_move_data decorator * update CHANGELOG.md * remove unused import * remove test_decorators.py * fix missed merge conflict Co-authored-by: thomas chaton <thomas@grid.ai>	2021-09-03 00:54:36 +02:00
Carlos Mocholí	a64cc37394	Replace `yapf` with `black` (#7783 ) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2021-07-26 13:37:35 +02:00
deepsource-autofix[bot]	e11fe19673	Remove unnecessary use of comprehension (#8149 ) Co-authored-by: deepsource-autofix[bot] <62050782+deepsource-autofix[bot]@users.noreply.github.com>	2021-06-27 10:00:02 +01:00
Ethan Harris	03bb389b21	Fix double precision + ddp_spawn (#6924 ) * Initial fix * Initial fix * Initial fix * Updates * Updates * Update typing and docs * Undo accidental refactor * Remove unused imports * Add DDP double precision test * Remove unused variable * Update CHANGELOG.md * Fix test * Update tests * Formatting * Revert bad change * Add back changes * Correct wrapping order * Improve unwrapping * Correct wrapping order * Fix... finally * Respond to comments * Drop ddp test * Simplify ddp spawn test * Simplify ddp spawn test Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>	2021-06-01 15:21:17 +00:00
Kaushik B	04dcb1786d	Add `__len__` method to IndexBatchSamplerWrapper (#7681 ) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2021-05-26 18:20:13 +02:00
Carlos Mocholí	8c0ea92af2	`TrainerState` refactor [5/5] (#7173 ) * `TrainerState` refactor * flake8 * Update finished check * Test cleanup * Fix tests * Fixes * Reorder * flake8 * Update CHANGELOG * Better docs * Better docs * Remove default * Update tests * Bad merge	2021-05-04 12:50:56 +02:00
thomas chaton	e147127c0e	[feat] Add better support for predict + ddp 2/3 (#7215 ) * wip * update * update * update * update * update * typo * update on comments * update * update * update * update * update changelog * update * Fix merge * Fix merge * move code * resolve test * add extra test * add an extra test * update on comments * add typing * resolve flake8 * Refactor and Docs * Fix tests * Fix tests * Fix tests * Duplicate * Fix tests * resolve bug * update * update on comments * update * update changelog * update * update * remove tpu * resolve flake8 * update on comments * update on comments * update on comment * resolve flake8 * add a cpu test for predict * add None test * update * Update CHANGELOG.md Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * resolve tests Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2021-04-27 08:46:45 -04:00
Adrian Wälchli	80c5293514	fix self.device access in DataParallel (#6414 )	2021-04-13 02:03:24 +02:00
thomas chaton	0995d30fab	Flash predict step (#6577 ) * add predict_step * Update predict_loop.py * Update trainer.py * Update trainer.py * resolve bugs * update * update * update * resolve bug * resolve some failing tests * udpate tests * update * resolve tests * add a test * remove typo * add a test for attachement * update * changed to on_train_dataloader * remove __flash_special_attr__ * resolve tests * update * update * update * update on comments * Update pytorch_lightning/trainer/data_loading.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>	2021-03-23 11:13:13 -04:00
Rohit Gupta	facfda85f1	Remove no return warning from val/test step (#6139 ) * remove warning * auto_opt * chlog * auto_opt * no_warning_call * rm old code * add warning for predict * Apply suggestions from code review Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2021-03-06 17:15:21 +00:00
Elia Cereda	d0596fac94	Refactor RunningStage usage in advance of implementing Trainer.validate() (#4945 ) * Update code Co-authored-by: EliaCereda * More property updates * Move properties. Introduce trainer._fitting * Use trainer.fitting * Fix reset dataloaders * Unused code * RunningStage.SANITY_CHECKING * Use setters * Fix bugs * Fix bugs * TrainerState.{FITTING,VALIDATING,TESTING,PREDICTING,TUNING} * Fix bugs * Fix bugs * Fix tests * Update CHANGELOG. Add deprecation warning. Fix tests * Unused imports * Optional trainer * More deprecation. More refactoring * Correct version * Use properties * Address comments * flake8 * Missed renamings * Typo * is -> == It is recommended to use for Enums since they are singletons, however, since the LightningEnum subclasses str, it's not a good idea in case a user sets the state/stage with a str * Also for tests * Typo * Address @tchaton's comments * PEP8 * Correct property * Update CHANGELOG * Apply suggestions from code review Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * Update pytorch_lightning/trainer/trainer.py Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * Remove called sanity check Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2021-03-06 12:40:19 +00:00
Jirka Borovec	0f9134e043	Refactor: skipif for Windows 2/n (#6268 ) * win * isort * flake8	2021-03-02 09:36:01 +00:00
Jirka Borovec	eb815000f6	Refactor: skipif for multi - gpus 1/n (#6266 ) * ngpus * gpu * isort * pt * flake8	2021-03-02 09:03:32 +01:00
David Völgyes	651c25feb6	Fix for incorrect usage of detach(), cpu(), to() (#6216 ) * Fix for incorrect detach/cpu calls (#6214) * Fix incorrect use of detach(), to(), and cpu(), #6214 * Fix incorrect use of detach() and cpu(), #6214 * update pr * add typing * chlog * more... * revert on module * update on comments * revert changes on model Co-authored-by: tchaton <thomas@grid.ai> Co-authored-by: Jirka Borovec <jirka.borovec@seznam.cz>	2021-03-01 15:15:52 +00:00
Adrian Wälchli	0456b4598f	mini refactor for _running_stage access (#5724 ) * running stage * circular import * running stage cleanup * fix unused import * fix running stage access * add return type * Revert "add return type" This reverts commit `65b0fe269c`. * try fix typing	2021-02-22 12:01:54 +01:00
chaton	e982800b81	Add PredictLoop (#5752 ) * integrate distrib_type * sync changes * sync * fixes * add forgotten generators * add missing logic * update * import * missed imports * import fixes * isort * mv f * changelog * format * move helper to parallel plugin * d * add world size * clean up * duplicate * activate ddp_sharded and tpu * set nvidia flags * remove unused colab var * use_tpu <-> on_tpu attrs * make some ddp_cpu and clusterplugin tests pass * Ref/accelerator connector (#5742) * final cleanup Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * connector cleanup Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * trainer cleanup Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * accelerator cleanup + missing logic in accelerator connector Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * add missing changes to callbacks Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * reflect accelerator changes to lightning module Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * clean cluster envs Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * cleanup plugins Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * add broadcasting Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * yapf * remove plugin connector Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * plugins * add predict_loop * manual optimization * clean predictloop * update optimizer routing * add predict loop on new accelerator * resolve a bug * add rank to torchelastic * add predict_loop * add predict loop on new accelerator * resolve a bug * fix memory mixed precision * update * setstate on trainer for pickling in ddp spawn * add predict_loop * clean predictloop * add predict loop on new accelerator * resolve a bug * add predict_loop * add predict loop on new accelerator * resolve a bug * add predict_loop * add predict loop on new accelerator * resolve a bug * add predict_loop * add predict loop on new accelerator * resolve a bug * add predict_loop * clean predictloop * add predict loop on new accelerator * resolve a bug * add predict_loop * add predict loop on new accelerator * resolve a bug * resolve tests * add predict method * add back commented accelerator code * adapt test for sync_batch_norm to new plugin * fix deprecated tests * fix ddp cpu choice when no num_processes are given * yapf format * skip a memory test that cannot pass anymore * remove sanetize * rename train to run_train * remove useless hooks * add misconfigurationException * remove wrong naming * resolve some legacy * udpate docstring * fix pickle error in spawn plugin * x * avoid * x * fix cyclic import in docs build * add support for sharded * update typing * add sharded and sharded_spawn to distributed types * make unwrap model default * refactor LightningShardedDataParallel similar to LightningDistributedDataParallel * update sharded spawn to reflect changes * update sharded to reflect changes * Merge 1.1.5 changes * fix merge * fix merge * yapf isort * fix merge * yapf isort * fix indentation in test * copy over reinit scheduler implementation from dev1.2 * fix apex tracking calls with dev_debugger * reduce diff to dev1.2, clean up * fix trainer config test when gpus>0 and num_processes >0 and ddp_cpu * sort plugin tests legacy/new * fix error handling for amp on cpu * fix merge fix merge fix merge * [Feat] Resolve manual_backward (#5837) * resolve manual_backward * resolve flake8 * update * resolve for ddp_spawn * resolve flake8 * resolve flake8 * resolve flake8 Co-authored-by: Ubuntu <ubuntu@ip-172-31-88-60.ec2.internal> * fix tests/accelerator tests on cpu * [BugFix] Resolve manual optimization (#5852) * resolve manual_optimization * update * update Co-authored-by: Ubuntu <ubuntu@ip-172-31-88-60.ec2.internal> * Remove copy trainer parameters to happen earlier within the loop and add safe guard to get ref model (#5856) * resovle a bug * Accelerator refactor sharded rpc (#5854) * rpc branch * merge * update handling of rpc * make devices etc. Optional in RPC * set devices etc. later if necessary * remove devices from sequential * make devices optional in rpc * fix import * uncomment everything * fix cluster selection Co-authored-by: Ubuntu <ubuntu@ip-172-31-88-60.ec2.internal> * resolve bug * fix assert in rpc test * resolve a test * fix docs compilation * accelerator refactor - fix for sharded parity test (#5866) * fix memory issue with ddp_spawn * x x x x x x x x x * x * Remove DDP2 as this does not apply * Add missing pre optimizer hook to ensure lambda closure is called * fix apex docstring * [accelerator][BugFix] Resolve some test for 1 gpu (#5863) * update * revert init * resolve a bug * update * resolve flake8 * update * update * update * revert init * resolve a bug * update * resolve flake8 * update * update * update * update * update * revert init * resolve a bug * update * resolve flake8 * update * update * update * revert init * update * resolve flake8 * update * update * update * update * update * all_gather * update * make plugins work, add misconfig for RPC * update * update * remove breaking test * resolve some tests * resolve flake8 * revert to ddp_spawn Co-authored-by: root <root@ip-172-31-88-60.ec2.internal> Co-authored-by: Ubuntu <ubuntu@ip-172-31-88-60.ec2.internal> Co-authored-by: Justus Schock <justus.schock@rwth-aachen.de> * yapf isort * resolve flake8 * fix apex doctests * fix apex doctests 2 * resolve docs * update drone * clean env * update * update * update * update * merge * Fix RPC related tests, clean out old API, update for new accelerator API [skip ci] (#5881) * Fix RPC related tests, clean out old API, update for new accelerator API * Move tests out of legacy folder, update paths and names * Update test_remove_1-4.py * Expose properties for tpu cores/gpus/num_gpus * Add root GPU property * Move properties to properties.py * move tests that were previously in drone * Fix root GPU property (#5908) * Move root GPU to property, remove horovod set as this is handled in horovod plugin, ensure we mock correctly to set GPU accelerator * Add missing tests back * fix best model path transfer when no checkpoint callback available * Fix setup hook order [wip] (#5858) * Call trainer setup hook before accelerator setup * Add test case * add new test * typo * fix callback order in test Co-authored-by: tchaton <thomas@grid.ai> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * rename ddp sequential -> rpc sequential for special test * revert * fix stupid merge problem * Use property in connector for sampler (#5913) * merge the import conflicts * fix spawning of processes in slurm * [wip] Fix some bugs for TPU [skip ci] (#5878) * fixed for single tpu * fixed spawn * fixed spawn * update * update * wip * resolve bugs * resolve bug * update on comment * removed decorator * resolve comments * set to 4 * update * update * need cleaning * update * update * update * resolve flake8 * resolve bugs * exclude broadcast * resolve bugs * change test * update * update * skip if meet fails * properly raise trace * update * add catch * wrap test * resolve typo * update * typo Co-authored-by: Lezwon Castelino <lezwon@gmail.com> Co-authored-by: Your Name <you@example.com> * resolve some tests * update * fix imports * update * resolve flake8 * update azure pipeline * skip a sharded test on cpu that requires a gpu * resolve tpus * resolve bug * resolve flake8 * update * updat utils * revert permission change on files * suggestions from carlos Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * remove unrelated formatting changes * remove incomplete comment * Update pytorch_lightning/accelerators/__init__.py Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * remove unrelated formatting change * add types * warn 1.7 ddp manual backward only if ddp kwarg unset * yapf + isort * pep8 unused imports * fix cyclic import in docs * Apply suggestions from code review * typer in accelerator.py * typo * resolve flake8 * update code * update * Update pytorch_lightning/trainer/predict_loop.py Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * Update pytorch_lightning/trainer/predict_loop.py Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * fix merge * fix merge * reset legacy accelerator * add missing rename dispatch * rename post traning * update code * resolved comments * typo * typo * add flow description * resolve comments * update on comments * update flow * add backticks * resolve tpu Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com> Co-authored-by: justusschock <justus.schock@posteo.de> Co-authored-by: Justus Schock <justus.schock@rwth-aachen.de> Co-authored-by: Ubuntu <ubuntu@ip-172-31-88-60.ec2.internal> Co-authored-by: Sean Naren <sean.narenthiran@gmail.com> Co-authored-by: SeanNaren <sean@grid.ai> Co-authored-by: root <root@ip-172-31-88-60.ec2.internal> Co-authored-by: Lezwon Castelino <lezwon@gmail.com> Co-authored-by: Your Name <you@example.com> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>	2021-02-16 17:11:56 -05:00
Jirka Borovec	a0f7831278	fix miss-leading imports in tests (#5873 ) * fix imorts * .	2021-02-09 05:10:52 -05:00
Jirka Borovec	82943515dc	formatting tests1/n (#5843 ) * utils * tuner * base	2021-02-06 08:22:10 -05:00
Adrian Wälchli	692f77b8a7	Refactor LightningDataParallel (#5670 ) * module * fix model access * scalar conversion * refactor * kwargs * auto unsqueeze * refactor code duplication * clean up * docs * update dp docs * changelog * generalize test * test * rename * warning cache * isort * unsqueezing test * device * device * scalar test * device * device * include coverage of overrides * clear * add deprecation test * docs * improve coverage * increase coverage * fix merge * extend test * rename base class * mention the predict method in docs * combine iteration over collection * remove override * move * line * Apply suggestions from code review * fix running stage * f401 * fix cyclic import Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>	2021-01-31 06:08:16 -05:00
chaton	3da28fd634	[feat] 1/2 Add trainer.predict (#5579 ) * start adding predict * add predict * resolve test * add predict * remove limit_predict * update * add test for predict * typo * update on comments * remove predict_step * update ddp_shareded * check ddp_sharded * resolve on comments * resolve isort * update dp * add test dp 1 gpu * made default forward * resolve path * resolve bug * update on comments * resolve doc * resolve bug * update * resolve bug * update on comments * resolve pep8 * update test doc * update on comments * solve special tests * resolve bug * resolve flake8 * Update pytorch_lightning/callbacks/progress.py Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * Update pytorch_lightning/trainer/trainer.py Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * add predict to LightningModule * missing predict * typo * rename is_prediction to _predicting * add * update * update * update doc Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>	2021-01-27 11:38:14 -05:00
Adrian Wälchli	e806bb77fa	Refactor LightningDistributedDataParallel (#5185 ) * add wrapper * add squeeze * replace LightningDistributedDP * update import * module access * inputs * refactor warning * update * resolve flake8 * remove old class * set find unused params to False * update docstrings * update docs * update docs * add changelog * deprecation * rename wrapper -> module * rename pl_module * add unit tests * Revert "add changelog" This reverts commit 02ec0a6864f4ba2ace3bb6fc6ebc364e1a80ffd7. * Revert "set find unused params to False" This reverts commit 8e451515e6ba3227d00f4a5cb63f332cfedb7b30. Co-authored-by: Ubuntu <thomas@grid.ai>	2021-01-13 14:35:42 -05:00

25 Commits