lightning

Commit Graph

Author	SHA1	Message	Date
edward-io	87bd54aedf	fix typos (#11937 )	2022-02-16 17:27:51 -08:00
ananthsub	8d23f6287a	Update module path for `LightningDeprecationWarning` in setup.cfg (#11793 )	2022-02-10 08:59:32 +05:30
ananthsub	a64438c897	Centralize rank_zero_only utilities into their own module (#11747 ) * Centralize rank_zero_only utilities into their own module Fixes #11746 * PossibleUserWarning * Update test_warnings.py * update imports * more imports * Update CHANGELOG.md * Update mlflow.py * Update cli.py * Update api_references.rst * Update meta.py * add deprecation tests * debug standalone * fix standalone tests * Update CHANGELOG.md	2022-02-07 08:09:55 +00:00
ananthsub	2eca957b29	Minor refactors to `init_dist_connection` (#11733 )	2022-02-04 13:33:49 +01:00
Rohit Gupta	96a53382ac	Update utilities API references (#11450 )	2022-01-13 13:22:58 +00:00
Adrian Wälchli	2b0075a47e	Teardown sync-batchnorm after training (#11078 ) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: ananthsub <ananth.subramaniam@gmail.com> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2021-12-16 18:58:44 +00:00
thomas chaton	3d6262b7a9	Fault Tolerant Manual: Add support for DDP (#10638 )	2021-11-25 18:31:53 +01:00
thomas chaton	7cf6374bd0	Fault Tolerant Manual: Add support for collecting states across processes (#10639 )	2021-11-23 14:27:33 +00:00
Kaushik B	d577f461a4	Remove deprecated `utilities.distributed.rank_zero_{warn,deprecation}` (#10451 )	2021-11-10 07:35:48 -08:00
four4fish	0ed5e3dc8a	Raise exceptions when torch distributed is not available (#10418 ) * Raise exceptions when torch distributed is not avalible * add changelog	2021-11-09 09:11:05 +00:00
Adrian Wälchli	a270a79ed9	Rename "master" methods to "main" in ClusterEnvironment plugins (#10103 ) * rename occurrences of master port, master address, maser node, master process * rename properties * add property decorators * occurrences in docs * update changelog * update changelog * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add lost method * create deprecation * add changelog * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix typo (but it was already there!!!) * Apply suggestions from code review Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com> * add todo * update more occurences * add types * add missing import Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>	2021-11-08 12:32:58 +00:00
four4fish	d56e041635	Update init_ddp_connection's name and log (#10295 )	2021-11-01 19:11:15 +00:00
Jirka Borovec	6e124e7207	CI: precommit - docformatter (#8584 ) * CI: precommit - docformatter * fix deprecated Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2021-09-06 12:49:09 +00:00
thomas chaton	045c879e08	Fix `self.log(sync_dist=True, reduce_fx={mean,max})` (#9142 ) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>	2021-08-27 15:40:51 +00:00
Yi Wang	366fb39d2e	Support post-localSGD in Lightning DDP plugin (#8967 ) Co-authored-by: ananthsub <ananth.subramaniam@gmail.com> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2021-08-26 08:24:49 +01:00
Kaushik B	f3c5889aa3	FIx mypy for init_ddp_connection (#9051 )	2021-08-23 15:46:15 +00:00
Kaushik B	0461107972	Move `init_ddp_connection` to distributed utilities (#9044 ) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2021-08-23 14:01:01 +05:30
Carlos Mocholí	4928dc5579	Improve SWA docs (#8717 )	2021-08-05 16:07:50 +00:00
Daniel Stancl	aacd131414	Fix mypy in `utilities.distributed` (#8201 )	2021-08-05 09:51:09 +00:00
Carlos Mocholí	a64cc37394	Replace `yapf` with `black` (#7783 ) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2021-07-26 13:37:35 +02:00
Carlos Mocholí	368ac1c622	[CLI] Drop `ArgumentParser` when pickling and save before spawning (#8017 ) Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2021-07-07 17:56:13 +00:00
deepsource-autofix[bot]	03154eb30a	Refactor unnecessary `else` / `elif` when `if` block has a `return` statement (#8156 ) Co-authored-by: deepsource-autofix[bot] <62050782+deepsource-autofix[bot]@users.noreply.github.com> Co-authored-by: Jirka <jirka.borovec@seznam.cz>	2021-06-28 15:27:41 +05:30
thomas chaton	24db914093	Support state restoration of logged results 2/2(#7966 ) Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com> Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>	2021-06-25 19:16:11 +00:00
Carlos Mocholí	6dd7797c97	Deprecate moved warning functions (#8085 )	2021-06-23 00:09:42 +02:00
Carlos Mocholí	dd340a6598	Actually show deprecation warnings and their line level [2/2] (#8002 ) Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2021-06-21 18:51:53 +02:00
Carlos Mocholí	5593b6f772	Merge pull request #7872 from PyTorchLightning/refactor/logger-poc-changes Random fixes for logger connector PoC	2021-06-08 09:04:16 -04:00
Kaushik B	1b3e4f9fb9	Fix sync_dist for tpus (#6950 )	2021-04-13 14:17:15 +05:30
shuyingsunshine21	313e81638d	Supporting Adding DDP Communication Hooks (#6736 ) * Fix some test errors Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * checkpoint consolidation * Update ddp_spawn.py * Update test_metric_result_integration.py * Update test_results.py * Update utils.py * Update utils.py * Update test_all_gather_grad.py * Update test_all_gather_grad.py * Update test_results.py * Revert "Update test_results.py" This reverts commit `9d4a2b891d`. * Revert "Merge pull request #1 from shuyingsunshine21/shuyingsunshine21-checkpoint_consolidate" This reverts commit `c5053da789`, reversing changes made to `0d23d75bc9`. * Revert "Update test_all_gather_grad.py" This reverts commit `0d23d75bc9`. * Revert "Update utils.py" This reverts commit `70fe5da9c6`. * Revert "Update utils.py" This reverts commit `a9aae99f6e`. * Revert "Update test_results.py" This reverts commit `ea74906878`. * Revert "Update test_metric_result_integration.py" This reverts commit `bf70e431b3`. * Revert "Update ddp_spawn.py" This reverts commit `f17210183b`. * Revert "checkpoint consolidation" This reverts commit `536c1323b0`. * Revert "Revert "checkpoint consolidation"" This reverts commit `3a9fde915a`. * Revert "Revert "Revert "checkpoint consolidation""" This reverts commit `7a369f47e1`. * Revert "Revert "Update ddp_spawn.py"" This reverts commit `8222dc98ea`. * Revert "Revert "Update test_metric_result_integration.py"" This reverts commit `6c095b2370`. * Revert "Revert "Update test_results.py"" This reverts commit `250d0aaaa2`. * Revert "Revert "Update utils.py"" This reverts commit `8651d54d79`. * Revert "Revert "Update test_all_gather_grad.py"" This reverts commit `dcdcd29731`. * modify distributed environment to make test pass * add DDP communication hook * remove test related setting * remove more test related setting * fix ddp comm hook util import issue * comments * one more fix for test_custom_plugin * fix ddp spwan * fix sgd * address comments and add tests * 1. add is gpu checking 2. modify test a bit 3. formatting * formatting nit * fix conda 3.7 1.7 issue for no torch.distributed.algorithms module * need at least 1.8.0 * minor fix * modify changelog * changelog should link to PR number instead of issue number * refine a bit on doc for register_ddp_comm_hook function, like ddp_comm_wrapper explanation and add hyperparameter for power sgd states in example usge * move single device checking before call register_ddp_comm_hook * formatting * comments * typo * pre-commit formatting	2021-04-07 12:35:57 +01:00
ananthsub	86e1d9f759	[fix] Better support for rank_zero_only setting for SLURM and torchelastic (#6802 ) Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2021-04-07 12:25:13 +01:00
Jirka Borovec	217c12a4e7	Simplify deprecations (#6620 ) * use external deprecate * simplify * simplify * simplify * flake8 * . * others * .	2021-03-25 15:26:38 +01:00
Shengyao Zhuang	b8ef52baa1	Match the number of outputs of backward with forward for AllGatherGrad (#6625 )	2021-03-25 15:07:58 +05:30
thomas chaton	0544efd453	[bug] Update broadcast + reduce decision ModelCheckpoint] (#6410 ) * resolve bug * update * update changelog * update PR * Update pytorch_lightning/trainer/connectors/logger_connector/epoch_result_store.py Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * add todo * resolve issues * resolve flake8 * update * add coverage for reduce * wip * restore back to brodbact * remove test.py * resolve flake8 * update * check world size * resolve test * update * use pytorch version when defined * update on comments * update on comments * flake8 * resolve bugs * Update CHANGELOG.md Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * update * update * update * update * remove test * update * resolve flake8 * update * update * update * proxy * update * update * resolve typo * prune * update parallel * update Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2021-03-14 17:14:27 +00:00
Adrian Wälchli	ec8d46e02b	introduce default cluster environment for lightning-specific ddp (#5915 ) * handle distributed_sampler_kwargs * move emptying cache to accelertor * fix a few tests * restoring the result from subprocess * fix queue.get() order for results * add missing "block_backward_sync" context manager * add missing "block_backward_sync" context manager * fix sync_batchnorm * fix supported gpu-ids for tuple * fix clip gradients and inf recursion * accelerator selection: added cluster_environment plugin * fix torchelastic test * fix reduce early stopping decision for DDP * fix tests: callbacks, conversion to lightning optimizer * fix lightning optimizer does not pickle * fix setting benchmark and deterministic option * fix slurm amp test * fix prepare_data test and determine node_rank * fix retrieving last path when testing * remove obsolete plugin argument * fix test: test_trainer_config * fix torchscript tests * fix trainer.model access * move properties * fix test_transfer_batch_hook * fix auto_select_gpus * fix omegaconf test * fix test that needs to simulate slurm ddp * add horovod plugin * fix test with named arguments * clean up whitespace * fix datamodules test * remove old accelerators * fix naming * move old plugins * move to plugins * create precision subpackage * create training_type subpackage * fix all new import errors * fix wrong arguments order passed to test * fix LR finder * Added sharded training type and amp plugin * Move clip grad to precision plugin * Added sharded spawn, select accelerators based on distributed_backend + enable custom fp16 plugin automatically * Fix import issue, attempting to fix tests * Fix initial test * Reflect hook logic from master, should wrap model after move to device * Optional state consolidation, since master has optimizers not wrapped * change attribute for instance test * reset optimizers optimizers are not used in main process, so state would be wrong. * legacy * imports in accel * legacy2 * trainer imports * fix import errors after rebase * move hook to new setup location * provide unwrapping logic * fix trainer callback system * added ddp2 implementation * fix imports .legacy * move plugins * restore legacy * drop test.py from root * add tpu accelerator and plugins * fixes * fix lightning optimizer merge * reset bugreportmodel * unwrapping * step routing forward * model access * unwrap * opt * integrate distrib_type * sync changes * sync * fixes * add forgotten generators * add missing logic * update * import * missed imports * import fixes * isort * mv f * changelog * format * move helper to parallel plugin * d * add world size * clean up * duplicate * activate ddp_sharded and tpu * set nvidia flags * remove unused colab var * use_tpu <-> on_tpu attrs * make some ddp_cpu and clusterplugin tests pass * Ref/accelerator connector (#5742) * final cleanup Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * connector cleanup Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * trainer cleanup Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * accelerator cleanup + missing logic in accelerator connector Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * add missing changes to callbacks Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * reflect accelerator changes to lightning module Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * clean cluster envs Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * cleanup plugins Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * add broadcasting Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * yapf * remove plugin connector Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * plugins * manual optimization * update optimizer routing * add rank to torchelastic * fix memory mixed precision * setstate on trainer for pickling in ddp spawn * add predict method * add back commented accelerator code * adapt test for sync_batch_norm to new plugin * fix deprecated tests * fix ddp cpu choice when no num_processes are given * yapf format * skip a memory test that cannot pass anymore * fix pickle error in spawn plugin * x * avoid * x * fix cyclic import in docs build * add support for sharded * update typing * add sharded and sharded_spawn to distributed types * make unwrap model default * refactor LightningShardedDataParallel similar to LightningDistributedDataParallel * update sharded spawn to reflect changes * update sharded to reflect changes * Merge 1.1.5 changes * fix merge * fix merge * yapf isort * fix merge * yapf isort * fix indentation in test * copy over reinit scheduler implementation from dev1.2 * fix apex tracking calls with dev_debugger * reduce diff to dev1.2, clean up * fix trainer config test when gpus>0 and num_processes >0 and ddp_cpu * sort plugin tests legacy/new * fix error handling for amp on cpu * fix merge fix merge fix merge * [Feat] Resolve manual_backward (#5837) * resolve manual_backward * resolve flake8 * update * resolve for ddp_spawn * resolve flake8 * resolve flake8 * resolve flake8 Co-authored-by: Ubuntu <ubuntu@ip-172-31-88-60.ec2.internal> * fix tests/accelerator tests on cpu * [BugFix] Resolve manual optimization (#5852) * resolve manual_optimization * update * update Co-authored-by: Ubuntu <ubuntu@ip-172-31-88-60.ec2.internal> * Remove copy trainer parameters to happen earlier within the loop and add safe guard to get ref model (#5856) * resovle a bug * Accelerator refactor sharded rpc (#5854) * rpc branch * merge * update handling of rpc * make devices etc. Optional in RPC * set devices etc. later if necessary * remove devices from sequential * make devices optional in rpc * fix import * uncomment everything * fix cluster selection Co-authored-by: Ubuntu <ubuntu@ip-172-31-88-60.ec2.internal> * resolve bug * fix assert in rpc test * resolve a test * fix docs compilation * accelerator refactor - fix for sharded parity test (#5866) * fix memory issue with ddp_spawn * x x x x x x x x x * x * Remove DDP2 as this does not apply * Add missing pre optimizer hook to ensure lambda closure is called * fix apex docstring * [accelerator][BugFix] Resolve some test for 1 gpu (#5863) * update * revert init * resolve a bug * update * resolve flake8 * update * update * update * revert init * resolve a bug * update * resolve flake8 * update * update * update * update * update * revert init * resolve a bug * update * resolve flake8 * update * update * update * revert init * update * resolve flake8 * update * update * update * update * update * all_gather * update * make plugins work, add misconfig for RPC * update * update * remove breaking test * resolve some tests * resolve flake8 * revert to ddp_spawn Co-authored-by: root <root@ip-172-31-88-60.ec2.internal> Co-authored-by: Ubuntu <ubuntu@ip-172-31-88-60.ec2.internal> Co-authored-by: Justus Schock <justus.schock@rwth-aachen.de> * yapf isort * resolve flake8 * fix apex doctests * fix apex doctests 2 * resolve docs * update drone * clean env * update * update * update * update * merge * Fix RPC related tests, clean out old API, update for new accelerator API [skip ci] (#5881) * Fix RPC related tests, clean out old API, update for new accelerator API * Move tests out of legacy folder, update paths and names * Update test_remove_1-4.py * Expose properties for tpu cores/gpus/num_gpus * Add root GPU property * Move properties to properties.py * move tests that were previously in drone * Fix root GPU property (#5908) * Move root GPU to property, remove horovod set as this is handled in horovod plugin, ensure we mock correctly to set GPU accelerator * Add missing tests back * fix best model path transfer when no checkpoint callback available * Fix setup hook order [wip] (#5858) * Call trainer setup hook before accelerator setup * Add test case * add new test * typo * fix callback order in test Co-authored-by: tchaton <thomas@grid.ai> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * rename ddp sequential -> rpc sequential for special test * revert * fix stupid merge problem * abstract the cluster plugins * default plugin * integrate default environment * fix property * adapt tests * adjust test * fix world size access * base cluster env * revert rebase errors * revert rebase errors * missing import * revert unrelated change * remove unused cluster local rank * remove unrelated changes * fix unrelated changes * fix pep8 * remove unused var * reset permissions * ypaf * test default environment * test torchelastic environment * world size as int * tests for slurm environment * changelog * test comments * remove unintended change * keep master port fixed after it is generated * test random master port * yapf * add missing default environment * move helper function * rename default environment * rename * rename * yapf * Update pytorch_lightning/plugins/environments/lightning_environment.py Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * Update CHANGELOG.md Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com> * spawn -> create Co-authored-by: justusschock <justus.schock@posteo.de> Co-authored-by: SeanNaren <sean@grid.ai> Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com> Co-authored-by: Jirka Borovec <jirka.borovec@seznam.cz> Co-authored-by: Justus Schock <justus.schock@rwth-aachen.de> Co-authored-by: chaton <thomas@grid.ai> Co-authored-by: Ubuntu <ubuntu@ip-172-31-88-60.ec2.internal> Co-authored-by: Sean Naren <sean.narenthiran@gmail.com> Co-authored-by: root <root@ip-172-31-88-60.ec2.internal> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2021-03-05 01:47:29 +00:00
Adrian Wälchli	bc577ca792	fix duplicate console logging bug v2 (#6275 ) Co-authored-by: chaton <thomas@grid.ai> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>	2021-03-02 15:17:55 +05:30
Jirka Borovec	eee38d59e7	formatting to PL utils (#5713 ) * yapf pl base * over * dist * utils * Apply suggestions from code review * flake8 * neew way	2021-01-30 15:28:59 +01:00
Jirka Borovec	2846322f60	fix docs render (#5610 )	2021-01-25 20:21:00 -05:00
chaton	be255de306	Bugfix/all gather (#5221 ) * resolve bug * add tests * add tests * resolve flake8 * update * update * remove globals * typo * Update pytorch_lightning/utilities/distributed.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * update * update * add suport int, float * update * resolve pep8 * Update pytorch_lightning/core/lightning.py Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * Update tests/utilities/test_all_gather_grad.py Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * update doc * add bool and np.ndarray * resolve conflicts * resolve conflicts * resolve pep8 * add changelog * Update pytorch_lightning/core/lightning.py Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: Ubuntu <ubuntu@ip-172-31-62-109.ec2.internal> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2021-01-09 07:37:44 -05:00
Gregor	0858beaf6b	[bugfix] Group defaults to WORLD if None (#5125 ) * [bugfix] Group defaults to WORLD if None * fix no_grad * Update pytorch_lightning/utilities/distributed.py * Update pytorch_lightning/utilities/distributed.py Co-authored-by: Gregor Koporec <gregork@unicorn.gorenje.com> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> Co-authored-by: Sean Naren <sean.narenthiran@gmail.com> (cherry picked from commit `176735097a`)	2021-01-06 11:37:43 +01:00
Gregor	d0b23f784a	[bugfix] Correct call to torch.no_grad (#5124 ) Co-authored-by: Gregor Koporec <gregork@unicorn.gorenje.com> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>	2021-01-05 09:58:37 +01:00
chaton	58a2993766	support number for logging with sync_dist=True (#5080 ) * support number * add two tests * wip * add ddp in special test * remove a test * move device to bottom * simplify test * update test * Update pytorch_lightning/core/step_result.py Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * resolve sync_ddp Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2021-01-05 09:58:37 +01:00
Ananya Harsh Jha	127454ade2	All gatherwith grads (#5012 ) * all_gather * ddp * horovod * grad tests * fixed ddp * ddp fixed, removed tpu, horovod for now * changelog * windows fix * windows fix * removed batch from ctx * all_gather * ddp * horovod * grad tests * fixed ddp * ddp fixed, removed tpu, horovod for now * changelog * windows fix * windows fix * removed batch from ctx * removed code duplication * merge Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>	2020-12-08 23:20:01 +00:00
Nicki Skafte	1b40a4053d	Auto convert to contiguous format for all_gather (#4907 ) * convert memory format * changelog * formatting * suggestions * retrigger tests Co-authored-by: Nicki Skafte <nugginea@gmail.com> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: chaton <thomas@grid.ai>	2020-12-05 15:49:45 +01:00
Travis Addair	51cc7a89ee	Horovod: fixed early stopping and added metrics aggregation (#3775 ) * Fixed early stopping for Horovod * Refactored to sync_dist_if_available * Bump min Horovod version to support hvd.is_initialized * Changelog * Added back change for Horovod * Removed redundant checks for initialization * Implement metrics gathering for Horovod * Added test for EvalResult * Renamed ddp_sync_on_step -> dist_sync_on_step * Added metric test for Horovod * Added option pass callable allgather function to metric base class * Added dist_sync_fn * Fixed calls to private _sync_dist * Fixed Horovod test * Added sync_tensor to the distributed backend * Skip Windows * Insert test path * Removed redundant import * Updated drone * Unset HOROVOD_GPU_ALLREDUCE * Unset * No cache dir * No uninstall * Unset variables * Uninstall Horovod during initialization * Replaced more references to ddp_sync_on_step * Fixed imports * Fixed attribute * Added back default * Lint * Added back docstring * Made gather_all_tensors default * Added whitespace * Update tests/models/test_horovod.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * Update pytorch_lightning/metrics/metric.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * Update CHANGELOG.md Co-authored-by: Teddy Koker <teddy.koker@gmail.com> Co-authored-by: Sean Naren <sean.narenthiran@gmail.com> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>	2020-11-05 12:52:02 -05:00
Ananya Harsh Jha	f76bc5254e	revamp entire metrics (#3868 ) * removed metric Co-authored-by: Teddy Koker <teddy.koker@gmail.com> * added new metrics Co-authored-by: Teddy Koker teddy.koker@gmail.com * pep8 Co-authored-by: Teddy Koker teddy.koker@gmail.com * pep8 Co-authored-by: Teddy Koker <teddy.koker@gmail.com> * docs Co-authored-by: Teddy Koker <teddy.koker@gmail.com> * docs Co-authored-by: Teddy Koker <teddy.koker@gmail.com> * win ddp tests skip Co-authored-by: Teddy Koker <teddy.koker@gmail.com> * win ddp tests skip Co-authored-by: Teddy Koker <teddy.koker@gmail.com> * win ddp tests skip Co-authored-by: Teddy Koker <teddy.koker@gmail.com> * win ddp tests skip Co-authored-by: Teddy Koker <teddy.koker@gmail.com> * reset in compute, cache compute Co-authored-by: Teddy Koker <teddy.koker@gmail.com> * reduce_ops handling Co-authored-by: Teddy Koker <teddy.koker@gmail.com> * sync -> sync_dist, type annotations Co-authored-by: Teddy Koker <teddy.koker@gmail.com> * wip docs Co-authored-by: Teddy Koker <teddy.koker@gmail.com> * mean squared error * docstring * added mean ___ error metrics * added mean ___ error metrics * seperated files * accuracy doctest * gpu fix * remove unnecessary mixin * metric and accuracy docstring Co-authored-by: Teddy Koker <teddy.koker@gmail.com> * metric docs Co-authored-by: Teddy Koker <teddy.koker@gmail.com> * pep8, changelog Co-authored-by: Teddy Koker <teddy.koker@gmail.com> * refactor dist utils, pep8 * refactor dist utils, pep8 Co-authored-by: Teddy Koker <teddy.koker@gmail.com>	2020-10-06 17:03:24 -04:00
Carlos Mocholí	a9c0ed920a	Fix log debug call (#3528 )	2020-09-18 02:06:44 +05:30
William Falcon	f43028f3ae	added copyright notices (#3062 )	2020-08-19 22:03:22 -04:00
Adrian Wälchli	188e06c261	ddp fix for trainer.test() + add basic ddp tests (#2997 ) * add ddp script variations * add ddp test * rename * shell * test * test * try call * try without subprocess * test * display the error * list all variations * try string * try copy env * debug * pythonpath * path * update test * change * simple ddp test * replace * remove random port * random port * str * clean up * check run spawn * clean up * docs * docs * update test * docs * changelog * changelog	2020-08-16 11:19:57 -04:00
Jirka Borovec	b7d72706c3	clean imports (#2867 ) * clean imports * miss	2020-08-08 00:33:51 +02:00
William Falcon	62ce00f96c	EvalResult support for val loop (PR 3/5) (#2651 ) * add EvalResult to support to val/test loops	2020-07-22 13:53:10 -04:00
ananthsub	ed581eb64f	Fix local rank zero casting (#2640 ) * Fix local rank zero casting The environment variable 'LOCAL_RANK' can be a string, causing the `if rank_zero_only.rank == 0` check to fail * Update distributed.py address comment	2020-07-18 20:12:06 -04:00

1 2

54 Commits