lightning/pytorch_lightning/utilities
Travis Addair 51cc7a89ee
Horovod: fixed early stopping and added metrics aggregation (#3775)
* Fixed early stopping for Horovod

* Refactored to sync_dist_if_available

* Bump min Horovod version to support hvd.is_initialized

* Changelog

* Added back change for Horovod

* Removed redundant checks for initialization

* Implement metrics gathering for Horovod

* Added test for EvalResult

* Renamed ddp_sync_on_step -> dist_sync_on_step

* Added metric test for Horovod

* Added option pass callable allgather function to metric base class

* Added dist_sync_fn

* Fixed calls to private _sync_dist

* Fixed Horovod test

* Added sync_tensor to the distributed backend

* Skip Windows

* Insert test path

* Removed redundant import

* Updated drone

* Unset HOROVOD_GPU_ALLREDUCE

* Unset

* No cache dir

* No uninstall

* Unset variables

* Uninstall Horovod during initialization

* Replaced more references to ddp_sync_on_step

* Fixed imports

* Fixed attribute

* Added back default

* Lint

* Added back docstring

* Made gather_all_tensors default

* Added whitespace

* Update tests/models/test_horovod.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* Update pytorch_lightning/metrics/metric.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* Update CHANGELOG.md

Co-authored-by: Teddy Koker <teddy.koker@gmail.com>
Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2020-11-05 12:52:02 -05:00
..
__init__.py notices (#4118) 2020-10-13 07:18:07 -04:00
apply_func.py implement fix and test (#3459) 2020-09-11 10:55:58 -04:00
argparse_utils.py feature: Allow str arguments in Trainer.profiler (#3656) 2020-10-27 16:27:16 +05:30
cloud_io.py Load checkpoint from Bytes (#4314) 2020-10-23 11:29:13 -04:00
data.py Revert "Remove limitation of batch scaler (#4006)" (#4040) 2020-10-09 21:03:23 -04:00
debugging.py [test] Accumulated gradient optimization tests (#4477) 2020-11-02 23:44:11 +00:00
device_dtype_mixin.py Fix for PyTorch 1.7 CI (#3768) 2020-10-01 16:37:00 +02:00
device_parser.py Set correct device ids in DDP [wip] (#4297) 2020-10-24 17:33:47 -04:00
distributed.py Horovod: fixed early stopping and added metrics aggregation (#3775) 2020-11-05 12:52:02 -05:00
exceptions.py added copyright notices (#3062) 2020-08-19 22:03:22 -04:00
memory.py added copyright notices (#3062) 2020-08-19 22:03:22 -04:00
model_utils.py add test for model hooks (#4010) 2020-10-20 13:33:46 +01:00
parsing.py is_picklable: catch AttributeError (addresses #3771) (#4508) 2020-11-04 23:40:57 +05:30
seed.py Do not set PYTHONHASHSEED #2156 (#3745) 2020-09-30 08:38:24 -04:00
upgrade_checkpoint.py notices (#4118) 2020-10-13 07:18:07 -04:00
warning_utils.py notices (#4118) 2020-10-13 07:18:07 -04:00
xla_device_utils.py timeout for tpu check (#4340) 2020-11-01 01:04:25 +01:00