lightning

Commit Graph

Author	SHA1	Message	Date
Jirka Borovec	53b0ae49b9	fix imports / isort / flake8	2021-01-26 14:57:34 +01:00
SeanNaren	127e04124d	Fix merge issue	2021-01-26 14:29:47 +01:00
chaton	0435e23a64	deprecate enable_pl_optimizer as it is not restored properly (#5244 ) * update * clean test * still in progress * udpdate test * update * update * resolve flake * add test for zero_grad * update * works without accumulated_grad * update * update * resolve amp * revert back to True * update * clean tests * cleaned out * typo * update test * git repare bug * remove print * udpate * Fix formatting/optimizer imports * Refactor the test for cleanliness * Add vanilla model to the test, better var names * Fixed var names, let's clean up these mock tests * repare test * update test * resolve flake8 * add manual_optimization * update tests * resolve flake8 * add random accumulate_grad_batches * improve test * Update tests/trainer/optimization/test_parity_automatic_optimization.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * Update tests/trainer/optimization/test_parity_automatic_optimization.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * update * clean tests * correct bug * Apply suggestions from code review * format * adress comments * update on comments * wip * typo * depreceate enable_pl_optimizer * resolve latest bugs * update * resolve merge * add comment * Update pytorch_lightning/core/lightning.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * Update tests/deprecated_api/test_remove_1-3.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * Update pytorch_lightning/trainer/connectors/optimizer_connector.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * Update pytorch_lightning/trainer/trainer.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * Update pytorch_lightning/trainer/trainer.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * Update tests/trainer/optimization/test_parity_automatic_optimization.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * update on comments * update restore * add a property * remove setstate as not needed anymore * update test * provide optimizer to on_before_zero_grad * update on comments * update on comments * Update pytorch_lightning/trainer/trainer.py Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * Update tests/trainer/optimization/test_parity_automatic_optimization.py Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * Update tests/trainer/optimization/test_parity_automatic_optimization.py Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * Update tests/trainer/optimization/test_parity_automatic_optimization.py Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * mofidy import * update changelog * resolve flake8 * update * update * clean doc Co-authored-by: SeanNaren <sean@grid.ai> Co-authored-by: Ubuntu <ubuntu@ip-172-31-62-109.ec2.internal> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Jirka Borovec <jirka.borovec@seznam.cz> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: Sean Naren <sean.narenthiran@gmail.com> (cherry picked from commit `f2e99d617f`)	2021-01-26 14:29:46 +01:00
chaton	f2f4a49271	[bug-fix] Call transfer_batch_to_device in DDPlugin (#5195 ) * hacking out * update * remove useless on_before_forward * update * remove overriden * iremove os * use on_before_forward * resolve flake8 * add test * update * add single_process_per_device * resolve flake8 * update * resolve * update * update * update * add comment * resolve bug with sharded * update * remove property * update * resolve test * resolve bug * update on comments * update doc * Update pytorch_lightning/core/hooks.py Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> * update on comments * Update pytorch_lightning/plugins/ddp_plugin.py Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> * Update pytorch_lightning/plugins/ddp_plugin.py Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> * resolve pep8 * add device_ids to pipe * update on comments * update * resolve * update * update * update Co-authored-by: Ubuntu <ubuntu@ip-172-31-62-109.ec2.internal> Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> Co-authored-by: Sean Naren <sean.narenthiran@gmail.com> (cherry picked from commit `d510707bc9`)	2021-01-26 14:28:45 +01:00
Jirka Borovec	2846322f60	fix docs render (#5610 )	2021-01-25 20:21:00 -05:00
Arnaud Gelas	1ff6b18e8a	Fix pre-commit isort failure on pytorch_lightning/accelerators (#5503 ) Remove from skipped module in pyproject.toml and fix failures on: - pytorch_lightning/accelerators/*.py	2021-01-16 14:10:56 -05:00
Adrian Wälchli	e806bb77fa	Refactor LightningDistributedDataParallel (#5185 ) * add wrapper * add squeeze * replace LightningDistributedDP * update import * module access * inputs * refactor warning * update * resolve flake8 * remove old class * set find unused params to False * update docstrings * update docs * update docs * add changelog * deprecation * rename wrapper -> module * rename pl_module * add unit tests * Revert "add changelog" This reverts commit 02ec0a6864f4ba2ace3bb6fc6ebc364e1a80ffd7. * Revert "set find unused params to False" This reverts commit 8e451515e6ba3227d00f4a5cb63f332cfedb7b30. Co-authored-by: Ubuntu <thomas@grid.ai>	2021-01-13 14:35:42 -05:00
Jirka Borovec	54d20dc596	Refactor: clean trainer device & distrib getters (#5300 ) * warnings * . * . * flake8 * . * . * . * use_tpu * use_dp * . * use_ddp * . * use_horovod * . * . * .	2021-01-12 05:22:37 -05:00
Jirka Borovec	5ae6926a52	fix some minor typos in docs (#5369 ) * fix docs typos * Apply suggestions from code review Co-authored-by: Wansoo Kim <rladhkstn8@gmail.com> * flake8 Co-authored-by: Wansoo Kim <rladhkstn8@gmail.com>	2021-01-07 08:01:52 -05:00
ananthsub	a7fe24e9a1	Fix hang in DDP HPC accelerators (#5157 ) * Fix hang in DDP HPC accelerators init_device was never called * Update CHANGELOG.md	2021-01-05 09:58:36 +01:00
Jirka Borovec	b72ed71d4e	Refactor: clean trainer device & distrib setters (#5297 ) * naive replace * simplify * clean * . * fix * . * fix * fix	2021-01-04 17:10:13 +00:00
Jirka Borovec	957583544a	mark todo exceptions (#5320 ) * mark todo exceptions * . * . * . * . * . * . * . * . * try * .	2021-01-04 09:07:56 +01:00
Jirka Borovec	0f36525e8f	fix/enable - check F401 (#5201 ) * refactor - check F401 * missed * fix	2020-12-21 10:15:04 +01:00
Jirka Borovec	2d54116baa	annotat unused vars (#5017 ) * annotate all unused vars * rank_zero_warn * Apply suggestions from code review Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * f1 fixed Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2020-12-19 13:53:06 +01:00
Jirka Borovec	059eaecbb4	set xxx_AVAILABLE as protected (#5082 ) * sett xxx_AVAILABLE as protected * docs	2020-12-14 20:19:05 +05:30
chaton	2c3d43dcb5	Initialize trainer with None in DDPAccelerator (#4915 ) * Initialize trainer with None * add typing to all accelerators * resolve imports * update * add typing * removed typo * update * Fix formatting and imports in accelerator Co-authored-by: maxjeblick <maxjeblick@users.noreply.github.com> Co-authored-by: Sean Naren <sean.narenthiran@gmail.com> Co-authored-by: SeanNaren <sean@grid.ai> Co-authored-by: Roger Shieh <sh.rog@protonmail.ch>	2020-12-10 15:24:44 +01:00
Jirka Borovec	d5fa02e798	simplify accelerator steps (#5015 ) * simplify accelerator steps * Apply suggestions from code review Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>	2020-12-10 18:36:13 +05:30
Jirka Borovec	cdbddbe99f	release 1.1.0 (#5048 ) * release 1.1.0 * pep8	2020-12-10 00:52:39 +00:00
Jirka Borovec	ce9179591d	ref: clean config [1/n] add intermediate setters (#4990 ) * add intermediate setters * show inputs * fix options * move * fix * less talk * fix * talk less * str * cases * rename Co-authored-by: chaton <thomas@grid.ai>	2020-12-09 14:13:57 -05:00
Rohit Gupta	bcbba3b702	Simplify GPU and TPU accelerator (#5024 )	2020-12-09 14:12:44 -05:00
Jirka Borovec	53d7c9555c	drop usage of deprecated distributed_backend (#5009 ) Co-authored-by: chaton <thomas@grid.ai> Co-authored-by: Roger Shieh <sh.rog@protonmail.ch>	2020-12-09 09:18:23 +01:00
Ananya Harsh Jha	127454ade2	All gatherwith grads (#5012 ) * all_gather * ddp * horovod * grad tests * fixed ddp * ddp fixed, removed tpu, horovod for now * changelog * windows fix * windows fix * removed batch from ctx * all_gather * ddp * horovod * grad tests * fixed ddp * ddp fixed, removed tpu, horovod for now * changelog * windows fix * windows fix * removed batch from ctx * removed code duplication * merge Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>	2020-12-08 23:20:01 +00:00
Sean Naren	ee9b3fe574	[feat] pp 1/n (#5016 ) * Added changes for RPC plugin * Add missing kwargs * Fix code format * Loading refactors by introducing is_distributed var, fix optimizer step flow * Add rpc guard * Added docstrings and typing * resolve comments * Add additional rpc hook, refactor name of exit process hook for clarity * remove annotation * Modify behaviour to allow optional return, add test for rpc plugin * resolve tests * rename is_ddp_based * update * update for windows * update * resolve test * code smell * Revert back to init_ddp_connection for backwards compat * Swap to explicit name for property * Add missing speed parity increase for CI variability, fix call counts for child process Co-authored-by: tchaton <thomas@grid.ai>	2020-12-08 22:02:10 +00:00
maxjeblick	79ae66d026	Initialize trainer with None (#4847 ) Co-authored-by: Sean Naren <sean.narenthiran@gmail.com> Co-authored-by: chaton <thomas@grid.ai> Co-authored-by: edenlightning <66261195+edenlightning@users.noreply.github.com>	2020-12-08 22:49:55 +05:30
chaton	2393474350	[hotfix] ddp + manual_optimisation (#4976 ) * Rely on ddp plugin for blocking sync behaviour, and skip if we're using manual optimization * debug * Revert "debug" This reverts commit `ccca6b6b` * Expose manual reduce for automatic optimization * Add input arguments * Enable parity test * clean imports * Expose hook after to ensure we reset * Fix naming * add * fix test * resolve on comments * typo * Update tests/trainer/optimization/test_manual_optimization.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * Update tests/trainer/optimization/test_manual_optimization.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * update on comments * resolve comments Co-authored-by: SeanNaren <sean@grid.ai> Co-authored-by: Sean Naren <sean.narenthiran@gmail.com> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>	2020-12-07 19:31:54 +00:00
chaton	02152c1729	Simplify optimization Logic (#4984 ) * Rely on ddp plugin for blocking sync behaviour, and skip if we're using manual optimization * debug * Revert "debug" This reverts commit `ccca6b6b` * Expose manual reduce for automatic optimization * Add input arguments * Enable parity test * clean imports * Expose hook after to ensure we reset * Fix naming * add * fix test * uniformize optimizer logic * resolve test * resovle flake8 * resolve amp bug * update tests * remove bug * remove optimizer_step in accelerators * typo * update lightning optimizer * set doesn't work with ddp_spawn * resolve flake8 * update threshold * ignore pyright * correct codeFactor * remove useless if * remove zer_grad function * simplify step * remove typo * resolve bug * Apply suggestions from code review * update on comments * resolve bugs * remove tests * Update pytorch_lightning/trainer/configuration_validator.py Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> * simplify testing * add more tests Co-authored-by: SeanNaren <sean@grid.ai> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>	2020-12-07 12:55:49 +00:00
Jirka Borovec	3976db597d	refactor imports of optional dependencies (#4859 ) * refactor imports of optional dependencies * fix * fix * fix * fix * fix * flake8 * flake8 Co-authored-by: Sean Naren <sean.narenthiran@gmail.com> Co-authored-by: chaton <thomas@grid.ai>	2020-12-04 10:26:10 +01:00
Lezwon Castelino	12cb9942a1	Tpu save (#4309 ) * convert xla tensor to cpu before save * move_to_cpu * updated CHANGELOG.md * added on_save to accelerators * if accelerator is not None * refactors * change filename to run test * run test_tpu_backend * added xla_device_utils to tests * added xla_device_utils to test * removed tests * Revert "added xla_device_utils to test" This reverts commit 0c9316bb * fixed pep * increase timeout and print traceback * lazy check tpu exists * increased timeout removed barrier for tpu during test reduced epochs * fixed torch_xla imports * fix tests * define xla utils * fix test * aval * chlog * docs * aval * Apply suggestions from code review Co-authored-by: Jirka Borovec <jirka.borovec@seznam.cz> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: chaton <thomas@grid.ai> Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>	2020-12-02 13:05:11 +00:00
chaton	c2e6e68c7e	optimizer clean up (#4658 ) * add LightningOptimizer * typo * add mock closure * typo * remove logic in optimizer_step * update * update * update * desactivate LightningOptimizer for hovorod * resolve flake * typo * check optimizer name * change name * added backward to LightningOptimizer * remove use_lightning_optimizer * move update * simplify init * resolve comments * resolve bug * update * update * resolve bugs * resolve flake8 * set state * work manual_optimizer_step * add doc * add enable_pl_optimizer * make optimizer_step * add make_optimizer_step * add examples * resolve test * add test_optimizer_return_options_enable_pl_optimizer * add enable_pl_optimizer=True * update * update tests * resolve bugs * update * set Trainer to False * update * resolve bugs * update * remove from doc * resolve bug * typo * update * set to True * simplification * typo * resolve horovod * unwrap horovod * remove Optimizer * resolve horovod * move logic to amp_backend * doesn't seem to be pickable * update * add again * resolve some bugs * cleanup * resolve bug with AMP * change __repr__ * round at -12 * udpate * update * update * remove from horovod * typo * add convert_to_lightning_optimizers in each accelerators * typo * forgot * forgot a convert_to_lightning_optimizers * update * update * update * increase coverage * update * resolve flake8 * update * remove useless code * resolve comments + add support for LightningOptimizer base class * resolve flake * check optimizer get wrapped back * resolve DDPSharded * reduce code * lightningoptimizer * Update pytorch_lightning/core/optimizer.py Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * Update pytorch_lightning/core/lightning.py * remove reference to step function * Apply suggestions from code review * update on comments * resolve * Update CHANGELOG.md * add back training_step in apex and native_amp * rename optimizer_step Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: William Falcon <waf2107@columbia.edu> Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>	2020-12-01 00:09:46 +00:00
Jirka Borovec	217650320e	simplify imports Omegaconf (#4873 ) * hydra * omegaconf	2020-11-27 01:00:56 +01:00
Jirka Borovec	442d57f1e9	simplify imports xla / TPU (#4872 ) * xla * tpu * fix * fix * flake8	2020-11-27 00:37:48 +01:00
Sean Naren	404af43cde	5/n: Extract reference model call to plugins/accelerators (#4773 ) * Encapsulate extracting reference model within the plugin to allow custom wrapper logic to live within the plugin/accelerators * Add missing new lines * Fix call to accelerator * Removed double blank * Use accelerator backend * Handle case where wrapper has not been initialized within the plugin * Added basic get model tests, add better typing * Change model name * Split GPU/DDP test * Add stronger typing, skip ddp test on windows * Fix import * Fix import in dp * Fixed PEP8 definition * Add ddp launcher for ddp testing * Modify accelerator reference model to property, change name to reflect func * Revert property as this is incorrect.= * Revert across accelerators * Modified name to get_model_from_plugin * Code review changes, fix issue with dp * Add verb to function getter Co-authored-by: chaton <thomas@grid.ai>	2020-11-23 17:21:47 +00:00
ananthsub	45c57600af	Move init_ddp_connection to DDP Plugin (#4407 ) * Move init_ddp_connection to DDP Plugin * cluster-env * trainer? * imports * Update ddp_plugin.py Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>	2020-11-18 15:49:22 -05:00
Sean Naren	e7134a9135	Sharded Plugin 2/n: Allow ddp plugin to modify optimizer state saving (#4675 ) * Allow ddp plugin to modify optimizer state saving * Rely on the accelerator for optimizer states * Ensure we init the accelerator for the saving function * Better comment for optim state dump * Revert "Ensure we init the accelerator for the saving function" This reverts commit `af65effa` * Added accelerator check to initialize tuner before saving model checkpoint * Simplify comment * Revert "Added accelerator check to initialize tuner before saving model checkpoint" This reverts commit `f9929c0c` * Return single optimizer state to reduce duplication * Fixed docstring * Fixed typing * Fixed comment * Added CHANGELOG.md Co-authored-by: chaton <thomas@grid.ai>	2020-11-18 16:38:35 +00:00
Sean Naren	8283680aa0	Sharded Plugin 3/n: Expose step input to DDP plugin (#4686 ) * Allow ddp plugin to move the input to a different device if needed * Swapped name to on_before_forward to align with hooks in the future * Update pytorch_lightning/plugins/ddp_plugin.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * Pass variable arg type to hook, add example * Remove blank space (pep check) * Added blank line Co-authored-by: William Falcon <waf2107@columbia.edu> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>	2020-11-18 15:45:30 +00:00
chaton	4018237c30	[FEAT] Add lambda closure to manual_optimizer_step (#4618 ) * added lambda_closure * move to types * add 2 new tests * make example more complex * add complex example to doc * added more tests * resolve doc * typo * update * update tpu optimizer_step * Apply suggestions from code review * Update pytorch_lightning/core/lightning.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * update Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>	2020-11-12 19:22:06 +00:00
Sean Naren	bacabaebaf	Sharded Accelerator 1/n: Expose clip gradients to plugins via abstract class (#4639 ) * Added abstract precision plugin to expose clip_gradients function, use within accelerator to clip gradients * Exclude model from override, keep optimizer (needed for sharded clip gradients), add override for O2 support apex * Fix doc * Applied codereview changes * Refactored clip function to encapsulate tpu changes with tpu accelerator. Default to standard clip function for vanilla torch * Pass correct grad clip val * Moved var to property * Apply code review suggestions	2020-11-12 17:18:09 +00:00
Sean Naren	33470ba605	Prevent crash if sync_dist=True on CPU (#4626 ) * Added test/fix for sync_dist raising NotImplementedError * Fixed comments/formatting * Revert base class change, enforce sync tensors across accelerators, added GPU test	2020-11-11 22:04:05 +00:00
chaton	7e08b0d710	[bug-fix] DDP and automatic_optimization=False (#4485 ) * resolve bug * add self._running_manual_optim * update * update tests * update lightning module * resolve bug * update tests * update * resolve pep8 * update * replace by `ddp_spawn` * temporary fix * update * update * move update to training_loop * make both ddp_spawn * introduce `manual_optimizer_step` * update changelog * added changelog wrong place * add force_optimizer_step * update docstring for tests * update optimizer_step * update zero_grad * resolve flake8 * move update into manual_optimizer_step * add zero_grad * remove zero_grad tests * remove manual_backward in AMP, it doesn't help * update * loosen tests * update * update doc * add TODO * Removed unnecessary get model from native amp * Remove try except with pytest raise * Add seed, clean up imports, remove try catch to reproduce error * update code * update test * revert back * formatting * Update pytorch_lightning/core/lightning.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: SeanNaren <sean@grid.ai> Co-authored-by: Sean Naren <sean.narenthiran@gmail.com> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>	2020-11-10 19:44:51 +00:00
William Falcon	ee35907170	Accelerator docs (#4583 ) * accelerator docs * accelerator docs	2020-11-08 17:24:41 -05:00
William Falcon	3ba48d3bc4	ref: unify slurm and TE under backendPlugin 5/n" (#4582 ) * ref: unify slurm and TE under backendPlugin 4/n * ref: unify slurm and TE under backendPlugin 5/n	2020-11-08 16:20:19 -05:00
William Falcon	624f5b5938	ref: unify slurm and TE under backendPlugin 3/n (#4581 )	2020-11-08 15:32:37 -05:00
William Falcon	bfaf014096	ref: unify slurm and TE under backendPlugin 2/n (#4580 )	2020-11-08 15:07:16 -05:00
William Falcon	0f64f15f52	ref: unify slurm and TE under backendPlugin 1/n (#4578 ) * ref: unify slurm and TE under backendPlugin * ref: unify slurm and TE under backendPlugin	2020-11-08 14:28:55 -05:00
cool425589	5e09fd31e9	show progressbar only on progress_rank 0 on ddp_slurm (#4437 ) Co-authored-by: chaton <thomas@grid.ai> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2020-11-06 01:36:22 +01:00
Travis Addair	51cc7a89ee	Horovod: fixed early stopping and added metrics aggregation (#3775 ) * Fixed early stopping for Horovod * Refactored to sync_dist_if_available * Bump min Horovod version to support hvd.is_initialized * Changelog * Added back change for Horovod * Removed redundant checks for initialization * Implement metrics gathering for Horovod * Added test for EvalResult * Renamed ddp_sync_on_step -> dist_sync_on_step * Added metric test for Horovod * Added option pass callable allgather function to metric base class * Added dist_sync_fn * Fixed calls to private _sync_dist * Fixed Horovod test * Added sync_tensor to the distributed backend * Skip Windows * Insert test path * Removed redundant import * Updated drone * Unset HOROVOD_GPU_ALLREDUCE * Unset * No cache dir * No uninstall * Unset variables * Uninstall Horovod during initialization * Replaced more references to ddp_sync_on_step * Fixed imports * Fixed attribute * Added back default * Lint * Added back docstring * Made gather_all_tensors default * Added whitespace * Update tests/models/test_horovod.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * Update pytorch_lightning/metrics/metric.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * Update CHANGELOG.md Co-authored-by: Teddy Koker <teddy.koker@gmail.com> Co-authored-by: Sean Naren <sean.narenthiran@gmail.com> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>	2020-11-05 12:52:02 -05:00
Ananya Harsh Jha	01ab2a933d	[bug] [docs] Clearer optimizer_step override instructions (#4455 ) * fix * flags * remove defaults	2020-11-02 22:13:34 +00:00
chaton	102fa9ee7d	[BUGFIX] AMP + Precision unscale grad (#4441 ) * move unscale within Native plugin * remove gradient tracking from lightning backward * forgot trainer.fit * typo * update * cleanup * set to 1.6 * typo * skip if below 1.6 strict * update changelog * remove useless code * Update tests/plugins/test_amp_plugin.py Co-authored-by: Sean Naren <sean.narenthiran@gmail.com> * Update tests/plugins/test_amp_plugin.py Co-authored-by: Sean Naren <sean.narenthiran@gmail.com> * update changelog * Update CHANGELOG.md Co-authored-by: Sean Naren <sean.narenthiran@gmail.com> Co-authored-by: Jeff Yang <ydcjeff@outlook.com>	2020-11-02 16:36:48 +00:00
Adrian Wälchli	28d45a26a3	Set correct device ids in DDP [wip] (#4297 ) * repro debug c d dd d d d ads d d d f rank f v d d d d d d d d d d d set drop PL_DDP_PID clean up keep set gpus revert Revert "drop PL_DDP_PID" This reverts commit 7d88cae469541ef19128f9c20919fd3a6f863039. d pid gpus clean up clean up misconfig? misconfig clean clean * fix pep * changelog * remove script Co-authored-by: chaton <thomas@grid.ai> Co-authored-by: William Falcon <waf2107@columbia.edu>	2020-10-24 17:33:47 -04:00
Sean Naren	5641b266d5	Bug/4319 ddp checkpoint (#4323 ) * Broadcast best model path to ensure we sync with main process + wait for main process to save * Add barrier call to ensure all processes are in sync * Added changelog commit * Move sync of best model path/score to model checkpoint, keep barrier to ensure all processes complete * Ensure we broadcast as tuple * Add init check * Update pytorch_lightning/callbacks/model_checkpoint.py Co-authored-by: ananthsub <ananth.subramaniam@gmail.com> * Update pytorch_lightning/callbacks/model_checkpoint.py Co-authored-by: ananthsub <ananth.subramaniam@gmail.com> * Removed model checkpoint code, added barrier to trainer to enforce we syncronize and wait for all processes to finish before completing training * Add barrier within teardown call, removed horovod teardown to inherit from base accelerator Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>	2020-10-24 16:55:49 -04:00
William Falcon	753362d0a4	enable ddp as a plugin (#4285 ) * enable custom ddp plugin * enable custom ddp plugin * enable custom ddp plugin * enable custom ddp plugin * enable custom ddp plugin * enable custom ddp plugin * enable custom ddp plugin * enable custom ddp plugin * enable custom ddp plugin * enable custom ddp plugin * enable custom ddp plugin Co-authored-by: chaton <thomas@grid.ai>	2020-10-22 05:15:51 -04:00
Justus Schock	0ec4107697	Optimizer closure (#4190 ) * closure for all optimizers * rename hook and take care of alternating backwards * add comment * training_loop_fix * closure whenever possible * training_loop * simple tests that count backward calls * fix test to work with closure * remove debugging statement * better place * check grads after backward * start fixing manual optimization * skip step when result returned by closure was None * fix gradient clipping test to work with closure * attribute dict result only for automatic optimization * adjust backward calls in accelerator * adjust where to call gradient clipping * adjust backward calls in tests * Apply suggestions from code review Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * pass kwargs to xla optimizer Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2020-10-21 19:34:29 +01:00
Akihiro Nitta	d27ee8b5bf	docs: Add empty lines in docstring [ci skip] (#4232 ) * Add empty lines in docstring for proper docs * Remove Returns: * Remove unnecessary Returns: * Update pytorch_lightning/accelerators/ddp2_accelerator.py Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> * fix returns Co-authored-by: William Falcon <waf2107@columbia.edu> Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>	2020-10-21 09:00:39 -04:00
Jirka Borovec	f37444fa3e	CI: add flake8 (#4239 )	2020-10-19 21:20:17 +01:00
Akihiro Nitta	b45b57cc58	Use `Optional` for arguments set to `None` by default (#4164 ) * Use `Optional` for variables set to `None` by default * Use `Optional` instead of `Union[None, ...]` for consistency	2020-10-15 23:02:50 +02:00
Sean Naren	98eb736496	Added getstate/setstate method for torch.save serialization (#4127 ) * Added getstate/setstate method for torch.save serialization, added additional Optional Typing to results object * Added tests to ensure torch.save does not fail * Added flags to ensure compatible ddp cpu environment * Removed torch version check due to minimum already being 1.3, reduced epochs for speed * Moved tests to separate file * Update to accelerator, move to ddp_spawn to prevent hanging ddp	2020-10-13 16:47:23 -04:00
William Falcon	09c2020a93	notices (#4118 )	2020-10-13 07:18:07 -04:00
William Falcon	4c4b090c66	depre (#4088 )	2020-10-12 05:58:31 -04:00
William Falcon	b9f2682b7d	clean docs, enable grad clip in manual mode (#4078 ) * docs * docs	2020-10-11 13:12:35 -04:00
William Falcon	7ffe05a3d1	ref: accelerator names (#4066 ) * ref: accelerator names * docs	2020-10-11 01:05:14 -04:00
William Falcon	a4b9221fc5	ref: decouple apex second attemp part n/n (#4065 ) * ref: decouple apex second attemp part n/n * ref: decouple apex second attemp part n/n	2020-10-10 22:04:50 -04:00
William Falcon	0281b077d8	ref: decouple apex second attemp part 10/n (#4064 ) * ref: decouple apex second attemp part 9/n * ref: decouple apex second attemp part 9/n * ref: decouple apex second attemp part 9/n	2020-10-10 20:05:05 -04:00
William Falcon	dca86c310e	ref: decouple apex second attemp part 6/n (#4060 ) * ref: decouple apex second attemp part 6/n * ref: decouple apex second attemp part 6/n	2020-10-10 15:28:25 -04:00
William Falcon	ce2edf1192	ref: decouple apex second attemp part 4/n (#4056 ) * ref: decouple apex second attemp part 4/n * ref: decouple apex second attemp part 4/n * Update lightning.py * ref: decouple apex second attemp part 4/n	2020-10-10 12:19:22 -04:00
William Falcon	3a6717ca34	ref: decouple apex second attemp part 3/n (#4055 )	2020-10-10 11:05:57 -04:00
William Falcon	7285613974	ref: decouple apex second attemp part 2/n (#4054 ) * ref: decouple apex second attemp part 2/n * ref: decouple apex second attemp part 2/n	2020-10-10 10:24:20 -04:00
William Falcon	e854d3744c	ref: decouple apex second attemp part 1/n (#4052 )	2020-10-10 09:53:02 -04:00
William Falcon	5b261a230e	enable passing in custom accelerators (#4050 ) * enable custom accelerators * ref: finish decoupling apex, LM and backward * ref: finish decoupling apex, LM and backward * ref: finish decoupling apex, LM and backward	2020-10-10 09:21:08 -04:00
William Falcon	2b255a3df4	ref: enable custom clusters (1/n) (#4048 ) * enable cluster plugins * enable cluster plugins + test backend choices * enable cluster plugins + test backend choices * enable cluster plugins + test backend choices * enable cluster plugins + test backend choices * enable cluster plugins + test backend choices * enable cluster plugins + test backend choices	2020-10-10 08:09:29 -04:00
William Falcon	0c42aa03fd	enables plugins (#4041 ) * plugin hardware * plugin hardware * plugin hardware	2020-10-09 22:03:46 -04:00
William Falcon	048a816be3	added tests for the training epoch end (#3967 )	2020-10-07 22:27:36 -04:00
William Falcon	b922409624	clean and organize fit (#3938 ) * clean and organize fit * clean and organize fit * clean and organize fit * clean and organize fit * clean and organize fit	2020-10-07 11:04:10 -04:00
William Falcon	9c415d2c71	moves configure ddp to each backend (#3924 ) * moves configure ddp to each backend * moves configure ddp to each backend * moves configure ddp to each backend * added torch manual seed in test_mean_error * test for complicated batch structure * test for complicated batch structure * test for complicated batch structure Co-authored-by: ananyahjha93 <ananya@pytorchlightning.ai>	2020-10-07 00:50:16 -04:00
William Falcon	e3007ffe0c	moves sync bn to each backend (#3925 )	2020-10-06 22:42:33 -04:00
William Falcon	af5887c0aa	fixed ddp flag crash (#3927 )	2020-10-06 22:41:08 -04:00
Lezwon Castelino	69833dad5b	Added check to verify xla device is TPU (#3274 ) * tpu device check * replaced with xmp spawn * Revert "replaced with xmp spawn" This reverts commit 6835380f * replaced all instances of XLA_AVAILABLE * moved inner_f to global scope * made refactors * added changelog * added TPU_AVAILABLE variable * fix codefactor issues * removed form trainer and early stopping * add TORCHXLA_AVAILABLE check * added tests * refactoring * Update pytorch_lightning/utilities/xla_device_utils.py Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * updated function names * fixed bug * updated CHANGELOG.md * added todo * added type hints * isort and black Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: William Falcon <waf2107@columbia.edu>	2020-10-06 19:54:37 +02:00
Sean Naren	e4a56fa5cf	Ensure global seed exists before passing into env subprocess.Popen call (#3904 )	2020-10-06 12:31:49 -04:00
William Falcon	70e792344a	test selecting the correct backend. temp backends while slurm and TE are decoupled (#3848 ) * test selecting the correct backend. tem backends while slurm and TE are decoupled * test selecting the correct backend. tem backends while slurm and TE are decoupled	2020-10-04 15:44:50 -04:00
William Falcon	2c21f7d7e2	ref: adding compute environments (2/n) (#3842 ) * ref: adding compute environments (2/n) * ref: adding compute environments (2/n) * ref: adding compute environments (2/n) * ref: adding compute environments (2/n)	2020-10-04 08:48:46 -04:00
Lezwon Castelino	4da240ea1b	added broadcast option to tpu (#3814 ) * added broadcast option to tpu * add device * moved tpu broadcast to tpu_backend * removed Lightning dist * decode bytes * pep8 fix * fix bug * test for broadcast * updated changelog	2020-10-04 07:47:33 -04:00
William Falcon	1f8ff7c48c	ref: callback system and init ddp (1/n) (#3836 ) * refactored callback system and init ddp * refactored callback system and init ddp * refactored callback system and init ddp * refactored callback system and init ddp	2020-10-03 23:39:17 -04:00
William Falcon	35d1111994	[WIP] ref: decoupled ddp, ddp spawn (finish 3733) (#3819 ) * ref: finish #3733 * remove deprecated test * remove deprecated test * remove deprecated test * remove deprecated test * remove deprecated test * remove deprecated test * remove deprecated test * remove deprecated test * remove deprecated test * remove deprecated test * remove deprecated test * remove deprecated test * remove deprecated test * Update pytorch_lightning/accelerators/ddp_backend.py Co-authored-by: ananthsub <ananth.subramaniam@gmail.com> * remove deprecated test * remove deprecated test * remove deprecated test Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>	2020-10-03 14:05:31 -04:00
William Falcon	ed1450a293	ref: clean up ddp before final fix (#3817 ) * ref: clean up ddp before final fix * ref: clean up ddp before final fix * ref: clean up ddp before final fix * ref: clean up ddp before final fix * ref: clean up ddp before final fix	2020-10-03 12:01:02 -04:00
William Falcon	0838c6bfce	ref: decoupled ddp2 (#3816 )	2020-10-03 09:02:35 -04:00
William Falcon	a677833f84	ref: separate slurm from ddp (#3809 ) * ref: separate slurm from ddp * ref: separate te from ddp * ref: merge * ref: merge * ref: merge	2020-10-02 23:08:34 -04:00
William Falcon	74484edecd	ref: separate te from ddp (#3810 ) * ref: separate te from ddp * ref: separate te from ddp * ref: separate te from ddp	2020-10-02 21:00:51 -04:00
William Falcon	a28528cc8b	ref: remove weight loading hack for ddp_cpu (#3808 )	2020-10-02 19:28:50 -04:00
William Falcon	afa43837a4	ref: part 8 of #3733 (#3806 )	2020-10-02 18:46:18 -04:00
ananthsub	3ab730e316	Swap torch.load for fsspec load in ddp spawn backend (#3787 ) * Update ddp_spawn_backend.py * Update ddp_cpu_spawn_backend.py * log Co-authored-by: Jirka Borovec <jirka@pytorchlightning.ai>	2020-10-02 21:00:01 +02:00
William Falcon	7c6ed1fa28	ref: part 7 of #3733 (#3802 ) * ref: part 7 of #3733 * ref: part 7 of #3733	2020-10-02 14:23:27 -04:00
Jirka Borovec	62eabdd535	revert backend types (#3788 ) * revert backend types * todo * todo	2020-10-02 06:18:44 -04:00
Akihiro Nitta	ebc1b23fa3	Use `raise .. from ..` to explicitly chain exceptions (#3750 ) * Fix exception chaining * names * Change exception names for consistency Co-authored-by: Nicki Skafte <skaftenicki@gmail.com> * Change exception names for consistency Co-authored-by: Nicki Skafte <skaftenicki@gmail.com> Co-authored-by: Jirka Borovec <jirka@pytorchlightning.ai> Co-authored-by: Nicki Skafte <skaftenicki@gmail.com>	2020-10-01 21:45:44 +02:00
William Falcon	622c5c3982	ref: part 4 of #3733 (#3773 ) * ref: part 4 of #3733 * ref: part 4 of #3733 * ref: part 4 of #3733 * ref: part 4 of #3733	2020-10-01 11:26:58 -04:00
William Falcon	440f837f6d	ref: part a of #3733 (#3766 ) * ref: part a of #3733 * ref: part a of #3733	2020-10-01 08:15:23 -04:00
Lezwon Castelino	8be002ccc7	skip best_model_path if checkpoint_callback is None (#2962 ) * skip best_model_path if checkpoint_callback is None * removed test	2020-10-01 06:57:26 -04:00
William Falcon	a38d108a68	add dist lib to enable syncing anything across devices (#3762 ) * add dist lib to enable syncing anything across devices	2020-10-01 01:21:38 -04:00
Jirka Borovec	31a36f04df	define distributed as a type (#3740 ) * define type * miss * Apply suggestions from code review Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> * miss * warn Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>	2020-09-30 08:33:01 -04:00
William Falcon	c41ea86b35	ref: move backends back to individual files (1/5) (ddp_cpu) (#3712 ) * ref: make each backend independent for easier debugging and independent debugging * ref: make each backend independent for easier debugging and independent debugging * ref: make each backend independent for easier debugging and independent debugging * ref: make each backend independent for easier debugging and independent debugging * ref: make each backend independent for easier debugging and independent debugging * ref: make each backend independent for easier debugging and independent debugging * ref: test val epoch end * ref: test val epoch end	2020-09-29 01:59:18 -04:00
Rohit Gupta	783750547d	disable optimizers setup during testing (#3059 ) * disable configure_optimizers during testing * minor changes * hvd and ddp * fix precision during testing * fix ddp * fix amp * fix cpu * update dp * simplify optimizers * add test * codefactor * ref optimizer setup * chlog * suggestions * isort * rebased with master	2020-09-29 01:09:04 +02:00
William Falcon	931995b55b	remove flake 8 (#3687 )	2020-09-27 20:40:02 -04:00
William Falcon	031274c25d	fix dp issues + update examples and test examples (#3618 ) * fix dp * fix dp * fix dp * fix dp * fix examples * fix examples * fix examples * fix examples * fix examples * fix examples * fix examples * fix examples * fix examples * fix examples * fix examples * fix examples * fix examples * fix examples * fix examples * fix examples * fix examples * fix examples	2020-09-23 00:19:46 -04:00
Adrian Wälchli	a71d62d840	Fix deterministic behavior in ddp_spawn (#3573 ) * docs * set env variable * fix * changelog	2020-09-20 19:42:58 -04:00
William Falcon	890588a9ee	ref: precision plugins 1/n (#3504 ) * ref: precision plugins 1/n * ref: precision plugins 1/n	2020-09-15 09:56:12 -04:00
William Falcon	810b445097	ref: apex plugin (#3502 ) * ref: apex plugin * ref: apex plugin * ref: apex plugin	2020-09-15 06:02:42 -04:00
William Falcon	6bcfa8b068	ref: merge backends x/n (#3482 )	2020-09-12 16:28:29 -04:00
William Falcon	518a0c0e92	ref: merge backends x/n (#3480 )	2020-09-12 15:27:11 -04:00
William Falcon	0045119b3f	ref: merge backends x/n (#3478 ) * ref: merge backends x/n * ref: merge backends x/n * ref: merge backends x/n * ref: merge backends x/n	2020-09-12 13:55:55 -04:00
William Falcon	00d155ae01	ref: merge backends x/n (#3477 )	2020-09-12 12:36:55 -04:00
William Falcon	59d8472548	ref: slurm connector 1/n (#3476 ) * ref: slurm connector 1/n * ref: slurm connector 1/n * ref: slurm connector 1/n * ref: slurm connector 1/n	2020-09-12 11:07:15 -04:00
William Falcon	ff0064f956	ref: group connectors (#3472 ) * ref: accelerator connector methods 3/n * ref: accelerator connector methods 3/n	2020-09-11 23:33:09 -04:00
William Falcon	dd324e4086	ref: accelerator connector methods x/n (#3470 )	2020-09-11 22:25:48 -04:00
William Falcon	de99222834	ref: accelerator connector methods x/n (#3469 ) * ref: accelerator connector methods x/n * ref: accelerator connector methods x/n	2020-09-11 21:52:22 -04:00
William Falcon	ef20310873	ref: move specific accelerator code x/n (#3457 ) * ref: organize args x/n * ref: move specific accelerator code x/n * ref: move specific accelerator code x/n * ref: move specific accelerator code x/n	2020-09-11 10:56:21 -04:00
William Falcon	70af47db84	ref: organize args 4/n (#3456 )	2020-09-10 21:58:47 -04:00
William Falcon	3281586ab4	ref: organize args 3/n (#3449 ) * ref: organize args 3/n * ref: organize args 3/n * ref: organize args 3/n * ref: organize args 3/n * ref: organize args 3/n * ref: organize args 3/n	2020-09-10 13:21:04 -04:00
William Falcon	a208d6da46	ref: organize args 2/n (#3448 ) * ref: organize args 2/n * ref: organize args 2/n * ref: organize args 2/n	2020-09-10 10:51:35 -04:00
William Falcon	541c4ab01d	ref: organize args 3/n (#3447 ) * ref: organize args 2/n * ref: organize args 2/n * ref: organize args 2/n * ref: organize args 2/n	2020-09-10 08:55:30 -04:00
William Falcon	deb82d9c08	ref: organize args 2/n (#3442 ) * ref: organize args 2/n * ref: organize args 2/n	2020-09-10 08:07:55 -04:00
William Falcon	49290a569b	ref: organize args 1/n (#3435 ) * ref: organize args 1/n * ref: organize args 1/n	2020-09-10 07:24:42 -04:00
William Falcon	8f6b115511	ref: added model connector (#3407 ) * ref: added model connector * ref: added model connector * ref: added model connector	2020-09-09 00:24:20 -04:00
Travis Addair	091d37f968	Added check for apex AMP and unit tests for Horovod + AMP (#3404 ) * Added check for apex AMP and unit tests for Horovod + AMP * Changelog * Fixed order of Horovod and Apex optimizer wrapping	2020-09-08 20:30:57 -04:00
William Falcon	9939f53b7c	ref: inner train loop (intermediate step) 12/n (#3372 ) * ref: inner train loop (intermediate step) 12/n * ref: inner train loop (intermediate step) 12/n * ref: inner train loop (intermediate step) 12/n * ref: inner train loop (intermediate step) 12/n * ref: inner train loop (intermediate step) 12/n * ref: inner train loop (intermediate step) 12/n	2020-09-06 17:50:47 -04:00
William Falcon	38b9677638	ref: inner train loop (intermediate step) 5/n (#3365 )	2020-09-05 18:27:28 -04:00
William Falcon	c7ef5ee874	ref: inner train loop (intermediate step) 3/n (#3363 )	2020-09-05 17:01:46 -04:00
William Falcon	f55efb7616	ref: inner train loop (intermediate step) 1/n (#3361 )	2020-09-05 10:10:49 -04:00
William Falcon	5a474c452c	ref: inner train loop (intermediate step) 1/n (#3359 )	2020-09-05 08:55:22 -04:00
William Falcon	0a119403d6	ref: moved accelerator router (#3309 ) * ref: moved accelerator * ref: moved accelerator * ref: moved accelerator * ref: moved accelerator	2020-09-01 15:48:28 -04:00
William Falcon	b0298cead8	ref: move train outside of setup training (#3297 ) * ref: move train outside of setup training * ref: move train outside of setup training * ref: move train outside of setup training * ref: move train outside of setup training	2020-08-31 20:36:52 -04:00
William Falcon	bcd13f70b8	ref: run_pretrain_routine -> setup_training (#3294 ) * ref: .tune() * ref: run_pretrain_routine -> setup_training	2020-08-31 18:06:11 -04:00
Philipp Singer	0aee137ba7	DP device fix (#3196 )	2020-08-27 09:01:29 -04:00
William Falcon	4272360076	ddp backend refactor (#3210 )	2020-08-26 21:02:15 -04:00
William Falcon	3a26b4ff5c	ddp backend refactor (#3209 )	2020-08-26 20:31:09 -04:00
William Falcon	6bae404bed	ref: ddp backend refactor (3) (#3208 ) * ddp backend refactor * ddp backend refactor	2020-08-26 20:03:09 -04:00
William Falcon	a8daf914f8	ddp backend refactor (#3207 )	2020-08-26 19:10:24 -04:00
William Falcon	ff3c2f4cff	ddp backend refactor (#3204 )	2020-08-26 18:43:28 -04:00
William Falcon	f3384d0cbb	ref: ddps train hooks (#3203 ) * ddps train * ddps train	2020-08-26 15:37:40 -04:00
William Falcon	ef07b0c4b3	acceleartor fit 1 (#3200 )	2020-08-26 14:20:38 -04:00
William Falcon	f064d74be8	refactored dataloader process hook (#3139 )	2020-08-24 21:53:56 -04:00
William Falcon	82d1128966	eval step scaling factor (#3136 )	2020-08-24 20:26:39 -04:00
William Falcon	6c3cec3a3c	training amp scaling refactor (#3135 )	2020-08-24 19:59:46 -04:00
William Falcon	0b3cb3c955	ref: moved ___step_end hooks (#3130 ) * moved eval hooks * moved eval hooks * moved eval hooks * moved eval hooks * moved eval hooks * moved eval hooks * moved eval hooks	2020-08-24 17:50:47 -04:00
William Falcon	6068b29d29	ref: remove obscure forward call in eval + CPU backend ___step (#3123 ) * remove obscure forward call in eval * remove obscure forward call in eval * remove obscure forward call in eval * remove obscure forward call in eval * remove obscure forward call in eval * remove obscure forward call in eval	2020-08-24 12:31:40 -04:00
William Falcon	18160b81b5	refactored horovod backend (#3122 )	2020-08-24 11:13:49 -04:00
William Falcon	8ebf4fe173	ref: refactored horovod backend (#3121 ) * refactored horovod backend * refactored horovod backend	2020-08-24 10:35:32 -04:00
William Falcon	8d7ca5cd2c	ref: refactored gpu backend __step (#3120 ) * refactored gpu backend __step * refactored gpu backend __step * refactored gpu backend __step * refactored gpu backend __step	2020-08-24 09:22:05 -04:00
William Falcon	527b9dca36	refactored ddp backend forward (#3119 )	2020-08-24 07:33:14 -04:00
William Falcon	3c88b0dd83	Refactor 1: moved tpu xxx_step to backend (#3118 ) * moved tpu training_step * refactored eval step * refactored eval step * refactored eval step	2020-08-24 07:02:06 -04:00
Ananya Harsh Jha	9445c800b0	set device to root gpu (#3042 )	2020-08-18 19:28:35 -04:00
Adrian Wälchli	188e06c261	ddp fix for trainer.test() + add basic ddp tests (#2997 ) * add ddp script variations * add ddp test * rename * shell * test * test * try call * try without subprocess * test * display the error * list all variations * try string * try copy env * debug * pythonpath * path * update test * change * simple ddp test * replace * remove random port * random port * str * clean up * check run spawn * clean up * docs * docs * update test * docs * changelog * changelog	2020-08-16 11:19:57 -04:00
William Falcon	e7794eb79a	Fixes #2407 (#2981 ) * fix gpus index error	2020-08-14 16:22:48 -04:00
Jirka Borovec	5bce06c050	nb. devices (#2973 )	2020-08-14 11:37:21 +02:00
William Falcon	0c264689cb	Fixes #2942 (#2969 ) * Fixes #2942 * doc fix	2020-08-13 21:54:57 -04:00
Jirka Borovec	4354690e55	add apex test (#2921 ) * add apex test * rename * level * events * wrap * evt * miss * apex * apex * apex * apex * apex * apex * Update tests/models/test_amp.py Co-authored-by: William Falcon <waf2107@columbia.edu> * notes * notes Co-authored-by: William Falcon <waf2107@columbia.edu>	2020-08-13 10:03:13 -04:00
Phil	e3528afae3	Move optimizer creation after device placement for ddp backends. (#2904 )	2020-08-12 06:34:59 -04:00
Jirka Borovec	a6e7aa7796	allow using apex with any PT version (#2865 ) * wip * setup * type * name * wip * docs * imports * fix if * fix if * use_amp * Apply suggestions from code review Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * Apply suggestions from code review Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * fix tests * Apply suggestions from code review Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * fix tests * todos Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2020-08-08 11:07:32 +02:00
Jirka Borovec	b7d72706c3	clean imports (#2867 ) * clean imports * miss	2020-08-08 00:33:51 +02:00
Jirka Borovec	f8c058215f	simplify tests & cleaning (#2588 ) * simplify * tmpdir * revert * clean * accel * types * test * edit test acc Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * Update test acc Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2020-08-07 23:22:05 +02:00
William Falcon	4dbd761a1c	refactor 3/n (#2709 ) * reactor into gpu accelerator * reactor into gpu accelerator * reactor into gpu accelerator * reactor into gpu accelerator * reactor into gpu accelerator * reactor into gpu accelerator * reactor into gpu accelerator * reactor into gpu accelerator * reactor into gpu accelerator * reactor into gpu accelerator * reactor into gpu accelerator * reactor into gpu accelerator * reactor into gpu accelerator * reactor into gpu accelerator * reactor into gpu accelerator * reactor into gpu accelerator * reactor into gpu accelerator * reactor into gpu accelerator * reactor into gpu accelerator * reactor into gpu accelerator * reactor into gpu accelerator	2020-07-25 20:56:50 -04:00
William Falcon	b34217e410	Refactor 2/n (#2708 ) * reactor into gpu accelerator * reactor into gpu accelerator * reactor into gpu accelerator * reactor into gpu accelerator * reactor into gpu accelerator * reactor into gpu accelerator * reactor into gpu accelerator	2020-07-25 17:31:34 -04:00
William Falcon	071e09fe38	refactor 1/n for v1.0.0 (#2704 ) * reactor into gpu accelerator * reactor into gpu accelerator * reactor into gpu accelerator * reactor into gpu accelerator * reactor into gpu accelerator * reactor into gpu accelerator * reactor into gpu accelerator * reactor into gpu accelerator * reactor into gpu accelerator * reactor into gpu accelerator * reactor into gpu accelerator * reactor into gpu accelerator	2020-07-25 14:38:51 -04:00

... 2 3 4 5 6 ...

310 Commits