lightning

Commit Graph

Author	SHA1	Message	Date
Sean Naren	0211f7f9b2	Disable pl optimizer temporarily to fix AMP issues (#5163 ) * Disable pl optimizer temporarily to fix AMP issues * Add todo and enable pl optimizer in the test	2021-01-05 09:58:37 +01:00
chaton	13bbf4b3f2	Un-balanced logging properly supported (#5119 ) * resolve bug * clean code * resolve comments * Update tests/trainer/optimization/test_multiple_optimizers.py Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> * resolve another bug * add comments * use abs to find diff * update * resolve flake8 Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>	2021-01-05 09:58:37 +01:00
Loi Ly	1d13943605	Fix reset TensorRunningAccum (#5106 ) * Fix reset TensorRunningAccum * add test for TensorRunningAccum's reset method * fix CI failed due to PEP8 Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>	2021-01-05 09:58:36 +01:00
Jirka Borovec	c72880f109	hotfix: dataloaders - add unimplemented methods (#5352 ) * add unimplemented methods * test * test * flake8	2021-01-05 03:41:20 -05:00
Justus Schock	d88cf4a652	Add Support for multiple train loaders (#1959 ) * add support for wrong dtype in apply_func * apply loader resetting to possible collection of loaders * add combined loader iter class * integrate combined loader iter to training loop * fix imports * fix imports * finish supporters * add tests for supporters * add test for model with multiple loaders * fix trainer integration * fix instance check * Train loaders (#4032) * patch for issues discussed in #1959, encapsulating underlying datastructures returned from train_dataloader * update data_loading.py to it uses patch discussed in #1959 * rename class * Separate CombinedLoaderIterator into two classes, and update related tests. (#4606) * Fix the bugs after rebasing. * Add custom get_len for apply_to_collection * Refactor MultiIterator to be as CombinedLoaderIterator * To get the right num_training_batches. Call the wrapper for multi trainloader in data_loading.py, instead of training_loop.py * Reload _loader_iters when calling __iter__ * Don't transform DataLoader to CombinedLoaderIterator when it's along * Updates test_fit_multiple_train_loaders for testing num_training_batches * Seperate CombinedLoaderIterator into CombinedLoaderIterator and CombinedDataLoader. Add CombinedDataset for unified DataLoader format. * Initialize CombinedDataLoader before calculating num_training_batches. Also updating self._worker_check for multiple loaders * Update tests for supporters * Update tests for multiple trainloaders. Add tests about few_workers for multiple loaders. * Fix pep8 issues * Add tests for train_loader_patch.py * Add descriptions to multiple_trainloader_mode * Remove unused variables * Add docstrings and typing * Add more tests for better converage * Remove unused commented codes * Add sampler property * Remove extract_dataset * Update typing * pep8 * Update train_loader_patch.py * Apply suggestions from code review Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * Update pytorch_lightning/trainer/supporters.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * reviewer comments * fix stupid import * add docs * add back line separator * fix line sep * pep8 * Apply suggestions from code review * fix * fix * Apply suggestions from code review Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * Apply suggestions from code review Co-authored-by: Nicki Skafte <skaftenicki@gmail.com> * flake8 Co-authored-by: Justus Schock <justusschock@justuss-mbp.fritz.box> Co-authored-by: Christofer Fransson <christofer_fransson@yahoo.com> Co-authored-by: YI-LIN SUNG <r06942076@ntu.edu.tw> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: Nicki Skafte <skaftenicki@gmail.com> Co-authored-by: Jirka Borovec <jirka.borovec@seznam.cz>	2021-01-04 19:57:53 +00:00
Jirka Borovec	b72ed71d4e	Refactor: clean trainer device & distrib setters (#5297 ) * naive replace * simplify * clean * . * fix * . * fix * fix	2021-01-04 17:10:13 +00:00
Jirka Borovec	957583544a	mark todo exceptions (#5320 ) * mark todo exceptions * . * . * . * . * . * . * . * . * try * .	2021-01-04 09:07:56 +01:00
Jirka Borovec	73e06fd7c8	fix trainer distributed attributes (#5303 ) * fix trainer distributed attributes * . * fix	2020-12-31 11:10:44 +01:00
Jirka Borovec	7a615b5651	add tests for Trainer attributes (#5261 ) * add tests for Trainer attributes * drop empty	2020-12-29 18:56:13 +01:00
Jirka Borovec	a884866ff0	Unify names in Utils (#5199 ) * warnings * argparse * mutils * xla device * deprecated * tests * simple * flake8 * fix * flake8 * 1.4	2020-12-22 00:23:33 +01:00
Jirka Borovec	0f36525e8f	fix/enable - check F401 (#5201 ) * refactor - check F401 * missed * fix	2020-12-21 10:15:04 +01:00
Jirka Borovec	35fd6e93c7	refactor - check E501 (#5200 )	2020-12-21 14:23:09 +05:30
Jirka Borovec	2d54116baa	annotat unused vars (#5017 ) * annotate all unused vars * rank_zero_warn * Apply suggestions from code review Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * f1 fixed Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2020-12-19 13:53:06 +01:00
chaton	f3748ba808	[feat] Enable self.log in callbacks (#5094 ) * enable to use self.log in callbacks * update * revert back to assert	2020-12-16 16:08:39 -05:00
Jirka Borovec	059eaecbb4	set xxx_AVAILABLE as protected (#5082 ) * sett xxx_AVAILABLE as protected * docs	2020-12-14 20:19:05 +05:30
Carlos Mocholí	0327f6b4c2	Do not warn when the name key is used in the lr_scheduler dict (#5057 ) * Do not warn when the name key is used * Missing line * Consistency * Update pytorch_lightning/callbacks/lr_monitor.py * Update docs * Update pytorch_lightning/core/lightning.py Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> * Update CHANGELOG Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>	2020-12-14 08:38:10 +01:00
tarepan	16feb5137b	Refactor load in checkpoint connector (#4593 ) * Refactor load step commentaries * Refactor hpc ckpt suffix acquisition * Refactor restore/hpc_load match * Refactor hpc load trial * Refactor checkpoint dir check * Refactor unneeded function nest * Refactor nested If * Refactor duplicated cache clear * Refactor attempt flow with if/elif * Fix pip8 * Refactor hook commentary Co-authored-by: chaton <thomas@grid.ai> * Fix pep8 * Refactor hpc load checkpoint path acquisition * Fix pip8 * Fix doc Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * Refactor None Union type with Optional Co-authored-by: chaton <thomas@grid.ai> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Roger Shieh <sh.rog@protonmail.ch>	2020-12-14 00:13:50 +08:00
chaton	1a970b2d8d	[hotfix] Extend Optimizer + update doc (#5095 ) * resolve urgent bug * update pr * update doc * update * remove typo * add defaults * Update pytorch_lightning/__init__.py * Update setup.py * update doc * Update docs/source/optimizers.rst Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * update * resolve doc * debug test * update test * Update docs/source/optimizers.rst Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * Update docs/source/optimizers.rst Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * Update docs/source/optimizers.rst Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * remove useless import * Update docs/source/optimizers.rst Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2020-12-11 14:24:59 -05:00
Jirka Borovec	d5fa02e798	simplify accelerator steps (#5015 ) * simplify accelerator steps * Apply suggestions from code review Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>	2020-12-10 18:36:13 +05:30
Jirka Borovec	4ebce38478	update usage of deprecated automatic_optimization (#5011 ) * drop deprecated usage automatic_optimization * Apply suggestions from code review Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * Apply suggestions from code review Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>	2020-12-10 15:31:33 +05:30
Jirka Borovec	77fb425dd4	update usage of deprecated profiler (#5010 ) * drop deprecated profiler * lut Co-authored-by: Roger Shieh <sh.rog@protonmail.ch>	2020-12-10 08:38:14 +01:00
Jirka Borovec	ce9179591d	ref: clean config [1/n] add intermediate setters (#4990 ) * add intermediate setters * show inputs * fix options * move * fix * less talk * fix * talk less * str * cases * rename Co-authored-by: chaton <thomas@grid.ai>	2020-12-09 14:13:57 -05:00
Jirka Borovec	53d7c9555c	drop usage of deprecated distributed_backend (#5009 ) Co-authored-by: chaton <thomas@grid.ai> Co-authored-by: Roger Shieh <sh.rog@protonmail.ch>	2020-12-09 09:18:23 +01:00
Sean Naren	ee9b3fe574	[feat] pp 1/n (#5016 ) * Added changes for RPC plugin * Add missing kwargs * Fix code format * Loading refactors by introducing is_distributed var, fix optimizer step flow * Add rpc guard * Added docstrings and typing * resolve comments * Add additional rpc hook, refactor name of exit process hook for clarity * remove annotation * Modify behaviour to allow optional return, add test for rpc plugin * resolve tests * rename is_ddp_based * update * update for windows * update * resolve test * code smell * Revert back to init_ddp_connection for backwards compat * Swap to explicit name for property * Add missing speed parity increase for CI variability, fix call counts for child process Co-authored-by: tchaton <thomas@grid.ai>	2020-12-08 22:02:10 +00:00
Rohit Gupta	6d2aeff26a	fast_dev_run can be int (#4629 ) * fast_dev_run can be int * pep * chlog * add check and update docs * logging with fdr * update docs * suggestions Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * fdr flush logs * update trainer.fast_dev_run * codefactor and pre-commit isort * tmp Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: Roger Shieh <sh.rog@protonmail.ch> Co-authored-by: edenlightning <66261195+edenlightning@users.noreply.github.com>	2020-12-09 01:37:53 +05:30
chaton	2393474350	[hotfix] ddp + manual_optimisation (#4976 ) * Rely on ddp plugin for blocking sync behaviour, and skip if we're using manual optimization * debug * Revert "debug" This reverts commit `ccca6b6b` * Expose manual reduce for automatic optimization * Add input arguments * Enable parity test * clean imports * Expose hook after to ensure we reset * Fix naming * add * fix test * resolve on comments * typo * Update tests/trainer/optimization/test_manual_optimization.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * Update tests/trainer/optimization/test_manual_optimization.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * update on comments * resolve comments Co-authored-by: SeanNaren <sean@grid.ai> Co-authored-by: Sean Naren <sean.narenthiran@gmail.com> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>	2020-12-07 19:31:54 +00:00
chaton	02152c1729	Simplify optimization Logic (#4984 ) * Rely on ddp plugin for blocking sync behaviour, and skip if we're using manual optimization * debug * Revert "debug" This reverts commit `ccca6b6b` * Expose manual reduce for automatic optimization * Add input arguments * Enable parity test * clean imports * Expose hook after to ensure we reset * Fix naming * add * fix test * uniformize optimizer logic * resolve test * resovle flake8 * resolve amp bug * update tests * remove bug * remove optimizer_step in accelerators * typo * update lightning optimizer * set doesn't work with ddp_spawn * resolve flake8 * update threshold * ignore pyright * correct codeFactor * remove useless if * remove zer_grad function * simplify step * remove typo * resolve bug * Apply suggestions from code review * update on comments * resolve bugs * remove tests * Update pytorch_lightning/trainer/configuration_validator.py Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> * simplify testing * add more tests Co-authored-by: SeanNaren <sean@grid.ai> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>	2020-12-07 12:55:49 +00:00
chaton	2e838e6dd8	Enable`self.log` in most functions. (#4969 ) * refactor * solve pyright * remove logging in batch_start functions * update docs * update doc * resolve bug * update * correct script * resolve on comments	2020-12-06 13:01:43 +00:00
Carlos Mocholí	72349706c1	Improve epoch_result_store code quality (#4875 ) * Improve code quality * black -l 120 -S * Fix pyright error Co-authored-by: Sean Naren <sean.narenthiran@gmail.com> Co-authored-by: chaton <thomas@grid.ai>	2020-12-05 11:49:28 +00:00
Justus Schock	f23f5e5648	Fix DP Logging Aggregation (#4138 ) * add option to step result to do aggregation on a specific device * in dp: do aggregation on root gpu * Update CHANGELOG.md * pep8 * trailing whitespace * move to root move result stupid result object revert to master undo import add "to" method to result generalize to try a test try a test Revert "try a test" This reverts commit 22e3c1001e6c5774ea18ad925830304c245bf145. Revert "try a test" This reverts commit 4d2d8fb2a52d552894809a0cbe51af126d78f070. new test max epochs super epoch end log in test hanging test undo test initial test that fails on master step end pass step end step end epoch end print step check dev clean up test sanity check wtf is go ing on frustration debugging test test test test test test test test test unused import * move chlog entry * clean * remove outdated changes Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: Jirka Borovec <jirka.borovec@seznam.cz>	2020-12-04 19:10:07 +01:00
Rohit Gupta	342a2b6f25	Deprecate auto mode from ModelCheckpoint and EarlyStopping (#4695 ) * remove auto mode from callbacks * chlog * remove auto mode from callbacks * mode * mode * move back * update docs * update docstrings * docstring warning * fix syntax * Apply suggestions from code review Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * isort * default to 'auto' * syntax Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>	2020-12-04 16:11:58 +01:00
NeuralLink	88792982b5	🔨 minor refactor in trainer. (#4801 ) * 🔨 minor refactor in trainer. * 🔨 Use finally instead of else * 🔨 revert format * 🔨 check should skip inside try Co-authored-by: chaton <thomas@grid.ai> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>	2020-12-04 13:42:13 +01:00
Jirka Borovec	3976db597d	refactor imports of optional dependencies (#4859 ) * refactor imports of optional dependencies * fix * fix * fix * fix * fix * flake8 * flake8 Co-authored-by: Sean Naren <sean.narenthiran@gmail.com> Co-authored-by: chaton <thomas@grid.ai>	2020-12-04 10:26:10 +01:00
Jethro Kuan	c7e349e73d	docs: default_root_path -> default_root_dir (#4942 ) * docs: default_root_path -> default_root_dir * Apply suggestions from code review * fix Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * update notebook Co-authored-by: Jethro Kuan <jethro.kuan@bytedance.com> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>	2020-12-02 19:17:34 -05:00
Lezwon Castelino	12cb9942a1	Tpu save (#4309 ) * convert xla tensor to cpu before save * move_to_cpu * updated CHANGELOG.md * added on_save to accelerators * if accelerator is not None * refactors * change filename to run test * run test_tpu_backend * added xla_device_utils to tests * added xla_device_utils to test * removed tests * Revert "added xla_device_utils to test" This reverts commit 0c9316bb * fixed pep * increase timeout and print traceback * lazy check tpu exists * increased timeout removed barrier for tpu during test reduced epochs * fixed torch_xla imports * fix tests * define xla utils * fix test * aval * chlog * docs * aval * Apply suggestions from code review Co-authored-by: Jirka Borovec <jirka.borovec@seznam.cz> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: chaton <thomas@grid.ai> Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>	2020-12-02 13:05:11 +00:00
Sean Naren	e952dee292	Allow string plugins (#4888 ) * Allow plugin to be chosen via string * Fix implementation, add tests * Fix codefactor issues * Added missing env patch * Skip test for windows * Reword reason * Add skip to invalid test * Create required_plugins function, move sharded amp requirement to plugin * Pass AMPType, fix setter for apex * Better doc strings * Add exception when using apex * Add trainer available_plugins function, warn user when plugins have been added automatically with option to override behaviour * Fixed pep8 indent * Fix codefactor issues * Add env variables * Update pytorch_lightning/cluster_environments/cluster_environment.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * Addressed code review * Update pytorch_lightning/plugins/plugin_connector.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * Update pytorch_lightning/plugins/plugin_connector.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * Update pytorch_lightning/plugins/plugin_connector.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * Addressed more code review feedback * Fixed docstrings * Swapped to verbose runtime error * Apply suggestions from code review * Apply suggestions from code review * Update pytorch_lightning/plugins/sharded_plugin.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * Change name * Pass trainer to plugins that may require it * Fix sharded plugin * Added test to ensure string sharded works * Removed trainer typing as this breaks pep8 * Fixed doc issues * Fixed tests Co-authored-by: chaton <thomas@grid.ai> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>	2020-12-01 20:30:49 +00:00
Justus Schock	ebbf256bf5	Create memory dynamically (#4938 ) * create window size dynamically. * pep8 Co-authored-by: chaton <thomas@grid.ai>	2020-12-02 01:05:12 +05:30
chaton	1d3724a878	[HotFix] Logging - One epoch delay on training epoch metrics. (#4913 ) * add test * resolve logging bug * update * resolve pep8 * resolve tests Co-authored-by: Roger Shieh <sh.rog@protonmail.ch>	2020-12-01 09:26:52 +00:00
chaton	c2e6e68c7e	optimizer clean up (#4658 ) * add LightningOptimizer * typo * add mock closure * typo * remove logic in optimizer_step * update * update * update * desactivate LightningOptimizer for hovorod * resolve flake * typo * check optimizer name * change name * added backward to LightningOptimizer * remove use_lightning_optimizer * move update * simplify init * resolve comments * resolve bug * update * update * resolve bugs * resolve flake8 * set state * work manual_optimizer_step * add doc * add enable_pl_optimizer * make optimizer_step * add make_optimizer_step * add examples * resolve test * add test_optimizer_return_options_enable_pl_optimizer * add enable_pl_optimizer=True * update * update tests * resolve bugs * update * set Trainer to False * update * resolve bugs * update * remove from doc * resolve bug * typo * update * set to True * simplification * typo * resolve horovod * unwrap horovod * remove Optimizer * resolve horovod * move logic to amp_backend * doesn't seem to be pickable * update * add again * resolve some bugs * cleanup * resolve bug with AMP * change __repr__ * round at -12 * udpate * update * update * remove from horovod * typo * add convert_to_lightning_optimizers in each accelerators * typo * forgot * forgot a convert_to_lightning_optimizers * update * update * update * increase coverage * update * resolve flake8 * update * remove useless code * resolve comments + add support for LightningOptimizer base class * resolve flake * check optimizer get wrapped back * resolve DDPSharded * reduce code * lightningoptimizer * Update pytorch_lightning/core/optimizer.py Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * Update pytorch_lightning/core/lightning.py * remove reference to step function * Apply suggestions from code review * update on comments * resolve * Update CHANGELOG.md * add back training_step in apex and native_amp * rename optimizer_step Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: William Falcon <waf2107@columbia.edu> Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>	2020-12-01 00:09:46 +00:00
William Falcon	f677efe61e	Merge pull request #4880 from PyTorchLightning/better_simple_profiler Logging	2020-11-27 15:33:58 -05:00
Sean Naren	06a856e055	Merge branch 'master' into feature/plug	2020-11-27 18:48:58 +00:00
tchaton	ba41733802	Merge branch 'better_simple_profiler' of https://github.com/PyTorchLightning/pytorch-lightning into better_simple_profiler	2020-11-27 18:47:05 +00:00
tchaton	316ebadbdc	remove capture on on_train_batch_end	2020-11-27 18:46:49 +00:00
chaton	6ba77c2611	Merge branch 'master' into better_simple_profiler	2020-11-27 18:43:01 +00:00
tchaton	cef83dbbf8	optimize logging	2020-11-27 18:21:23 +00:00
Jirka Borovec	042152cd61	ref: fix & simplify test callback (#4009 ) * simplify test callback * update * use mock * flake8	2020-11-27 19:12:56 +01:00
tchaton	e17300f97d	add more profiler	2020-11-27 18:00:48 +00:00
tchaton	3a8fa6bf11	update	2020-11-27 17:48:51 +00:00
tchaton	290d74b40e	resolve test	2020-11-27 16:47:13 +00:00
SeanNaren	1704773712	Address code review	2020-11-27 14:50:12 +00:00
Sean Naren	4f693762ea	Update pytorch_lightning/trainer/connectors/precision_connector.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>	2020-11-27 14:45:15 +00:00
SeanNaren	cdd2e122fc	Add none check for func	2020-11-27 14:30:57 +00:00
SeanNaren	5598dce1a9	Remove unneeded check	2020-11-27 14:22:17 +00:00
Sean Naren	00bd0d2e72	Merge branch 'master' into feature/plug	2020-11-27 13:18:50 +00:00
chaton	dee968f20b	[bug] Replace_sampler attach previous multiprocessing_context (#4742 ) * resolve bug * add test docstring * Update tests/trainer/test_dataloaders.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * update test Co-authored-by: Sean Naren <sean.narenthiran@gmail.com> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>	2020-11-27 12:57:25 +00:00
SeanNaren	04bb0abe36	Merge branch 'master' into feature/plug # Conflicts: # pytorch_lightning/utilities/__init__.py # requirements/extra.txt	2020-11-27 10:00:05 +00:00
Jirka Borovec	217650320e	simplify imports Omegaconf (#4873 ) * hydra * omegaconf	2020-11-27 01:00:56 +01:00
Jirka Borovec	442d57f1e9	simplify imports xla / TPU (#4872 ) * xla * tpu * fix * fix * flake8	2020-11-27 00:37:48 +01:00
SeanNaren	737447fc6e	Merge branch 'master' into feature/plug # Conflicts: # pytorch_lightning/trainer/connectors/precision_connector.py # pytorch_lightning/utilities/__init__.py	2020-11-26 23:02:36 +00:00
Jirka Borovec	11e73ceaa6	fix import and typo in AMP (#4871 ) * fix import and typo * docs * apex * fix * typo	2020-11-26 23:45:52 +01:00
SeanNaren	fc9b2bf015	Fix logic and add test for apex check, rename file, add DDP launcher tests	2020-11-26 22:45:21 +00:00
SeanNaren	8dc857c38d	Ensure we add the condition to the case statement	2020-11-26 22:11:05 +00:00
SeanNaren	a9c316b669	Add additional check to ensure apex is not used with sharded	2020-11-26 19:00:55 +00:00
SeanNaren	47c121ef1a	Addressed code review points	2020-11-26 16:44:45 +00:00
Sean Naren	22b4d5ee1a	Merge branch 'master' into feature/plug	2020-11-25 20:16:37 +00:00
chaton	204a0a2d03	[bugfix] Accumulated_gradient and TensoBoard (#4738 ) * resolve bug * update * update * modify one test * remove paramters * update on comments * update changelog * update docstring Co-authored-by: Nicki Skafte <skaftenicki@gmail.com>	2020-11-25 19:44:05 +00:00
SeanNaren	b39f290c4d	Merge branch 'master' into feature/plug	2020-11-25 12:55:42 +00:00
SeanNaren	6b129216d0	Add catches around fairscale installation	2020-11-24 19:23:55 +00:00
Samyak S Sarnayak	ccf38ced2e	Use high progress_bar_refresh_rate on Google Colab (#4654 ) * Use high refresh rate on Google Colab (#3786) Automatically override progress_bar_refresh_rate when on Google Colab. Also added a constant IS_COLAB in utilities to check whether it is being run in colab or not. (#3786) * Show a warning instead of overriding when rate is low on colab * Change warning to suggestion and move it Moved warning to configure_progress_bar instead of on_trainer_init * Apply suggestions from code review Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> * add a mock test Co-authored-by: chaton <thomas@grid.ai> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>	2020-11-24 02:13:33 +05:30
SeanNaren	d953f2be5b	Merge branch 'master' into feature/fairscale-817-6n # Conflicts: # pytorch_lightning/accelerators/accelerator.py # pytorch_lightning/accelerators/ddp2_accelerator.py # pytorch_lightning/accelerators/ddp_accelerator.py # pytorch_lightning/accelerators/ddp_cpu_spawn_accelerator.py # pytorch_lightning/accelerators/ddp_hpc_accelerator.py # pytorch_lightning/accelerators/ddp_spawn_accelerator.py # pytorch_lightning/accelerators/dp_accelerator.py # pytorch_lightning/plugins/ddp_plugin.py # pytorch_lightning/trainer/connectors/model_connector.py	2020-11-23 20:19:46 +00:00
Sean Naren	404af43cde	5/n: Extract reference model call to plugins/accelerators (#4773 ) * Encapsulate extracting reference model within the plugin to allow custom wrapper logic to live within the plugin/accelerators * Add missing new lines * Fix call to accelerator * Removed double blank * Use accelerator backend * Handle case where wrapper has not been initialized within the plugin * Added basic get model tests, add better typing * Change model name * Split GPU/DDP test * Add stronger typing, skip ddp test on windows * Fix import * Fix import in dp * Fixed PEP8 definition * Add ddp launcher for ddp testing * Modify accelerator reference model to property, change name to reflect func * Revert property as this is incorrect.= * Revert across accelerators * Modified name to get_model_from_plugin * Code review changes, fix issue with dp * Add verb to function getter Co-authored-by: chaton <thomas@grid.ai>	2020-11-23 17:21:47 +00:00
SeanNaren	c590e3a166	Ensure we check if we should use sharded amp plugin	2020-11-22 15:18:50 +00:00
SeanNaren	b506a7e46a	Revert across accelerators	2020-11-22 15:00:23 +00:00
SeanNaren	977625c289	Revert property as this is incorrect.=	2020-11-22 14:54:00 +00:00
Sean Naren	4b16b47843	Merge branch 'master' into feature/817-fairscale-5n	2020-11-22 11:39:15 +00:00
SeanNaren	358f503848	Modify accelerator reference model to property, change name to reflect func	2020-11-22 11:39:00 +00:00
Teddy Koker	299de5dc62	don't override PYTHONWARNINGS (#4700 ) Co-authored-by: Nicki Skafte <skaftenicki@gmail.com> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2020-11-22 11:25:24 +01:00
edenlightning	a716ea60e1	Clarify checkpoint deprecation message (#4640 ) * Clarify checkpoint deprecation message * Update pytorch_lightning/trainer/connectors/callback_connector.py Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * Apply suggestions from code review * Apply suggestions from code review * Apply suggestions from code review Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: Jeff Yang <ydcjeff@outlook.com> Co-authored-by: Roger Shieh <sh.rog@protonmail.ch> Co-authored-by: chaton <thomas@grid.ai> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>	2020-11-22 07:35:54 +01:00
YI-LIN SUNG	69b9949192	[docs] Remove the redundant indents in trainer.py (#4720 ) * Remove the redundant indents in trainer.py * Update pytorch_lightning/trainer/trainer.py Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> Co-authored-by: chaton <thomas@grid.ai> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Roger Shieh <sh.rog@protonmail.ch>	2020-11-21 08:15:09 +06:30
Sean Naren	e3869c3950	Merge branch 'master' into feature/817-fairscale-5n	2020-11-20 17:13:17 +00:00
Roger Shieh	42e59c6add	Cast hparams to dict when not using omegaconf (#4770 ) * init fix * init test * more specific dict assert * update changelog * Update tests/checkpointing/test_model_checkpoint.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: chaton <thomas@grid.ai> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>	2020-11-20 19:53:05 +08:00
SeanNaren	95a1f19851	Use accelerator backend	2020-11-19 10:59:17 +00:00
SeanNaren	078a829834	Fix call to accelerator	2020-11-19 10:48:27 +00:00
SeanNaren	be4c24c484	Encapsulate extracting reference model within the plugin to allow custom wrapper logic to live within the plugin/accelerators	2020-11-19 10:43:16 +00:00
Sean Naren	f0ab74dc2f	Expose scaler in amp plugin (#4737 )	2020-11-18 22:30:47 +00:00
Sean Naren	e7134a9135	Sharded Plugin 2/n: Allow ddp plugin to modify optimizer state saving (#4675 ) * Allow ddp plugin to modify optimizer state saving * Rely on the accelerator for optimizer states * Ensure we init the accelerator for the saving function * Better comment for optim state dump * Revert "Ensure we init the accelerator for the saving function" This reverts commit `af65effa` * Added accelerator check to initialize tuner before saving model checkpoint * Simplify comment * Revert "Added accelerator check to initialize tuner before saving model checkpoint" This reverts commit `f9929c0c` * Return single optimizer state to reduce duplication * Fixed docstring * Fixed typing * Fixed comment * Added CHANGELOG.md Co-authored-by: chaton <thomas@grid.ai>	2020-11-18 16:38:35 +00:00
chaton	96769a7184	quick fix (#4697 )	2020-11-16 16:20:35 +00:00
chaton	867eef0e4c	[HOTFIX] Logging for evaluation (#4684 ) * resolve bugs * add should_flush_logs * remove should_flush * should work * update test * use something else * Update pytorch_lightning/trainer/connectors/logger_connector/logger_connector.py * log mock_log_metrics.mock_calls * typo * don't use keys * convert to list * typo * check kwargs * resolve bug * resolve flake8 Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>	2020-11-15 10:41:33 -05:00
Justus Schock	e04e7c9ecc	Makes automatic optimization a model attribute (#4602 ) * Makes automatic optimization a model attribute * Update trainer.py * remove setting property in model * Update pytorch_lightning/core/lightning.py Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> * Update pytorch_lightning/trainer/trainer.py Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> * Update trainer.py Co-authored-by: Sean Naren <sean.narenthiran@gmail.com> Co-authored-by: Roger Shieh <sh.rog@protonmail.ch> Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> Co-authored-by: Jeff Yang <ydcjeff@outlook.com>	2020-11-14 11:13:42 +06:30
ananthsub	d096a2ea6d	Fix setup callback hook to pass LightningModule through (#4608 ) * Fix setup callback hook * Update CHANGELOG.md * Update test_trainer.py * Update test_trainer.py * Update test_trainer.py * fix chlog Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>	2020-11-13 19:34:46 -05:00
Jeff Yang	baa8558cc0	logger docs and api docs (#3950 ) * logger and api docs * remove gpu_usage_logger, lr_logger * update docstring * fix wandb example * remove step result * charts * add some charts info Co-authored-by: Teddy Koker <teddy.koker@gmail.com> Co-authored-by: Nicki Skafte <skaftenicki@gmail.com>	2020-11-13 20:35:54 +05:30
chaton	4018237c30	[FEAT] Add lambda closure to manual_optimizer_step (#4618 ) * added lambda_closure * move to types * add 2 new tests * make example more complex * add complex example to doc * added more tests * resolve doc * typo * update * update tpu optimizer_step * Apply suggestions from code review * Update pytorch_lightning/core/lightning.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * update Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>	2020-11-12 19:22:06 +00:00
chaton	3d202f9ecc	[FEAT] Refactor logging 3/3 [v1] (#4552 ) * wip * wip check how many tests break * wip * resolve some bugs * resolve more bugs * resolve 2 bugs * resolve * temp fix * update * remove useless code * remove result * try to resolve bug * update changelog * formatting * remove pl Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>	2020-11-11 17:05:24 +00:00
chaton	514cb22bd7	[Fix] Move log value to cpu. (#4592 ) * move value to cpu to save memory * update * move to cpu * try something * update * update * add back out_dict.update({k: v}) * add move_metrics_to_cpu * update * Update pytorch_lightning/utilities/memory.py Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com> * resolve comments * Update pytorch_lightning/core/step_result.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * Update pytorch_lightning/trainer/connectors/logger_connector/logger_connector.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>	2020-11-10 21:13:41 +00:00
chaton	7e08b0d710	[bug-fix] DDP and automatic_optimization=False (#4485 ) * resolve bug * add self._running_manual_optim * update * update tests * update lightning module * resolve bug * update tests * update * resolve pep8 * update * replace by `ddp_spawn` * temporary fix * update * update * move update to training_loop * make both ddp_spawn * introduce `manual_optimizer_step` * update changelog * added changelog wrong place * add force_optimizer_step * update docstring for tests * update optimizer_step * update zero_grad * resolve flake8 * move update into manual_optimizer_step * add zero_grad * remove zero_grad tests * remove manual_backward in AMP, it doesn't help * update * loosen tests * update * update doc * add TODO * Removed unnecessary get model from native amp * Remove try except with pytest raise * Add seed, clean up imports, remove try catch to reproduce error * update code * update test * revert back * formatting * Update pytorch_lightning/core/lightning.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: SeanNaren <sean@grid.ai> Co-authored-by: Sean Naren <sean.narenthiran@gmail.com> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>	2020-11-10 19:44:51 +00:00
tarepan	41c9bee4f0	Fix load disparity between normal and hpc (#4526 ) * Add missing load functionality in hpc * Add general file load for hpc * Add mark in CHANGELOG * Fix Typo Lihgtning Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> * Refactor line separation Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> * Fix entangled fixation commit * Fix naming of restore_model_states * Fix amp restore place Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> Co-authored-by: chaton <thomas@grid.ai>	2020-11-09 17:26:38 +00:00
William Falcon	09a51697ed	Adds shortcut for path to log (#4573 ) * added log_dir shortcut to trainer properties for writing logs * added log_dir shortcut * added log_dir shortcut * added log_dir shortcut * added log_dir shortcut * added log_dir shortcut * added log_dir shortcut * added log_dir shortcut * added log_dir shortcut	2020-11-08 12:16:22 -05:00
William Falcon	bb356a73cb	added trainer api docs (#4569 )	2020-11-07 14:18:45 -05:00
chaton	9c8701f2e2	[feat] Logging refactor 2/n - train (#4495 ) * update logging * solve more bugs * replace Mapping by Dict * update on comments * resolve pep8 * Apply suggestions from code review Co-authored-by: ananthsub <ananth.subramaniam@gmail.com> * Update pytorch_lightning/trainer/connectors/logger_connector/epoch_result_store.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * update on comments * typo * update for coverage * update test * update * Update tests/models/test_hooks.py Co-authored-by: Sean Naren <sean.narenthiran@gmail.com> * Update tests/models/test_hooks.py Co-authored-by: Sean Naren <sean.narenthiran@gmail.com> * update on comments * remove deepcopy * remove useless look for * another small optim * extra optim * remove lastest optim, can be source of bug * resolve bug * add docstring * optimize coverage * Update pytorch_lightning/trainer/connectors/logger_connector/epoch_result_store.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * Update pytorch_lightning/trainer/connectors/logger_connector/epoch_result_store.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * Update tests/trainer/logging_tests/test_distributed_logging.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * Update pytorch_lightning/trainer/evaluation_loop.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * Update tests/trainer/logging/test_logger_connector.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * Update tests/trainer/logging_tests/test_train_loop_logging_1_0.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * update on comments * update * update on comments * update parity speed * get it down to 0.65 * update * 0.8 max_dif Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: ananthsub <ananth.subramaniam@gmail.com> Co-authored-by: Sean Naren <sean.narenthiran@gmail.com> Co-authored-by: William Falcon <waf2107@columbia.edu>	2020-11-05 22:27:04 +00:00
chaton	11dc5264cd	Bugfix/4449 dict attribute error (#4480 ) * resolve a bug * resolve a bug * remove todo * resolve more bugs * update tests * Update pytorch_lightning/trainer/connectors/logger_connector/logger_connector.py Co-authored-by: Sean Naren <sean.narenthiran@gmail.com> * Update pytorch_lightning/trainer/connectors/logger_connector/logger_connector.py Co-authored-by: Sean Naren <sean.narenthiran@gmail.com> * resolve pyright Co-authored-by: Teddy Koker <teddy.koker@gmail.com> Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>	2020-11-04 19:35:07 +00:00
tarepan	7b375ed1d3	Add CheckpointConnector internal commentaries (#4421 ) * Add CheckpointConnector commentaries * Fix comment format * Fix save/load schema as function comments Co-authored-by: chaton <thomas@grid.ai>	2020-11-03 22:09:29 +05:30
Adrian Wälchli	9b7f01654a	Update old "module_arguments" and "hparams" references in docs (#4417 ) * replace module_arguments refernces * update hparams docs * add missing save_hyperparameters in example * deprecate instead of remove * Update docs/source/hyperparameters.rst Co-authored-by: chaton <thomas@grid.ai> * Update docs/source/hyperparameters.rst Co-authored-by: Teddy Koker <teddy.koker@gmail.com> Co-authored-by: chaton <thomas@grid.ai> Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>	2020-11-03 12:13:10 +01:00
Rohit Gupta	360b3d8844	Disable training when limit_train_batches=0 (#4371 ) * Disable training when limit_train_batches=0 * chlog * pep * limit_train_batches * BoringModel Co-authored-by: Roger Shieh <sh.rog@protonmail.ch>	2020-11-03 12:10:35 +05:30
Rohit Gupta	ad2556b669	Disable saving checkpoints if not trained (#4372 ) * Disable saving checkpoints if not trained * chlog * update test * fix Co-authored-by: chaton <thomas@grid.ai>	2020-11-03 11:38:32 +05:30
chaton	958aa1aee7	[test] Accumulated gradient optimization tests (#4477 ) * adding tests * wip * update * Update tests/trainer/test_trainer.py Co-authored-by: Sean Naren <sean.narenthiran@gmail.com> Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>	2020-11-02 23:44:11 +00:00
chaton	ac3f7393fd	[FEAT] logging refactors 1/n (#4439 ) * introducing new logging object * typo * typo * Update pytorch_lightning/trainer/logging.py Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com> * Update pytorch_lightning/trainer/logging.py Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com> * update on comments * update on comments * add more doctstring * Update pytorch_lightning/core/lightning.py Co-authored-by: Sean Naren <sean.narenthiran@gmail.com> * resolve on comments * solve pyright * Update pytorch_lightning/trainer/connectors/logger_connector/epoch_result_store.py Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com> * update on comments * Update pytorch_lightning/trainer/connectors/logger_connector/epoch_result_store.py Co-authored-by: Sean Naren <sean.narenthiran@gmail.com> * update on comments Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com> Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>	2020-11-02 20:51:43 +00:00
chaton	102fa9ee7d	[BUGFIX] AMP + Precision unscale grad (#4441 ) * move unscale within Native plugin * remove gradient tracking from lightning backward * forgot trainer.fit * typo * update * cleanup * set to 1.6 * typo * skip if below 1.6 strict * update changelog * remove useless code * Update tests/plugins/test_amp_plugin.py Co-authored-by: Sean Naren <sean.narenthiran@gmail.com> * Update tests/plugins/test_amp_plugin.py Co-authored-by: Sean Naren <sean.narenthiran@gmail.com> * update changelog * Update CHANGELOG.md Co-authored-by: Sean Naren <sean.narenthiran@gmail.com> Co-authored-by: Jeff Yang <ydcjeff@outlook.com>	2020-11-02 16:36:48 +00:00
Jirka Borovec	ef03c39ab7	Add step index in checkpoint name (#3807 ) * true final value of global step * ch check * tests * save each validation interval * wip * add test * add test * wip * fix tests, revert old edits, fix merge conflicts, update doctests * test + bugfix * sort files * format test * suggestion by ananth * added changelog * naming * docs * example * suggestion Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * fix test * pep * pep Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2020-11-02 15:05:58 +01:00
Adrian Wälchli	6ae4c6ec85	update docs on checkpoint_callback Trainer argument (#4461 ) * docs update * update callbacks docs * docs * notebook examples * warning * line lenght * update deprecation Co-authored-by: Sean Naren <sean.narenthiran@gmail.com> Co-authored-by: Roger Shieh <55400948+s-rog@users.noreply.github.com>	2020-11-02 06:18:20 +01:00
Sean Naren	6211fd4b0c	Fix type checker issue with explicit cast of ref_model object (#4457 ) Co-authored-by: Jeff Yang <ydcjeff@outlook.com>	2020-10-31 16:43:19 -04:00
Adrian Wälchli	d1234c592d	deprecate passing ModelCheckpoint instance to Trainer(checkpoint_callback=...) (#4336 ) * first attempt * update tests * support multiple * test bugfix * changelog * pep * pep * import order * import * improve test for resuming * test * update test * add references test Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * docstring suggestion deprecation Co-authored-by: Jeff Yang <ydcjeff@outlook.com> * paramref Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: Jeff Yang <ydcjeff@outlook.com>	2020-10-30 04:47:37 +01:00
Jeff Yang	ebe3a31ddd	[docs] distributed_backend -> accelerator (#4429 ) * distributed_backend -> accelerator * distributed_backend -> accelerator * use_amp -> precision * format Co-authored-by: rohitgr7 <rohitgr1998@gmail.com>	2020-10-30 00:45:24 +06:30
Justus Schock	bbd81dfd55	Skips DDP parameter sync (#4301 ) * ddp no-sync * Update pytorch_lightning/trainer/training_loop.py Co-authored-by: ananthsub <ananth.subramaniam@gmail.com> * Update training_loop.py * factor __enter__ and __exit__ out to separate context manager * delete _updated_model_last_step Co-authored-by: justusschock <justusschock@pc125.lfb.rwth-aachen.de> Co-authored-by: Teddy Koker <teddy.koker@gmail.com> Co-authored-by: ananthsub <ananth.subramaniam@gmail.com> Co-authored-by: chaton <thomas@grid.ai> Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>	2020-10-29 23:01:37 +05:30
Martin Hwang	b459fd26ac	fix: `nb` is set total number of devices, when nb is -1. (#4209 ) * fix: `nb` is set total number of devices, when nb is -1. Refs: #4207 * feat: add test code 1. test combination `auto_select_gpus`, `gpus` options using Trainer 2. test `pick_multiple_gpus` function directly Refs: #4207 * docs: modify contents in `Select GPU devices` Refs: #4207 * refactore: reflect the reuslt of review Refs: #4207 * refactore: reflect the reuslt of review Refs: #4207 * Update CHANGELOG.md Co-authored-by: chaton <thomas@grid.ai> Co-authored-by: Roger Shieh <55400948+s-rog@users.noreply.github.com> Co-authored-by: Nicki Skafte <skaftenicki@gmail.com>	2020-10-29 10:50:37 +01:00
Rohit Gupta	b26c71eadf	Add optimizer hooks in callbacks (#4379 ) * Add optimizer hooks in callbacks * optimizer param * update test Co-authored-by: Nicki Skafte <skaftenicki@gmail.com>	2020-10-28 13:15:22 +01:00
Dusan Drevicky	c50c225f05	feature: Allow str arguments in Trainer.profiler (#3656 ) * allow trainer's profiler param to have a str value * add tests * update docs * update exception message * Update CHANGELOG * fix pep8 issues * cleanup test code Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * Add deprecation warning if using bool for profiler * Add deprecation tests and move deprecated tests * Remove bool option to profiler from docs * Deprecate bool args to profiler in CHANGELOG * fixup! Add deprecation warning if using bool for profiler * fixup! Add deprecation tests and move deprecated tests * Apply suggestions from code review Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> * Implement suggestions, remove whitespace * fixup! Implement suggestions, remove whitespace * Allow bool, str (case insensitive), BaseProfiler * Add info about bool deprecation to trainer * fixup! Add info about bool deprecation to trainer * Move deprecate todo to test_deprecated * Test wrong profiler type, improve error message * fixup! Test wrong profiler type, improve error message * Update pytorch_lightning/trainer/connectors/profiler_connector.py Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * Apply suggestions from code review * Readd bool to profiler types, test cli profiler arg * Remove extra whitespace in doc Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * Apply suggestions from code review Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * Update deprecation versions Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2020-10-27 16:27:16 +05:30
Adrian Wälchli	48b6de0c40	update (#4343 ) Co-authored-by: chaton <thomas@grid.ai>	2020-10-27 06:07:29 -04:00
William Falcon	98205fb438	Enable custom apex and amp plugins (#4355 ) * enable custom apex, amp plugin * enable custom apex, amp plugin * enable custom apex, amp plugin * enable custom apex, amp plugin	2020-10-25 17:11:07 -04:00
ananthsub	f6efb712ed	Skip replacing dataloader sampler if it's already a distributed sampler (#4273 ) * Update data_loading.py * Update data_loading.py * add test + update flag description * add to changelog * Update test_dataloaders.py * fix-pickle * Update test_dataloaders.py * Added missing reference calls * Update tests/trainer/test_dataloaders.py * Apply suggestions from code review * Update data_loading.py * Update test_dataloaders.py Co-authored-by: Sean Naren <sean.narenthiran@gmail.com> Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>	2020-10-23 17:34:07 +01:00
chaton	3abfec8962	[HOTFIX] ModelCheckpoint - Don't increase current_epoch and global_step if not trained (#4291 ) * add two tests w/wo tempdir * resolve flake8 * this test is failing * update bug report * resolve bug and add test * remove bug_report * resolve flake8 * resolve bug * resolve pep8 * resolve pep8 Co-authored-by: Teddy Koker <teddy.koker@gmail.com>	2020-10-23 11:17:50 +01:00
Rohit Gupta	4c7ebdc32b	Add dirpath and filename parameter in ModelCheckpoint (#4213 ) * Add dirpath and filename parameter in ModelCheckpoint * remove old function * chlog * codefactor * update tests * docs * fix doctest and added tests * pathlib dirpath * dep version and docs * try fix doctest * pep * suggestions Co-authored-by: carmocca <carlossmocholi@gmail.com> * suggestions * fix test * pep * trigger tests * Apply suggestions from code review Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * suggestions * try fix windows test * add and update some tests * trigger tests * Apply suggestions from code review * Apply suggestions from code review Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: William Falcon <waf2107@columbia.edu>	2020-10-23 09:59:12 +05:30
Sean Naren	9823f97a84	Protect functions not to be accessed by user (#4305 )	2020-10-22 15:15:04 +01:00
Sean Naren	065cc94112	Fix bug comparing max_steps to global step which inits at 0 (#4278 ) * Fix bug comparing max_steps to global step which inits at 0 * Added test to ensure accumulate grad batch works with max steps * check fix with TODO test * correct call counts * Add check to ensure we've finished accumulation of this global step before exiting loop in conjuction with max steps * Remove + 1 check in test as this was incorrect * Update incorrect expected outputs in lr finder test * Added brackets for clarity Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2020-10-22 13:58:59 +01:00
Mauricio Villegas	546476c704	Allow changing the logged step value in validation_step (#4130 ) * Fix to bug identified in https://github.com/PyTorchLightning/pytorch-lightning/issues/4102 * update tests * chlog Co-authored-by: rohitgr7 <rohitgr1998@gmail.com>	2020-10-22 03:03:07 +05:30
Carlos Mocholí	2549ca40e6	Clean up optimizer code (#3587 ) * Update optimizer code * Update CHANGELOG * Fix tuple of one list case * Update docs * Fix pep issue * Minor typo [skip-ci] * Use minimal match Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * Apply suggestions from code review Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>	2020-10-21 21:12:48 +02:00
Justus Schock	0ec4107697	Optimizer closure (#4190 ) * closure for all optimizers * rename hook and take care of alternating backwards * add comment * training_loop_fix * closure whenever possible * training_loop * simple tests that count backward calls * fix test to work with closure * remove debugging statement * better place * check grads after backward * start fixing manual optimization * skip step when result returned by closure was None * fix gradient clipping test to work with closure * attribute dict result only for automatic optimization * adjust backward calls in accelerator * adjust where to call gradient clipping * adjust backward calls in tests * Apply suggestions from code review Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * pass kwargs to xla optimizer Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2020-10-21 19:34:29 +01:00
William Falcon	8a20d6af51	make save fx part of model checkpoint cb (#4284 )	2020-10-21 10:06:42 -04:00
Carlos Mocholí	e0f9799dbf	Add strict option to lr_scheduler dict (#3586 ) * Add strict option to lr_scheduler dict * Update docs * Unnecessary "else" after "raise" * Update CHANGELOG * Fix rebase	2020-10-21 14:14:37 +02:00
Sean Naren	c336881959	Added fix to ensure that custom logged metrics within test_epoch_end are appended to the result object even without step reduced metrics (#4251 )	2020-10-20 18:33:18 +02:00
Jirka Borovec	f37444fa3e	CI: add flake8 (#4239 )	2020-10-19 21:20:17 +01:00
Espen Haugsdal	66e58f5afb	Use checkpoint_connector.hpc_save in SLURM (#4217 )	2020-10-18 10:13:56 -04:00
Elia Cereda	cf9fe4905e	Annotate return type of TrainerProperties.from_argparse_args(...) (#4192 ) * Annotate return type of TrainerProperties.from_argparse_args(...) * Added second empty line between class and typevar * Renamed all uses of the typevar to _T	2020-10-17 20:00:50 +08:00
Akihiro Nitta	b45b57cc58	Use `Optional` for arguments set to `None` by default (#4164 ) * Use `Optional` for variables set to `None` by default * Use `Optional` instead of `Union[None, ...]` for consistency	2020-10-15 23:02:50 +02:00
William Falcon	72f19768c8	remove duplicate metric vs step log for train loop (#4173 ) * remove duplicate metric vs step log * remove duplicate metric vs step log * remove duplicate metric vs step log * fix ddp index issue	2020-10-15 10:47:00 -04:00
William Falcon	45d05ff68d	Fixes #4141 (#4169 ) * fix val epoch agg * fix val agg metrics * fix val agg metrics * fix val agg metrics	2020-10-15 09:12:05 -04:00
Jirka Borovec	f064682786	save initial arguments (#4163 ) * save initial arguments * typing * chlog * .	2020-10-15 08:30:49 -04:00
Rohit Gupta	dec31b3e76	Call on_load_checkpoint before loading state_dict (#4057 )	2020-10-14 23:26:04 +02:00
William Falcon	09c2020a93	notices (#4118 )	2020-10-13 07:18:07 -04:00
William Falcon	bf2067a609	enabled manual returns (#4089 )	2020-10-12 10:06:17 -04:00
William Falcon	1dbc6ffbc1	added templates (#4077 ) * docs * docs	2020-10-11 09:35:51 -04:00
William Falcon	7ffe05a3d1	ref: accelerator names (#4066 ) * ref: accelerator names * docs	2020-10-11 01:05:14 -04:00
William Falcon	0281b077d8	ref: decouple apex second attemp part 10/n (#4064 ) * ref: decouple apex second attemp part 9/n * ref: decouple apex second attemp part 9/n * ref: decouple apex second attemp part 9/n	2020-10-10 20:05:05 -04:00
William Falcon	dbfe2b6129	ref: decouple apex second attemp part 9/n (#4063 ) * ref: decouple apex second attemp part 9/n * ref: decouple apex second attemp part 9/n	2020-10-10 18:44:24 -04:00
William Falcon	5ce9fc6bb3	ref: decouple apex second attemp part 7/n (#4061 ) * ref: decouple apex second attemp part 7/n * ref: decouple apex second attemp part 7/n * ref: decouple apex second attemp part 7/n	2020-10-10 16:44:15 -04:00
William Falcon	d1bbb449a3	ref: decouple apex second attemp part 5/n (#4058 )	2020-10-10 14:35:25 -04:00
Rohit Gupta	bdbf846029	Fix to print scaler value in progress bar (#4053 ) * Fix to print scaler value in progress bar * chlog * Fix to print scaler value in progress bar * Fix to print scaler value in progress bar	2020-10-10 12:20:11 -04:00
William Falcon	ce2edf1192	ref: decouple apex second attemp part 4/n (#4056 ) * ref: decouple apex second attemp part 4/n * ref: decouple apex second attemp part 4/n * Update lightning.py * ref: decouple apex second attemp part 4/n	2020-10-10 12:19:22 -04:00
William Falcon	7285613974	ref: decouple apex second attemp part 2/n (#4054 ) * ref: decouple apex second attemp part 2/n * ref: decouple apex second attemp part 2/n	2020-10-10 10:24:20 -04:00
William Falcon	5b261a230e	enable passing in custom accelerators (#4050 ) * enable custom accelerators * ref: finish decoupling apex, LM and backward * ref: finish decoupling apex, LM and backward * ref: finish decoupling apex, LM and backward	2020-10-10 09:21:08 -04:00
William Falcon	2b255a3df4	ref: enable custom clusters (1/n) (#4048 ) * enable cluster plugins * enable cluster plugins + test backend choices * enable cluster plugins + test backend choices * enable cluster plugins + test backend choices * enable cluster plugins + test backend choices * enable cluster plugins + test backend choices * enable cluster plugins + test backend choices	2020-10-10 08:09:29 -04:00

1 2 3 4 5 ...

1007 Commits