Commit Graph

1007 Commits

Author SHA1 Message Date
Sean Naren 0211f7f9b2 Disable pl optimizer temporarily to fix AMP issues (#5163)
* Disable pl optimizer temporarily to fix AMP issues

* Add todo and enable pl optimizer in the test
2021-01-05 09:58:37 +01:00
chaton 13bbf4b3f2 Un-balanced logging properly supported (#5119)
* resolve bug

* clean code

* resolve comments

* Update tests/trainer/optimization/test_multiple_optimizers.py

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

* resolve another bug

* add comments

* use abs to find diff

* update

* resolve flake8

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2021-01-05 09:58:37 +01:00
Loi Ly 1d13943605 Fix reset TensorRunningAccum (#5106)
* Fix reset TensorRunningAccum

* add test for TensorRunningAccum's reset method

* fix CI failed due to PEP8

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2021-01-05 09:58:36 +01:00
Jirka Borovec c72880f109
hotfix: dataloaders - add unimplemented methods (#5352)
* add unimplemented methods

* test

* test

* flake8
2021-01-05 03:41:20 -05:00
Justus Schock d88cf4a652
Add Support for multiple train loaders (#1959)
* add support for wrong dtype in apply_func

* apply loader resetting to possible collection of loaders

* add combined loader iter class

* integrate combined loader iter to training loop

* fix imports

* fix imports

* finish supporters

* add tests for supporters

* add test for model with multiple loaders

* fix trainer integration

* fix instance check

* Train loaders (#4032)

* patch for issues discussed in #1959, encapsulating underlying datastructures returned from train_dataloader

* update data_loading.py to it uses patch discussed in #1959

* rename class

* Separate CombinedLoaderIterator into two classes, and update related tests. (#4606)

* Fix the bugs after rebasing.

* Add custom get_len for apply_to_collection

* Refactor MultiIterator to be as CombinedLoaderIterator

* To get the right num_training_batches. Call the wrapper for multi trainloader in data_loading.py, instead of training_loop.py

* Reload _loader_iters when calling __iter__

* Don't transform DataLoader to CombinedLoaderIterator when it's along

* Updates test_fit_multiple_train_loaders for testing num_training_batches

* Seperate CombinedLoaderIterator into CombinedLoaderIterator and CombinedDataLoader. Add CombinedDataset for unified DataLoader format.

* Initialize CombinedDataLoader before calculating num_training_batches. Also updating self._worker_check for multiple loaders

* Update tests for supporters

* Update tests for multiple trainloaders. Add tests about few_workers for multiple loaders.

* Fix pep8 issues

* Add tests for train_loader_patch.py

* Add descriptions to multiple_trainloader_mode

* Remove unused variables

* Add docstrings and typing

* Add more tests for better converage

* Remove unused commented codes

* Add sampler property

* Remove extract_dataset

* Update typing

* pep8

* Update train_loader_patch.py

* Apply suggestions from code review

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* Update pytorch_lightning/trainer/supporters.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* reviewer comments

* fix stupid import

* add docs

* add back line separator

* fix line sep

* pep8

* Apply suggestions from code review

* fix

* fix

* Apply suggestions from code review

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* Apply suggestions from code review

Co-authored-by: Nicki Skafte <skaftenicki@gmail.com>

* flake8

Co-authored-by: Justus Schock <justusschock@justuss-mbp.fritz.box>
Co-authored-by: Christofer Fransson <christofer_fransson@yahoo.com>
Co-authored-by: YI-LIN SUNG <r06942076@ntu.edu.tw>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Nicki Skafte <skaftenicki@gmail.com>
Co-authored-by: Jirka Borovec <jirka.borovec@seznam.cz>
2021-01-04 19:57:53 +00:00
Jirka Borovec b72ed71d4e
Refactor: clean trainer device & distrib setters (#5297)
* naive replace

* simplify

* clean

* .

* fix

* .

* fix

* fix
2021-01-04 17:10:13 +00:00
Jirka Borovec 957583544a
mark todo exceptions (#5320)
* mark todo exceptions

* .

* .

* .

* .

* .

* .

* .

* .

* try

* .
2021-01-04 09:07:56 +01:00
Jirka Borovec 73e06fd7c8
fix trainer distributed attributes (#5303)
* fix trainer distributed attributes

* .

* fix
2020-12-31 11:10:44 +01:00
Jirka Borovec 7a615b5651
add tests for Trainer attributes (#5261)
* add tests for Trainer attributes

* drop empty
2020-12-29 18:56:13 +01:00
Jirka Borovec a884866ff0
Unify names in Utils (#5199)
* warnings

* argparse

* mutils

* xla device

* deprecated

* tests

* simple

* flake8

* fix

* flake8

* 1.4
2020-12-22 00:23:33 +01:00
Jirka Borovec 0f36525e8f
fix/enable - check F401 (#5201)
* refactor - check F401

* missed

* fix
2020-12-21 10:15:04 +01:00
Jirka Borovec 35fd6e93c7
refactor - check E501 (#5200) 2020-12-21 14:23:09 +05:30
Jirka Borovec 2d54116baa
annotat unused vars (#5017)
* annotate all unused vars

* rank_zero_warn

* Apply suggestions from code review

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* f1 fixed

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2020-12-19 13:53:06 +01:00
chaton f3748ba808
[feat] Enable self.log in callbacks (#5094)
* enable to use self.log in callbacks

* update

* revert back to assert
2020-12-16 16:08:39 -05:00
Jirka Borovec 059eaecbb4
set xxx_AVAILABLE as protected (#5082)
* sett xxx_AVAILABLE as protected

* docs
2020-12-14 20:19:05 +05:30
Carlos Mocholí 0327f6b4c2
Do not warn when the name key is used in the lr_scheduler dict (#5057)
* Do not warn when the name key is used

* Missing line

* Consistency

* Update pytorch_lightning/callbacks/lr_monitor.py

* Update docs

* Update pytorch_lightning/core/lightning.py

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

* Update CHANGELOG

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2020-12-14 08:38:10 +01:00
tarepan 16feb5137b
Refactor load in checkpoint connector (#4593)
* Refactor load step commentaries

* Refactor hpc ckpt suffix acquisition

* Refactor restore/hpc_load match

* Refactor hpc load trial

* Refactor checkpoint dir check

* Refactor unneeded function nest

* Refactor nested If

* Refactor duplicated cache clear

* Refactor attempt flow with if/elif

* Fix pip8

* Refactor hook commentary

Co-authored-by: chaton <thomas@grid.ai>

* Fix pep8

* Refactor hpc load checkpoint path acquisition

* Fix pip8

* Fix doc

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* Refactor None Union type with Optional

Co-authored-by: chaton <thomas@grid.ai>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Roger Shieh <sh.rog@protonmail.ch>
2020-12-14 00:13:50 +08:00
chaton 1a970b2d8d
[hotfix] Extend Optimizer + update doc (#5095)
* resolve urgent bug

* update pr

* update doc

* update

* remove typo

* add defaults

* Update pytorch_lightning/__init__.py

* Update setup.py

* update doc

* Update docs/source/optimizers.rst

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* update

* resolve doc

* debug test

* update test

* Update docs/source/optimizers.rst

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* Update docs/source/optimizers.rst

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* Update docs/source/optimizers.rst

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* remove useless import

* Update docs/source/optimizers.rst

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2020-12-11 14:24:59 -05:00
Jirka Borovec d5fa02e798
simplify accelerator steps (#5015)
* simplify accelerator steps

* Apply suggestions from code review

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2020-12-10 18:36:13 +05:30
Jirka Borovec 4ebce38478
update usage of deprecated automatic_optimization (#5011)
* drop deprecated usage automatic_optimization

* Apply suggestions from code review

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* Apply suggestions from code review

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2020-12-10 15:31:33 +05:30
Jirka Borovec 77fb425dd4
update usage of deprecated profiler (#5010)
* drop deprecated profiler

* lut

Co-authored-by: Roger Shieh <sh.rog@protonmail.ch>
2020-12-10 08:38:14 +01:00
Jirka Borovec ce9179591d
ref: clean config [1/n] add intermediate setters (#4990)
* add intermediate setters

* show inputs

* fix options

* move

* fix

* less talk

* fix

* talk less

* str

* cases

* rename

Co-authored-by: chaton <thomas@grid.ai>
2020-12-09 14:13:57 -05:00
Jirka Borovec 53d7c9555c
drop usage of deprecated distributed_backend (#5009)
Co-authored-by: chaton <thomas@grid.ai>
Co-authored-by: Roger Shieh <sh.rog@protonmail.ch>
2020-12-09 09:18:23 +01:00
Sean Naren ee9b3fe574
[feat] pp 1/n (#5016)
* Added changes for RPC plugin

* Add missing kwargs

* Fix code format

* Loading refactors by introducing is_distributed var, fix optimizer step flow

* Add rpc guard

* Added docstrings and typing

* resolve comments

* Add additional rpc hook, refactor name of exit process hook for clarity

* remove annotation

* Modify behaviour to allow optional return, add test for rpc plugin

* resolve tests

* rename is_ddp_based

* update

* update for windows

* update

* resolve test

* code smell

* Revert back to init_ddp_connection for backwards compat

* Swap to explicit name for property

* Add missing speed parity increase for CI variability, fix call counts for child process

Co-authored-by: tchaton <thomas@grid.ai>
2020-12-08 22:02:10 +00:00
Rohit Gupta 6d2aeff26a
fast_dev_run can be int (#4629)
* fast_dev_run can be int

* pep

* chlog

* add check and update docs

* logging with fdr

* update docs

* suggestions

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* fdr flush logs

* update trainer.fast_dev_run

* codefactor and pre-commit isort

* tmp

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Roger Shieh <sh.rog@protonmail.ch>
Co-authored-by: edenlightning <66261195+edenlightning@users.noreply.github.com>
2020-12-09 01:37:53 +05:30
chaton 2393474350
[hotfix] ddp + manual_optimisation (#4976)
* Rely on ddp plugin for blocking sync behaviour, and skip if we're using manual optimization

* debug

* Revert "debug"

This reverts commit ccca6b6b

* Expose manual reduce for automatic optimization

* Add input arguments

* Enable parity test

* clean imports

* Expose hook after to ensure we reset

* Fix naming

* add

* fix test

* resolve on comments

* typo

* Update tests/trainer/optimization/test_manual_optimization.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* Update tests/trainer/optimization/test_manual_optimization.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* update on comments

* resolve comments

Co-authored-by: SeanNaren <sean@grid.ai>
Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2020-12-07 19:31:54 +00:00
chaton 02152c1729
Simplify optimization Logic (#4984)
* Rely on ddp plugin for blocking sync behaviour, and skip if we're using manual optimization

* debug

* Revert "debug"

This reverts commit ccca6b6b

* Expose manual reduce for automatic optimization

* Add input arguments

* Enable parity test

* clean imports

* Expose hook after to ensure we reset

* Fix naming

* add

* fix test

* uniformize optimizer logic

* resolve test

* resovle flake8

* resolve amp bug

* update tests

* remove bug

* remove optimizer_step in accelerators

* typo

* update lightning optimizer

* set doesn't work with ddp_spawn

* resolve flake8

* update threshold

* ignore pyright

* correct codeFactor

* remove useless if

* remove zer_grad function

* simplify step

* remove typo

* resolve bug

* Apply suggestions from code review

* update on comments

* resolve bugs

* remove tests

* Update pytorch_lightning/trainer/configuration_validator.py

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

* simplify testing

* add more tests

Co-authored-by: SeanNaren <sean@grid.ai>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2020-12-07 12:55:49 +00:00
chaton 2e838e6dd8
Enable`self.log` in most functions. (#4969)
* refactor

* solve pyright

* remove logging in batch_start functions

* update docs

* update doc

* resolve bug

* update

* correct script

* resolve on comments
2020-12-06 13:01:43 +00:00
Carlos Mocholí 72349706c1
Improve epoch_result_store code quality (#4875)
* Improve code quality

* black -l 120 -S

* Fix pyright error

Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
Co-authored-by: chaton <thomas@grid.ai>
2020-12-05 11:49:28 +00:00
Justus Schock f23f5e5648
Fix DP Logging Aggregation (#4138)
* add option to step result to do aggregation on a specific device

* in dp: do aggregation on root gpu

* Update CHANGELOG.md

* pep8

* trailing whitespace

* move to root


move result


stupid result object


revert to master


undo import


add "to" method to result


generalize to


try a test


try a test


Revert "try a test"

This reverts commit 22e3c1001e6c5774ea18ad925830304c245bf145.

Revert "try a test"

This reverts commit 4d2d8fb2a52d552894809a0cbe51af126d78f070.

new test


max epochs


super epoch end 


log in test


hanging test


undo test


initial test that fails on master


step end


pass


step end


step end


epoch end


print


step


check dev


clean up test


sanity check


wtf is go ing on


frustration


debugging test


test


test


test


test


test


test


test


test


unused import

* move chlog entry

* clean

* remove outdated changes

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Jirka Borovec <jirka.borovec@seznam.cz>
2020-12-04 19:10:07 +01:00
Rohit Gupta 342a2b6f25
Deprecate auto mode from ModelCheckpoint and EarlyStopping (#4695)
* remove auto mode from callbacks

* chlog

* remove auto mode from callbacks

* mode

* mode

* move back

* update docs

* update docstrings

* docstring warning

* fix syntax

* Apply suggestions from code review

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* isort

* default to 'auto'

* syntax

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2020-12-04 16:11:58 +01:00
NeuralLink 88792982b5
🔨 minor refactor in trainer. (#4801)
* 🔨 minor refactor in trainer.

* 🔨 Use finally instead of else

* 🔨 revert format

* 🔨 check should skip inside try

Co-authored-by: chaton <thomas@grid.ai>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2020-12-04 13:42:13 +01:00
Jirka Borovec 3976db597d
refactor imports of optional dependencies (#4859)
* refactor imports of optional dependencies

* fix

* fix

* fix

* fix

* fix

* flake8

* flake8

Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
Co-authored-by: chaton <thomas@grid.ai>
2020-12-04 10:26:10 +01:00
Jethro Kuan c7e349e73d
docs: default_root_path -> default_root_dir (#4942)
* docs: default_root_path -> default_root_dir

* Apply suggestions from code review

* fix

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* update notebook

Co-authored-by: Jethro Kuan <jethro.kuan@bytedance.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2020-12-02 19:17:34 -05:00
Lezwon Castelino 12cb9942a1
Tpu save (#4309)
* convert xla tensor to cpu before save

* move_to_cpu

* updated CHANGELOG.md

* added on_save to accelerators

* if accelerator is not None

* refactors

* change filename to run test

* run test_tpu_backend

* added xla_device_utils to tests

* added xla_device_utils to test

* removed tests

* Revert "added xla_device_utils to test"

This reverts commit 0c9316bb

* fixed pep

* increase timeout and print traceback

* lazy check tpu exists

* increased timeout
removed barrier for tpu during test
reduced epochs

* fixed torch_xla imports

* fix tests

* define xla utils

* fix test

* aval

* chlog

* docs

* aval

* Apply suggestions from code review

Co-authored-by: Jirka Borovec <jirka.borovec@seznam.cz>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: chaton <thomas@grid.ai>
Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
2020-12-02 13:05:11 +00:00
Sean Naren e952dee292
Allow string plugins (#4888)
* Allow plugin to be chosen via string

* Fix implementation, add tests

* Fix codefactor issues

* Added missing env patch

* Skip test for windows

* Reword reason

* Add skip to invalid test

* Create required_plugins function, move sharded amp requirement to plugin

* Pass AMPType, fix setter for apex

* Better doc strings

* Add exception when using apex

* Add trainer available_plugins function, warn user when plugins have been added automatically with option to override behaviour

* Fixed pep8 indent

* Fix codefactor issues

* Add env variables

* Update pytorch_lightning/cluster_environments/cluster_environment.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* Addressed code review

* Update pytorch_lightning/plugins/plugin_connector.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* Update pytorch_lightning/plugins/plugin_connector.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* Update pytorch_lightning/plugins/plugin_connector.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* Addressed more code review feedback

* Fixed docstrings

* Swapped to verbose runtime error

* Apply suggestions from code review

* Apply suggestions from code review

* Update pytorch_lightning/plugins/sharded_plugin.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* Change name

* Pass trainer to plugins that may require it

* Fix sharded plugin

* Added test to ensure string sharded works

* Removed trainer typing as this breaks pep8

* Fixed doc issues

* Fixed tests

Co-authored-by: chaton <thomas@grid.ai>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2020-12-01 20:30:49 +00:00
Justus Schock ebbf256bf5
Create memory dynamically (#4938)
* create window size dynamically.

* pep8

Co-authored-by: chaton <thomas@grid.ai>
2020-12-02 01:05:12 +05:30
chaton 1d3724a878
[HotFix] Logging - One epoch delay on training epoch metrics. (#4913)
* add test

* resolve logging bug

* update

* resolve pep8

* resolve tests

Co-authored-by: Roger Shieh <sh.rog@protonmail.ch>
2020-12-01 09:26:52 +00:00
chaton c2e6e68c7e
optimizer clean up (#4658)
* add LightningOptimizer

* typo

* add mock closure

* typo

* remove logic in optimizer_step

* update

* update

* update

* desactivate LightningOptimizer for hovorod

* resolve flake

* typo

* check optimizer name

* change name

* added backward to LightningOptimizer

* remove use_lightning_optimizer

* move update

* simplify init

* resolve comments

* resolve bug

* update

* update

* resolve bugs

* resolve flake8

* set state

* work manual_optimizer_step

* add doc

* add enable_pl_optimizer

* make optimizer_step

* add make_optimizer_step

* add examples

* resolve test

* add test_optimizer_return_options_enable_pl_optimizer

* add enable_pl_optimizer=True

* update

* update tests

* resolve bugs

* update

* set Trainer to False

* update

* resolve bugs

* update

* remove from doc

* resolve bug

* typo

* update

* set to True

* simplification

* typo

* resolve horovod

* unwrap horovod

* remove Optimizer

* resolve horovod

* move logic to amp_backend

* doesn't seem to be pickable

* update

* add again

* resolve some bugs

* cleanup

* resolve bug with AMP

* change __repr__

* round at -12

* udpate

* update

* update

* remove from horovod

* typo

* add convert_to_lightning_optimizers in each accelerators

* typo

* forgot

* forgot a convert_to_lightning_optimizers

* update

* update

* update

* increase coverage

* update

* resolve flake8

* update

* remove useless code

* resolve comments + add support for LightningOptimizer base class

* resolve flake

* check optimizer get wrapped back

* resolve DDPSharded

* reduce code

* lightningoptimizer

* Update pytorch_lightning/core/optimizer.py

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* Update pytorch_lightning/core/lightning.py

* remove reference to step function

* Apply suggestions from code review

* update on comments

* resolve

* Update CHANGELOG.md

* add back training_step in apex and native_amp

* rename optimizer_step

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: William Falcon <waf2107@columbia.edu>
Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
2020-12-01 00:09:46 +00:00
William Falcon f677efe61e
Merge pull request #4880 from PyTorchLightning/better_simple_profiler
Logging
2020-11-27 15:33:58 -05:00
Sean Naren 06a856e055
Merge branch 'master' into feature/plug 2020-11-27 18:48:58 +00:00
tchaton ba41733802 Merge branch 'better_simple_profiler' of https://github.com/PyTorchLightning/pytorch-lightning into better_simple_profiler 2020-11-27 18:47:05 +00:00
tchaton 316ebadbdc remove capture on on_train_batch_end 2020-11-27 18:46:49 +00:00
chaton 6ba77c2611
Merge branch 'master' into better_simple_profiler 2020-11-27 18:43:01 +00:00
tchaton cef83dbbf8 optimize logging 2020-11-27 18:21:23 +00:00
Jirka Borovec 042152cd61
ref: fix & simplify test callback (#4009)
* simplify test callback

* update

* use mock

* flake8
2020-11-27 19:12:56 +01:00
tchaton e17300f97d add more profiler 2020-11-27 18:00:48 +00:00
tchaton 3a8fa6bf11 update 2020-11-27 17:48:51 +00:00
tchaton 290d74b40e resolve test 2020-11-27 16:47:13 +00:00
SeanNaren 1704773712 Address code review 2020-11-27 14:50:12 +00:00
Sean Naren 4f693762ea
Update pytorch_lightning/trainer/connectors/precision_connector.py
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2020-11-27 14:45:15 +00:00
SeanNaren cdd2e122fc Add none check for func 2020-11-27 14:30:57 +00:00
SeanNaren 5598dce1a9 Remove unneeded check 2020-11-27 14:22:17 +00:00
Sean Naren 00bd0d2e72
Merge branch 'master' into feature/plug 2020-11-27 13:18:50 +00:00
chaton dee968f20b
[bug] Replace_sampler attach previous multiprocessing_context (#4742)
* resolve bug

* add test docstring

* Update tests/trainer/test_dataloaders.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* update test

Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2020-11-27 12:57:25 +00:00
SeanNaren 04bb0abe36 Merge branch 'master' into feature/plug
# Conflicts:
#	pytorch_lightning/utilities/__init__.py
#	requirements/extra.txt
2020-11-27 10:00:05 +00:00
Jirka Borovec 217650320e
simplify imports Omegaconf (#4873)
* hydra

* omegaconf
2020-11-27 01:00:56 +01:00
Jirka Borovec 442d57f1e9
simplify imports xla / TPU (#4872)
* xla

* tpu

* fix

* fix

* flake8
2020-11-27 00:37:48 +01:00
SeanNaren 737447fc6e Merge branch 'master' into feature/plug
# Conflicts:
#	pytorch_lightning/trainer/connectors/precision_connector.py
#	pytorch_lightning/utilities/__init__.py
2020-11-26 23:02:36 +00:00
Jirka Borovec 11e73ceaa6
fix import and typo in AMP (#4871)
* fix import and typo

* docs

* apex

* fix

* typo
2020-11-26 23:45:52 +01:00
SeanNaren fc9b2bf015 Fix logic and add test for apex check, rename file, add DDP launcher tests 2020-11-26 22:45:21 +00:00
SeanNaren 8dc857c38d Ensure we add the condition to the case statement 2020-11-26 22:11:05 +00:00
SeanNaren a9c316b669 Add additional check to ensure apex is not used with sharded 2020-11-26 19:00:55 +00:00
SeanNaren 47c121ef1a Addressed code review points 2020-11-26 16:44:45 +00:00
Sean Naren 22b4d5ee1a
Merge branch 'master' into feature/plug 2020-11-25 20:16:37 +00:00
chaton 204a0a2d03
[bugfix] Accumulated_gradient and TensoBoard (#4738)
* resolve bug

* update

* update

* modify one test

* remove paramters

* update on comments

* update changelog

* update docstring

Co-authored-by: Nicki Skafte <skaftenicki@gmail.com>
2020-11-25 19:44:05 +00:00
SeanNaren b39f290c4d Merge branch 'master' into feature/plug 2020-11-25 12:55:42 +00:00
SeanNaren 6b129216d0 Add catches around fairscale installation 2020-11-24 19:23:55 +00:00
Samyak S Sarnayak ccf38ced2e
Use high progress_bar_refresh_rate on Google Colab (#4654)
* Use high refresh rate on Google Colab (#3786)

Automatically override progress_bar_refresh_rate when on Google
Colab. Also added a constant IS_COLAB in utilities to check
whether it is being run in colab or not.
(#3786)

* Show a warning instead of overriding when rate is low on colab

* Change warning to suggestion and move it

Moved warning to configure_progress_bar instead of on_trainer_init

* Apply suggestions from code review

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

* add a mock test

Co-authored-by: chaton <thomas@grid.ai>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2020-11-24 02:13:33 +05:30
SeanNaren d953f2be5b Merge branch 'master' into feature/fairscale-817-6n
# Conflicts:
#	pytorch_lightning/accelerators/accelerator.py
#	pytorch_lightning/accelerators/ddp2_accelerator.py
#	pytorch_lightning/accelerators/ddp_accelerator.py
#	pytorch_lightning/accelerators/ddp_cpu_spawn_accelerator.py
#	pytorch_lightning/accelerators/ddp_hpc_accelerator.py
#	pytorch_lightning/accelerators/ddp_spawn_accelerator.py
#	pytorch_lightning/accelerators/dp_accelerator.py
#	pytorch_lightning/plugins/ddp_plugin.py
#	pytorch_lightning/trainer/connectors/model_connector.py
2020-11-23 20:19:46 +00:00
Sean Naren 404af43cde
5/n: Extract reference model call to plugins/accelerators (#4773)
* Encapsulate extracting reference model within the plugin to allow custom wrapper logic to live within the plugin/accelerators

* Add missing new lines

* Fix call to accelerator

* Removed double blank

* Use accelerator backend

* Handle case where wrapper has not been initialized within the plugin

* Added basic get model tests, add better typing

* Change model name

* Split GPU/DDP test

* Add stronger typing, skip ddp test on windows

* Fix import

* Fix import in dp

* Fixed PEP8 definition

* Add ddp launcher for ddp testing

* Modify accelerator reference model to property, change name to reflect func

* Revert property as this is incorrect.=

* Revert across accelerators

* Modified name to get_model_from_plugin

* Code review changes, fix issue with dp

* Add verb to function getter

Co-authored-by: chaton <thomas@grid.ai>
2020-11-23 17:21:47 +00:00
SeanNaren c590e3a166 Ensure we check if we should use sharded amp plugin 2020-11-22 15:18:50 +00:00
SeanNaren b506a7e46a Revert across accelerators 2020-11-22 15:00:23 +00:00
SeanNaren 977625c289 Revert property as this is incorrect.= 2020-11-22 14:54:00 +00:00
Sean Naren 4b16b47843
Merge branch 'master' into feature/817-fairscale-5n 2020-11-22 11:39:15 +00:00
SeanNaren 358f503848 Modify accelerator reference model to property, change name to reflect func 2020-11-22 11:39:00 +00:00
Teddy Koker 299de5dc62
don't override PYTHONWARNINGS (#4700)
Co-authored-by: Nicki Skafte <skaftenicki@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2020-11-22 11:25:24 +01:00
edenlightning a716ea60e1
Clarify checkpoint deprecation message (#4640)
* Clarify checkpoint deprecation message

* Update pytorch_lightning/trainer/connectors/callback_connector.py

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* Apply suggestions from code review

* Apply suggestions from code review

* Apply suggestions from code review

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Jeff Yang <ydcjeff@outlook.com>
Co-authored-by: Roger Shieh <sh.rog@protonmail.ch>
Co-authored-by: chaton <thomas@grid.ai>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2020-11-22 07:35:54 +01:00
YI-LIN SUNG 69b9949192
[docs] Remove the redundant indents in trainer.py (#4720)
* Remove the redundant indents in trainer.py

* Update pytorch_lightning/trainer/trainer.py

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: chaton <thomas@grid.ai>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Roger Shieh <sh.rog@protonmail.ch>
2020-11-21 08:15:09 +06:30
Sean Naren e3869c3950
Merge branch 'master' into feature/817-fairscale-5n 2020-11-20 17:13:17 +00:00
Roger Shieh 42e59c6add
Cast hparams to dict when not using omegaconf (#4770)
* init fix

* init test

* more specific dict assert

* update changelog

* Update tests/checkpointing/test_model_checkpoint.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

Co-authored-by: chaton <thomas@grid.ai>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2020-11-20 19:53:05 +08:00
SeanNaren 95a1f19851 Use accelerator backend 2020-11-19 10:59:17 +00:00
SeanNaren 078a829834 Fix call to accelerator 2020-11-19 10:48:27 +00:00
SeanNaren be4c24c484 Encapsulate extracting reference model within the plugin to allow custom wrapper logic to live within the plugin/accelerators 2020-11-19 10:43:16 +00:00
Sean Naren f0ab74dc2f
Expose scaler in amp plugin (#4737) 2020-11-18 22:30:47 +00:00
Sean Naren e7134a9135
Sharded Plugin 2/n: Allow ddp plugin to modify optimizer state saving (#4675)
* Allow ddp plugin to modify optimizer state saving

* Rely on the accelerator for optimizer states

* Ensure we init the accelerator for the saving function

* Better comment for optim state dump

* Revert "Ensure we init the accelerator for the saving function"

This reverts commit af65effa

* Added accelerator check to initialize tuner before saving model checkpoint

* Simplify comment

* Revert "Added accelerator check to initialize tuner before saving model checkpoint"

This reverts commit f9929c0c

* Return single optimizer state to reduce duplication

* Fixed docstring

* Fixed typing

* Fixed comment

* Added CHANGELOG.md

Co-authored-by: chaton <thomas@grid.ai>
2020-11-18 16:38:35 +00:00
chaton 96769a7184
quick fix (#4697) 2020-11-16 16:20:35 +00:00
chaton 867eef0e4c
[HOTFIX] Logging for evaluation (#4684)
* resolve bugs

* add should_flush_logs

* remove should_flush

* should work

* update test

* use something else

* Update pytorch_lightning/trainer/connectors/logger_connector/logger_connector.py

* log mock_log_metrics.mock_calls

* typo

* don't use keys

* convert to list

* typo

* check kwargs

* resolve bug

* resolve flake8

Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
2020-11-15 10:41:33 -05:00
Justus Schock e04e7c9ecc
Makes automatic optimization a model attribute (#4602)
* Makes automatic optimization a model attribute

* Update trainer.py

* remove setting property in model

* Update pytorch_lightning/core/lightning.py

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

* Update pytorch_lightning/trainer/trainer.py

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

* Update trainer.py

Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
Co-authored-by: Roger Shieh <sh.rog@protonmail.ch>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: Jeff Yang <ydcjeff@outlook.com>
2020-11-14 11:13:42 +06:30
ananthsub d096a2ea6d
Fix setup callback hook to pass LightningModule through (#4608)
* Fix setup callback hook

* Update CHANGELOG.md

* Update test_trainer.py

* Update test_trainer.py

* Update test_trainer.py

* fix chlog

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2020-11-13 19:34:46 -05:00
Jeff Yang baa8558cc0
logger docs and api docs (#3950)
* logger and api docs

* remove gpu_usage_logger, lr_logger

* update docstring

* fix wandb example

* remove step result

* charts

* add some charts info

Co-authored-by: Teddy Koker <teddy.koker@gmail.com>
Co-authored-by: Nicki Skafte <skaftenicki@gmail.com>
2020-11-13 20:35:54 +05:30
chaton 4018237c30
[FEAT] Add lambda closure to manual_optimizer_step (#4618)
* added lambda_closure

* move to types

* add 2 new tests

* make example more complex

* add complex example to doc

* added more tests

* resolve doc

* typo

* update

* update tpu optimizer_step

* Apply suggestions from code review

* Update pytorch_lightning/core/lightning.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* update

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2020-11-12 19:22:06 +00:00
chaton 3d202f9ecc
[FEAT] Refactor logging 3/3 [v1] (#4552)
* wip

* wip check how many tests break

* wip

* resolve some bugs

* resolve more bugs

* resolve 2 bugs

* resolve

* temp fix

* update

* remove useless code

* remove result

* try to resolve bug

* update changelog

* formatting

* remove pl

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
2020-11-11 17:05:24 +00:00
chaton 514cb22bd7
[Fix] Move log value to cpu. (#4592)
* move value to cpu to save memory

* update

* move to cpu

* try something

* update

* update

* add back out_dict.update({k: v})

* add move_metrics_to_cpu

* update

* Update pytorch_lightning/utilities/memory.py

Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>

* resolve comments

* Update pytorch_lightning/core/step_result.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* Update pytorch_lightning/trainer/connectors/logger_connector/logger_connector.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
2020-11-10 21:13:41 +00:00
chaton 7e08b0d710
[bug-fix] DDP and automatic_optimization=False (#4485)
* resolve bug

* add self._running_manual_optim

* update

* update tests

* update lightning module

* resolve bug

* update tests

* update

* resolve pep8

* update

* replace by `ddp_spawn`

* temporary fix

* update

* update

* move update to training_loop

* make both ddp_spawn

* introduce `manual_optimizer_step`

* update changelog

* added changelog wrong place

* add force_optimizer_step

* update docstring for tests

* update optimizer_step

* update zero_grad

* resolve flake8

* move update into manual_optimizer_step

* add zero_grad

* remove zero_grad tests

* remove manual_backward in AMP, it doesn't help

* update

* loosen tests

* update

* update doc

* add TODO

* Removed unnecessary get model from native amp

* Remove try except with pytest raise

* Add seed, clean up imports, remove try catch to reproduce error

* update code

* update test

* revert back

* formatting

* Update pytorch_lightning/core/lightning.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

Co-authored-by: SeanNaren <sean@grid.ai>
Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2020-11-10 19:44:51 +00:00
tarepan 41c9bee4f0
Fix load disparity between normal and hpc (#4526)
* Add missing load functionality in hpc

* Add general file load for hpc

* Add mark in CHANGELOG

* Fix Typo Li**hg**tning

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

* Refactor line separation

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

* Fix entangled fixation commit

* Fix naming of restore_model_states

* Fix amp restore place

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: chaton <thomas@grid.ai>
2020-11-09 17:26:38 +00:00
William Falcon 09a51697ed
Adds shortcut for path to log (#4573)
* added log_dir shortcut to trainer properties for writing logs

* added log_dir shortcut

* added log_dir shortcut

* added log_dir shortcut

* added log_dir shortcut

* added log_dir shortcut

* added log_dir shortcut

* added log_dir shortcut

* added log_dir shortcut
2020-11-08 12:16:22 -05:00
William Falcon bb356a73cb
added trainer api docs (#4569) 2020-11-07 14:18:45 -05:00
chaton 9c8701f2e2
[feat] Logging refactor 2/n - train (#4495)
* update logging

* solve more bugs

* replace Mapping by Dict

* update on comments

* resolve pep8

* Apply suggestions from code review

Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>

* Update pytorch_lightning/trainer/connectors/logger_connector/epoch_result_store.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* update on comments

* typo

* update for coverage

* update test

* update

* Update tests/models/test_hooks.py

Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>

* Update tests/models/test_hooks.py

Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>

* update on comments

* remove deepcopy

* remove useless look for

* another small optim

* extra optim

* remove lastest optim, can be source of bug

* resolve bug

* add docstring

* optimize coverage

* Update pytorch_lightning/trainer/connectors/logger_connector/epoch_result_store.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* Update pytorch_lightning/trainer/connectors/logger_connector/epoch_result_store.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* Update tests/trainer/logging_tests/test_distributed_logging.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* Update pytorch_lightning/trainer/evaluation_loop.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* Update tests/trainer/logging/test_logger_connector.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* Update tests/trainer/logging_tests/test_train_loop_logging_1_0.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* update on comments

* update

* update on comments

* update parity speed

* get it down to 0.65

* update

* 0.8 max_dif

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
Co-authored-by: William Falcon <waf2107@columbia.edu>
2020-11-05 22:27:04 +00:00
chaton 11dc5264cd
Bugfix/4449 dict attribute error (#4480)
* resolve a bug

* resolve a bug

* remove todo

* resolve more bugs

* update tests

* Update pytorch_lightning/trainer/connectors/logger_connector/logger_connector.py

Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>

* Update pytorch_lightning/trainer/connectors/logger_connector/logger_connector.py

Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>

* resolve pyright

Co-authored-by: Teddy Koker <teddy.koker@gmail.com>
Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
2020-11-04 19:35:07 +00:00
tarepan 7b375ed1d3
Add CheckpointConnector internal commentaries (#4421)
* Add CheckpointConnector commentaries

* Fix comment format

* Fix save/load schema as function comments

Co-authored-by: chaton <thomas@grid.ai>
2020-11-03 22:09:29 +05:30
Adrian Wälchli 9b7f01654a
Update old "module_arguments" and "hparams" references in docs (#4417)
* replace module_arguments refernces

* update hparams docs

* add missing save_hyperparameters in example

* deprecate instead of remove

* Update docs/source/hyperparameters.rst

Co-authored-by: chaton <thomas@grid.ai>

* Update docs/source/hyperparameters.rst

Co-authored-by: Teddy Koker <teddy.koker@gmail.com>
Co-authored-by: chaton <thomas@grid.ai>
Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
2020-11-03 12:13:10 +01:00
Rohit Gupta 360b3d8844
Disable training when limit_train_batches=0 (#4371)
* Disable training when limit_train_batches=0

* chlog

* pep

* limit_train_batches

* BoringModel

Co-authored-by: Roger Shieh <sh.rog@protonmail.ch>
2020-11-03 12:10:35 +05:30
Rohit Gupta ad2556b669
Disable saving checkpoints if not trained (#4372)
* Disable saving checkpoints if not trained

* chlog

* update test

* fix

Co-authored-by: chaton <thomas@grid.ai>
2020-11-03 11:38:32 +05:30
chaton 958aa1aee7
[test] Accumulated gradient optimization tests (#4477)
* adding tests

* wip

* update

* Update tests/trainer/test_trainer.py

Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>

Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
2020-11-02 23:44:11 +00:00
chaton ac3f7393fd
[FEAT] logging refactors 1/n (#4439)
* introducing new logging object

* typo

* typo

* Update pytorch_lightning/trainer/logging.py

Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>

* Update pytorch_lightning/trainer/logging.py

Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>

* update on comments

* update on comments

* add more doctstring

* Update pytorch_lightning/core/lightning.py

Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>

* resolve on comments

* solve pyright

* Update pytorch_lightning/trainer/connectors/logger_connector/epoch_result_store.py

Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>

* update on comments

* Update pytorch_lightning/trainer/connectors/logger_connector/epoch_result_store.py

Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>

* update on comments

Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
2020-11-02 20:51:43 +00:00
chaton 102fa9ee7d
[BUGFIX] AMP + Precision unscale grad (#4441)
* move unscale within Native plugin

* remove gradient tracking from lightning backward

* forgot trainer.fit

* typo

* update

* cleanup

* set to 1.6

* typo

* skip if below 1.6 strict

* update changelog

* remove useless code

* Update tests/plugins/test_amp_plugin.py

Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>

* Update tests/plugins/test_amp_plugin.py

Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>

* update changelog

* Update CHANGELOG.md

Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
Co-authored-by: Jeff Yang <ydcjeff@outlook.com>
2020-11-02 16:36:48 +00:00
Jirka Borovec ef03c39ab7
Add step index in checkpoint name (#3807)
* true final value of global step

* ch check

* tests

* save each validation interval

* wip

* add test

* add test

* wip

* fix tests, revert old edits, fix merge conflicts, update doctests

* test + bugfix

* sort files

* format test

* suggestion by ananth

* added changelog

* naming

* docs

* example

* suggestion

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* fix test

* pep

* pep

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2020-11-02 15:05:58 +01:00
Adrian Wälchli 6ae4c6ec85
update docs on checkpoint_callback Trainer argument (#4461)
* docs update

* update callbacks docs

* docs

* notebook examples

* warning

* line lenght

* update deprecation

Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
Co-authored-by: Roger Shieh <55400948+s-rog@users.noreply.github.com>
2020-11-02 06:18:20 +01:00
Sean Naren 6211fd4b0c
Fix type checker issue with explicit cast of ref_model object (#4457)
Co-authored-by: Jeff Yang <ydcjeff@outlook.com>
2020-10-31 16:43:19 -04:00
Adrian Wälchli d1234c592d
deprecate passing ModelCheckpoint instance to Trainer(checkpoint_callback=...) (#4336)
* first attempt

* update tests

* support multiple

* test bugfix

* changelog

* pep

* pep

* import order

* import

* improve test for resuming

* test

* update test

* add references test

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* docstring suggestion deprecation

Co-authored-by: Jeff Yang <ydcjeff@outlook.com>

* paramref

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Jeff Yang <ydcjeff@outlook.com>
2020-10-30 04:47:37 +01:00
Jeff Yang ebe3a31ddd
[docs] distributed_backend -> accelerator (#4429)
* distributed_backend -> accelerator

* distributed_backend -> accelerator

* use_amp -> precision

* format

Co-authored-by: rohitgr7 <rohitgr1998@gmail.com>
2020-10-30 00:45:24 +06:30
Justus Schock bbd81dfd55
Skips DDP parameter sync (#4301)
* ddp no-sync

* Update pytorch_lightning/trainer/training_loop.py

Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>

* Update training_loop.py

* factor __enter__ and __exit__ out to separate context manager

* delete _updated_model_last_step

Co-authored-by: justusschock <justusschock@pc125.lfb.rwth-aachen.de>
Co-authored-by: Teddy Koker <teddy.koker@gmail.com>
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
Co-authored-by: chaton <thomas@grid.ai>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2020-10-29 23:01:37 +05:30
Martin Hwang b459fd26ac
fix: `nb` is set total number of devices, when nb is -1. (#4209)
* fix: `nb` is set total number of devices, when nb is -1.

 Refs: #4207

* feat: add test code
     1. test combination `auto_select_gpus`, `gpus` options using
Trainer
     2. test `pick_multiple_gpus` function directly

Refs: #4207

* docs: modify contents in `Select GPU devices`

 Refs: #4207

* refactore: reflect the reuslt of review

 Refs: #4207

* refactore: reflect the reuslt of review

 Refs: #4207

* Update CHANGELOG.md

Co-authored-by: chaton <thomas@grid.ai>
Co-authored-by: Roger Shieh <55400948+s-rog@users.noreply.github.com>
Co-authored-by: Nicki Skafte <skaftenicki@gmail.com>
2020-10-29 10:50:37 +01:00
Rohit Gupta b26c71eadf
Add optimizer hooks in callbacks (#4379)
* Add optimizer hooks in callbacks

* optimizer param

* update test

Co-authored-by: Nicki Skafte <skaftenicki@gmail.com>
2020-10-28 13:15:22 +01:00
Dusan Drevicky c50c225f05
feature: Allow str arguments in Trainer.profiler (#3656)
* allow trainer's profiler param to have a str value

* add tests

* update docs

* update exception message

* Update CHANGELOG

* fix pep8 issues

* cleanup test code

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* Add deprecation warning if using bool for profiler

* Add deprecation tests and move deprecated tests

* Remove bool option to profiler from docs

* Deprecate bool args to profiler in CHANGELOG

* fixup! Add deprecation warning if using bool for profiler

* fixup! Add deprecation tests and move deprecated tests

* Apply suggestions from code review

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

* Implement suggestions, remove whitespace

* fixup! Implement suggestions, remove whitespace

* Allow bool, str (case insensitive), BaseProfiler

* Add info about bool deprecation to trainer

* fixup! Add info about bool deprecation to trainer

* Move deprecate todo to test_deprecated

* Test wrong profiler type, improve error message

* fixup! Test wrong profiler type, improve error message

* Update pytorch_lightning/trainer/connectors/profiler_connector.py

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* Apply suggestions from code review

* Readd bool to profiler types, test cli profiler arg

* Remove extra whitespace in doc

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* Apply suggestions from code review

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* Update deprecation versions

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2020-10-27 16:27:16 +05:30
Adrian Wälchli 48b6de0c40
update (#4343)
Co-authored-by: chaton <thomas@grid.ai>
2020-10-27 06:07:29 -04:00
William Falcon 98205fb438
Enable custom apex and amp plugins (#4355)
* enable custom apex, amp plugin

* enable custom apex, amp plugin

* enable custom apex, amp plugin

* enable custom apex, amp plugin
2020-10-25 17:11:07 -04:00
ananthsub f6efb712ed
Skip replacing dataloader sampler if it's already a distributed sampler (#4273)
* Update data_loading.py

* Update data_loading.py

* add test + update flag description

* add to changelog

* Update test_dataloaders.py

* fix-pickle

* Update test_dataloaders.py

* Added missing reference calls

* Update tests/trainer/test_dataloaders.py

* Apply suggestions from code review

* Update data_loading.py

* Update test_dataloaders.py

Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2020-10-23 17:34:07 +01:00
chaton 3abfec8962
[HOTFIX] ModelCheckpoint - Don't increase current_epoch and global_step if not trained (#4291)
* add two tests w/wo tempdir

* resolve flake8

* this test is failing

* update bug report

* resolve bug and add test

* remove bug_report

* resolve flake8

* resolve bug

* resolve pep8

* resolve pep8

Co-authored-by: Teddy Koker <teddy.koker@gmail.com>
2020-10-23 11:17:50 +01:00
Rohit Gupta 4c7ebdc32b
Add dirpath and filename parameter in ModelCheckpoint (#4213)
* Add dirpath and filename parameter in ModelCheckpoint

* remove old function

* chlog

* codefactor

* update tests

* docs

* fix doctest and added tests

* pathlib dirpath

* dep version and docs

* try fix doctest

* pep

* suggestions
Co-authored-by: carmocca <carlossmocholi@gmail.com>

* suggestions

* fix test

* pep

* trigger tests

* Apply suggestions from code review

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* suggestions

* try fix windows test

* add and update some tests

* trigger tests

* Apply suggestions from code review

* Apply suggestions from code review

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: William Falcon <waf2107@columbia.edu>
2020-10-23 09:59:12 +05:30
Sean Naren 9823f97a84
Protect functions not to be accessed by user (#4305) 2020-10-22 15:15:04 +01:00
Sean Naren 065cc94112
Fix bug comparing max_steps to global step which inits at 0 (#4278)
* Fix bug comparing max_steps to global step which inits at 0

* Added test to ensure accumulate grad batch works with max steps

* check fix with TODO test

* correct call counts

* Add check to ensure we've finished accumulation of this global step before exiting loop in conjuction with max steps

* Remove + 1 check in test as this was incorrect

* Update incorrect expected outputs in lr finder test

* Added brackets for clarity

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2020-10-22 13:58:59 +01:00
Mauricio Villegas 546476c704
Allow changing the logged step value in validation_step (#4130)
* Fix to bug identified in https://github.com/PyTorchLightning/pytorch-lightning/issues/4102

* update tests

* chlog

Co-authored-by: rohitgr7 <rohitgr1998@gmail.com>
2020-10-22 03:03:07 +05:30
Carlos Mocholí 2549ca40e6
Clean up optimizer code (#3587)
* Update optimizer code

* Update CHANGELOG

* Fix tuple of one list case

* Update docs

* Fix pep issue

* Minor typo [skip-ci]

* Use minimal match

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* Apply suggestions from code review

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2020-10-21 21:12:48 +02:00
Justus Schock 0ec4107697
Optimizer closure (#4190)
* closure for all optimizers

* rename hook and take care of alternating backwards

* add comment

* training_loop_fix

* closure whenever possible

* training_loop

* simple tests that count backward calls

* fix test to work with closure

* remove debugging statement

* better place

* check grads after backward

* start fixing manual optimization

* skip step when result returned by closure was None

* fix gradient clipping test to work with closure

* attribute dict result only for automatic optimization

* adjust backward calls in accelerator

* adjust where to call gradient clipping

* adjust backward calls in tests

* Apply suggestions from code review

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* pass kwargs to xla optimizer

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2020-10-21 19:34:29 +01:00
William Falcon 8a20d6af51
make save fx part of model checkpoint cb (#4284) 2020-10-21 10:06:42 -04:00
Carlos Mocholí e0f9799dbf
Add strict option to lr_scheduler dict (#3586)
* Add strict option to lr_scheduler dict

* Update docs

* Unnecessary "else" after "raise"

* Update CHANGELOG

* Fix rebase
2020-10-21 14:14:37 +02:00
Sean Naren c336881959
Added fix to ensure that custom logged metrics within test_epoch_end are appended to the result object even without step reduced metrics (#4251) 2020-10-20 18:33:18 +02:00
Jirka Borovec f37444fa3e
CI: add flake8 (#4239) 2020-10-19 21:20:17 +01:00
Espen Haugsdal 66e58f5afb
Use checkpoint_connector.hpc_save in SLURM (#4217) 2020-10-18 10:13:56 -04:00
Elia Cereda cf9fe4905e
Annotate return type of TrainerProperties.from_argparse_args(...) (#4192)
* Annotate return type of TrainerProperties.from_argparse_args(...)

* Added second empty line between class and typevar

* Renamed all uses of the typevar to _T
2020-10-17 20:00:50 +08:00
Akihiro Nitta b45b57cc58
Use `Optional` for arguments set to `None` by default (#4164)
* Use `Optional` for variables set to `None` by default

* Use `Optional` instead of `Union[None, ...]` for consistency
2020-10-15 23:02:50 +02:00
William Falcon 72f19768c8
remove duplicate metric vs step log for train loop (#4173)
* remove duplicate metric vs step log

* remove duplicate metric vs step log

* remove duplicate metric vs step log

* fix ddp index issue
2020-10-15 10:47:00 -04:00
William Falcon 45d05ff68d
Fixes #4141 (#4169)
* fix val epoch agg

* fix val agg metrics

* fix val agg metrics

* fix val agg metrics
2020-10-15 09:12:05 -04:00
Jirka Borovec f064682786
save initial arguments (#4163)
* save initial arguments

* typing

* chlog

* .
2020-10-15 08:30:49 -04:00
Rohit Gupta dec31b3e76
Call on_load_checkpoint before loading state_dict (#4057) 2020-10-14 23:26:04 +02:00
William Falcon 09c2020a93
notices (#4118) 2020-10-13 07:18:07 -04:00
William Falcon bf2067a609
enabled manual returns (#4089) 2020-10-12 10:06:17 -04:00
William Falcon 1dbc6ffbc1
added templates (#4077)
* docs

* docs
2020-10-11 09:35:51 -04:00
William Falcon 7ffe05a3d1
ref: accelerator names (#4066)
* ref: accelerator names

* docs
2020-10-11 01:05:14 -04:00
William Falcon 0281b077d8
ref: decouple apex second attemp part 10/n (#4064)
* ref: decouple apex second attemp part 9/n

* ref: decouple apex second attemp part 9/n

* ref: decouple apex second attemp part 9/n
2020-10-10 20:05:05 -04:00
William Falcon dbfe2b6129
ref: decouple apex second attemp part 9/n (#4063)
* ref: decouple apex second attemp part 9/n

* ref: decouple apex second attemp part 9/n
2020-10-10 18:44:24 -04:00
William Falcon 5ce9fc6bb3
ref: decouple apex second attemp part 7/n (#4061)
* ref: decouple apex second attemp part 7/n

* ref: decouple apex second attemp part 7/n

* ref: decouple apex second attemp part 7/n
2020-10-10 16:44:15 -04:00
William Falcon d1bbb449a3
ref: decouple apex second attemp part 5/n (#4058) 2020-10-10 14:35:25 -04:00
Rohit Gupta bdbf846029
Fix to print scaler value in progress bar (#4053)
* Fix to print scaler value in progress bar

* chlog

* Fix to print scaler value in progress bar

* Fix to print scaler value in progress bar
2020-10-10 12:20:11 -04:00
William Falcon ce2edf1192
ref: decouple apex second attemp part 4/n (#4056)
* ref: decouple apex second attemp part 4/n

* ref: decouple apex second attemp part 4/n

* Update lightning.py

* ref: decouple apex second attemp part 4/n
2020-10-10 12:19:22 -04:00
William Falcon 7285613974
ref: decouple apex second attemp part 2/n (#4054)
* ref: decouple apex second attemp part 2/n

* ref: decouple apex second attemp part 2/n
2020-10-10 10:24:20 -04:00
William Falcon 5b261a230e
enable passing in custom accelerators (#4050)
* enable custom accelerators

* ref: finish decoupling apex, LM and backward

* ref: finish decoupling apex, LM and backward

* ref: finish decoupling apex, LM and backward
2020-10-10 09:21:08 -04:00
William Falcon 2b255a3df4
ref: enable custom clusters (1/n) (#4048)
* enable cluster plugins

* enable cluster plugins + test backend choices

* enable cluster plugins + test backend choices

* enable cluster plugins + test backend choices

* enable cluster plugins + test backend choices

* enable cluster plugins + test backend choices

* enable cluster plugins + test backend choices
2020-10-10 08:09:29 -04:00