Sean Naren
0211f7f9b2
Disable pl optimizer temporarily to fix AMP issues ( #5163 )
...
* Disable pl optimizer temporarily to fix AMP issues
* Add todo and enable pl optimizer in the test
2021-01-05 09:58:37 +01:00
chaton
13bbf4b3f2
Un-balanced logging properly supported ( #5119 )
...
* resolve bug
* clean code
* resolve comments
* Update tests/trainer/optimization/test_multiple_optimizers.py
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
* resolve another bug
* add comments
* use abs to find diff
* update
* resolve flake8
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2021-01-05 09:58:37 +01:00
Loi Ly
1d13943605
Fix reset TensorRunningAccum ( #5106 )
...
* Fix reset TensorRunningAccum
* add test for TensorRunningAccum's reset method
* fix CI failed due to PEP8
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2021-01-05 09:58:36 +01:00
Jirka Borovec
c72880f109
hotfix: dataloaders - add unimplemented methods ( #5352 )
...
* add unimplemented methods
* test
* test
* flake8
2021-01-05 03:41:20 -05:00
Justus Schock
d88cf4a652
Add Support for multiple train loaders ( #1959 )
...
* add support for wrong dtype in apply_func
* apply loader resetting to possible collection of loaders
* add combined loader iter class
* integrate combined loader iter to training loop
* fix imports
* fix imports
* finish supporters
* add tests for supporters
* add test for model with multiple loaders
* fix trainer integration
* fix instance check
* Train loaders (#4032 )
* patch for issues discussed in #1959 , encapsulating underlying datastructures returned from train_dataloader
* update data_loading.py to it uses patch discussed in #1959
* rename class
* Separate CombinedLoaderIterator into two classes, and update related tests. (#4606 )
* Fix the bugs after rebasing.
* Add custom get_len for apply_to_collection
* Refactor MultiIterator to be as CombinedLoaderIterator
* To get the right num_training_batches. Call the wrapper for multi trainloader in data_loading.py, instead of training_loop.py
* Reload _loader_iters when calling __iter__
* Don't transform DataLoader to CombinedLoaderIterator when it's along
* Updates test_fit_multiple_train_loaders for testing num_training_batches
* Seperate CombinedLoaderIterator into CombinedLoaderIterator and CombinedDataLoader. Add CombinedDataset for unified DataLoader format.
* Initialize CombinedDataLoader before calculating num_training_batches. Also updating self._worker_check for multiple loaders
* Update tests for supporters
* Update tests for multiple trainloaders. Add tests about few_workers for multiple loaders.
* Fix pep8 issues
* Add tests for train_loader_patch.py
* Add descriptions to multiple_trainloader_mode
* Remove unused variables
* Add docstrings and typing
* Add more tests for better converage
* Remove unused commented codes
* Add sampler property
* Remove extract_dataset
* Update typing
* pep8
* Update train_loader_patch.py
* Apply suggestions from code review
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* Update pytorch_lightning/trainer/supporters.py
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* reviewer comments
* fix stupid import
* add docs
* add back line separator
* fix line sep
* pep8
* Apply suggestions from code review
* fix
* fix
* Apply suggestions from code review
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* Apply suggestions from code review
Co-authored-by: Nicki Skafte <skaftenicki@gmail.com>
* flake8
Co-authored-by: Justus Schock <justusschock@justuss-mbp.fritz.box>
Co-authored-by: Christofer Fransson <christofer_fransson@yahoo.com>
Co-authored-by: YI-LIN SUNG <r06942076@ntu.edu.tw>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Nicki Skafte <skaftenicki@gmail.com>
Co-authored-by: Jirka Borovec <jirka.borovec@seznam.cz>
2021-01-04 19:57:53 +00:00
Jirka Borovec
b72ed71d4e
Refactor: clean trainer device & distrib setters ( #5297 )
...
* naive replace
* simplify
* clean
* .
* fix
* .
* fix
* fix
2021-01-04 17:10:13 +00:00
Jirka Borovec
957583544a
mark todo exceptions ( #5320 )
...
* mark todo exceptions
* .
* .
* .
* .
* .
* .
* .
* .
* try
* .
2021-01-04 09:07:56 +01:00
Jirka Borovec
73e06fd7c8
fix trainer distributed attributes ( #5303 )
...
* fix trainer distributed attributes
* .
* fix
2020-12-31 11:10:44 +01:00
Jirka Borovec
7a615b5651
add tests for Trainer attributes ( #5261 )
...
* add tests for Trainer attributes
* drop empty
2020-12-29 18:56:13 +01:00
Jirka Borovec
a884866ff0
Unify names in Utils ( #5199 )
...
* warnings
* argparse
* mutils
* xla device
* deprecated
* tests
* simple
* flake8
* fix
* flake8
* 1.4
2020-12-22 00:23:33 +01:00
Jirka Borovec
0f36525e8f
fix/enable - check F401 ( #5201 )
...
* refactor - check F401
* missed
* fix
2020-12-21 10:15:04 +01:00
Jirka Borovec
35fd6e93c7
refactor - check E501 ( #5200 )
2020-12-21 14:23:09 +05:30
Jirka Borovec
2d54116baa
annotat unused vars ( #5017 )
...
* annotate all unused vars
* rank_zero_warn
* Apply suggestions from code review
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
* f1 fixed
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2020-12-19 13:53:06 +01:00
chaton
f3748ba808
[feat] Enable self.log in callbacks ( #5094 )
...
* enable to use self.log in callbacks
* update
* revert back to assert
2020-12-16 16:08:39 -05:00
Jirka Borovec
059eaecbb4
set xxx_AVAILABLE as protected ( #5082 )
...
* sett xxx_AVAILABLE as protected
* docs
2020-12-14 20:19:05 +05:30
Carlos Mocholí
0327f6b4c2
Do not warn when the name key is used in the lr_scheduler dict ( #5057 )
...
* Do not warn when the name key is used
* Missing line
* Consistency
* Update pytorch_lightning/callbacks/lr_monitor.py
* Update docs
* Update pytorch_lightning/core/lightning.py
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
* Update CHANGELOG
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2020-12-14 08:38:10 +01:00
tarepan
16feb5137b
Refactor load in checkpoint connector ( #4593 )
...
* Refactor load step commentaries
* Refactor hpc ckpt suffix acquisition
* Refactor restore/hpc_load match
* Refactor hpc load trial
* Refactor checkpoint dir check
* Refactor unneeded function nest
* Refactor nested If
* Refactor duplicated cache clear
* Refactor attempt flow with if/elif
* Fix pip8
* Refactor hook commentary
Co-authored-by: chaton <thomas@grid.ai>
* Fix pep8
* Refactor hpc load checkpoint path acquisition
* Fix pip8
* Fix doc
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* Refactor None Union type with Optional
Co-authored-by: chaton <thomas@grid.ai>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Roger Shieh <sh.rog@protonmail.ch>
2020-12-14 00:13:50 +08:00
chaton
1a970b2d8d
[hotfix] Extend Optimizer + update doc ( #5095 )
...
* resolve urgent bug
* update pr
* update doc
* update
* remove typo
* add defaults
* Update pytorch_lightning/__init__.py
* Update setup.py
* update doc
* Update docs/source/optimizers.rst
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* update
* resolve doc
* debug test
* update test
* Update docs/source/optimizers.rst
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* Update docs/source/optimizers.rst
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* Update docs/source/optimizers.rst
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* remove useless import
* Update docs/source/optimizers.rst
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2020-12-11 14:24:59 -05:00
Jirka Borovec
d5fa02e798
simplify accelerator steps ( #5015 )
...
* simplify accelerator steps
* Apply suggestions from code review
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2020-12-10 18:36:13 +05:30
Jirka Borovec
4ebce38478
update usage of deprecated automatic_optimization ( #5011 )
...
* drop deprecated usage automatic_optimization
* Apply suggestions from code review
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* Apply suggestions from code review
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2020-12-10 15:31:33 +05:30
Jirka Borovec
77fb425dd4
update usage of deprecated profiler ( #5010 )
...
* drop deprecated profiler
* lut
Co-authored-by: Roger Shieh <sh.rog@protonmail.ch>
2020-12-10 08:38:14 +01:00
Jirka Borovec
ce9179591d
ref: clean config [1/n] add intermediate setters ( #4990 )
...
* add intermediate setters
* show inputs
* fix options
* move
* fix
* less talk
* fix
* talk less
* str
* cases
* rename
Co-authored-by: chaton <thomas@grid.ai>
2020-12-09 14:13:57 -05:00
Jirka Borovec
53d7c9555c
drop usage of deprecated distributed_backend ( #5009 )
...
Co-authored-by: chaton <thomas@grid.ai>
Co-authored-by: Roger Shieh <sh.rog@protonmail.ch>
2020-12-09 09:18:23 +01:00
Sean Naren
ee9b3fe574
[feat] pp 1/n ( #5016 )
...
* Added changes for RPC plugin
* Add missing kwargs
* Fix code format
* Loading refactors by introducing is_distributed var, fix optimizer step flow
* Add rpc guard
* Added docstrings and typing
* resolve comments
* Add additional rpc hook, refactor name of exit process hook for clarity
* remove annotation
* Modify behaviour to allow optional return, add test for rpc plugin
* resolve tests
* rename is_ddp_based
* update
* update for windows
* update
* resolve test
* code smell
* Revert back to init_ddp_connection for backwards compat
* Swap to explicit name for property
* Add missing speed parity increase for CI variability, fix call counts for child process
Co-authored-by: tchaton <thomas@grid.ai>
2020-12-08 22:02:10 +00:00
Rohit Gupta
6d2aeff26a
fast_dev_run can be int ( #4629 )
...
* fast_dev_run can be int
* pep
* chlog
* add check and update docs
* logging with fdr
* update docs
* suggestions
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
* fdr flush logs
* update trainer.fast_dev_run
* codefactor and pre-commit isort
* tmp
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Roger Shieh <sh.rog@protonmail.ch>
Co-authored-by: edenlightning <66261195+edenlightning@users.noreply.github.com>
2020-12-09 01:37:53 +05:30
chaton
2393474350
[hotfix] ddp + manual_optimisation ( #4976 )
...
* Rely on ddp plugin for blocking sync behaviour, and skip if we're using manual optimization
* debug
* Revert "debug"
This reverts commit ccca6b6b
* Expose manual reduce for automatic optimization
* Add input arguments
* Enable parity test
* clean imports
* Expose hook after to ensure we reset
* Fix naming
* add
* fix test
* resolve on comments
* typo
* Update tests/trainer/optimization/test_manual_optimization.py
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* Update tests/trainer/optimization/test_manual_optimization.py
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* update on comments
* resolve comments
Co-authored-by: SeanNaren <sean@grid.ai>
Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2020-12-07 19:31:54 +00:00
chaton
02152c1729
Simplify optimization Logic ( #4984 )
...
* Rely on ddp plugin for blocking sync behaviour, and skip if we're using manual optimization
* debug
* Revert "debug"
This reverts commit ccca6b6b
* Expose manual reduce for automatic optimization
* Add input arguments
* Enable parity test
* clean imports
* Expose hook after to ensure we reset
* Fix naming
* add
* fix test
* uniformize optimizer logic
* resolve test
* resovle flake8
* resolve amp bug
* update tests
* remove bug
* remove optimizer_step in accelerators
* typo
* update lightning optimizer
* set doesn't work with ddp_spawn
* resolve flake8
* update threshold
* ignore pyright
* correct codeFactor
* remove useless if
* remove zer_grad function
* simplify step
* remove typo
* resolve bug
* Apply suggestions from code review
* update on comments
* resolve bugs
* remove tests
* Update pytorch_lightning/trainer/configuration_validator.py
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
* simplify testing
* add more tests
Co-authored-by: SeanNaren <sean@grid.ai>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2020-12-07 12:55:49 +00:00
chaton
2e838e6dd8
Enable`self.log` in most functions. ( #4969 )
...
* refactor
* solve pyright
* remove logging in batch_start functions
* update docs
* update doc
* resolve bug
* update
* correct script
* resolve on comments
2020-12-06 13:01:43 +00:00
Carlos Mocholí
72349706c1
Improve epoch_result_store code quality ( #4875 )
...
* Improve code quality
* black -l 120 -S
* Fix pyright error
Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
Co-authored-by: chaton <thomas@grid.ai>
2020-12-05 11:49:28 +00:00
Justus Schock
f23f5e5648
Fix DP Logging Aggregation ( #4138 )
...
* add option to step result to do aggregation on a specific device
* in dp: do aggregation on root gpu
* Update CHANGELOG.md
* pep8
* trailing whitespace
* move to root
move result
stupid result object
revert to master
undo import
add "to" method to result
generalize to
try a test
try a test
Revert "try a test"
This reverts commit 22e3c1001e6c5774ea18ad925830304c245bf145.
Revert "try a test"
This reverts commit 4d2d8fb2a52d552894809a0cbe51af126d78f070.
new test
max epochs
super epoch end
log in test
hanging test
undo test
initial test that fails on master
step end
pass
step end
step end
epoch end
print
step
check dev
clean up test
sanity check
wtf is go ing on
frustration
debugging test
test
test
test
test
test
test
test
test
unused import
* move chlog entry
* clean
* remove outdated changes
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Jirka Borovec <jirka.borovec@seznam.cz>
2020-12-04 19:10:07 +01:00
Rohit Gupta
342a2b6f25
Deprecate auto mode from ModelCheckpoint and EarlyStopping ( #4695 )
...
* remove auto mode from callbacks
* chlog
* remove auto mode from callbacks
* mode
* mode
* move back
* update docs
* update docstrings
* docstring warning
* fix syntax
* Apply suggestions from code review
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* isort
* default to 'auto'
* syntax
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2020-12-04 16:11:58 +01:00
NeuralLink
88792982b5
🔨 minor refactor in trainer. ( #4801 )
...
* 🔨 minor refactor in trainer.
* 🔨 Use finally instead of else
* 🔨 revert format
* 🔨 check should skip inside try
Co-authored-by: chaton <thomas@grid.ai>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2020-12-04 13:42:13 +01:00
Jirka Borovec
3976db597d
refactor imports of optional dependencies ( #4859 )
...
* refactor imports of optional dependencies
* fix
* fix
* fix
* fix
* fix
* flake8
* flake8
Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
Co-authored-by: chaton <thomas@grid.ai>
2020-12-04 10:26:10 +01:00
Jethro Kuan
c7e349e73d
docs: default_root_path -> default_root_dir ( #4942 )
...
* docs: default_root_path -> default_root_dir
* Apply suggestions from code review
* fix
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
* update notebook
Co-authored-by: Jethro Kuan <jethro.kuan@bytedance.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2020-12-02 19:17:34 -05:00
Lezwon Castelino
12cb9942a1
Tpu save ( #4309 )
...
* convert xla tensor to cpu before save
* move_to_cpu
* updated CHANGELOG.md
* added on_save to accelerators
* if accelerator is not None
* refactors
* change filename to run test
* run test_tpu_backend
* added xla_device_utils to tests
* added xla_device_utils to test
* removed tests
* Revert "added xla_device_utils to test"
This reverts commit 0c9316bb
* fixed pep
* increase timeout and print traceback
* lazy check tpu exists
* increased timeout
removed barrier for tpu during test
reduced epochs
* fixed torch_xla imports
* fix tests
* define xla utils
* fix test
* aval
* chlog
* docs
* aval
* Apply suggestions from code review
Co-authored-by: Jirka Borovec <jirka.borovec@seznam.cz>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: chaton <thomas@grid.ai>
Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
2020-12-02 13:05:11 +00:00
Sean Naren
e952dee292
Allow string plugins ( #4888 )
...
* Allow plugin to be chosen via string
* Fix implementation, add tests
* Fix codefactor issues
* Added missing env patch
* Skip test for windows
* Reword reason
* Add skip to invalid test
* Create required_plugins function, move sharded amp requirement to plugin
* Pass AMPType, fix setter for apex
* Better doc strings
* Add exception when using apex
* Add trainer available_plugins function, warn user when plugins have been added automatically with option to override behaviour
* Fixed pep8 indent
* Fix codefactor issues
* Add env variables
* Update pytorch_lightning/cluster_environments/cluster_environment.py
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* Addressed code review
* Update pytorch_lightning/plugins/plugin_connector.py
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* Update pytorch_lightning/plugins/plugin_connector.py
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* Update pytorch_lightning/plugins/plugin_connector.py
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* Addressed more code review feedback
* Fixed docstrings
* Swapped to verbose runtime error
* Apply suggestions from code review
* Apply suggestions from code review
* Update pytorch_lightning/plugins/sharded_plugin.py
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* Change name
* Pass trainer to plugins that may require it
* Fix sharded plugin
* Added test to ensure string sharded works
* Removed trainer typing as this breaks pep8
* Fixed doc issues
* Fixed tests
Co-authored-by: chaton <thomas@grid.ai>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2020-12-01 20:30:49 +00:00
Justus Schock
ebbf256bf5
Create memory dynamically ( #4938 )
...
* create window size dynamically.
* pep8
Co-authored-by: chaton <thomas@grid.ai>
2020-12-02 01:05:12 +05:30
chaton
1d3724a878
[HotFix] Logging - One epoch delay on training epoch metrics. ( #4913 )
...
* add test
* resolve logging bug
* update
* resolve pep8
* resolve tests
Co-authored-by: Roger Shieh <sh.rog@protonmail.ch>
2020-12-01 09:26:52 +00:00
chaton
c2e6e68c7e
optimizer clean up ( #4658 )
...
* add LightningOptimizer
* typo
* add mock closure
* typo
* remove logic in optimizer_step
* update
* update
* update
* desactivate LightningOptimizer for hovorod
* resolve flake
* typo
* check optimizer name
* change name
* added backward to LightningOptimizer
* remove use_lightning_optimizer
* move update
* simplify init
* resolve comments
* resolve bug
* update
* update
* resolve bugs
* resolve flake8
* set state
* work manual_optimizer_step
* add doc
* add enable_pl_optimizer
* make optimizer_step
* add make_optimizer_step
* add examples
* resolve test
* add test_optimizer_return_options_enable_pl_optimizer
* add enable_pl_optimizer=True
* update
* update tests
* resolve bugs
* update
* set Trainer to False
* update
* resolve bugs
* update
* remove from doc
* resolve bug
* typo
* update
* set to True
* simplification
* typo
* resolve horovod
* unwrap horovod
* remove Optimizer
* resolve horovod
* move logic to amp_backend
* doesn't seem to be pickable
* update
* add again
* resolve some bugs
* cleanup
* resolve bug with AMP
* change __repr__
* round at -12
* udpate
* update
* update
* remove from horovod
* typo
* add convert_to_lightning_optimizers in each accelerators
* typo
* forgot
* forgot a convert_to_lightning_optimizers
* update
* update
* update
* increase coverage
* update
* resolve flake8
* update
* remove useless code
* resolve comments + add support for LightningOptimizer base class
* resolve flake
* check optimizer get wrapped back
* resolve DDPSharded
* reduce code
* lightningoptimizer
* Update pytorch_lightning/core/optimizer.py
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
* Update pytorch_lightning/core/lightning.py
* remove reference to step function
* Apply suggestions from code review
* update on comments
* resolve
* Update CHANGELOG.md
* add back training_step in apex and native_amp
* rename optimizer_step
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: William Falcon <waf2107@columbia.edu>
Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
2020-12-01 00:09:46 +00:00
William Falcon
f677efe61e
Merge pull request #4880 from PyTorchLightning/better_simple_profiler
...
Logging
2020-11-27 15:33:58 -05:00
Sean Naren
06a856e055
Merge branch 'master' into feature/plug
2020-11-27 18:48:58 +00:00
tchaton
ba41733802
Merge branch 'better_simple_profiler' of https://github.com/PyTorchLightning/pytorch-lightning into better_simple_profiler
2020-11-27 18:47:05 +00:00
tchaton
316ebadbdc
remove capture on on_train_batch_end
2020-11-27 18:46:49 +00:00
chaton
6ba77c2611
Merge branch 'master' into better_simple_profiler
2020-11-27 18:43:01 +00:00
tchaton
cef83dbbf8
optimize logging
2020-11-27 18:21:23 +00:00
Jirka Borovec
042152cd61
ref: fix & simplify test callback ( #4009 )
...
* simplify test callback
* update
* use mock
* flake8
2020-11-27 19:12:56 +01:00
tchaton
e17300f97d
add more profiler
2020-11-27 18:00:48 +00:00
tchaton
3a8fa6bf11
update
2020-11-27 17:48:51 +00:00
tchaton
290d74b40e
resolve test
2020-11-27 16:47:13 +00:00
SeanNaren
1704773712
Address code review
2020-11-27 14:50:12 +00:00
Sean Naren
4f693762ea
Update pytorch_lightning/trainer/connectors/precision_connector.py
...
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2020-11-27 14:45:15 +00:00
SeanNaren
cdd2e122fc
Add none check for func
2020-11-27 14:30:57 +00:00
SeanNaren
5598dce1a9
Remove unneeded check
2020-11-27 14:22:17 +00:00
Sean Naren
00bd0d2e72
Merge branch 'master' into feature/plug
2020-11-27 13:18:50 +00:00
chaton
dee968f20b
[bug] Replace_sampler attach previous multiprocessing_context ( #4742 )
...
* resolve bug
* add test docstring
* Update tests/trainer/test_dataloaders.py
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* update test
Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2020-11-27 12:57:25 +00:00
SeanNaren
04bb0abe36
Merge branch 'master' into feature/plug
...
# Conflicts:
# pytorch_lightning/utilities/__init__.py
# requirements/extra.txt
2020-11-27 10:00:05 +00:00
Jirka Borovec
217650320e
simplify imports Omegaconf ( #4873 )
...
* hydra
* omegaconf
2020-11-27 01:00:56 +01:00
Jirka Borovec
442d57f1e9
simplify imports xla / TPU ( #4872 )
...
* xla
* tpu
* fix
* fix
* flake8
2020-11-27 00:37:48 +01:00
SeanNaren
737447fc6e
Merge branch 'master' into feature/plug
...
# Conflicts:
# pytorch_lightning/trainer/connectors/precision_connector.py
# pytorch_lightning/utilities/__init__.py
2020-11-26 23:02:36 +00:00
Jirka Borovec
11e73ceaa6
fix import and typo in AMP ( #4871 )
...
* fix import and typo
* docs
* apex
* fix
* typo
2020-11-26 23:45:52 +01:00
SeanNaren
fc9b2bf015
Fix logic and add test for apex check, rename file, add DDP launcher tests
2020-11-26 22:45:21 +00:00
SeanNaren
8dc857c38d
Ensure we add the condition to the case statement
2020-11-26 22:11:05 +00:00
SeanNaren
a9c316b669
Add additional check to ensure apex is not used with sharded
2020-11-26 19:00:55 +00:00
SeanNaren
47c121ef1a
Addressed code review points
2020-11-26 16:44:45 +00:00
Sean Naren
22b4d5ee1a
Merge branch 'master' into feature/plug
2020-11-25 20:16:37 +00:00
chaton
204a0a2d03
[bugfix] Accumulated_gradient and TensoBoard ( #4738 )
...
* resolve bug
* update
* update
* modify one test
* remove paramters
* update on comments
* update changelog
* update docstring
Co-authored-by: Nicki Skafte <skaftenicki@gmail.com>
2020-11-25 19:44:05 +00:00
SeanNaren
b39f290c4d
Merge branch 'master' into feature/plug
2020-11-25 12:55:42 +00:00
SeanNaren
6b129216d0
Add catches around fairscale installation
2020-11-24 19:23:55 +00:00
Samyak S Sarnayak
ccf38ced2e
Use high progress_bar_refresh_rate on Google Colab ( #4654 )
...
* Use high refresh rate on Google Colab (#3786 )
Automatically override progress_bar_refresh_rate when on Google
Colab. Also added a constant IS_COLAB in utilities to check
whether it is being run in colab or not.
(#3786 )
* Show a warning instead of overriding when rate is low on colab
* Change warning to suggestion and move it
Moved warning to configure_progress_bar instead of on_trainer_init
* Apply suggestions from code review
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
* add a mock test
Co-authored-by: chaton <thomas@grid.ai>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2020-11-24 02:13:33 +05:30
SeanNaren
d953f2be5b
Merge branch 'master' into feature/fairscale-817-6n
...
# Conflicts:
# pytorch_lightning/accelerators/accelerator.py
# pytorch_lightning/accelerators/ddp2_accelerator.py
# pytorch_lightning/accelerators/ddp_accelerator.py
# pytorch_lightning/accelerators/ddp_cpu_spawn_accelerator.py
# pytorch_lightning/accelerators/ddp_hpc_accelerator.py
# pytorch_lightning/accelerators/ddp_spawn_accelerator.py
# pytorch_lightning/accelerators/dp_accelerator.py
# pytorch_lightning/plugins/ddp_plugin.py
# pytorch_lightning/trainer/connectors/model_connector.py
2020-11-23 20:19:46 +00:00
Sean Naren
404af43cde
5/n: Extract reference model call to plugins/accelerators ( #4773 )
...
* Encapsulate extracting reference model within the plugin to allow custom wrapper logic to live within the plugin/accelerators
* Add missing new lines
* Fix call to accelerator
* Removed double blank
* Use accelerator backend
* Handle case where wrapper has not been initialized within the plugin
* Added basic get model tests, add better typing
* Change model name
* Split GPU/DDP test
* Add stronger typing, skip ddp test on windows
* Fix import
* Fix import in dp
* Fixed PEP8 definition
* Add ddp launcher for ddp testing
* Modify accelerator reference model to property, change name to reflect func
* Revert property as this is incorrect.=
* Revert across accelerators
* Modified name to get_model_from_plugin
* Code review changes, fix issue with dp
* Add verb to function getter
Co-authored-by: chaton <thomas@grid.ai>
2020-11-23 17:21:47 +00:00
SeanNaren
c590e3a166
Ensure we check if we should use sharded amp plugin
2020-11-22 15:18:50 +00:00
SeanNaren
b506a7e46a
Revert across accelerators
2020-11-22 15:00:23 +00:00
SeanNaren
977625c289
Revert property as this is incorrect.=
2020-11-22 14:54:00 +00:00
Sean Naren
4b16b47843
Merge branch 'master' into feature/817-fairscale-5n
2020-11-22 11:39:15 +00:00
SeanNaren
358f503848
Modify accelerator reference model to property, change name to reflect func
2020-11-22 11:39:00 +00:00
Teddy Koker
299de5dc62
don't override PYTHONWARNINGS ( #4700 )
...
Co-authored-by: Nicki Skafte <skaftenicki@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2020-11-22 11:25:24 +01:00
edenlightning
a716ea60e1
Clarify checkpoint deprecation message ( #4640 )
...
* Clarify checkpoint deprecation message
* Update pytorch_lightning/trainer/connectors/callback_connector.py
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* Apply suggestions from code review
* Apply suggestions from code review
* Apply suggestions from code review
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Jeff Yang <ydcjeff@outlook.com>
Co-authored-by: Roger Shieh <sh.rog@protonmail.ch>
Co-authored-by: chaton <thomas@grid.ai>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2020-11-22 07:35:54 +01:00
YI-LIN SUNG
69b9949192
[docs] Remove the redundant indents in trainer.py ( #4720 )
...
* Remove the redundant indents in trainer.py
* Update pytorch_lightning/trainer/trainer.py
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: chaton <thomas@grid.ai>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Roger Shieh <sh.rog@protonmail.ch>
2020-11-21 08:15:09 +06:30
Sean Naren
e3869c3950
Merge branch 'master' into feature/817-fairscale-5n
2020-11-20 17:13:17 +00:00
Roger Shieh
42e59c6add
Cast hparams to dict when not using omegaconf ( #4770 )
...
* init fix
* init test
* more specific dict assert
* update changelog
* Update tests/checkpointing/test_model_checkpoint.py
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: chaton <thomas@grid.ai>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2020-11-20 19:53:05 +08:00
SeanNaren
95a1f19851
Use accelerator backend
2020-11-19 10:59:17 +00:00
SeanNaren
078a829834
Fix call to accelerator
2020-11-19 10:48:27 +00:00
SeanNaren
be4c24c484
Encapsulate extracting reference model within the plugin to allow custom wrapper logic to live within the plugin/accelerators
2020-11-19 10:43:16 +00:00
Sean Naren
f0ab74dc2f
Expose scaler in amp plugin ( #4737 )
2020-11-18 22:30:47 +00:00
Sean Naren
e7134a9135
Sharded Plugin 2/n: Allow ddp plugin to modify optimizer state saving ( #4675 )
...
* Allow ddp plugin to modify optimizer state saving
* Rely on the accelerator for optimizer states
* Ensure we init the accelerator for the saving function
* Better comment for optim state dump
* Revert "Ensure we init the accelerator for the saving function"
This reverts commit af65effa
* Added accelerator check to initialize tuner before saving model checkpoint
* Simplify comment
* Revert "Added accelerator check to initialize tuner before saving model checkpoint"
This reverts commit f9929c0c
* Return single optimizer state to reduce duplication
* Fixed docstring
* Fixed typing
* Fixed comment
* Added CHANGELOG.md
Co-authored-by: chaton <thomas@grid.ai>
2020-11-18 16:38:35 +00:00
chaton
96769a7184
quick fix ( #4697 )
2020-11-16 16:20:35 +00:00
chaton
867eef0e4c
[HOTFIX] Logging for evaluation ( #4684 )
...
* resolve bugs
* add should_flush_logs
* remove should_flush
* should work
* update test
* use something else
* Update pytorch_lightning/trainer/connectors/logger_connector/logger_connector.py
* log mock_log_metrics.mock_calls
* typo
* don't use keys
* convert to list
* typo
* check kwargs
* resolve bug
* resolve flake8
Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
2020-11-15 10:41:33 -05:00
Justus Schock
e04e7c9ecc
Makes automatic optimization a model attribute ( #4602 )
...
* Makes automatic optimization a model attribute
* Update trainer.py
* remove setting property in model
* Update pytorch_lightning/core/lightning.py
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
* Update pytorch_lightning/trainer/trainer.py
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
* Update trainer.py
Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
Co-authored-by: Roger Shieh <sh.rog@protonmail.ch>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: Jeff Yang <ydcjeff@outlook.com>
2020-11-14 11:13:42 +06:30
ananthsub
d096a2ea6d
Fix setup callback hook to pass LightningModule through ( #4608 )
...
* Fix setup callback hook
* Update CHANGELOG.md
* Update test_trainer.py
* Update test_trainer.py
* Update test_trainer.py
* fix chlog
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2020-11-13 19:34:46 -05:00
Jeff Yang
baa8558cc0
logger docs and api docs ( #3950 )
...
* logger and api docs
* remove gpu_usage_logger, lr_logger
* update docstring
* fix wandb example
* remove step result
* charts
* add some charts info
Co-authored-by: Teddy Koker <teddy.koker@gmail.com>
Co-authored-by: Nicki Skafte <skaftenicki@gmail.com>
2020-11-13 20:35:54 +05:30
chaton
4018237c30
[FEAT] Add lambda closure to manual_optimizer_step ( #4618 )
...
* added lambda_closure
* move to types
* add 2 new tests
* make example more complex
* add complex example to doc
* added more tests
* resolve doc
* typo
* update
* update tpu optimizer_step
* Apply suggestions from code review
* Update pytorch_lightning/core/lightning.py
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* update
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2020-11-12 19:22:06 +00:00
chaton
3d202f9ecc
[FEAT] Refactor logging 3/3 [v1] ( #4552 )
...
* wip
* wip check how many tests break
* wip
* resolve some bugs
* resolve more bugs
* resolve 2 bugs
* resolve
* temp fix
* update
* remove useless code
* remove result
* try to resolve bug
* update changelog
* formatting
* remove pl
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
2020-11-11 17:05:24 +00:00
chaton
514cb22bd7
[Fix] Move log value to cpu. ( #4592 )
...
* move value to cpu to save memory
* update
* move to cpu
* try something
* update
* update
* add back out_dict.update({k: v})
* add move_metrics_to_cpu
* update
* Update pytorch_lightning/utilities/memory.py
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
* resolve comments
* Update pytorch_lightning/core/step_result.py
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* Update pytorch_lightning/trainer/connectors/logger_connector/logger_connector.py
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
2020-11-10 21:13:41 +00:00
chaton
7e08b0d710
[bug-fix] DDP and automatic_optimization=False ( #4485 )
...
* resolve bug
* add self._running_manual_optim
* update
* update tests
* update lightning module
* resolve bug
* update tests
* update
* resolve pep8
* update
* replace by `ddp_spawn`
* temporary fix
* update
* update
* move update to training_loop
* make both ddp_spawn
* introduce `manual_optimizer_step`
* update changelog
* added changelog wrong place
* add force_optimizer_step
* update docstring for tests
* update optimizer_step
* update zero_grad
* resolve flake8
* move update into manual_optimizer_step
* add zero_grad
* remove zero_grad tests
* remove manual_backward in AMP, it doesn't help
* update
* loosen tests
* update
* update doc
* add TODO
* Removed unnecessary get model from native amp
* Remove try except with pytest raise
* Add seed, clean up imports, remove try catch to reproduce error
* update code
* update test
* revert back
* formatting
* Update pytorch_lightning/core/lightning.py
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: SeanNaren <sean@grid.ai>
Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2020-11-10 19:44:51 +00:00
tarepan
41c9bee4f0
Fix load disparity between normal and hpc ( #4526 )
...
* Add missing load functionality in hpc
* Add general file load for hpc
* Add mark in CHANGELOG
* Fix Typo Li**hg**tning
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
* Refactor line separation
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
* Fix entangled fixation commit
* Fix naming of restore_model_states
* Fix amp restore place
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: chaton <thomas@grid.ai>
2020-11-09 17:26:38 +00:00
William Falcon
09a51697ed
Adds shortcut for path to log ( #4573 )
...
* added log_dir shortcut to trainer properties for writing logs
* added log_dir shortcut
* added log_dir shortcut
* added log_dir shortcut
* added log_dir shortcut
* added log_dir shortcut
* added log_dir shortcut
* added log_dir shortcut
* added log_dir shortcut
2020-11-08 12:16:22 -05:00
William Falcon
bb356a73cb
added trainer api docs ( #4569 )
2020-11-07 14:18:45 -05:00
chaton
9c8701f2e2
[feat] Logging refactor 2/n - train ( #4495 )
...
* update logging
* solve more bugs
* replace Mapping by Dict
* update on comments
* resolve pep8
* Apply suggestions from code review
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
* Update pytorch_lightning/trainer/connectors/logger_connector/epoch_result_store.py
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* update on comments
* typo
* update for coverage
* update test
* update
* Update tests/models/test_hooks.py
Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
* Update tests/models/test_hooks.py
Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
* update on comments
* remove deepcopy
* remove useless look for
* another small optim
* extra optim
* remove lastest optim, can be source of bug
* resolve bug
* add docstring
* optimize coverage
* Update pytorch_lightning/trainer/connectors/logger_connector/epoch_result_store.py
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* Update pytorch_lightning/trainer/connectors/logger_connector/epoch_result_store.py
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* Update tests/trainer/logging_tests/test_distributed_logging.py
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* Update pytorch_lightning/trainer/evaluation_loop.py
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* Update tests/trainer/logging/test_logger_connector.py
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* Update tests/trainer/logging_tests/test_train_loop_logging_1_0.py
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* update on comments
* update
* update on comments
* update parity speed
* get it down to 0.65
* update
* 0.8 max_dif
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
Co-authored-by: William Falcon <waf2107@columbia.edu>
2020-11-05 22:27:04 +00:00
chaton
11dc5264cd
Bugfix/4449 dict attribute error ( #4480 )
...
* resolve a bug
* resolve a bug
* remove todo
* resolve more bugs
* update tests
* Update pytorch_lightning/trainer/connectors/logger_connector/logger_connector.py
Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
* Update pytorch_lightning/trainer/connectors/logger_connector/logger_connector.py
Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
* resolve pyright
Co-authored-by: Teddy Koker <teddy.koker@gmail.com>
Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
2020-11-04 19:35:07 +00:00
tarepan
7b375ed1d3
Add CheckpointConnector internal commentaries ( #4421 )
...
* Add CheckpointConnector commentaries
* Fix comment format
* Fix save/load schema as function comments
Co-authored-by: chaton <thomas@grid.ai>
2020-11-03 22:09:29 +05:30
Adrian Wälchli
9b7f01654a
Update old "module_arguments" and "hparams" references in docs ( #4417 )
...
* replace module_arguments refernces
* update hparams docs
* add missing save_hyperparameters in example
* deprecate instead of remove
* Update docs/source/hyperparameters.rst
Co-authored-by: chaton <thomas@grid.ai>
* Update docs/source/hyperparameters.rst
Co-authored-by: Teddy Koker <teddy.koker@gmail.com>
Co-authored-by: chaton <thomas@grid.ai>
Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
2020-11-03 12:13:10 +01:00
Rohit Gupta
360b3d8844
Disable training when limit_train_batches=0 ( #4371 )
...
* Disable training when limit_train_batches=0
* chlog
* pep
* limit_train_batches
* BoringModel
Co-authored-by: Roger Shieh <sh.rog@protonmail.ch>
2020-11-03 12:10:35 +05:30
Rohit Gupta
ad2556b669
Disable saving checkpoints if not trained ( #4372 )
...
* Disable saving checkpoints if not trained
* chlog
* update test
* fix
Co-authored-by: chaton <thomas@grid.ai>
2020-11-03 11:38:32 +05:30
chaton
958aa1aee7
[test] Accumulated gradient optimization tests ( #4477 )
...
* adding tests
* wip
* update
* Update tests/trainer/test_trainer.py
Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
2020-11-02 23:44:11 +00:00
chaton
ac3f7393fd
[FEAT] logging refactors 1/n ( #4439 )
...
* introducing new logging object
* typo
* typo
* Update pytorch_lightning/trainer/logging.py
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
* Update pytorch_lightning/trainer/logging.py
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
* update on comments
* update on comments
* add more doctstring
* Update pytorch_lightning/core/lightning.py
Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
* resolve on comments
* solve pyright
* Update pytorch_lightning/trainer/connectors/logger_connector/epoch_result_store.py
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
* update on comments
* Update pytorch_lightning/trainer/connectors/logger_connector/epoch_result_store.py
Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
* update on comments
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
2020-11-02 20:51:43 +00:00
chaton
102fa9ee7d
[BUGFIX] AMP + Precision unscale grad ( #4441 )
...
* move unscale within Native plugin
* remove gradient tracking from lightning backward
* forgot trainer.fit
* typo
* update
* cleanup
* set to 1.6
* typo
* skip if below 1.6 strict
* update changelog
* remove useless code
* Update tests/plugins/test_amp_plugin.py
Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
* Update tests/plugins/test_amp_plugin.py
Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
* update changelog
* Update CHANGELOG.md
Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
Co-authored-by: Jeff Yang <ydcjeff@outlook.com>
2020-11-02 16:36:48 +00:00
Jirka Borovec
ef03c39ab7
Add step index in checkpoint name ( #3807 )
...
* true final value of global step
* ch check
* tests
* save each validation interval
* wip
* add test
* add test
* wip
* fix tests, revert old edits, fix merge conflicts, update doctests
* test + bugfix
* sort files
* format test
* suggestion by ananth
* added changelog
* naming
* docs
* example
* suggestion
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
* fix test
* pep
* pep
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2020-11-02 15:05:58 +01:00
Adrian Wälchli
6ae4c6ec85
update docs on checkpoint_callback Trainer argument ( #4461 )
...
* docs update
* update callbacks docs
* docs
* notebook examples
* warning
* line lenght
* update deprecation
Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
Co-authored-by: Roger Shieh <55400948+s-rog@users.noreply.github.com>
2020-11-02 06:18:20 +01:00
Sean Naren
6211fd4b0c
Fix type checker issue with explicit cast of ref_model object ( #4457 )
...
Co-authored-by: Jeff Yang <ydcjeff@outlook.com>
2020-10-31 16:43:19 -04:00
Adrian Wälchli
d1234c592d
deprecate passing ModelCheckpoint instance to Trainer(checkpoint_callback=...) ( #4336 )
...
* first attempt
* update tests
* support multiple
* test bugfix
* changelog
* pep
* pep
* import order
* import
* improve test for resuming
* test
* update test
* add references test
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
* docstring suggestion deprecation
Co-authored-by: Jeff Yang <ydcjeff@outlook.com>
* paramref
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Jeff Yang <ydcjeff@outlook.com>
2020-10-30 04:47:37 +01:00
Jeff Yang
ebe3a31ddd
[docs] distributed_backend -> accelerator ( #4429 )
...
* distributed_backend -> accelerator
* distributed_backend -> accelerator
* use_amp -> precision
* format
Co-authored-by: rohitgr7 <rohitgr1998@gmail.com>
2020-10-30 00:45:24 +06:30
Justus Schock
bbd81dfd55
Skips DDP parameter sync ( #4301 )
...
* ddp no-sync
* Update pytorch_lightning/trainer/training_loop.py
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
* Update training_loop.py
* factor __enter__ and __exit__ out to separate context manager
* delete _updated_model_last_step
Co-authored-by: justusschock <justusschock@pc125.lfb.rwth-aachen.de>
Co-authored-by: Teddy Koker <teddy.koker@gmail.com>
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
Co-authored-by: chaton <thomas@grid.ai>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2020-10-29 23:01:37 +05:30
Martin Hwang
b459fd26ac
fix: `nb` is set total number of devices, when nb is -1. ( #4209 )
...
* fix: `nb` is set total number of devices, when nb is -1.
Refs: #4207
* feat: add test code
1. test combination `auto_select_gpus`, `gpus` options using
Trainer
2. test `pick_multiple_gpus` function directly
Refs: #4207
* docs: modify contents in `Select GPU devices`
Refs: #4207
* refactore: reflect the reuslt of review
Refs: #4207
* refactore: reflect the reuslt of review
Refs: #4207
* Update CHANGELOG.md
Co-authored-by: chaton <thomas@grid.ai>
Co-authored-by: Roger Shieh <55400948+s-rog@users.noreply.github.com>
Co-authored-by: Nicki Skafte <skaftenicki@gmail.com>
2020-10-29 10:50:37 +01:00
Rohit Gupta
b26c71eadf
Add optimizer hooks in callbacks ( #4379 )
...
* Add optimizer hooks in callbacks
* optimizer param
* update test
Co-authored-by: Nicki Skafte <skaftenicki@gmail.com>
2020-10-28 13:15:22 +01:00
Dusan Drevicky
c50c225f05
feature: Allow str arguments in Trainer.profiler ( #3656 )
...
* allow trainer's profiler param to have a str value
* add tests
* update docs
* update exception message
* Update CHANGELOG
* fix pep8 issues
* cleanup test code
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
* Add deprecation warning if using bool for profiler
* Add deprecation tests and move deprecated tests
* Remove bool option to profiler from docs
* Deprecate bool args to profiler in CHANGELOG
* fixup! Add deprecation warning if using bool for profiler
* fixup! Add deprecation tests and move deprecated tests
* Apply suggestions from code review
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
* Implement suggestions, remove whitespace
* fixup! Implement suggestions, remove whitespace
* Allow bool, str (case insensitive), BaseProfiler
* Add info about bool deprecation to trainer
* fixup! Add info about bool deprecation to trainer
* Move deprecate todo to test_deprecated
* Test wrong profiler type, improve error message
* fixup! Test wrong profiler type, improve error message
* Update pytorch_lightning/trainer/connectors/profiler_connector.py
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
* Apply suggestions from code review
* Readd bool to profiler types, test cli profiler arg
* Remove extra whitespace in doc
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* Apply suggestions from code review
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* Update deprecation versions
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2020-10-27 16:27:16 +05:30
Adrian Wälchli
48b6de0c40
update ( #4343 )
...
Co-authored-by: chaton <thomas@grid.ai>
2020-10-27 06:07:29 -04:00
William Falcon
98205fb438
Enable custom apex and amp plugins ( #4355 )
...
* enable custom apex, amp plugin
* enable custom apex, amp plugin
* enable custom apex, amp plugin
* enable custom apex, amp plugin
2020-10-25 17:11:07 -04:00
ananthsub
f6efb712ed
Skip replacing dataloader sampler if it's already a distributed sampler ( #4273 )
...
* Update data_loading.py
* Update data_loading.py
* add test + update flag description
* add to changelog
* Update test_dataloaders.py
* fix-pickle
* Update test_dataloaders.py
* Added missing reference calls
* Update tests/trainer/test_dataloaders.py
* Apply suggestions from code review
* Update data_loading.py
* Update test_dataloaders.py
Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2020-10-23 17:34:07 +01:00
chaton
3abfec8962
[HOTFIX] ModelCheckpoint - Don't increase current_epoch and global_step if not trained ( #4291 )
...
* add two tests w/wo tempdir
* resolve flake8
* this test is failing
* update bug report
* resolve bug and add test
* remove bug_report
* resolve flake8
* resolve bug
* resolve pep8
* resolve pep8
Co-authored-by: Teddy Koker <teddy.koker@gmail.com>
2020-10-23 11:17:50 +01:00
Rohit Gupta
4c7ebdc32b
Add dirpath and filename parameter in ModelCheckpoint ( #4213 )
...
* Add dirpath and filename parameter in ModelCheckpoint
* remove old function
* chlog
* codefactor
* update tests
* docs
* fix doctest and added tests
* pathlib dirpath
* dep version and docs
* try fix doctest
* pep
* suggestions
Co-authored-by: carmocca <carlossmocholi@gmail.com>
* suggestions
* fix test
* pep
* trigger tests
* Apply suggestions from code review
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
* suggestions
* try fix windows test
* add and update some tests
* trigger tests
* Apply suggestions from code review
* Apply suggestions from code review
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: William Falcon <waf2107@columbia.edu>
2020-10-23 09:59:12 +05:30
Sean Naren
9823f97a84
Protect functions not to be accessed by user ( #4305 )
2020-10-22 15:15:04 +01:00
Sean Naren
065cc94112
Fix bug comparing max_steps to global step which inits at 0 ( #4278 )
...
* Fix bug comparing max_steps to global step which inits at 0
* Added test to ensure accumulate grad batch works with max steps
* check fix with TODO test
* correct call counts
* Add check to ensure we've finished accumulation of this global step before exiting loop in conjuction with max steps
* Remove + 1 check in test as this was incorrect
* Update incorrect expected outputs in lr finder test
* Added brackets for clarity
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2020-10-22 13:58:59 +01:00
Mauricio Villegas
546476c704
Allow changing the logged step value in validation_step ( #4130 )
...
* Fix to bug identified in https://github.com/PyTorchLightning/pytorch-lightning/issues/4102
* update tests
* chlog
Co-authored-by: rohitgr7 <rohitgr1998@gmail.com>
2020-10-22 03:03:07 +05:30
Carlos Mocholí
2549ca40e6
Clean up optimizer code ( #3587 )
...
* Update optimizer code
* Update CHANGELOG
* Fix tuple of one list case
* Update docs
* Fix pep issue
* Minor typo [skip-ci]
* Use minimal match
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* Apply suggestions from code review
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2020-10-21 21:12:48 +02:00
Justus Schock
0ec4107697
Optimizer closure ( #4190 )
...
* closure for all optimizers
* rename hook and take care of alternating backwards
* add comment
* training_loop_fix
* closure whenever possible
* training_loop
* simple tests that count backward calls
* fix test to work with closure
* remove debugging statement
* better place
* check grads after backward
* start fixing manual optimization
* skip step when result returned by closure was None
* fix gradient clipping test to work with closure
* attribute dict result only for automatic optimization
* adjust backward calls in accelerator
* adjust where to call gradient clipping
* adjust backward calls in tests
* Apply suggestions from code review
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* pass kwargs to xla optimizer
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2020-10-21 19:34:29 +01:00
William Falcon
8a20d6af51
make save fx part of model checkpoint cb ( #4284 )
2020-10-21 10:06:42 -04:00
Carlos Mocholí
e0f9799dbf
Add strict option to lr_scheduler dict ( #3586 )
...
* Add strict option to lr_scheduler dict
* Update docs
* Unnecessary "else" after "raise"
* Update CHANGELOG
* Fix rebase
2020-10-21 14:14:37 +02:00
Sean Naren
c336881959
Added fix to ensure that custom logged metrics within test_epoch_end are appended to the result object even without step reduced metrics ( #4251 )
2020-10-20 18:33:18 +02:00
Jirka Borovec
f37444fa3e
CI: add flake8 ( #4239 )
2020-10-19 21:20:17 +01:00
Espen Haugsdal
66e58f5afb
Use checkpoint_connector.hpc_save in SLURM ( #4217 )
2020-10-18 10:13:56 -04:00
Elia Cereda
cf9fe4905e
Annotate return type of TrainerProperties.from_argparse_args(...) ( #4192 )
...
* Annotate return type of TrainerProperties.from_argparse_args(...)
* Added second empty line between class and typevar
* Renamed all uses of the typevar to _T
2020-10-17 20:00:50 +08:00
Akihiro Nitta
b45b57cc58
Use `Optional` for arguments set to `None` by default ( #4164 )
...
* Use `Optional` for variables set to `None` by default
* Use `Optional` instead of `Union[None, ...]` for consistency
2020-10-15 23:02:50 +02:00
William Falcon
72f19768c8
remove duplicate metric vs step log for train loop ( #4173 )
...
* remove duplicate metric vs step log
* remove duplicate metric vs step log
* remove duplicate metric vs step log
* fix ddp index issue
2020-10-15 10:47:00 -04:00
William Falcon
45d05ff68d
Fixes #4141 ( #4169 )
...
* fix val epoch agg
* fix val agg metrics
* fix val agg metrics
* fix val agg metrics
2020-10-15 09:12:05 -04:00
Jirka Borovec
f064682786
save initial arguments ( #4163 )
...
* save initial arguments
* typing
* chlog
* .
2020-10-15 08:30:49 -04:00
Rohit Gupta
dec31b3e76
Call on_load_checkpoint before loading state_dict ( #4057 )
2020-10-14 23:26:04 +02:00
William Falcon
09c2020a93
notices ( #4118 )
2020-10-13 07:18:07 -04:00
William Falcon
bf2067a609
enabled manual returns ( #4089 )
2020-10-12 10:06:17 -04:00
William Falcon
1dbc6ffbc1
added templates ( #4077 )
...
* docs
* docs
2020-10-11 09:35:51 -04:00
William Falcon
7ffe05a3d1
ref: accelerator names ( #4066 )
...
* ref: accelerator names
* docs
2020-10-11 01:05:14 -04:00
William Falcon
0281b077d8
ref: decouple apex second attemp part 10/n ( #4064 )
...
* ref: decouple apex second attemp part 9/n
* ref: decouple apex second attemp part 9/n
* ref: decouple apex second attemp part 9/n
2020-10-10 20:05:05 -04:00
William Falcon
dbfe2b6129
ref: decouple apex second attemp part 9/n ( #4063 )
...
* ref: decouple apex second attemp part 9/n
* ref: decouple apex second attemp part 9/n
2020-10-10 18:44:24 -04:00
William Falcon
5ce9fc6bb3
ref: decouple apex second attemp part 7/n ( #4061 )
...
* ref: decouple apex second attemp part 7/n
* ref: decouple apex second attemp part 7/n
* ref: decouple apex second attemp part 7/n
2020-10-10 16:44:15 -04:00
William Falcon
d1bbb449a3
ref: decouple apex second attemp part 5/n ( #4058 )
2020-10-10 14:35:25 -04:00
Rohit Gupta
bdbf846029
Fix to print scaler value in progress bar ( #4053 )
...
* Fix to print scaler value in progress bar
* chlog
* Fix to print scaler value in progress bar
* Fix to print scaler value in progress bar
2020-10-10 12:20:11 -04:00
William Falcon
ce2edf1192
ref: decouple apex second attemp part 4/n ( #4056 )
...
* ref: decouple apex second attemp part 4/n
* ref: decouple apex second attemp part 4/n
* Update lightning.py
* ref: decouple apex second attemp part 4/n
2020-10-10 12:19:22 -04:00
William Falcon
7285613974
ref: decouple apex second attemp part 2/n ( #4054 )
...
* ref: decouple apex second attemp part 2/n
* ref: decouple apex second attemp part 2/n
2020-10-10 10:24:20 -04:00
William Falcon
5b261a230e
enable passing in custom accelerators ( #4050 )
...
* enable custom accelerators
* ref: finish decoupling apex, LM and backward
* ref: finish decoupling apex, LM and backward
* ref: finish decoupling apex, LM and backward
2020-10-10 09:21:08 -04:00
William Falcon
2b255a3df4
ref: enable custom clusters (1/n) ( #4048 )
...
* enable cluster plugins
* enable cluster plugins + test backend choices
* enable cluster plugins + test backend choices
* enable cluster plugins + test backend choices
* enable cluster plugins + test backend choices
* enable cluster plugins + test backend choices
* enable cluster plugins + test backend choices
2020-10-10 08:09:29 -04:00