Arnaud Gelas
a9d9f33a86
Fix isort failures in trainer ( #5529 )
...
Remove from skipped module in pyproject.toml and fix failures on:
- pytorch_lightning/trainer/*.py
2021-01-18 13:42:50 -05:00
Adrian Wälchli
e806bb77fa
Refactor LightningDistributedDataParallel ( #5185 )
...
* add wrapper
* add squeeze
* replace LightningDistributedDP
* update import
* module access
* inputs
* refactor warning
* update
* resolve flake8
* remove old class
* set find unused params to False
* update docstrings
* update docs
* update docs
* add changelog
* deprecation
* rename wrapper -> module
* rename pl_module
* add unit tests
* Revert "add changelog"
This reverts commit 02ec0a6864f4ba2ace3bb6fc6ebc364e1a80ffd7.
* Revert "set find unused params to False"
This reverts commit 8e451515e6ba3227d00f4a5cb63f332cfedb7b30.
Co-authored-by: Ubuntu <thomas@grid.ai>
2021-01-13 14:35:42 -05:00
Jirka Borovec
54d20dc596
Refactor: clean trainer device & distrib getters ( #5300 )
...
* warnings
* .
* .
* flake8
* .
* .
* .
* use_tpu
* use_dp
* .
* use_ddp
* .
* use_horovod
* .
* .
* .
2021-01-12 05:22:37 -05:00
chaton
56437e98a6
[bug-fix] Trainer.test points to latest best_model_path ( #5161 )
...
* resolve bug
* update code
* add set -e
* Update pytorch_lightning/callbacks/model_checkpoint.py
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* update test
* Update tests/checkpointing/test_trainer_checkpoint.py
Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
* Update tests/checkpointing/test_trainer_checkpoint.py
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
* update on comments
* resolve test
* convert to set
* update
* add error triggering
* update
* update on comments
* update
* resolve import
* update
* update
* Update pytorch_lightning/plugins/rpc_plugin.py
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* update
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Ubuntu <ubuntu@ip-172-31-62-109.ec2.internal>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
(cherry picked from commit d5b367871f
)
2021-01-06 15:14:10 +01:00
Rohit Gupta
704e00ee7f
Fix invalid value for weights_summary ( #5296 )
...
* Fix weights_summary
* use mode
* fix
* optional
* what was I thinking
(cherry picked from commit 062800aa99
)
2021-01-06 12:59:32 +01:00
Rohit Gupta
9cfbf8d609
Disable checkpointing, earlystopping and logging with fast_dev_run ( #5277 )
...
* Disable checkpointing, earlystopping and logger with fast_dev_run
* docs
* chlog
* disable callbacks and enable DummyLogger
* add log
* use dummy logger method
* Apply suggestions from code review
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: chaton <thomas@grid.ai>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
(cherry picked from commit f740245521
)
2021-01-06 12:57:24 +01:00
Justus Schock
d88cf4a652
Add Support for multiple train loaders ( #1959 )
...
* add support for wrong dtype in apply_func
* apply loader resetting to possible collection of loaders
* add combined loader iter class
* integrate combined loader iter to training loop
* fix imports
* fix imports
* finish supporters
* add tests for supporters
* add test for model with multiple loaders
* fix trainer integration
* fix instance check
* Train loaders (#4032 )
* patch for issues discussed in #1959 , encapsulating underlying datastructures returned from train_dataloader
* update data_loading.py to it uses patch discussed in #1959
* rename class
* Separate CombinedLoaderIterator into two classes, and update related tests. (#4606 )
* Fix the bugs after rebasing.
* Add custom get_len for apply_to_collection
* Refactor MultiIterator to be as CombinedLoaderIterator
* To get the right num_training_batches. Call the wrapper for multi trainloader in data_loading.py, instead of training_loop.py
* Reload _loader_iters when calling __iter__
* Don't transform DataLoader to CombinedLoaderIterator when it's along
* Updates test_fit_multiple_train_loaders for testing num_training_batches
* Seperate CombinedLoaderIterator into CombinedLoaderIterator and CombinedDataLoader. Add CombinedDataset for unified DataLoader format.
* Initialize CombinedDataLoader before calculating num_training_batches. Also updating self._worker_check for multiple loaders
* Update tests for supporters
* Update tests for multiple trainloaders. Add tests about few_workers for multiple loaders.
* Fix pep8 issues
* Add tests for train_loader_patch.py
* Add descriptions to multiple_trainloader_mode
* Remove unused variables
* Add docstrings and typing
* Add more tests for better converage
* Remove unused commented codes
* Add sampler property
* Remove extract_dataset
* Update typing
* pep8
* Update train_loader_patch.py
* Apply suggestions from code review
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* Update pytorch_lightning/trainer/supporters.py
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* reviewer comments
* fix stupid import
* add docs
* add back line separator
* fix line sep
* pep8
* Apply suggestions from code review
* fix
* fix
* Apply suggestions from code review
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* Apply suggestions from code review
Co-authored-by: Nicki Skafte <skaftenicki@gmail.com>
* flake8
Co-authored-by: Justus Schock <justusschock@justuss-mbp.fritz.box>
Co-authored-by: Christofer Fransson <christofer_fransson@yahoo.com>
Co-authored-by: YI-LIN SUNG <r06942076@ntu.edu.tw>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Nicki Skafte <skaftenicki@gmail.com>
Co-authored-by: Jirka Borovec <jirka.borovec@seznam.cz>
2021-01-04 19:57:53 +00:00
Jirka Borovec
957583544a
mark todo exceptions ( #5320 )
...
* mark todo exceptions
* .
* .
* .
* .
* .
* .
* .
* .
* try
* .
2021-01-04 09:07:56 +01:00
Jirka Borovec
a884866ff0
Unify names in Utils ( #5199 )
...
* warnings
* argparse
* mutils
* xla device
* deprecated
* tests
* simple
* flake8
* fix
* flake8
* 1.4
2020-12-22 00:23:33 +01:00
Jirka Borovec
0f36525e8f
fix/enable - check F401 ( #5201 )
...
* refactor - check F401
* missed
* fix
2020-12-21 10:15:04 +01:00
Jirka Borovec
2d54116baa
annotat unused vars ( #5017 )
...
* annotate all unused vars
* rank_zero_warn
* Apply suggestions from code review
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
* f1 fixed
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2020-12-19 13:53:06 +01:00
chaton
f3748ba808
[feat] Enable self.log in callbacks ( #5094 )
...
* enable to use self.log in callbacks
* update
* revert back to assert
2020-12-16 16:08:39 -05:00
Jirka Borovec
059eaecbb4
set xxx_AVAILABLE as protected ( #5082 )
...
* sett xxx_AVAILABLE as protected
* docs
2020-12-14 20:19:05 +05:30
chaton
1a970b2d8d
[hotfix] Extend Optimizer + update doc ( #5095 )
...
* resolve urgent bug
* update pr
* update doc
* update
* remove typo
* add defaults
* Update pytorch_lightning/__init__.py
* Update setup.py
* update doc
* Update docs/source/optimizers.rst
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* update
* resolve doc
* debug test
* update test
* Update docs/source/optimizers.rst
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* Update docs/source/optimizers.rst
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* Update docs/source/optimizers.rst
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* remove useless import
* Update docs/source/optimizers.rst
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2020-12-11 14:24:59 -05:00
Rohit Gupta
6d2aeff26a
fast_dev_run can be int ( #4629 )
...
* fast_dev_run can be int
* pep
* chlog
* add check and update docs
* logging with fdr
* update docs
* suggestions
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
* fdr flush logs
* update trainer.fast_dev_run
* codefactor and pre-commit isort
* tmp
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Roger Shieh <sh.rog@protonmail.ch>
Co-authored-by: edenlightning <66261195+edenlightning@users.noreply.github.com>
2020-12-09 01:37:53 +05:30
chaton
2393474350
[hotfix] ddp + manual_optimisation ( #4976 )
...
* Rely on ddp plugin for blocking sync behaviour, and skip if we're using manual optimization
* debug
* Revert "debug"
This reverts commit ccca6b6b
* Expose manual reduce for automatic optimization
* Add input arguments
* Enable parity test
* clean imports
* Expose hook after to ensure we reset
* Fix naming
* add
* fix test
* resolve on comments
* typo
* Update tests/trainer/optimization/test_manual_optimization.py
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* Update tests/trainer/optimization/test_manual_optimization.py
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* update on comments
* resolve comments
Co-authored-by: SeanNaren <sean@grid.ai>
Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2020-12-07 19:31:54 +00:00
chaton
02152c1729
Simplify optimization Logic ( #4984 )
...
* Rely on ddp plugin for blocking sync behaviour, and skip if we're using manual optimization
* debug
* Revert "debug"
This reverts commit ccca6b6b
* Expose manual reduce for automatic optimization
* Add input arguments
* Enable parity test
* clean imports
* Expose hook after to ensure we reset
* Fix naming
* add
* fix test
* uniformize optimizer logic
* resolve test
* resovle flake8
* resolve amp bug
* update tests
* remove bug
* remove optimizer_step in accelerators
* typo
* update lightning optimizer
* set doesn't work with ddp_spawn
* resolve flake8
* update threshold
* ignore pyright
* correct codeFactor
* remove useless if
* remove zer_grad function
* simplify step
* remove typo
* resolve bug
* Apply suggestions from code review
* update on comments
* resolve bugs
* remove tests
* Update pytorch_lightning/trainer/configuration_validator.py
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
* simplify testing
* add more tests
Co-authored-by: SeanNaren <sean@grid.ai>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2020-12-07 12:55:49 +00:00
chaton
2e838e6dd8
Enable`self.log` in most functions. ( #4969 )
...
* refactor
* solve pyright
* remove logging in batch_start functions
* update docs
* update doc
* resolve bug
* update
* correct script
* resolve on comments
2020-12-06 13:01:43 +00:00
chaton
1d3724a878
[HotFix] Logging - One epoch delay on training epoch metrics. ( #4913 )
...
* add test
* resolve logging bug
* update
* resolve pep8
* resolve tests
Co-authored-by: Roger Shieh <sh.rog@protonmail.ch>
2020-12-01 09:26:52 +00:00
chaton
c2e6e68c7e
optimizer clean up ( #4658 )
...
* add LightningOptimizer
* typo
* add mock closure
* typo
* remove logic in optimizer_step
* update
* update
* update
* desactivate LightningOptimizer for hovorod
* resolve flake
* typo
* check optimizer name
* change name
* added backward to LightningOptimizer
* remove use_lightning_optimizer
* move update
* simplify init
* resolve comments
* resolve bug
* update
* update
* resolve bugs
* resolve flake8
* set state
* work manual_optimizer_step
* add doc
* add enable_pl_optimizer
* make optimizer_step
* add make_optimizer_step
* add examples
* resolve test
* add test_optimizer_return_options_enable_pl_optimizer
* add enable_pl_optimizer=True
* update
* update tests
* resolve bugs
* update
* set Trainer to False
* update
* resolve bugs
* update
* remove from doc
* resolve bug
* typo
* update
* set to True
* simplification
* typo
* resolve horovod
* unwrap horovod
* remove Optimizer
* resolve horovod
* move logic to amp_backend
* doesn't seem to be pickable
* update
* add again
* resolve some bugs
* cleanup
* resolve bug with AMP
* change __repr__
* round at -12
* udpate
* update
* update
* remove from horovod
* typo
* add convert_to_lightning_optimizers in each accelerators
* typo
* forgot
* forgot a convert_to_lightning_optimizers
* update
* update
* update
* increase coverage
* update
* resolve flake8
* update
* remove useless code
* resolve comments + add support for LightningOptimizer base class
* resolve flake
* check optimizer get wrapped back
* resolve DDPSharded
* reduce code
* lightningoptimizer
* Update pytorch_lightning/core/optimizer.py
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
* Update pytorch_lightning/core/lightning.py
* remove reference to step function
* Apply suggestions from code review
* update on comments
* resolve
* Update CHANGELOG.md
* add back training_step in apex and native_amp
* rename optimizer_step
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: William Falcon <waf2107@columbia.edu>
Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
2020-12-01 00:09:46 +00:00
tchaton
316ebadbdc
remove capture on on_train_batch_end
2020-11-27 18:46:49 +00:00
tchaton
cef83dbbf8
optimize logging
2020-11-27 18:21:23 +00:00
tchaton
e17300f97d
add more profiler
2020-11-27 18:00:48 +00:00
tchaton
3a8fa6bf11
update
2020-11-27 17:48:51 +00:00
Sean Naren
f0ab74dc2f
Expose scaler in amp plugin ( #4737 )
2020-11-18 22:30:47 +00:00
chaton
4018237c30
[FEAT] Add lambda closure to manual_optimizer_step ( #4618 )
...
* added lambda_closure
* move to types
* add 2 new tests
* make example more complex
* add complex example to doc
* added more tests
* resolve doc
* typo
* update
* update tpu optimizer_step
* Apply suggestions from code review
* Update pytorch_lightning/core/lightning.py
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* update
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2020-11-12 19:22:06 +00:00
chaton
514cb22bd7
[Fix] Move log value to cpu. ( #4592 )
...
* move value to cpu to save memory
* update
* move to cpu
* try something
* update
* update
* add back out_dict.update({k: v})
* add move_metrics_to_cpu
* update
* Update pytorch_lightning/utilities/memory.py
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
* resolve comments
* Update pytorch_lightning/core/step_result.py
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* Update pytorch_lightning/trainer/connectors/logger_connector/logger_connector.py
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
2020-11-10 21:13:41 +00:00
chaton
7e08b0d710
[bug-fix] DDP and automatic_optimization=False ( #4485 )
...
* resolve bug
* add self._running_manual_optim
* update
* update tests
* update lightning module
* resolve bug
* update tests
* update
* resolve pep8
* update
* replace by `ddp_spawn`
* temporary fix
* update
* update
* move update to training_loop
* make both ddp_spawn
* introduce `manual_optimizer_step`
* update changelog
* added changelog wrong place
* add force_optimizer_step
* update docstring for tests
* update optimizer_step
* update zero_grad
* resolve flake8
* move update into manual_optimizer_step
* add zero_grad
* remove zero_grad tests
* remove manual_backward in AMP, it doesn't help
* update
* loosen tests
* update
* update doc
* add TODO
* Removed unnecessary get model from native amp
* Remove try except with pytest raise
* Add seed, clean up imports, remove try catch to reproduce error
* update code
* update test
* revert back
* formatting
* Update pytorch_lightning/core/lightning.py
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: SeanNaren <sean@grid.ai>
Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2020-11-10 19:44:51 +00:00
chaton
9c8701f2e2
[feat] Logging refactor 2/n - train ( #4495 )
...
* update logging
* solve more bugs
* replace Mapping by Dict
* update on comments
* resolve pep8
* Apply suggestions from code review
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
* Update pytorch_lightning/trainer/connectors/logger_connector/epoch_result_store.py
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* update on comments
* typo
* update for coverage
* update test
* update
* Update tests/models/test_hooks.py
Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
* Update tests/models/test_hooks.py
Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
* update on comments
* remove deepcopy
* remove useless look for
* another small optim
* extra optim
* remove lastest optim, can be source of bug
* resolve bug
* add docstring
* optimize coverage
* Update pytorch_lightning/trainer/connectors/logger_connector/epoch_result_store.py
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* Update pytorch_lightning/trainer/connectors/logger_connector/epoch_result_store.py
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* Update tests/trainer/logging_tests/test_distributed_logging.py
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* Update pytorch_lightning/trainer/evaluation_loop.py
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* Update tests/trainer/logging/test_logger_connector.py
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* Update tests/trainer/logging_tests/test_train_loop_logging_1_0.py
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* update on comments
* update
* update on comments
* update parity speed
* get it down to 0.65
* update
* 0.8 max_dif
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
Co-authored-by: William Falcon <waf2107@columbia.edu>
2020-11-05 22:27:04 +00:00
Rohit Gupta
360b3d8844
Disable training when limit_train_batches=0 ( #4371 )
...
* Disable training when limit_train_batches=0
* chlog
* pep
* limit_train_batches
* BoringModel
Co-authored-by: Roger Shieh <sh.rog@protonmail.ch>
2020-11-03 12:10:35 +05:30
Rohit Gupta
ad2556b669
Disable saving checkpoints if not trained ( #4372 )
...
* Disable saving checkpoints if not trained
* chlog
* update test
* fix
Co-authored-by: chaton <thomas@grid.ai>
2020-11-03 11:38:32 +05:30
chaton
958aa1aee7
[test] Accumulated gradient optimization tests ( #4477 )
...
* adding tests
* wip
* update
* Update tests/trainer/test_trainer.py
Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
2020-11-02 23:44:11 +00:00
chaton
ac3f7393fd
[FEAT] logging refactors 1/n ( #4439 )
...
* introducing new logging object
* typo
* typo
* Update pytorch_lightning/trainer/logging.py
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
* Update pytorch_lightning/trainer/logging.py
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
* update on comments
* update on comments
* add more doctstring
* Update pytorch_lightning/core/lightning.py
Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
* resolve on comments
* solve pyright
* Update pytorch_lightning/trainer/connectors/logger_connector/epoch_result_store.py
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
* update on comments
* Update pytorch_lightning/trainer/connectors/logger_connector/epoch_result_store.py
Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
* update on comments
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
2020-11-02 20:51:43 +00:00
chaton
102fa9ee7d
[BUGFIX] AMP + Precision unscale grad ( #4441 )
...
* move unscale within Native plugin
* remove gradient tracking from lightning backward
* forgot trainer.fit
* typo
* update
* cleanup
* set to 1.6
* typo
* skip if below 1.6 strict
* update changelog
* remove useless code
* Update tests/plugins/test_amp_plugin.py
Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
* Update tests/plugins/test_amp_plugin.py
Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
* update changelog
* Update CHANGELOG.md
Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
Co-authored-by: Jeff Yang <ydcjeff@outlook.com>
2020-11-02 16:36:48 +00:00
Justus Schock
bbd81dfd55
Skips DDP parameter sync ( #4301 )
...
* ddp no-sync
* Update pytorch_lightning/trainer/training_loop.py
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
* Update training_loop.py
* factor __enter__ and __exit__ out to separate context manager
* delete _updated_model_last_step
Co-authored-by: justusschock <justusschock@pc125.lfb.rwth-aachen.de>
Co-authored-by: Teddy Koker <teddy.koker@gmail.com>
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
Co-authored-by: chaton <thomas@grid.ai>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2020-10-29 23:01:37 +05:30
Rohit Gupta
b26c71eadf
Add optimizer hooks in callbacks ( #4379 )
...
* Add optimizer hooks in callbacks
* optimizer param
* update test
Co-authored-by: Nicki Skafte <skaftenicki@gmail.com>
2020-10-28 13:15:22 +01:00
chaton
3abfec8962
[HOTFIX] ModelCheckpoint - Don't increase current_epoch and global_step if not trained ( #4291 )
...
* add two tests w/wo tempdir
* resolve flake8
* this test is failing
* update bug report
* resolve bug and add test
* remove bug_report
* resolve flake8
* resolve bug
* resolve pep8
* resolve pep8
Co-authored-by: Teddy Koker <teddy.koker@gmail.com>
2020-10-23 11:17:50 +01:00
Sean Naren
9823f97a84
Protect functions not to be accessed by user ( #4305 )
2020-10-22 15:15:04 +01:00
Sean Naren
065cc94112
Fix bug comparing max_steps to global step which inits at 0 ( #4278 )
...
* Fix bug comparing max_steps to global step which inits at 0
* Added test to ensure accumulate grad batch works with max steps
* check fix with TODO test
* correct call counts
* Add check to ensure we've finished accumulation of this global step before exiting loop in conjuction with max steps
* Remove + 1 check in test as this was incorrect
* Update incorrect expected outputs in lr finder test
* Added brackets for clarity
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2020-10-22 13:58:59 +01:00
Justus Schock
0ec4107697
Optimizer closure ( #4190 )
...
* closure for all optimizers
* rename hook and take care of alternating backwards
* add comment
* training_loop_fix
* closure whenever possible
* training_loop
* simple tests that count backward calls
* fix test to work with closure
* remove debugging statement
* better place
* check grads after backward
* start fixing manual optimization
* skip step when result returned by closure was None
* fix gradient clipping test to work with closure
* attribute dict result only for automatic optimization
* adjust backward calls in accelerator
* adjust where to call gradient clipping
* adjust backward calls in tests
* Apply suggestions from code review
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* pass kwargs to xla optimizer
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2020-10-21 19:34:29 +01:00
William Falcon
72f19768c8
remove duplicate metric vs step log for train loop ( #4173 )
...
* remove duplicate metric vs step log
* remove duplicate metric vs step log
* remove duplicate metric vs step log
* fix ddp index issue
2020-10-15 10:47:00 -04:00
Jirka Borovec
f064682786
save initial arguments ( #4163 )
...
* save initial arguments
* typing
* chlog
* .
2020-10-15 08:30:49 -04:00
William Falcon
bf2067a609
enabled manual returns ( #4089 )
2020-10-12 10:06:17 -04:00
William Falcon
0281b077d8
ref: decouple apex second attemp part 10/n ( #4064 )
...
* ref: decouple apex second attemp part 9/n
* ref: decouple apex second attemp part 9/n
* ref: decouple apex second attemp part 9/n
2020-10-10 20:05:05 -04:00
William Falcon
5ce9fc6bb3
ref: decouple apex second attemp part 7/n ( #4061 )
...
* ref: decouple apex second attemp part 7/n
* ref: decouple apex second attemp part 7/n
* ref: decouple apex second attemp part 7/n
2020-10-10 16:44:15 -04:00
William Falcon
d1bbb449a3
ref: decouple apex second attemp part 5/n ( #4058 )
2020-10-10 14:35:25 -04:00
William Falcon
ce2edf1192
ref: decouple apex second attemp part 4/n ( #4056 )
...
* ref: decouple apex second attemp part 4/n
* ref: decouple apex second attemp part 4/n
* Update lightning.py
* ref: decouple apex second attemp part 4/n
2020-10-10 12:19:22 -04:00
William Falcon
7285613974
ref: decouple apex second attemp part 2/n ( #4054 )
...
* ref: decouple apex second attemp part 2/n
* ref: decouple apex second attemp part 2/n
2020-10-10 10:24:20 -04:00
Nrupatunga
fcfa587492
Bugfix/update trainer properties ( #3975 )
...
* make current_epoch and global_step to be same as trainer, after model restore.
* remove assignment here
* test
* minor modification
* merge with parent's master
* [bug-fix]: update trainer properties
* minor comment fix
* minor comment fix
* reset train loader in `on_train_epoch_start` hook
* makes sure the changes work
* minor chane
* update changelog
* adding unit test for reload_dataloaders_every_epoch arg
* modified changelog, to add PR number
* revert imports
* changes to unit test
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2020-10-08 10:20:55 -04:00
William Falcon
048a816be3
added tests for the training epoch end ( #3967 )
2020-10-07 22:27:36 -04:00