Rohit Gupta
581bf7f2f2
Deprecate `on_epoch_start/on_epoch_end` hook ( #11578 )
2022-02-07 14:15:27 +00:00
Rohit Gupta
0cb64fb8ba
Fix mid-epoch warning call while resuming ( #11556 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Jirka <jirka.borovec@seznam.cz>
2022-02-03 05:42:31 +00:00
Krishna Kalyan
6586dd23b7
Mark `CheckpointConnector` as protected ( #11550 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2022-02-03 02:26:08 +00:00
Carlos Mocholí
a44881cd90
Changes in preparation to #8578 ( #11562 )
2022-02-02 19:57:08 +00:00
Carlos Mocholí
62818dbace
Use a dataclass as the scheduler config ( #11443 )
2022-01-18 20:23:32 +01:00
jjenniferdai
4b5761539e
Remove `hpc_save` ( #11101 )
2022-01-03 12:23:13 +00:00
ORippler
86a3c5e2a3
Add required states for resumed ModelCheckpoint GC ( #10995 )
...
* Add required states for resumed ModelCheckpoint GC
* Add backwards compatibility with legacy cktps
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
* Add test to check if attrs are written to ckpt
Note that we do not yet check for proper loading/reinstantiation of
ModelCheckpooint based on the ckpt written to disk
* Test if attributes are restored properly from ckpt
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Fix broken `test_callbacks_state_fit_ckpt_path`
`ModelCheckpoint` is configured to save after every epoch,
but `trainer.fit` is called with `max_steps = 1`
Note there may be a better way of doing this, where `ModelCheckpoint`
is called after `training_step`
* Update test_restore.py
* Update test_restore.py
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Check that all attributes are restored properly
* revert changes, use fix on master
* Convert to proper unit test
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Refactor `test_mode_checkpoint_saveload_ckpt`
* First save, then load ckpt.
* Instantiate ModelCheckpoint twice.
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-12-20 17:05:15 +01:00
Rohit Gupta
61eb6230c2
Prune EvalModelTemplate ( #11153 )
2021-12-19 13:08:43 +00:00
Adrian Wälchli
e19d93f69e
Initialize ModelCheckpoint state as early as possible ( #11108 )
2021-12-17 00:18:29 +01:00
Carlos Mocholí
1b43e43e9f
Minor changes in preparation for saving the loops state ( #10783 )
2021-11-30 19:37:04 +05:30
Adrian Wälchli
1ff35ed0f5
Improve code quality in `AcceleratorConnector._configure_slurm_ddp` ( #10102 )
2021-11-17 23:10:47 +00:00
Rohit Gupta
34d5980df6
Raise `MisconfigurationException` if `trainer.eval` is missing required methods ( #10016 )
2021-10-25 23:12:08 -07:00
jjenniferdai
6d79184ec5
Unify checkpoint load paths [redo #9693 ] ( #10061 )
2021-10-25 19:05:31 +00:00
Adrian Wälchli
76081fb846
Mark SLURM detection methods in `AcceleratorConnector` as protected ( #10101 )
...
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
2021-10-25 17:52:15 +00:00
Adrian Wälchli
7eb2edf421
rename set_random_master_port ( #10104 )
...
Co-authored-by: tchaton <thomas@grid.ai>
2021-10-25 12:09:05 +00:00
Kaushik B
5e8829b97d
(1/n) tests: Use strategy flag instead of accelerator for training strategies ( #9931 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-10-16 20:40:25 +05:30
Carlos Mocholí
e973bcb76a
Use non-deprecated options in tests ( #9949 )
2021-10-15 16:58:07 -07:00
Adrian Wälchli
b530b7afd2
update tests to not rely on patched dataloaders ( #9905 )
2021-10-12 12:45:28 +02:00
Rohit Gupta
b303b4f895
Fix restoring training state during `trainer.fit` only ( #9413 )
...
* reload state on fit
* trainer.state
* add test
* chlog
* revert
* review
* review
* rev and ammend
* fix test and logic
* update
* code review
* Apply suggestions from code review
* better assertions
* better assertions
* Apply suggestions from code review
* add loop test
* Apply suggestions from code review
* Split for typing
* review comments
* review comments
* use if_else
* code review
* code review
* code review
* Apply suggestions from code review
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
* Remove unnecessary pieces from the test
* move test
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-10-06 14:57:40 +00:00
Danielle Pintz
b3a5c7f442
Add `enable_progress_bar` to Trainer constructor ( #9664 )
2021-09-24 22:53:31 -07:00
Jirka Borovec
6e124e7207
CI: precommit - docformatter ( #8584 )
...
* CI: precommit - docformatter
* fix deprecated
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-09-06 12:49:09 +00:00
Sean Naren
aadd2a9d9c
Load ckpt path when model provided in validate/test/predict ( #8352 )
...
* Change trainer loading behaviour for validate/test/predict
* Fix
* Fix/add tests
* remove
* Cleanups
* Space
* cleanups
* Add CHANGELOG.md
* Move after setup
* Cleanups on logic
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Remve
* fix test
* feedback
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Update pytorch_lightning/trainer/properties.py
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
* Feedback
* Same fix
* Same fix
* Add test for behaviour, modify based on feedback
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Wording
* Apply suggestions from code review
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
* Cleanup docs
* Update pytorch_lightning/trainer/trainer.py
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
* feedback
* Fixes to test API
* Add carlos description
* Move logic further
* Move checkpoint connector logic
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
2021-07-28 10:12:46 +00:00
Carlos Mocholí
a64cc37394
Replace `yapf` with `black` ( #7783 )
...
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-07-26 13:37:35 +02:00
thomas chaton
1f025789fc
[bugfix] Clean Validation Sanity Checking metrics ( #8171 )
...
* resolve logging issue
* update changelog
* remove breakpoint
* resolve bugs
* remove pass
2021-06-28 13:49:56 -04:00
Adrian Wälchli
971908a1aa
Loop Refactor 1/N - Training Loop ( #7871 )
...
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: tchaton <thomas@grid.ai>
Co-authored-by: Justus Schock <justus.schock@rwth-aachen.de>
Co-authored-by: Justus Schock <justus.schock@posteo.de>
Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
2021-06-15 12:55:06 +00:00
Carlos Mocholí
8c0ea92af2
`TrainerState` refactor [5/5] ( #7173 )
...
* `TrainerState` refactor
* flake8
* Update finished check
* Test cleanup
* Fix tests
* Fixes
* Reorder
* flake8
* Update CHANGELOG
* Better docs
* Better docs
* Remove default
* Update tests
* Bad merge
2021-05-04 12:50:56 +02:00
Adrian Wälchli
b780af51be
update test for resume_from_checkpoint on missing file ( #7255 )
2021-05-04 09:16:34 +00:00
Vaibhav Balloli
ccd87cadfc
Changes resume_from_checkpoint warning to error ( #7075 )
...
Co-authored-by: thomas chaton <thomas@grid.ai>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-04-28 15:03:29 +02:00
Jirka Borovec
aa7d3dc6cc
Fix `torchmetrics` compatibility ( #7131 )
...
* get_num_classes
* tmp
* fix one test
* fix deprecated tests
* fix deprecate
* pep8
* deprecate 0.3
* wip
* wip
* HaCK
* brnch
* brnch
* format
* Apply suggestions from code review
* prune
* rev
* mltilabel
* Apply suggestions from code review
* master
* rev
* .
Co-authored-by: Nicki Skafte <skaftenicki@gmail.com>
2021-04-22 20:45:46 +00:00
Elia Cereda
d0596fac94
Refactor RunningStage usage in advance of implementing Trainer.validate() ( #4945 )
...
* Update code
Co-authored-by: EliaCereda
* More property updates
* Move properties. Introduce trainer._fitting
* Use trainer.fitting
* Fix reset dataloaders
* Unused code
* RunningStage.SANITY_CHECKING
* Use setters
* Fix bugs
* Fix bugs
* TrainerState.{FITTING,VALIDATING,TESTING,PREDICTING,TUNING}
* Fix bugs
* Fix bugs
* Fix tests
* Update CHANGELOG. Add deprecation warning. Fix tests
* Unused imports
* Optional trainer
* More deprecation. More refactoring
* Correct version
* Use properties
* Address comments
* flake8
* Missed renamings
* Typo
* is -> ==
It is recommended to use for Enums since they are singletons, however, since the LightningEnum subclasses str, it's not a good idea in case a user sets the state/stage with a str
* Also for tests
* Typo
* Address @tchaton's comments
* PEP8
* Correct property
* Update CHANGELOG
* Apply suggestions from code review
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* Update pytorch_lightning/trainer/trainer.py
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* Remove called sanity check
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-03-06 12:40:19 +00:00
Jirka Borovec
b9cf1223b9
missing tests default_root_dir=tmpdir ( #6314 )
...
* default_root_dir=tmpdir
* miss
2021-03-04 19:23:12 +00:00
Jirka Borovec
0f9134e043
Refactor: skipif for Windows 2/n ( #6268 )
...
* win
* isort
* flake8
2021-03-02 09:36:01 +00:00
Jirka Borovec
eb815000f6
Refactor: skipif for multi - gpus 1/n ( #6266 )
...
* ngpus
* gpu
* isort
* pt
* flake8
2021-03-02 09:03:32 +01:00
Jirka Borovec
1c851b89e1
fixing miss-leading tested acc values ( #5876 )
...
* fixing tested values
* .
* tests
* yapf
* softmax
* hvd
* rename
* lr
* duplicate
* drop
* classif
* rm EvalModel
* Revert "rm EvalModel"
This reverts commit 6c3fb39ebe
.
* update tests
* fix
* azure
* azure
* self
* cpu
* Apply suggestions from code review
Co-authored-by: rohitgr7 <rohitgr1998@gmail.com>
2021-02-23 22:08:46 +00:00
Adrian Wälchli
0456b4598f
mini refactor for _running_stage access ( #5724 )
...
* running stage
* circular import
* running stage cleanup
* fix unused import
* fix running stage access
* add return type
* Revert "add return type"
This reverts commit 65b0fe269c
.
* try fix typing
2021-02-22 12:01:54 +01:00
Adrian Wälchli
02ac4b0b6a
Replace .get_model() with explicit .lightning_module ( #6035 )
...
* rename get_model -> lightning_module
* update references to get_model
* pep8
* add proper deprecation
* remove outdated _get_reference_model
* fix cyclic import
2021-02-18 15:59:54 +01:00
Adrian Wälchli
4bdf2fe55f
remove executable bit on source files ( #5929 )
...
* 644
2021-02-12 00:06:40 +01:00
Kaushik B
4857546c25
Fix: Failing test in data_modules(dp) ( #5924 )
...
* Update test_datamodules.py
* fix code format issue
* fix test restore
* fix code format issue
2021-02-11 17:32:46 +00:00
Rohit Gupta
8e9a026bc3
[tests/models] refactor with BoringModel ( #5507 )
...
* update with BoringModel
* update with BoringModel
* step
* try TPU
* TPU
* update tests
* update tpu tests
* self
* fix
* dp
* update tests
* ref
* update tests
* fix tpu tests
* fix dp and run_prediction
* dp
* only dp
* Apply suggestions from code review
* Apply suggestions from code review
* Apply suggestions from code review
* Apply suggestions from code review
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2021-02-11 14:32:07 +00:00
Jirka Borovec
a0f7831278
fix miss-leading imports in tests ( #5873 )
...
* fix imorts
* .
2021-02-09 05:10:52 -05:00
Jirka Borovec
bd920b4102
Refactor simplify tests ( #5861 )
...
* add new
* restructure
* yapf
* move
* fix
2021-02-08 11:52:02 +01:00
Jirka Borovec
4faaef7758
formatting tests: 4/n ( #5846 )
...
* models
* ckpt
* core
* log
2021-02-06 12:07:26 +01:00
Adrian Wälchli
9555043a29
Force ModelCheckpoint callback to run last ( #5731 )
2021-02-03 16:40:57 -05:00
Adrian Wälchli
692f77b8a7
Refactor LightningDataParallel ( #5670 )
...
* module
* fix model access
* scalar conversion
* refactor
* kwargs
* auto unsqueeze
* refactor code duplication
* clean up
* docs
* update dp docs
* changelog
* generalize test
* test
* rename
* warning cache
* isort
* unsqueezing test
* device
* device
* scalar test
* device
* device
* include coverage of overrides
* clear
* add deprecation test
* docs
* improve coverage
* increase coverage
* fix merge
* extend test
* rename base class
* mention the predict method in docs
* combine iteration over collection
* remove override
* move
* line
* Apply suggestions from code review
* fix running stage
* f401
* fix cyclic import
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2021-01-31 06:08:16 -05:00
chaton
3da28fd634
[feat] 1/2 Add trainer.predict ( #5579 )
...
* start adding predict
* add predict
* resolve test
* add predict
* remove limit_predict
* update
* add test for predict
* typo
* update on comments
* remove predict_step
* update ddp_shareded
* check ddp_sharded
* resolve on comments
* resolve isort
* update dp
* add test dp 1 gpu
* made default forward
* resolve path
* resolve bug
* update on comments
* resolve doc
* resolve bug
* update
* resolve bug
* update on comments
* resolve pep8
* update test doc
* update on comments
* solve special tests
* resolve bug
* resolve flake8
* Update pytorch_lightning/callbacks/progress.py
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
* Update pytorch_lightning/trainer/trainer.py
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
* add predict to LightningModule
* missing predict
* typo
* rename is_prediction to _predicting
* add
* update
* update
* update doc
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2021-01-27 11:38:14 -05:00
Jirka Borovec
53b0ae49b9
fix imports / isort / flake8
2021-01-26 14:57:34 +01:00
chaton
0435e23a64
deprecate enable_pl_optimizer as it is not restored properly ( #5244 )
...
* update
* clean test
* still in progress
* udpdate test
* update
* update
* resolve flake
* add test for zero_grad
* update
* works without accumulated_grad
* update
* update
* resolve amp
* revert back to True
* update
* clean tests
* cleaned out
* typo
* update test
* git repare bug
* remove print
* udpate
* Fix formatting/optimizer imports
* Refactor the test for cleanliness
* Add vanilla model to the test, better var names
* Fixed var names, let's clean up these mock tests
* repare test
* update test
* resolve flake8
* add manual_optimization
* update tests
* resolve flake8
* add random accumulate_grad_batches
* improve test
* Update tests/trainer/optimization/test_parity_automatic_optimization.py
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* Update tests/trainer/optimization/test_parity_automatic_optimization.py
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* update
* clean tests
* correct bug
* Apply suggestions from code review
* format
* adress comments
* update on comments
* wip
* typo
* depreceate enable_pl_optimizer
* resolve latest bugs
* update
* resolve merge
* add comment
* Update pytorch_lightning/core/lightning.py
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* Update tests/deprecated_api/test_remove_1-3.py
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* Update pytorch_lightning/trainer/connectors/optimizer_connector.py
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* Update pytorch_lightning/trainer/trainer.py
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* Update pytorch_lightning/trainer/trainer.py
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* Update tests/trainer/optimization/test_parity_automatic_optimization.py
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* update on comments
* update restore
* add a property
* remove setstate as not needed anymore
* update test
* provide optimizer to on_before_zero_grad
* update on comments
* update on comments
* Update pytorch_lightning/trainer/trainer.py
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* Update tests/trainer/optimization/test_parity_automatic_optimization.py
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* Update tests/trainer/optimization/test_parity_automatic_optimization.py
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* Update tests/trainer/optimization/test_parity_automatic_optimization.py
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* mofidy import
* update changelog
* resolve flake8
* update
* update
* clean doc
Co-authored-by: SeanNaren <sean@grid.ai>
Co-authored-by: Ubuntu <ubuntu@ip-172-31-62-109.ec2.internal>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Jirka Borovec <jirka.borovec@seznam.cz>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
(cherry picked from commit f2e99d617f
)
2021-01-26 14:29:46 +01:00
Jirka Borovec
059f4630c8
prune check on Trainer fit result ( #5453 )
...
* prune check on Trainer fit result
* flake8
* Apply suggestions from code review
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
* .
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-01-11 19:36:48 -05:00
Gianluca Scarpellini
7464aca44e
test_cpu and test_gpu EvalModelTemplate deprecation ( #4820 )
...
* test_cpu refactoring - BoringModel and checkpoints; test_gpu refactoring - BoringModelboring_model refactoring - validation, testing; Fix - run_prediction as dispatcher for testing BoringModel
* Removed EvalModelTemplate import from test_cpu and test_gpu
* Reverting unintended changes
* Issues with checkpointing
* Fixed tests for logging and checkpointing
* Fix for dispatcher
* test_cpu refactoring - BoringModel and checkpoints; test_gpu refactoring - BoringModelboring_model refactoring - validation, testing; Fix - run_prediction as dispatcher for testing BoringModel
* Removed EvalModelTemplate import from test_cpu and test_gpu
* Reverting unintended changes
* Issues with checkpointing
* Fixed tests for logging and checkpointing
* Fix for dispatcher
* Fixed acc check for stocasticity of seeds
* Fixed according to @borda suggestions
* Hparams for boring_model
* Deprecated RuntimeParamChagneModelAssing (functionality is tested in RuntimeParamChangeModelSaving)
* Reduced boring_model parameters to just in and out features, test_cpu modelsinherit BoringModel to specify additional parameters (e.g., optimizer)
* Fix PEP8
* Update tests/base/develop_pipelines.py
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
* Update tests/base/boring_model.py
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
* Update tests/base/develop_pipelines.py
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
* Update tests/models/test_cpu.py
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
* Update tests/models/test_cpu.py
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
* Merged test_early_stopping with all_features; added TODO for self.log
* Fixed test_all_features trainer options
* Ready for review!
* Update tests/models/test_cpu.py
Thank you! :)
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
* Update tests/models/test_cpu.py
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
* Update tests/models/test_cpu.py
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
* Update tests/models/test_cpu.py
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
* Update tests/models/test_cpu.py
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
* added optimizer_name, lr, and batch_size as hparams for save_hparameters()
* Fixes for reducing PR size
* Reverse test_hparams (removed DEPRECATED test for hparams direct assignment)
* Changes for in_features
* Fixed hparams
* Fixed parameters for boring_model
* Update tests/models/test_cpu.py
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
* Update tests/models/test_cpu.py
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
* Update tests/models/test_cpu.py
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
* fix for pep8
* Fixed run_predction and TODO
* fix min acc for darwin/windows without pl_opt
* eval as DEFAULT run_prediction strategy
* Updated val_dataloader for running_test_no_val
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: chaton <thomas@grid.ai>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-01-07 05:50:08 -05:00
tarepan
bb366232e7
Add non-existing resume_from_checkpoint acceptance for auto-resubmit ( #4402 )
...
* Add empty resume_from_checkpoint acceptance #4366
* Fix general error catch with focused file check
* Add fsspec HTTP extras
Add fsspec's HTTPFileSystem support through http extras.
pl has supported remote http file (e.g. #2925 ),
so this commit do not add new functionality.
* Fix potential too much logging in DDP
* Add PR changelog
* Add well-written argument explanation
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* Fix DDP-compatible restore logging
Notify from where the states are restored.
This feature temporally deleted as a result of PR review.
With succeeding review, added with DDP compatibility.
* Fix utility import pathes
* Refactor load step commentaries
* Refactor hpc ckpt suffix acquisition
* Refactor restore/hpc_load match
* Refactor hpc load trial
* Refactor checkpoint dir check
* Refactor unneeded function nest
* Refactor nested If
* Refactor duplicated cache clear
* Refactor attempt flow with if/elif
* Fix pip8
* Refactor hook commentary
Co-authored-by: chaton <thomas@grid.ai>
* Fix pep8
* Refactor hpc load checkpoint path acquisition
* Fix pip8
* Fix typo
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* Fix typo
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* Fix doc
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* Refactor None Union type with Optional
* Fix build-doc CI failure debuged in #5329
* Fix fsspec import during build-doc #5329
* Fix test epoch
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* Fix test with latest test models
* .
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: chaton <thomas@grid.ai>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
Co-authored-by: Roger Shieh <sh.rog@protonmail.ch>
(cherry picked from commit b0051e8c03
)
2021-01-06 12:55:38 +01:00