Kaushik B
7b0d1183db
Update `gpus` flag with `accelerator` and `devices` flag ( #12156 )
...
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2022-03-23 19:52:12 +00:00
DuYicong515
523200971d
Remove `AcceleratorConnector.root_gpu` and deprecate `Trainer.root_gpu` ( #12262 )
2022-03-19 23:53:50 +00:00
Danielle Pintz
0fe3379fa4
Deprecate `weights_save_path` from the Trainer constructor ( #12084 )
...
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2022-02-28 22:45:26 +00:00
Krishna Kalyan
6586dd23b7
Mark `CheckpointConnector` as protected ( #11550 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2022-02-03 02:26:08 +00:00
jjenniferdai
4b5761539e
Remove `hpc_save` ( #11101 )
2022-01-03 12:23:13 +00:00
four4fish
cf5ef32f7b
Deprecate Trainer.training_type_plugin in favor of trainer.strategy ( #11141 )
...
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-12-22 02:11:43 +00:00
Adrian Wälchli
7eb2edf421
rename set_random_master_port ( #10104 )
...
Co-authored-by: tchaton <thomas@grid.ai>
2021-10-25 12:09:05 +00:00
Kaushik B
5e8829b97d
(1/n) tests: Use strategy flag instead of accelerator for training strategies ( #9931 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-10-16 20:40:25 +05:30
Danielle Pintz
b3a5c7f442
Add `enable_progress_bar` to Trainer constructor ( #9664 )
2021-09-24 22:53:31 -07:00
Jirka Borovec
6e124e7207
CI: precommit - docformatter ( #8584 )
...
* CI: precommit - docformatter
* fix deprecated
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-09-06 12:49:09 +00:00
Jirka Borovec
f67892ea96
CI: yesqa ( #8564 )
...
* add yesqa
* fix flake8
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-08-02 16:05:56 +00:00
Carlos Mocholí
a64cc37394
Replace `yapf` with `black` ( #7783 )
...
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-07-26 13:37:35 +02:00
marsggbo
d0038b521c
Bugfix: horovod optimizer missing 2 required positional arguments ( #7840 )
...
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: thomas chaton <thomas@grid.ai>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-07-21 08:11:26 +00:00
Adrian Wälchli
6b7b40473b
deprecate hpc_load() and integrate it with restore() ( #7955 )
...
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-06-14 12:20:01 +00:00
Carlos Mocholí
8c0ea92af2
`TrainerState` refactor [5/5] ( #7173 )
...
* `TrainerState` refactor
* flake8
* Update finished check
* Test cleanup
* Fix tests
* Fixes
* Reorder
* flake8
* Update CHANGELOG
* Better docs
* Better docs
* Remove default
* Update tests
* Bad merge
2021-05-04 12:50:56 +02:00
thomas chaton
0544efd453
[bug] Update broadcast + reduce decision ModelCheckpoint] ( #6410 )
...
* resolve bug
* update
* update changelog
* update PR
* Update pytorch_lightning/trainer/connectors/logger_connector/epoch_result_store.py
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
* add todo
* resolve issues
* resolve flake8
* update
* add coverage for reduce
* wip
* restore back to brodbact
* remove test.py
* resolve flake8
* update
* check world size
* resolve test
* update
* use pytorch version when defined
* update on comments
* update on comments
* flake8
* resolve bugs
* Update CHANGELOG.md
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
* update
* update
* update
* update
* remove test
* update
* resolve flake8
* update
* update
* update
* proxy
* update
* update
* resolve typo
* prune
* update parallel
* update
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-03-14 17:14:27 +00:00
Jirka Borovec
1c851b89e1
fixing miss-leading tested acc values ( #5876 )
...
* fixing tested values
* .
* tests
* yapf
* softmax
* hvd
* rename
* lr
* duplicate
* drop
* classif
* rm EvalModel
* Revert "rm EvalModel"
This reverts commit 6c3fb39ebe
.
* update tests
* fix
* azure
* azure
* self
* cpu
* Apply suggestions from code review
Co-authored-by: rohitgr7 <rohitgr1998@gmail.com>
2021-02-23 22:08:46 +00:00
Rohit Gupta
8e9a026bc3
[tests/models] refactor with BoringModel ( #5507 )
...
* update with BoringModel
* update with BoringModel
* step
* try TPU
* TPU
* update tests
* update tpu tests
* self
* fix
* dp
* update tests
* ref
* update tests
* fix tpu tests
* fix dp and run_prediction
* dp
* only dp
* Apply suggestions from code review
* Apply suggestions from code review
* Apply suggestions from code review
* Apply suggestions from code review
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2021-02-11 14:32:07 +00:00
Jirka Borovec
bd920b4102
Refactor simplify tests ( #5861 )
...
* add new
* restructure
* yapf
* move
* fix
2021-02-08 11:52:02 +01:00
Kaushik B
5dfd62c09e
Disable training with zero num_training_batches when insufficient limit_train_batches ( #5703 )
...
* disable training when zero num_train_batches with limit_train_batches
* refactor train skip condition
* fix formatting issues
* fix formatting issues
* ref: test error msg
* fix tests for data loader calls
* fix train dataloader condition
* update limit_train_batches upper range in test comment
* remove model state check test
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2021-02-05 21:40:42 +01:00
Sumanth Ratna
1c44f35cf3
Fix mypy 0.800 plus when prepending $PYTHONPATH to sys.path ( #5698 )
...
* Fix mypy when prepending $PYTHONPATH to sys.path
* attempt mypy fix
* Revert "attempt mypy fix"
This reverts commit fb7ed827d9
.
* fix mypy
Co-authored-by: Jirka Borovec <jirka.borovec@seznam.cz>
2021-02-05 21:40:40 +01:00
Arnaud Gelas
ac531ec945
Fix pre-commit isort failure on tests/models/*.py ( #5423 )
...
* Remove tests.models from skipped module in pyproject.toml
* Fix pre-commit isort failure on tests/models/*.py
2021-01-14 09:42:01 -05:00
Jirka Borovec
059f4630c8
prune check on Trainer fit result ( #5453 )
...
* prune check on Trainer fit result
* flake8
* Apply suggestions from code review
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
* .
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-01-11 19:36:48 -05:00
Gianluca Scarpellini
7464aca44e
test_cpu and test_gpu EvalModelTemplate deprecation ( #4820 )
...
* test_cpu refactoring - BoringModel and checkpoints; test_gpu refactoring - BoringModelboring_model refactoring - validation, testing; Fix - run_prediction as dispatcher for testing BoringModel
* Removed EvalModelTemplate import from test_cpu and test_gpu
* Reverting unintended changes
* Issues with checkpointing
* Fixed tests for logging and checkpointing
* Fix for dispatcher
* test_cpu refactoring - BoringModel and checkpoints; test_gpu refactoring - BoringModelboring_model refactoring - validation, testing; Fix - run_prediction as dispatcher for testing BoringModel
* Removed EvalModelTemplate import from test_cpu and test_gpu
* Reverting unintended changes
* Issues with checkpointing
* Fixed tests for logging and checkpointing
* Fix for dispatcher
* Fixed acc check for stocasticity of seeds
* Fixed according to @borda suggestions
* Hparams for boring_model
* Deprecated RuntimeParamChagneModelAssing (functionality is tested in RuntimeParamChangeModelSaving)
* Reduced boring_model parameters to just in and out features, test_cpu modelsinherit BoringModel to specify additional parameters (e.g., optimizer)
* Fix PEP8
* Update tests/base/develop_pipelines.py
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
* Update tests/base/boring_model.py
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
* Update tests/base/develop_pipelines.py
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
* Update tests/models/test_cpu.py
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
* Update tests/models/test_cpu.py
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
* Merged test_early_stopping with all_features; added TODO for self.log
* Fixed test_all_features trainer options
* Ready for review!
* Update tests/models/test_cpu.py
Thank you! :)
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
* Update tests/models/test_cpu.py
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
* Update tests/models/test_cpu.py
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
* Update tests/models/test_cpu.py
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
* Update tests/models/test_cpu.py
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
* added optimizer_name, lr, and batch_size as hparams for save_hparameters()
* Fixes for reducing PR size
* Reverse test_hparams (removed DEPRECATED test for hparams direct assignment)
* Changes for in_features
* Fixed hparams
* Fixed parameters for boring_model
* Update tests/models/test_cpu.py
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
* Update tests/models/test_cpu.py
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
* Update tests/models/test_cpu.py
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
* fix for pep8
* Fixed run_predction and TODO
* fix min acc for darwin/windows without pl_opt
* eval as DEFAULT run_prediction strategy
* Updated val_dataloader for running_test_no_val
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: chaton <thomas@grid.ai>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-01-07 05:50:08 -05:00
Jirka Borovec
059eaecbb4
set xxx_AVAILABLE as protected ( #5082 )
...
* sett xxx_AVAILABLE as protected
* docs
2020-12-14 20:19:05 +05:30
tarepan
16feb5137b
Refactor load in checkpoint connector ( #4593 )
...
* Refactor load step commentaries
* Refactor hpc ckpt suffix acquisition
* Refactor restore/hpc_load match
* Refactor hpc load trial
* Refactor checkpoint dir check
* Refactor unneeded function nest
* Refactor nested If
* Refactor duplicated cache clear
* Refactor attempt flow with if/elif
* Fix pip8
* Refactor hook commentary
Co-authored-by: chaton <thomas@grid.ai>
* Fix pep8
* Refactor hpc load checkpoint path acquisition
* Fix pip8
* Fix doc
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* Refactor None Union type with Optional
Co-authored-by: chaton <thomas@grid.ai>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Roger Shieh <sh.rog@protonmail.ch>
2020-12-14 00:13:50 +08:00
Jirka Borovec
05f25f3a54
update usage of deprecated checkpoint_callback ( #5006 )
...
* drop usage of deprecated checkpoint_callback
* fix
* fix
2020-12-09 14:14:34 -05:00
Jirka Borovec
53d7c9555c
drop usage of deprecated distributed_backend ( #5009 )
...
Co-authored-by: chaton <thomas@grid.ai>
Co-authored-by: Roger Shieh <sh.rog@protonmail.ch>
2020-12-09 09:18:23 +01:00
Jirka Borovec
ab7c947961
simplify CI horovod ( #4951 )
...
* simplify CI horovod
* reorder
2020-12-07 10:31:33 +01:00
Jirka Borovec
3976db597d
refactor imports of optional dependencies ( #4859 )
...
* refactor imports of optional dependencies
* fix
* fix
* fix
* fix
* fix
* flake8
* flake8
Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
Co-authored-by: chaton <thomas@grid.ai>
2020-12-04 10:26:10 +01:00
Rohit Gupta
4c7ebdc32b
Add dirpath and filename parameter in ModelCheckpoint ( #4213 )
...
* Add dirpath and filename parameter in ModelCheckpoint
* remove old function
* chlog
* codefactor
* update tests
* docs
* fix doctest and added tests
* pathlib dirpath
* dep version and docs
* try fix doctest
* pep
* suggestions
Co-authored-by: carmocca <carlossmocholi@gmail.com>
* suggestions
* fix test
* pep
* trigger tests
* Apply suggestions from code review
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
* suggestions
* try fix windows test
* add and update some tests
* trigger tests
* Apply suggestions from code review
* Apply suggestions from code review
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: William Falcon <waf2107@columbia.edu>
2020-10-23 09:59:12 +05:30
William Falcon
e17712e5c3
part 5 of #3733 ( #3774 )
...
* ref: part 4 of #3733
* ref: part 4 of #3733
* ref: part 4 of #3733
2020-10-01 12:34:12 -04:00
William Falcon
cf182e80fc
Finish Allow on_save_checkpoint... ( #3688 )
...
* Finish #3562
* Apply suggestions from code review
* Apply suggestions from code review
* fix tests
* Finish #3562
* Apply suggestions from code review
* Apply suggestions from code review
* fix tests
* fix structure
* fix structure
* make save_last test pass
* unnecessary global rank check
* fix test
* update test
* update test
* test
* test
* run save on all
* remove assert
* tracking saves
* check if fails
* test
* clean up
* adjust horovod test
* clean up
* remove unnecessary makdirs
* change
* undo
* debug
* debug
* debug
* debug
* mock
* undo debug code
* add extra assertions
* test
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <adrian.waelchli@inf.unibe.ch>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2020-09-30 16:15:29 -04:00
Jirka Borovec
7b64472ced
fix lib paths after Wandb 0.10 ( #3520 )
...
* try
* try
* drop 0.20
* drop 0.19.5
* -U
* Fixed Horovod in CI due to wandb==0.10.0 sys.path modifications (#3525 )
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* format
* wb freeze
* types
Co-authored-by: Travis Addair <taddair@uber.com>
2020-09-17 08:37:49 -04:00
William Falcon
cd16aa9854
ref: checkpoint connector methods 4/n ( #3474 )
...
* ref: checkpoint connector methods 4/n
* ref: checkpoint connector methods 4/n
* ref: checkpoint connector methods 4/n
* ref: checkpoint connector methods 4/n
* ref: checkpoint connector methods 4/n
* ref: checkpoint connector methods 4/n
* ref: checkpoint connector methods 4/n
* ref: checkpoint connector methods 4/n
* ref: checkpoint connector methods 4/n
2020-09-12 08:42:27 -04:00
Adrian Wälchli
188e06c261
ddp fix for trainer.test() + add basic ddp tests ( #2997 )
...
* add ddp script variations
* add ddp test
* rename
* shell
* test
* test
* try call
* try without subprocess
* test
* display the error
* list all variations
* try string
* try copy env
* debug
* pythonpath
* path
* update test
* change
* simple ddp test
* replace
* remove random port
* random port
* str
* clean up
* check run spawn
* clean up
* docs
* docs
* update test
* docs
* changelog
* changelog
2020-08-16 11:19:57 -04:00
Adrian Wälchli
d03953260d
Fix weights_save_path when logger is used + simplify path handling + better docs ( #2681 )
...
* fix weights_save path and drop ckpt_path
* add tests
* unused import
* update docs
* changelog
* pep8
* fix horovod test
* make backward compatible
* perform same test for all loggers
* fix for when logger=False and weights_save_path is set
* update changelog
* update docs
* update tests
* do not set save dir dynamically
* remove duplicate test
* remove duplicated tests
* update tests
* update tests
* remove remaining ckpt_path references
* move defaults to init as suggested by @Borda
* test deprecation
2020-07-27 12:53:11 -04:00
William Falcon
f35337adba
Fixes .test() for ddp ( #2570 )
...
* enable none checkpoint
* enable none checkpoint
* enable none checkpoint
* enable none checkpoint
* enable none checkpoint
* enable none checkpoint
* enable none checkpoint
* enable none checkpoint
* enable none checkpoint
* enable none checkpoint
* enable none checkpoint
* enable none checkpoint
* enable none checkpoint
* enable none checkpoint
* enable none checkpoint
* enable none checkpoint
* enable none checkpoint
* enable none checkpoint
* enable none checkpoint
* enable none checkpoint
* enable none checkpoint
* enable none checkpoint
* enable none checkpoint
* enable none checkpoint
* enable none checkpoint
* enable none checkpoint
* enable none checkpoint
* enable none checkpoint
* enable none checkpoint
* enable none checkpoint
* enable none checkpoint
* enable none checkpoint
* enable none checkpoint
* enable none checkpoint
* enable none checkpoint
* enable none checkpoint
* enable none checkpoint
* enable none checkpoint
* enable none checkpoint
* enable none checkpoint
* enable none checkpoint
* enable none checkpoint
* enable none checkpoint
* enable none checkpoint
* enable none checkpoint
* enable none checkpoint
* enable none checkpoint
* enable none checkpoint
* enable none checkpoint
* enable none checkpoint
* enable none checkpoint
* enable none checkpoint
* enable none checkpoint
* enable none checkpoint
* enable none checkpoint
* enable none checkpoint
* enable none checkpoint
* enable none checkpoint
* enable none checkpoint
* enable none checkpoint
* enable none checkpoint
* enable none checkpoint
* enable none checkpoint
2020-07-09 18:36:36 -04:00
Adrian Wälchli
78db847e42
Fixed skipped horovod tests ( #2514 )
...
* skip ckpt test on rank > 0
* fx test
* add extra assert
* code factor
* add back removed
* add old loading code
* add back old
* unused import
* add same skip to run_model_without_loggers
* test if horovod now works with python 3.8
* test remove all 3.8 skips
* remove spawn
* fix
* fix test
* move load check up
* fix test multigpu
* rename
* fix gpu mode
* on gpu fix when on cpu
* move
2020-07-07 14:54:07 -04:00
Jirka Borovec
f1c96930b1
repair CI for Win ( #2358 )
...
* no cov
* no cov
* ReduceOp
* group
* reduce_op.sum
* Update sklearns.py
* formatting
* horovod
* Apply suggestions from code review
* horovod
* horovod
* horovod
* horovod
* ci
* print
* ci
* timeout
* timeout
* time
* fix
* distributed cpu
* pipes
* time
* cpu
* spawn
* spawn
* spawn
* tp
* separate
* os
* os
* npm
* Fix load_from_checkpoint() not working with URL on Windows
* Update CHANGELOG
* Update CHANGELOG.md
Co-authored-by: Peter Yu <2057325+yukw777@users.noreply.github.com>
* Apply suggestions from code review
* fix
* fix meta tags creating empty lines
* pyright
* node
* fix httpserver address
* drop tutils.default_trainer_options
* imports
* Better fix for load_from_checkpoint() not working with absolute path on Windows (#2294 )
* Fix load_from_checkpoint() not working with URL on Windows
* Update CHANGELOG
* Update CHANGELOG.md
Co-authored-by: Peter Yu <2057325+yukw777@users.noreply.github.com>
* Apply suggestions from code review
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Peter Yu <2057325+yukw777@users.noreply.github.com>
* drop duplicate
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
Co-authored-by: airium <airium@outlook.com>
Co-authored-by: Peter Yu <2057325+yukw777@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: AIRIUM <38249940+airium@users.noreply.github.com>
2020-06-26 21:38:25 -04:00
Jirka Borovec
9d2df24d6b
RC & Docs/changelog ( #1776 )
...
* missing
* RC
* tol
* Apply suggestions from code review
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* test
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2020-05-11 21:57:53 -04:00
Jirka Borovec
134eb61e1a
Tests: refactor cleanup ( #1744 )
...
* wip
* cleaning
* optim imports
* -
* default hparams
* fix restore
* fix imports
2020-05-10 13:15:28 -04:00
Jirka Borovec
f380027951
refactor default model ( #1652 )
...
* refactor default model
* drop redundant seeds
* formatting
* path
* formatting
* rename
2020-05-02 08:38:22 -04:00
Travis Addair
2950f66983
Fix Horovod distributed backend to set the root_gpu property ( #1669 )
...
* params
* drop acc
* Fix Horovod distributed backend to set the root_gpu
* Fixed test
* Fixed tests
* Fixed lint
* Set root_gpu during initialization
* chlog
Co-authored-by: Jirka <jirka.borovec@seznam.cz>
2020-05-01 14:13:35 -04:00
Travis Addair
7024177f7d
Added Horovod distributed backend ( #1529 )
...
* Initial commit of Horovod distributed backend implementation
* Update distrib_data_parallel.py
* Update distrib_data_parallel.py
* Update tests/models/test_horovod.py
Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com>
* Update tests/models/test_horovod.py
Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com>
* Fixed tests
* Added six
* tests
* Install tox for GitHub CI
* Retry tests
* Catch all exceptions
* Skip cache
* Remove tox
* Restore pip cache
* Remove the cache
* Restore pip cache
* Remove AMP
Co-authored-by: William Falcon <waf2107@columbia.edu>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: J. Borovec <jirka.borovec@seznam.cz>
2020-04-22 17:39:08 -04:00