Jirka Borovec
0c0b24c031
Prune deprecated metrics ( #8586 )
...
* drop metrics
* drop tests
* fix imports
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-07-28 16:57:31 +00:00
Carlos Mocholí
f7027a8701
Remove `torch >= 1.6` checks ( #8523 )
2021-07-23 04:03:20 +00:00
Carlos Mocholí
6ce77a102b
Set minimum PyTorch version to 1.6 ( #8288 )
...
Co-authored-by: Jirka <jirka.borovec@seznam.cz>
2021-07-13 17:12:49 +00:00
Carlos Mocholí
4d9b72b8a9
Nuke RPC ( #8101 )
2021-06-23 18:31:13 +00:00
Carlos Mocholí
dd340a6598
Actually show deprecation warnings and their line level [2/2] ( #8002 )
...
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-06-21 18:51:53 +02:00
Sean Naren
96433d03ea
IPU Integration 5/5 ( #7867 )
...
* Initial changes
* Add broken example for now
* Fix reference
* Fix format
* Code runs
* Fixes
* Clear up files
* Add tests, helpers, fixes
* Small cleanups
* Refactors based on review
* Swap to special tests
* Add special tests
* Add source
* Cleanups
* Add logic to attach/detach model from devices
* Fixes for tests
* Fixes for tests
* Move earlier
* Cleanups
* Add check for nvcc
* Add tests, cleanups
* Fix errors
* fix
* Try condition
* Add missing annotation
* Clearer
* Clearer message
* Fix variable
* Cleanups
* Add comment
* CHANGELOG.md
* Add simple selection test
* Remove special=True to see what happens
* Fix test
* Update tests/accelerators/test_ipu.py
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
* Convert ipu_cores -> ipus
* Add typing, fail earlier
* simplify precision
* Add test, add helper
* fix accum
* Update pytorch_lightning/plugins/training_type/ipu.py
Co-authored-by: thomas chaton <thomas@grid.ai>
* Use stages
* Make sure warning message returned
* thorw error
* Add more tests, use fs
* add comment
* Clean
* Address feedback, add IPU tests
* Fixes
* Fix signature
* Add types
* Remove autoround
* Add docstring
* ipu_cores -> ipus
* Add test, remove unnecessary precision set
* Add optimizer test
* Add precision back with test
* Address code review
* Change to probs
* Move some of the asserts earlier
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
Co-authored-by: thomas chaton <thomas@grid.ai>
2021-06-11 15:07:04 +00:00
shuyingsunshine21
299f2c481b
FSDP with full state dict ( #7487 )
...
* Fix some test errors
Summary:
Test Plan:
Reviewers:
Subscribers:
Tasks:
Tags:
* checkpoint consolidation
* Update ddp_spawn.py
* Update test_metric_result_integration.py
* Update test_results.py
* Update utils.py
* Update utils.py
* Update test_all_gather_grad.py
* Update test_all_gather_grad.py
* Update test_results.py
* Revert "Update test_results.py"
This reverts commit 9d4a2b891d
.
* Revert "Merge pull request #1 from shuyingsunshine21/shuyingsunshine21-checkpoint_consolidate"
This reverts commit c5053da789
, reversing
changes made to 0d23d75bc9
.
* Revert "Update test_all_gather_grad.py"
This reverts commit 0d23d75bc9
.
* Revert "Update utils.py"
This reverts commit 70fe5da9c6
.
* Revert "Update utils.py"
This reverts commit a9aae99f6e
.
* Revert "Update test_results.py"
This reverts commit ea74906878
.
* Revert "Update test_metric_result_integration.py"
This reverts commit bf70e431b3
.
* Revert "Update ddp_spawn.py"
This reverts commit f17210183b
.
* Revert "checkpoint consolidation"
This reverts commit 536c1323b0
.
* Revert "Revert "checkpoint consolidation""
This reverts commit 3a9fde915a
.
* Revert "Revert "Revert "checkpoint consolidation"""
This reverts commit 7a369f47e1
.
* Revert "Revert "Update ddp_spawn.py""
This reverts commit 8222dc98ea
.
* Revert "Revert "Update test_metric_result_integration.py""
This reverts commit 6c095b2370
.
* Revert "Revert "Update test_results.py""
This reverts commit 250d0aaaa2
.
* Revert "Revert "Update utils.py""
This reverts commit 8651d54d79
.
* Revert "Revert "Update test_all_gather_grad.py""
This reverts commit dcdcd29731
.
* modify distributed environment to make test pass
* fix version for ddp plugin test
* fix
* fix
* changelog
* Update CHANGELOG.md
* fsdp with full state dict
* fix missing import
* modify unitest
* fix
* fix
* fix typo
* modify test and add changelog
* fix
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* limit max_epoch to 1 for testing
* test
* fix
* update
* testing remove special for multi gpu
* assert gpu
* add assertion for gpu
* fix
* Re-enable special test, use ModelCheckpoint
* Fix paths
* Fix path passing
* test
* test
* fix test
* fix
* pre-commit format
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: SeanNaren <sean@grid.ai>
2021-05-24 08:11:45 +01:00
Carlos Mocholí
8208c330eb
Use `torch.nn.utils.clip_grad_norm_` and add `clip_grad_by_value` support for TPU ( #7025 )
...
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
2021-05-07 16:41:39 +00:00
ananthsub
44fd01734c
Move grad_norm to a dedicated utilities file ( #7292 )
...
* rm-grad-norm-mixin
* Update grads.py
* Update CHANGELOG.md
* Apply suggestions from code review
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* Update docstrings
* Update __init__.py
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-04-30 09:19:22 -07:00
shuyingsunshine21
52a5cee0a7
Set smarter default for DDP sharded for performance optimization ( #6937 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-04-27 04:01:34 +05:30
Kaushik B
1b3e4f9fb9
Fix sync_dist for tpus ( #6950 )
2021-04-13 14:17:15 +05:30
shuyingsunshine21
313e81638d
Supporting Adding DDP Communication Hooks ( #6736 )
...
* Fix some test errors
Summary:
Test Plan:
Reviewers:
Subscribers:
Tasks:
Tags:
* checkpoint consolidation
* Update ddp_spawn.py
* Update test_metric_result_integration.py
* Update test_results.py
* Update utils.py
* Update utils.py
* Update test_all_gather_grad.py
* Update test_all_gather_grad.py
* Update test_results.py
* Revert "Update test_results.py"
This reverts commit 9d4a2b891d
.
* Revert "Merge pull request #1 from shuyingsunshine21/shuyingsunshine21-checkpoint_consolidate"
This reverts commit c5053da789
, reversing
changes made to 0d23d75bc9
.
* Revert "Update test_all_gather_grad.py"
This reverts commit 0d23d75bc9
.
* Revert "Update utils.py"
This reverts commit 70fe5da9c6
.
* Revert "Update utils.py"
This reverts commit a9aae99f6e
.
* Revert "Update test_results.py"
This reverts commit ea74906878
.
* Revert "Update test_metric_result_integration.py"
This reverts commit bf70e431b3
.
* Revert "Update ddp_spawn.py"
This reverts commit f17210183b
.
* Revert "checkpoint consolidation"
This reverts commit 536c1323b0
.
* Revert "Revert "checkpoint consolidation""
This reverts commit 3a9fde915a
.
* Revert "Revert "Revert "checkpoint consolidation"""
This reverts commit 7a369f47e1
.
* Revert "Revert "Update ddp_spawn.py""
This reverts commit 8222dc98ea
.
* Revert "Revert "Update test_metric_result_integration.py""
This reverts commit 6c095b2370
.
* Revert "Revert "Update test_results.py""
This reverts commit 250d0aaaa2
.
* Revert "Revert "Update utils.py""
This reverts commit 8651d54d79
.
* Revert "Revert "Update test_all_gather_grad.py""
This reverts commit dcdcd29731
.
* modify distributed environment to make test pass
* add DDP communication hook
* remove test related setting
* remove more test related setting
* fix ddp comm hook util import issue
* comments
* one more fix for test_custom_plugin
* fix ddp spwan
* fix sgd
* address comments and add tests
* 1. add is gpu checking 2. modify test a bit 3. formatting
* formatting nit
* fix conda 3.7 1.7 issue for no torch.distributed.algorithms module
* need at least 1.8.0
* minor fix
* modify changelog
* changelog should link to PR number instead of issue number
* refine a bit on doc for register_ddp_comm_hook function, like ddp_comm_wrapper explanation and add hyperparameter for power sgd states in example usge
* move single device checking before call register_ddp_comm_hook
* formatting
* comments
* typo
* pre-commit formatting
2021-04-07 12:35:57 +01:00
Anthony Kim
7f6154fcad
Add `Trainer(gradient_clip_algorithm='value'|'norm')` ( #6123 )
...
* add changelog
* add clip by value
* fix bug in training tricks.rst
* fix bug in trainer.rst
* Update trainer.rst
* Update trainer.rst
* Update CHANGELOG.md
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* Update pytorch_lightning/plugins/precision/deepspeed_precision.py
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* Update pytorch_lightning/utilities/enums.py
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* yapf formatting
* update training tricks
* update based on comment
* update based on comment
* Update pytorch_lightning/trainer/trainer.py
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
* update based on comment
* pep8
* mypy
* mypy
* Update docs/source/advanced/training_tricks.rst
Co-authored-by: thomas chaton <thomas@grid.ai>
* Update sharded_native_amp.py
* Update test_sharded_parity.py
* update test codes
* Update test_tpu.py
* Update pytorch_lightning/trainer/connectors/training_trick_connector.py
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
* Update test_trainer.py
* Update enums.py
* Update enums.py
* add super-class initialization to precision plugins.
* add clip_grad horovod cpu test
* add clip_grad horovod cpu test
* use subprocess check_call
* change order of horovod tests
* set max_epochs 2 in horovod test
* remove clip_grad_val test from horovod-cpu
* remove "type: ignore"
* divide clip grad val test in horovod
* update based on comments
* add super-class initialization to precision plugins.
* bugfix
* bugfix
* revert some changes
* revert some changes
* Update tests/models/test_horovod.py
* merge master
* Delete signature test
No point in testing a signature
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: thomas chaton <thomas@grid.ai>
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Jirka Borovec <jirka.borovec@seznam.cz>
2021-04-06 08:27:37 -05:00
thomas chaton
1302766f83
DeepSpeed ZeRO Update ( #6546 )
...
* Add context to call hook to handle all modules defined within the hook
* Expose some additional parameters
* Added docs, exposed parameters
* Make sure we only configure if necessary
* Setup activation checkpointing regardless, saves the user having to do it manually
* Add some tests that fail currently
* update
* update
* update
* add tests
* change docstring
* resolve accumulate_grad_batches
* resolve flake8
* Update DeepSpeed to use latest version, add some comments
* add metrics
* update
* Small formatting fixes, clean up some code
* Few cleanups
* No need for default state
* Fix tests, add some boilerplate that should move eventually
* Add hook removal
* Add a context manager to handle hook
* Small naming cleanup
* wip
* move save_checkpoint responsability to accelerator
* resolve flake8
* add BC
* Change recommended scale to 16
* resolve flake8
* update test
* update install
* update
* update test
* update
* update
* update test
* resolve flake8
* update
* update
* update on comments
* Push
* pull
* Update pytorch_lightning/plugins/training_type/deepspeed.py
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* Update pytorch_lightning/plugins/training_type/deepspeed.py
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* update
* Apply suggestions from code review
* Swap to using world size defined by plugin
* update
* update todo
* Remove deepspeed from extra, keep it in the base cuda docker install
* Push
* pull
* update
* update
* update
* update
* Minor changes
* duplicate
* format
* format2
Co-authored-by: SeanNaren <sean@grid.ai>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Jirka Borovec <jirka.borovec@seznam.cz>
2021-03-30 13:39:02 -04:00
thomas chaton
3a4c4246ee
[TPU] update is_tpu_exists utils internal logic to rely on xmp.spawn ( #6719 )
...
* update_logic
* update
* Update tests/utilities/test_xla_device_utils.py
* Update pytorch_lightning/utilities/xla_device.py
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
* Update pytorch_lightning/utilities/xla_device.py
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
* update test
* Update tests/utilities/test_xla_device_utils.py
* update
* Apply fix
* Docstring
* flake8
* update
Co-authored-by: Your Name <you@example.com>
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-03-29 18:59:20 +01:00
Jirka Borovec
217c12a4e7
Simplify deprecations ( #6620 )
...
* use external deprecate
* simplify
* simplify
* simplify
* flake8
* .
* others
* .
2021-03-25 15:26:38 +01:00
ifsheldon
ebabe56f4e
Ensure accelerator is valid if running interactively ( #5970 )
...
Co-authored-by: chaton <thomas@grid.ai>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-02-23 14:23:50 +01:00
Sean Naren
7189d673f6
DeepSpeed Integration ( #5954 )
...
* Add initial deepspeed changes
* Address code review
* Move static method outside of function
* Fixes
* Add missing annotation
* Remove seed setting
* Doc changes
* Doc changes, add address reviews
* Fix docs
* Try fixing issue by moving to torch adam
* Clean up check
* Changes, better APIs!
* Add wrapper, swap to git install revision
* Add special test
* Add warning
* Address review
* Add better disclaimer
* Turn off ZeRO for testing due to compilation
* Add description on modifying parameters via the plugin
* Doc strings clear
* Small doc fixes
* Fix hash, reduce test
* Added CI change
* Move to azure pipeline
* Fix test name
* Add missing flag
* Remove sudo...
* Try conda instead
* Swap to conda base
* Try suggested install
* Apply suggestions from code review
* Apply suggestions from code review
* Revert "Apply suggestions from code review"
This reverts commit 41cca05a
* Revert "Apply suggestions from code review"
This reverts commit e06ec29e
* Remove setter
* Address most review
* Move out function, remove DeepSpeed from requirements
* Install deepspeed/mpi4py within container
* Use special tests, move to master commit for deepspeed
* Export path
* Force compile to happen first
* Remove!
* Debugging ninja
* Fix error in optimizer step logic
* Attempt to fix symbolic link
* Reverse to aid debugging
* Export path again
* Clean up mess
* var
* Revert "var"
This reverts commit 3450eaca
* Address review, add todo
* Add note about unsupported functionality
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: tchaton <thomas@grid.ai>
Co-authored-by: Jirka Borovec <jirka.borovec@seznam.cz>
2021-02-17 15:23:42 -05:00
Justus Schock
da6dbc8d1d
PoC: Accelerator refactor ( #5743 )
...
* restoring the result from subprocess
* fix queue.get() order for results
* add missing "block_backward_sync" context manager
* add missing "block_backward_sync" context manager
* fix sync_batchnorm
* fix supported gpu-ids for tuple
* fix clip gradients and inf recursion
* accelerator selection: added cluster_environment plugin
* fix torchelastic test
* fix reduce early stopping decision for DDP
* fix tests: callbacks, conversion to lightning optimizer
* fix lightning optimizer does not pickle
* fix setting benchmark and deterministic option
* fix slurm amp test
* fix prepare_data test and determine node_rank
* fix retrieving last path when testing
* remove obsolete plugin argument
* fix test: test_trainer_config
* fix torchscript tests
* fix trainer.model access
* move properties
* fix test_transfer_batch_hook
* fix auto_select_gpus
* fix omegaconf test
* fix test that needs to simulate slurm ddp
* add horovod plugin
* fix test with named arguments
* clean up whitespace
* fix datamodules test
* remove old accelerators
* fix naming
* move old plugins
* move to plugins
* create precision subpackage
* create training_type subpackage
* fix all new import errors
* fix wrong arguments order passed to test
* fix LR finder
* Added sharded training type and amp plugin
* Move clip grad to precision plugin
* Added sharded spawn, select accelerators based on distributed_backend + enable custom fp16 plugin automatically
* Fix import issue, attempting to fix tests
* Fix initial test
* Reflect hook logic from master, should wrap model after move to device
* Optional state consolidation, since master has optimizers not wrapped
* change attribute for instance test
* reset optimizers
optimizers are not used in main process, so state would be wrong.
* legacy
* imports in accel
* legacy2
* trainer imports
* fix import errors after rebase
* move hook to new setup location
* provide unwrapping logic
* fix trainer callback system
* added ddp2 implementation
* fix imports .legacy
* move plugins
* restore legacy
* drop test.py from root
* add tpu accelerator and plugins
* fixes
* fix lightning optimizer merge
* reset bugreportmodel
* unwrapping
* step routing forward
* model access
* unwrap
* opt
* integrate distrib_type
* sync changes
* sync
* fixes
* add forgotten generators
* add missing logic
* update
* import
* missed imports
* import fixes
* isort
* mv f
* changelog
* format
* move helper to parallel plugin
* d
* add world size
* clean up
* duplicate
* activate ddp_sharded and tpu
* set nvidia flags
* remove unused colab var
* use_tpu <-> on_tpu attrs
* make some ddp_cpu and clusterplugin tests pass
* Ref/accelerator connector (#5742 )
* final cleanup
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* connector cleanup
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* trainer cleanup
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* accelerator cleanup + missing logic in accelerator connector
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* add missing changes to callbacks
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* reflect accelerator changes to lightning module
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* clean cluster envs
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* cleanup plugins
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* add broadcasting
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* yapf
* remove plugin connector
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* plugins
* manual optimization
* update optimizer routing
* add rank to torchelastic
* fix memory mixed precision
* setstate on trainer for pickling in ddp spawn
* add predict method
* add back commented accelerator code
* adapt test for sync_batch_norm to new plugin
* fix deprecated tests
* fix ddp cpu choice when no num_processes are given
* yapf format
* skip a memory test that cannot pass anymore
* fix pickle error in spawn plugin
* x
* avoid
* x
* fix cyclic import in docs build
* add support for sharded
* update typing
* add sharded and sharded_spawn to distributed types
* make unwrap model default
* refactor LightningShardedDataParallel similar to LightningDistributedDataParallel
* update sharded spawn to reflect changes
* update sharded to reflect changes
* Merge 1.1.5 changes
* fix merge
* fix merge
* yapf isort
* fix merge
* yapf isort
* fix indentation in test
* copy over reinit scheduler implementation from dev1.2
* fix apex tracking calls with dev_debugger
* reduce diff to dev1.2, clean up
* fix trainer config test when gpus>0 and num_processes >0 and ddp_cpu
* sort plugin tests legacy/new
* fix error handling for amp on cpu
* fix merge
fix merge
fix merge
* [Feat] Resolve manual_backward (#5837 )
* resolve manual_backward
* resolve flake8
* update
* resolve for ddp_spawn
* resolve flake8
* resolve flake8
* resolve flake8
Co-authored-by: Ubuntu <ubuntu@ip-172-31-88-60.ec2.internal>
* fix tests/accelerator tests on cpu
* [BugFix] Resolve manual optimization (#5852 )
* resolve manual_optimization
* update
* update
Co-authored-by: Ubuntu <ubuntu@ip-172-31-88-60.ec2.internal>
* Remove copy trainer parameters to happen earlier within the loop and add safe guard to get ref model (#5856 )
* resovle a bug
* Accelerator refactor sharded rpc (#5854 )
* rpc branch
* merge
* update handling of rpc
* make devices etc. Optional in RPC
* set devices etc. later if necessary
* remove devices from sequential
* make devices optional in rpc
* fix import
* uncomment everything
* fix cluster selection
Co-authored-by: Ubuntu <ubuntu@ip-172-31-88-60.ec2.internal>
* resolve bug
* fix assert in rpc test
* resolve a test
* fix docs compilation
* accelerator refactor - fix for sharded parity test (#5866 )
* fix memory issue with ddp_spawn
* x
x
x
x
x
x
x
x
x
* x
* Remove DDP2 as this does not apply
* Add missing pre optimizer hook to ensure lambda closure is called
* fix apex docstring
* [accelerator][BugFix] Resolve some test for 1 gpu (#5863 )
* update
* revert init
* resolve a bug
* update
* resolve flake8
* update
* update
* update
* revert init
* resolve a bug
* update
* resolve flake8
* update
* update
* update
* update
* update
* revert init
* resolve a bug
* update
* resolve flake8
* update
* update
* update
* revert init
* update
* resolve flake8
* update
* update
* update
* update
* update
* all_gather
* update
* make plugins work, add misconfig for RPC
* update
* update
* remove breaking test
* resolve some tests
* resolve flake8
* revert to ddp_spawn
Co-authored-by: root <root@ip-172-31-88-60.ec2.internal>
Co-authored-by: Ubuntu <ubuntu@ip-172-31-88-60.ec2.internal>
Co-authored-by: Justus Schock <justus.schock@rwth-aachen.de>
* yapf isort
* resolve flake8
* fix apex doctests
* fix apex doctests 2
* resolve docs
* update drone
* clean env
* update
* update
* update
* update
* merge
* Fix RPC related tests, clean out old API, update for new accelerator API [skip ci] (#5881 )
* Fix RPC related tests, clean out old API, update for new accelerator API
* Move tests out of legacy folder, update paths and names
* Update test_remove_1-4.py
* Expose properties for tpu cores/gpus/num_gpus
* Add root GPU property
* Move properties to properties.py
* move tests that were previously in drone
* Fix root GPU property (#5908 )
* Move root GPU to property, remove horovod set as this is handled in horovod plugin, ensure we mock correctly to set GPU accelerator
* Add missing tests back
* fix best model path transfer when no checkpoint callback available
* Fix setup hook order [wip] (#5858 )
* Call trainer setup hook before accelerator setup
* Add test case
* add new test
* typo
* fix callback order in test
Co-authored-by: tchaton <thomas@grid.ai>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* rename ddp sequential -> rpc sequential for special test
* revert
* fix stupid merge problem
* Use property in connector for sampler (#5913 )
* merge the import conflicts
* fix spawning of processes in slurm
* [wip] Fix some bugs for TPU [skip ci] (#5878 )
* fixed for single tpu
* fixed spawn
* fixed spawn
* update
* update
* wip
* resolve bugs
* resolve bug
* update on comment
* removed decorator
* resolve comments
* set to 4
* update
* update
* need cleaning
* update
* update
* update
* resolve flake8
* resolve bugs
* exclude broadcast
* resolve bugs
* change test
* update
* update
* skip if meet fails
* properly raise trace
* update
* add catch
* wrap test
* resolve typo
* update
* typo
Co-authored-by: Lezwon Castelino <lezwon@gmail.com>
Co-authored-by: Your Name <you@example.com>
* resolve some tests
* update
* fix imports
* update
* resolve flake8
* update azure pipeline
* skip a sharded test on cpu that requires a gpu
* resolve tpus
* resolve bug
* resolve flake8
* update
* updat utils
* revert permission change on files
* suggestions from carlos
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
* remove unrelated formatting changes
* remove incomplete comment
* Update pytorch_lightning/accelerators/__init__.py
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
* remove unrelated formatting change
* add types
* warn 1.7 ddp manual backward only if ddp kwarg unset
* yapf + isort
* pep8 unused imports
* fix cyclic import in docs
* Apply suggestions from code review
* typer in accelerator.py
* typo
* Apply suggestions from code review
* formatting
* update on comments
* update typo
* Update pytorch_lightning/trainer/properties.py
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* update
* suggestion from code review
* suggestion from code review
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: SeanNaren <sean@grid.ai>
Co-authored-by: Jirka Borovec <jirka.borovec@seznam.cz>
Co-authored-by: chaton <thomas@grid.ai>
Co-authored-by: Ubuntu <ubuntu@ip-172-31-88-60.ec2.internal>
Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
Co-authored-by: root <root@ip-172-31-88-60.ec2.internal>
Co-authored-by: Lezwon Castelino <lezwon@gmail.com>
Co-authored-by: Your Name <you@example.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2021-02-12 15:48:56 -05:00
Jirka Borovec
b434c479e7
Quantisation ( #5706 )
...
* empty
* sq
* obs
* int
* ts
* helpers
* chlog
* yapf
* avg
* dupl
* Apply suggestions from code review
Co-authored-by: Nicki Skafte <skaftenicki@gmail.com>
* Apply suggestions from code review
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
* fixes
* Apply suggestions from code review
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
* fixes
* note
* warn
* 45
* link
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
* Apply suggestions from code review
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
* yapf
* flake8
* Apply suggestions from code review
* Apply suggestions from code review
Co-authored-by: Nicki Skafte <skaftenicki@gmail.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-02-11 07:04:57 -05:00
Jirka Borovec
9475c845cb
Docs/fixes ( #5914 )
...
* wip
* ..
* ...
* Apply suggestions from code review
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
2021-02-11 10:22:07 +00:00
chaton
7b00894130
[feat] Add StochasticWeightAveragingCallback ( #5640 )
...
* add swa callback
* switch back to 1.6.0
* remove optimizer_step
* move super
* update
* forgot update_parameters
* update on comments
* works for ddp
* resolve flake8
* remove set_model
* resolve flake8
* resolve cpu
* resolve flake8
* resolve flake8
* update
* update on comments
2021-02-11 00:05:59 +00:00
Carlos Mocholí
a028171f26
Fix Pruning callback and add a few features ( #5825 )
...
* Remove pruning check because it was added in 1.4.0 and that is our minimal torch version
* Fixing many bugs
* Fix misconfig test
* Fix tests
* Improve error message
* Reduce whitespace
* WIP
* TODOs
* _MODULE_CONTAINERS
* Add LTH test
* Allow resampling
* Iterative pruning
* Log pruning percentage
* Properly make pruning permanent
* Fix docstring
* Minor changes
* Test loading non-permanent model
* corrent bugs
* Revert "corrent bugs"
This reverts commit ffb8d47547
.
* Add beta warning
* Fix docs
* 2 verbosity levels
* OCD
Co-authored-by: Your Name <you@example.com>
2021-02-10 15:03:23 +00:00
tchaton
77be6f6e24
resolve conflits
...
resolve doc
boring commit
docs
torchvision
tpu
Update dockers/tpu-tests/tpu_test_cases.jsonnet
Update dockers/tpu-tests/tpu_test_cases.jsonnet
2021-02-05 21:43:10 +01:00
chaton
d0aaf983b9
[Feat] Adding PruningCallback ( #5618 )
...
* wip
* add pruning callback
* add condition for duplicated weights
* update on comments
* update on comments
* update on comments
* add more tests
* resolve flake8
* resolve on comments
* update changelog
* update on comments
* update on comments
* change order
* remove ddp_spawn skip
* update
* typo
* Update pytorch_lightning/callbacks/pruning.py
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* Update pytorch_lightning/callbacks/pruning.py
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* update on comments
* forgot platform
* update on comments
* remove @rank_zero_only
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2021-01-27 01:00:42 -05:00
SeanNaren
a80e37b95b
Add hydra experimental to correct location
2021-01-26 14:29:47 +01:00
Arnaud Gelas
e4688ae754
Fix isort failures in utilities ( #5530 )
...
Remove from skipped module in pyproject.toml and fix failures on:
- pytorch_lightning/utilities/*.py
2021-01-15 13:57:40 -05:00
Jirka Borovec
ae9956f997
clean avail. imports & enums ( #5256 )
...
* clean avail. imports
* enums
* fix missing
2020-12-29 19:02:18 +01:00
Jirka Borovec
a884866ff0
Unify names in Utils ( #5199 )
...
* warnings
* argparse
* mutils
* xla device
* deprecated
* tests
* simple
* flake8
* fix
* flake8
* 1.4
2020-12-22 00:23:33 +01:00
Jirka Borovec
0f36525e8f
fix/enable - check F401 ( #5201 )
...
* refactor - check F401
* missed
* fix
2020-12-21 10:15:04 +01:00
Jirka Borovec
059eaecbb4
set xxx_AVAILABLE as protected ( #5082 )
...
* sett xxx_AVAILABLE as protected
* docs
2020-12-14 20:19:05 +05:30
chaton
7755572b4f
Check if optimizer supports closure ( #4981 )
...
* check if optimizer support closure
* cleanup test
* resolve tests
* resolve flake
* update test due to patch limit
* update
* update dep
* Update tests/core/test_lightning_optimizer.py
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
* Update tests/core/test_lightning_optimizer.py
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
* resolve bug
* update test
* resolve tests
* Update requirements/extra.txt
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* remove bolts dep
* remove bolts
* add missing bolts dep for tests
* remove need for bolts
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2020-12-11 14:51:45 +01:00
Jirka Borovec
ce9179591d
ref: clean config [1/n] add intermediate setters ( #4990 )
...
* add intermediate setters
* show inputs
* fix options
* move
* fix
* less talk
* fix
* talk less
* str
* cases
* rename
Co-authored-by: chaton <thomas@grid.ai>
2020-12-09 14:13:57 -05:00
chaton
ef8ef12fd0
[feat] pp 2/n ( #5026 )
...
* Added changes for RPC plugin
* Add missing kwargs
* Fix code format
* Loading refactors by introducing is_distributed var, fix optimizer step flow
* Add rpc guard
* Added docstrings and typing
* resolve comments
* Add additional rpc hook, refactor name of exit process hook for clarity
* remove annotation
* Modify behaviour to allow optional return, add test for rpc plugin
* resolve tests
* rename is_ddp_based
* update
* update for windows
* update
* resolve test
* code smell
* Added sequential plugin
* resolve bug
* update
* cleanup
* add Exception
* resolve docs
* Remove ddp support
* Revert distributed -> ddp
* Update pl_examples/basic_examples/conv_sequential_example.py
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* Update pl_examples/basic_examples/conv_sequential_example.py
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* Update pytorch_lightning/plugins/ddp_sequential_plugin.py
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* Address code review points
* Update pytorch_lightning/plugins/ddp_sequential_plugin.py
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* Update pytorch_lightning/plugins/ddp_sequential_plugin.py
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* Add missing return
* Fix formatting, add datamodule args
* add small comment
* resolve comments
* resolve comments
* update source for fairscale
* update extras
* remove staticmethod
* resolve flake8
* Skip tests that are failing due to bug upstream with multiple optimizers and shard
* update
* update on comments
* clean test
* latest comments
* remove old comments
* add todo
* Update version
* update
* resolve bugs
* resolve bugs
* update test
* remove hanging test
* Update pytorch_lightning/plugins/ddp_sequential_plugin.py
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
* resolve on comments
* Update pytorch_lightning/plugins/ddp_sequential_plugin.py
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
* resolve on comments
* Update pytorch_lightning/plugins/ddp_sequential_plugin.py
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
* Update pytorch_lightning/plugins/ddp_sequential_plugin.py
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
* Update pytorch_lightning/plugins/ddp_sequential_plugin.py
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
* Update pytorch_lightning/plugins/ddp_sequential_plugin.py
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
* remove ImportError
Co-authored-by: SeanNaren <sean@grid.ai>
Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2020-12-09 12:56:51 +00:00
Ananya Harsh Jha
127454ade2
All gatherwith grads ( #5012 )
...
* all_gather
* ddp
* horovod
* grad tests
* fixed ddp
* ddp fixed, removed tpu, horovod for now
* changelog
* windows fix
* windows fix
* removed batch from ctx
* all_gather
* ddp
* horovod
* grad tests
* fixed ddp
* ddp fixed, removed tpu, horovod for now
* changelog
* windows fix
* windows fix
* removed batch from ctx
* removed code duplication
* merge
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2020-12-08 23:20:01 +00:00
Sean Naren
ee9b3fe574
[feat] pp 1/n ( #5016 )
...
* Added changes for RPC plugin
* Add missing kwargs
* Fix code format
* Loading refactors by introducing is_distributed var, fix optimizer step flow
* Add rpc guard
* Added docstrings and typing
* resolve comments
* Add additional rpc hook, refactor name of exit process hook for clarity
* remove annotation
* Modify behaviour to allow optional return, add test for rpc plugin
* resolve tests
* rename is_ddp_based
* update
* update for windows
* update
* resolve test
* code smell
* Revert back to init_ddp_connection for backwards compat
* Swap to explicit name for property
* Add missing speed parity increase for CI variability, fix call counts for child process
Co-authored-by: tchaton <thomas@grid.ai>
2020-12-08 22:02:10 +00:00
Jirka Borovec
3976db597d
refactor imports of optional dependencies ( #4859 )
...
* refactor imports of optional dependencies
* fix
* fix
* fix
* fix
* fix
* flake8
* flake8
Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
Co-authored-by: chaton <thomas@grid.ai>
2020-12-04 10:26:10 +01:00
SeanNaren
04bb0abe36
Merge branch 'master' into feature/plug
...
# Conflicts:
# pytorch_lightning/utilities/__init__.py
# requirements/extra.txt
2020-11-27 10:00:05 +00:00
Jirka Borovec
217650320e
simplify imports Omegaconf ( #4873 )
...
* hydra
* omegaconf
2020-11-27 01:00:56 +01:00
Jirka Borovec
442d57f1e9
simplify imports xla / TPU ( #4872 )
...
* xla
* tpu
* fix
* fix
* flake8
2020-11-27 00:37:48 +01:00
SeanNaren
737447fc6e
Merge branch 'master' into feature/plug
...
# Conflicts:
# pytorch_lightning/trainer/connectors/precision_connector.py
# pytorch_lightning/utilities/__init__.py
2020-11-26 23:02:36 +00:00
Jirka Borovec
11e73ceaa6
fix import and typo in AMP ( #4871 )
...
* fix import and typo
* docs
* apex
* fix
* typo
2020-11-26 23:45:52 +01:00
SeanNaren
47c121ef1a
Addressed code review points
2020-11-26 16:44:45 +00:00
Samyak S Sarnayak
ccf38ced2e
Use high progress_bar_refresh_rate on Google Colab ( #4654 )
...
* Use high refresh rate on Google Colab (#3786 )
Automatically override progress_bar_refresh_rate when on Google
Colab. Also added a constant IS_COLAB in utilities to check
whether it is being run in colab or not.
(#3786 )
* Show a warning instead of overriding when rate is low on colab
* Change warning to suggestion and move it
Moved warning to configure_progress_bar instead of on_trainer_init
* Apply suggestions from code review
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
* add a mock test
Co-authored-by: chaton <thomas@grid.ai>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2020-11-24 02:13:33 +05:30
Jirka Borovec
94a9d3d283
Update examples - use DataModule ( #4740 )
...
* rename
* add mnist_datamodule.py
* dm
* fix
* imports
* clean
* imports
* transforms
* skip
2020-11-20 23:40:40 +05:30
William Falcon
09c2020a93
notices ( #4118 )
2020-10-13 07:18:07 -04:00
monney
d5254ff9df
warn user when dropping unpicklable hparams ( #2874 )
...
* refactored clean_namespace
* Update try except to handle pickling error
* Consolidated clean_namespace. Added is_picklable
* PEP8
* Change warning to use rank_zero_warn. Added Test to ensure proper hparam filtering
* Updated imports
* Corrected Test Case
2020-08-28 09:07:43 +02:00
Jirka Borovec
a6e7aa7796
allow using apex with any PT version ( #2865 )
...
* wip
* setup
* type
* name
* wip
* docs
* imports
* fix if
* fix if
* use_amp
* Apply suggestions from code review
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* Apply suggestions from code review
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* fix tests
* Apply suggestions from code review
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* fix tests
* todos
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2020-08-08 11:07:32 +02:00
Jirka Borovec
b7d72706c3
clean imports ( #2867 )
...
* clean imports
* miss
2020-08-08 00:33:51 +02:00
William Falcon
62ce00f96c
EvalResult support for val loop (PR 3/5) ( #2651 )
...
* add EvalResult to support to val/test loops
2020-07-22 13:53:10 -04:00