Adrian Wälchli
29be6a0fe9
unroll test
2021-06-29 14:21:19 +02:00
Adrian Wälchli
2849bbfa92
update test comments
2021-06-29 13:50:32 +02:00
Adrian Wälchli
4bc559235e
update test
...
update test
update test
update test
2021-06-29 13:48:24 +02:00
Adrian Wälchli
b0c52a2cbd
fix merge
2021-06-29 13:09:22 +02:00
Adrian Wälchli
198fef58ba
update wrong test
2021-06-29 13:09:22 +02:00
Adrian Wälchli
589bf44855
test
2021-06-29 13:09:22 +02:00
Adrian Wälchli
9a689ac236
skip
2021-06-29 13:09:22 +02:00
Adrian Wälchli
b0d49926bb
problematic test
2021-06-29 13:09:22 +02:00
Adrian Wälchli
db3d3b7068
debug
2021-06-29 13:09:22 +02:00
Adrian Wälchli
9da2cf87bf
destroy pg
2021-06-29 13:09:22 +02:00
Adrian Wälchli
5db42a413d
make a special test
2021-06-29 13:09:22 +02:00
Adrian Wälchli
02ea55bbf6
patch os environ
2021-06-29 13:09:22 +02:00
Adrian Wälchli
c914724035
yapf
2021-06-29 13:09:22 +02:00
Adrian Wälchli
f5e64a96fb
add test
2021-06-29 13:09:22 +02:00
nisheethlahoti
06f8349291
Support calling fit and test scripts using "python -m" module syntax with DDP ( #8073 )
...
Co-authored-by: Nisheeth Lahoti <nisheeth@rephrase.ai>
2021-06-23 02:42:04 +00:00
Andrew Tritt
e808f9fb28
Use DistributedSampler when running with custom accelerator ( #7814 )
...
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-06-18 14:34:05 +02:00
Sean Naren
024cf23c67
Remove convert_to_half, suggest using `model.half` ( #7974 )
2021-06-14 18:48:02 +01:00
Sean Naren
96433d03ea
IPU Integration 5/5 ( #7867 )
...
* Initial changes
* Add broken example for now
* Fix reference
* Fix format
* Code runs
* Fixes
* Clear up files
* Add tests, helpers, fixes
* Small cleanups
* Refactors based on review
* Swap to special tests
* Add special tests
* Add source
* Cleanups
* Add logic to attach/detach model from devices
* Fixes for tests
* Fixes for tests
* Move earlier
* Cleanups
* Add check for nvcc
* Add tests, cleanups
* Fix errors
* fix
* Try condition
* Add missing annotation
* Clearer
* Clearer message
* Fix variable
* Cleanups
* Add comment
* CHANGELOG.md
* Add simple selection test
* Remove special=True to see what happens
* Fix test
* Update tests/accelerators/test_ipu.py
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
* Convert ipu_cores -> ipus
* Add typing, fail earlier
* simplify precision
* Add test, add helper
* fix accum
* Update pytorch_lightning/plugins/training_type/ipu.py
Co-authored-by: thomas chaton <thomas@grid.ai>
* Use stages
* Make sure warning message returned
* thorw error
* Add more tests, use fs
* add comment
* Clean
* Address feedback, add IPU tests
* Fixes
* Fix signature
* Add types
* Remove autoround
* Add docstring
* ipu_cores -> ipus
* Add test, remove unnecessary precision set
* Add optimizer test
* Add precision back with test
* Address code review
* Change to probs
* Move some of the asserts earlier
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
Co-authored-by: thomas chaton <thomas@grid.ai>
2021-06-11 15:07:04 +00:00
Sean Naren
6388c29e87
[IPU] Add reset dataloader hooks to training type plugin 3/n ( #7861 )
...
* Add hooks
* Add tests for hooks
* Add changelog
* Test changes, add typing
2021-06-07 10:37:09 +00:00
Ethan Harris
03bb389b21
Fix double precision + ddp_spawn ( #6924 )
...
* Initial fix
* Initial fix
* Initial fix
* Updates
* Updates
* Update typing and docs
* Undo accidental refactor
* Remove unused imports
* Add DDP double precision test
* Remove unused variable
* Update CHANGELOG.md
* Fix test
* Update tests
* Formatting
* Revert bad change
* Add back changes
* Correct wrapping order
* Improve unwrapping
* Correct wrapping order
* Fix... finally
* Respond to comments
* Drop ddp test
* Simplify ddp spawn test
* Simplify ddp spawn test
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2021-06-01 15:21:17 +00:00
Carlos Mocholí
d47173bb72
Use typing forward references ( #7770 )
...
* Use typing forward references
* Update pytorch_lightning/core/lightning.py
2021-05-31 09:54:28 +02:00
Carlos Mocholí
a69beab499
Clean existing logging tests ( #7760 )
...
* Remove dev debugger metric tracking
* Fix tests
* Fix test
* Import
* Clean logging tests
* flake8
* Docstring
2021-05-30 16:36:52 +02:00
Carlos Mocholí
bc3238be8c
Remove metric tracking from dev debugger ( #7759 )
...
* Remove dev debugger metric tracking
* Fix tests
* Fix test
* Import
* Fix tests
* Fix test
* flake8
* Fix tests
2021-05-30 12:03:42 +02:00
Kaushik B
3f460b150a
Move parameter validation specific to TPU Training plugins ( #7415 )
...
* Move parameter validation specific to TPU Training plugins
* update docstring
2021-05-24 16:02:01 +00:00
Adrian Wälchli
502adbced3
refactor optimizer loop logic for manual and automatic optimization ( #7526 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
2021-05-17 14:42:01 +02:00
Kaushik B
bf46730d92
Support TPU Pod Training (n/n) ( #7296 )
2021-05-17 11:33:44 +00:00
Nic Eggert
f4f51e0dcf
Add kubeflow cluster environment ( #7300 )
...
* Add kubeflow cluster environment
* Add KubeflowEnvironment to docs
* Add KubeflowEnvironment to the changelog
* break up a long line
* Add method to detect kubeflow environment
* Select Kubeflow environment when available
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Run pre-commit
* task_idx == 0
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-05-17 09:05:24 +01:00
Alan Du
6ac16ff348
Fix DistribType for `ddp_cpu` (spawn) ( #7492 )
2021-05-14 20:53:26 +01:00
Jirka Borovec
d4ec75164c
Prune deprecated trainer attributes ( #7501 )
...
* use_single_gpu
* use_horovod
* use_ddp2
* use_ddp
* use_dp
* on_gpu
* use_tpu
* on_tpu
* on_cpu
* cleaning
* chlog
* Apply suggestions from code review
* Apply suggestions from code review
2021-05-12 20:10:15 +00:00
Ethan Harris
45143fd825
Improve val step logging ( #7351 )
...
* Fix val step logging
* Add a type
* Fix
* Update CHANGELOG.md
2021-05-07 22:58:03 +00:00
Martin Kristiansen
c3fc0313ef
Updating docs and error message: half precision not available on CPU ( #7384 )
...
* Updating docs and error message to specify that half precission not available on CPU
* update messages
Co-authored-by: Martin Kristiansen <martinkristiansen@sixgill.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: jirka <jirka.borovec@seznam.cz>
2021-05-06 09:05:50 +00:00
Carlos Mocholí
8c0ea92af2
`TrainerState` refactor [5/5] ( #7173 )
...
* `TrainerState` refactor
* flake8
* Update finished check
* Test cleanup
* Fix tests
* Fixes
* Reorder
* flake8
* Update CHANGELOG
* Better docs
* Better docs
* Remove default
* Update tests
* Bad merge
2021-05-04 12:50:56 +02:00
Carlos Mocholí
40f80230fe
Remove `trainer.fit` return value [2/n] ( #7237 )
...
* `_fit_impl` refactor and types
* Fix return
* Remove return docstring
* Fixes
* Fixes
* Remove `trainer.fit` return value
* Update CHANGELOG
* flake8
* Undo results change
* Fix test
* Revert changes for a separate PR
* flake8
2021-04-28 19:11:32 +01:00
thomas chaton
5a113a2f05
[bug/feat] Support parameters_to_ignore in DDP ( #7239 )
...
* update
* update
* update
* update on comments
* update
2021-04-27 17:49:32 +00:00
Kaushik B
f168a535ca
Add MpModelWrapper in TPU Spawn ( #7045 )
...
Co-authored-by: tchaton <thomas@grid.ai>
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-04-20 13:05:27 +00:00
thomas chaton
3cc0b2c063
[test] Add checks for gpus=1 ( #7105 )
...
* update
* remove cluster env
2021-04-19 20:39:28 +02:00
Adrian Wälchli
e9fca760ac
Set `DistributedSampler` seed if `seed_everything` was called ( #7024 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-04-19 14:50:31 +01:00
Adrian Wälchli
33cc9fe138
Clean up environment access in plugins ( #6941 )
...
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
2021-04-13 20:07:40 +02:00
Kaushik B
f79a13e495
[Model Parallel] Add configure sharded model hook ( #6679 )
...
* Add base hook for model parallel
* fix callback signature
* Simplify hook
* Add hook logic
* add tests
* add property setter
* add logic for being called once
* Update changelog
* Fix
* fix return type
* fix lambda callback test
* Fix tests
* Apply code suggestions
* add logic for setup_optimizers_predispatch
* add common dummy model
* Swap call order
* Remove test that isn't needed anymore
* Update tests
* Add a bit more doc
* Few code review fixes
* Update pytorch_lightning/accelerators/accelerator.py
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
* Change hook name
* Fix test
* Test setup hook, refactor names
* Swap call order of callbacks and model initialization
* Change name of context manager
Co-authored-by: SeanNaren <sean@grid.ai>
Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-03-29 14:50:51 -06:00
Carlos Mocholí
21fc5eb21e
Automatically find and run special tests ( #6669 )
2021-03-26 17:04:59 +00:00
thomas chaton
fd5cb7fcc3
Add PyTorch 1.8 Profiler 5/5 ( #6618 )
...
* Refactor profilers
* Update PassThrough
* WIP - This is broken and will change
* Update pytorch_lightning/profiler/pytorch.py
Co-authored-by: thomas chaton <thomas@grid.ai>
* resolve tests
* resolve tests
* find output
* try something
* update
* add support for test and predict
* update
* update
* use getattr
* test
* test
* update
* tests
* update
* update
* update
* update
* update
* remove file
* update
* update
* update
* update
* update
* test
* update#
* update
* update tests
* update
* add suport for 1.8
* rename records
* add support for 1.8
* update
* resolve flake8
* resolve test
* Refactor basic profilers
* Fixes
* Unused import
* Introduce setup
* Profile on all ranks. Print to stdout on 0
* Introduce dirpath + filename
* CHANGELOG
* Add tests. Address comments
* add `on_run_stage_setup`
* add on_run_stage_setup function
* update
* add test for RegisterRecordFunction
* update lightnng flow direction
* move variable to private
* remove trace
* Undo code that should be in 3/4
* Multi-stage multi-rank
* 2/5 changes
* Pass stage in __del__
* Remove TODOs
* Describe on_evaluation_end. Add tests
* Typo
* Address comments
* deepcopy tests
* Advanced teardown
* Fix teardown test
* Fix tests
* Minor change
* Update CHANGELOG.md
* Fix test
* Quick fixes
* Fix 6522
* resolve ddp tests
* resolve tests
* resolve some tests
* update tests
* resolve tests
* update
* resolve tests
* resolve some tests
* Missed fixes from 3/5
* Fixes
* resolve some tests
* resolve test for 1.7.1
* Broken refactor
* Missed stage
* Minor changes
* resolve tests
* Update CHANGELOG
* resolve bug
* remove print
* Typo
* Cleanup
* resolve ddp test
* remove barrier
* update profiler
* update
* Smaller model
* update
* resolve tests
* update
* Minor changes. CHANGELOG
* Minimize diff
* update to 1.8.1
* RunIf. Extra code. Check segfault
* resolve tests
* Typo. Bad merge
* Fixing a bad merge
* replace for kineto
* Update pytorch_lightning/profiler/pytorch.py
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
* Update pytorch_lightning/profiler/pytorch.py
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
* Minor changes
* Bad merge
* Use lists for flexibility
* Use sets
* predict_step
* Ananth's suggestion
* update
* Docs
* Update pl_examples/basic_examples/profiler_example.py
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* update example
* update example
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2021-03-23 20:43:21 +00:00
Jirka Borovec
efce2b7777
Prune metrics: regression 8/n ( #6636 )
...
* explained_variance
* tests
* mean_absolute_error
* mean_squared_error
* mean_relative_error
* mean_squared_log_error
* chlog
2021-03-23 09:35:51 +01:00
Sean Naren
58c9fa7edb
Allow training type plugin to delay optimizer creation (FSDP 2/n) ( #6331 )
...
* Allow training_type_plugin to delay optimizer configure
* Add missing references to trainer, add a CPU accelerator based test
2021-03-22 11:43:53 +00:00
Sean Naren
4e9b453854
[Fix] Move init dist connection into the setup function ( #6506 )
...
* Move connection setup into the setup function. Call setup hook after we set up the accelerator
* Added CHANGELOG.md
* fix setup order in callback test
* fix input arguments in test
* Mock distributed function, remove protection to turn into training type hook
* Remove import
* Add missing mock, ensure custom plugin does not create children process
* Skip test on windows
* Update deepspeed to init connection in setup
* Do not initialize distributed module
* Move DeepSpeed tests to special tests since dist communication is being set up
* Special the test to see if this fixes CI
* Delete accelerator connector test to see if its causing build to fail
* Delete deepspeed test
* Revert "Delete accelerator connector test to see if its causing build to fail"
This reverts commit edde60b8
* Revert "Delete deepspeed test"
This reverts commit 9d317429
* Reverse hook
* Reverse setup hooks to debug again
* Add todo so i know where i left off
* For single device move in pre_dispatch after setup function
* Add additional model to device hook if any additional parameters have been set
* See if we can enable deepspeed tests
* Revert "See if we can enable deepspeed tests"
This reverts commit b5450def
* See if this hook approach works
* Introduce new granular hooks
* Remove import, fix tpu spawn by moving the function to setup
* Added missing special test
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-03-18 14:33:39 -07:00
Jirka Borovec
b341b53f70
deprecate metrics pkg ( #6505 )
...
* deprecate metrics
* examples
* req
* docs
* Apply suggestions from code review
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Nicki Skafte <skaftenicki@gmail.com>
* pep8
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Nicki Skafte <skaftenicki@gmail.com>
2021-03-15 14:39:38 +00:00
Rohit Gupta
c53edce1a1
Disable batch transfer in DP mode ( #6098 )
...
* add exceptions and test
* hook
* fix
* clean up
* clean up
* regex
* regex
* docs
* rev
* comment and docs
* chlog
* Apply suggestions from code review
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
* Apply suggestions from code review
Co-authored-by: chaton <thomas@grid.ai>
* Monkey-patch device count
* docs
* pep
* api_change
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: chaton <thomas@grid.ai>
2021-03-11 10:51:10 -05:00
Elia Cereda
f4cc7451a9
Add Trainer.validate(…) method to run one validation epoch ( #4948 )
...
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
Co-authored-by: chaton <thomas@grid.ai>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-03-11 03:46:37 +01:00
Jirka Borovec
55dd3a4c64
Typing for tests 1/n ( #6313 )
...
* typing
* yapf
* typing
2021-03-09 11:27:15 +00:00
Adrian Wälchli
e1f5eacab9
fix dp reduction test ( #6404 )
...
* fix
* update
* fix
* move the class outside
2021-03-08 18:11:20 +00:00
Adrian Wälchli
ec8d46e02b
introduce default cluster environment for lightning-specific ddp ( #5915 )
...
* handle distributed_sampler_kwargs
* move emptying cache to accelertor
* fix a few tests
* restoring the result from subprocess
* fix queue.get() order for results
* add missing "block_backward_sync" context manager
* add missing "block_backward_sync" context manager
* fix sync_batchnorm
* fix supported gpu-ids for tuple
* fix clip gradients and inf recursion
* accelerator selection: added cluster_environment plugin
* fix torchelastic test
* fix reduce early stopping decision for DDP
* fix tests: callbacks, conversion to lightning optimizer
* fix lightning optimizer does not pickle
* fix setting benchmark and deterministic option
* fix slurm amp test
* fix prepare_data test and determine node_rank
* fix retrieving last path when testing
* remove obsolete plugin argument
* fix test: test_trainer_config
* fix torchscript tests
* fix trainer.model access
* move properties
* fix test_transfer_batch_hook
* fix auto_select_gpus
* fix omegaconf test
* fix test that needs to simulate slurm ddp
* add horovod plugin
* fix test with named arguments
* clean up whitespace
* fix datamodules test
* remove old accelerators
* fix naming
* move old plugins
* move to plugins
* create precision subpackage
* create training_type subpackage
* fix all new import errors
* fix wrong arguments order passed to test
* fix LR finder
* Added sharded training type and amp plugin
* Move clip grad to precision plugin
* Added sharded spawn, select accelerators based on distributed_backend + enable custom fp16 plugin automatically
* Fix import issue, attempting to fix tests
* Fix initial test
* Reflect hook logic from master, should wrap model after move to device
* Optional state consolidation, since master has optimizers not wrapped
* change attribute for instance test
* reset optimizers
optimizers are not used in main process, so state would be wrong.
* legacy
* imports in accel
* legacy2
* trainer imports
* fix import errors after rebase
* move hook to new setup location
* provide unwrapping logic
* fix trainer callback system
* added ddp2 implementation
* fix imports .legacy
* move plugins
* restore legacy
* drop test.py from root
* add tpu accelerator and plugins
* fixes
* fix lightning optimizer merge
* reset bugreportmodel
* unwrapping
* step routing forward
* model access
* unwrap
* opt
* integrate distrib_type
* sync changes
* sync
* fixes
* add forgotten generators
* add missing logic
* update
* import
* missed imports
* import fixes
* isort
* mv f
* changelog
* format
* move helper to parallel plugin
* d
* add world size
* clean up
* duplicate
* activate ddp_sharded and tpu
* set nvidia flags
* remove unused colab var
* use_tpu <-> on_tpu attrs
* make some ddp_cpu and clusterplugin tests pass
* Ref/accelerator connector (#5742 )
* final cleanup
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* connector cleanup
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* trainer cleanup
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* accelerator cleanup + missing logic in accelerator connector
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* add missing changes to callbacks
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* reflect accelerator changes to lightning module
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* clean cluster envs
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* cleanup plugins
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* add broadcasting
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* yapf
* remove plugin connector
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* plugins
* manual optimization
* update optimizer routing
* add rank to torchelastic
* fix memory mixed precision
* setstate on trainer for pickling in ddp spawn
* add predict method
* add back commented accelerator code
* adapt test for sync_batch_norm to new plugin
* fix deprecated tests
* fix ddp cpu choice when no num_processes are given
* yapf format
* skip a memory test that cannot pass anymore
* fix pickle error in spawn plugin
* x
* avoid
* x
* fix cyclic import in docs build
* add support for sharded
* update typing
* add sharded and sharded_spawn to distributed types
* make unwrap model default
* refactor LightningShardedDataParallel similar to LightningDistributedDataParallel
* update sharded spawn to reflect changes
* update sharded to reflect changes
* Merge 1.1.5 changes
* fix merge
* fix merge
* yapf isort
* fix merge
* yapf isort
* fix indentation in test
* copy over reinit scheduler implementation from dev1.2
* fix apex tracking calls with dev_debugger
* reduce diff to dev1.2, clean up
* fix trainer config test when gpus>0 and num_processes >0 and ddp_cpu
* sort plugin tests legacy/new
* fix error handling for amp on cpu
* fix merge
fix merge
fix merge
* [Feat] Resolve manual_backward (#5837 )
* resolve manual_backward
* resolve flake8
* update
* resolve for ddp_spawn
* resolve flake8
* resolve flake8
* resolve flake8
Co-authored-by: Ubuntu <ubuntu@ip-172-31-88-60.ec2.internal>
* fix tests/accelerator tests on cpu
* [BugFix] Resolve manual optimization (#5852 )
* resolve manual_optimization
* update
* update
Co-authored-by: Ubuntu <ubuntu@ip-172-31-88-60.ec2.internal>
* Remove copy trainer parameters to happen earlier within the loop and add safe guard to get ref model (#5856 )
* resovle a bug
* Accelerator refactor sharded rpc (#5854 )
* rpc branch
* merge
* update handling of rpc
* make devices etc. Optional in RPC
* set devices etc. later if necessary
* remove devices from sequential
* make devices optional in rpc
* fix import
* uncomment everything
* fix cluster selection
Co-authored-by: Ubuntu <ubuntu@ip-172-31-88-60.ec2.internal>
* resolve bug
* fix assert in rpc test
* resolve a test
* fix docs compilation
* accelerator refactor - fix for sharded parity test (#5866 )
* fix memory issue with ddp_spawn
* x
x
x
x
x
x
x
x
x
* x
* Remove DDP2 as this does not apply
* Add missing pre optimizer hook to ensure lambda closure is called
* fix apex docstring
* [accelerator][BugFix] Resolve some test for 1 gpu (#5863 )
* update
* revert init
* resolve a bug
* update
* resolve flake8
* update
* update
* update
* revert init
* resolve a bug
* update
* resolve flake8
* update
* update
* update
* update
* update
* revert init
* resolve a bug
* update
* resolve flake8
* update
* update
* update
* revert init
* update
* resolve flake8
* update
* update
* update
* update
* update
* all_gather
* update
* make plugins work, add misconfig for RPC
* update
* update
* remove breaking test
* resolve some tests
* resolve flake8
* revert to ddp_spawn
Co-authored-by: root <root@ip-172-31-88-60.ec2.internal>
Co-authored-by: Ubuntu <ubuntu@ip-172-31-88-60.ec2.internal>
Co-authored-by: Justus Schock <justus.schock@rwth-aachen.de>
* yapf isort
* resolve flake8
* fix apex doctests
* fix apex doctests 2
* resolve docs
* update drone
* clean env
* update
* update
* update
* update
* merge
* Fix RPC related tests, clean out old API, update for new accelerator API [skip ci] (#5881 )
* Fix RPC related tests, clean out old API, update for new accelerator API
* Move tests out of legacy folder, update paths and names
* Update test_remove_1-4.py
* Expose properties for tpu cores/gpus/num_gpus
* Add root GPU property
* Move properties to properties.py
* move tests that were previously in drone
* Fix root GPU property (#5908 )
* Move root GPU to property, remove horovod set as this is handled in horovod plugin, ensure we mock correctly to set GPU accelerator
* Add missing tests back
* fix best model path transfer when no checkpoint callback available
* Fix setup hook order [wip] (#5858 )
* Call trainer setup hook before accelerator setup
* Add test case
* add new test
* typo
* fix callback order in test
Co-authored-by: tchaton <thomas@grid.ai>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* rename ddp sequential -> rpc sequential for special test
* revert
* fix stupid merge problem
* abstract the cluster plugins
* default plugin
* integrate default environment
* fix property
* adapt tests
* adjust test
* fix world size access
* base cluster env
* revert rebase errors
* revert rebase errors
* missing import
* revert unrelated change
* remove unused cluster local rank
* remove unrelated changes
* fix unrelated changes
* fix pep8
* remove unused var
* reset permissions
* ypaf
* test default environment
* test torchelastic environment
* world size as int
* tests for slurm environment
* changelog
* test comments
* remove unintended change
* keep master port fixed after it is generated
* test random master port
* yapf
* add missing default environment
* move helper function
* rename default environment
* rename
* rename
* yapf
* Update pytorch_lightning/plugins/environments/lightning_environment.py
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
* Update CHANGELOG.md
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
* spawn -> create
Co-authored-by: justusschock <justus.schock@posteo.de>
Co-authored-by: SeanNaren <sean@grid.ai>
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
Co-authored-by: Jirka Borovec <jirka.borovec@seznam.cz>
Co-authored-by: Justus Schock <justus.schock@rwth-aachen.de>
Co-authored-by: chaton <thomas@grid.ai>
Co-authored-by: Ubuntu <ubuntu@ip-172-31-88-60.ec2.internal>
Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
Co-authored-by: root <root@ip-172-31-88-60.ec2.internal>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-03-05 01:47:29 +00:00