Sean Naren
5157ba5509
Add openmpi to our base cuda container for MPI support ( #6026 )
...
* Add openmpi to our base container for DeepSpeed MPI support
* conda
Co-authored-by: Jirka Borovec <jirka.borovec@seznam.cz>
2021-02-17 12:15:49 +00:00
Rohit Gupta
99da0d92a5
update lr_finder to check for attribute if not running fast_dev_run ( #5990 )
...
* ref lr_finder a bit
* chlog
* Apply suggestions from code review
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2021-02-17 07:15:29 -05:00
Jirka Borovec
c0ee1f19fc
fix install dtrun ( #6025 )
2021-02-17 11:43:51 +00:00
Sean Naren
ba6290029a
Re-introduce fix for Hydra directory sync with multiple process ( #5993 )
...
* Add hydra fix that was missing from master
* Remove error commas
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2021-02-17 11:24:57 +00:00
manipopopo
6a9cec41a3
Add `dim` to `pytorch_lightning.metrics.PSNR` ( #5957 )
...
* Add dim to PSNR
* Update CHANGELOG.md
* Update CHANGELOG.md
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* Add reduction tests
* Recover warnings on reduction and add tests
* Add copyright texts
* Refactor PSNR
* Change warnings
* Update pytorch_lightning/metrics/functional/psnr.py
Change functional.psnr dim doc
Co-authored-by: Nicki Skafte <skaftenicki@gmail.com>
* Change PSNR dim docs
* Apply suggestions from code review
* tests
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Nicki Skafte <skaftenicki@gmail.com>
Co-authored-by: Jirka Borovec <jirka.borovec@seznam.cz>
2021-02-17 11:55:40 +01:00
Carlos Mocholí
7aae589167
Add deprecation warning to ModelCheckpoint when logging val_loss with no monitor ( #6012 )
...
* Add deprecation warning when logging val_loss with no monitor
* EOF
* Update CHANGELOG
* Clear warning cache before testing
* pep8
Co-authored-by: Jirka Borovec <jirka.borovec@seznam.cz>
2021-02-17 10:46:58 +00:00
Somshubra Majumdar
6e8721e7ae
Attempt slurm auto resume call when non-shell call fails ( #6002 )
...
Signed-off-by: smajumdar <titu1994@gmail.com>
2021-02-17 10:43:06 +00:00
Alessia Marcolini
1554a59ef7
Fix typo and code rendering in docs ( #5940 )
2021-02-17 10:33:15 +00:00
Jirka Borovec
f655f974eb
hotfix: move process_dataloader to plugins ( #6023 )
2021-02-17 05:52:07 +00:00
David Bankson
ec0eac1197
changed spelling of "licence" ( #5937 )
...
Normalized spelling to "license" as in the URL "https://github.com/PytorchLightning/pytorch-lightning/blob/master/LICENSE "
2021-02-17 00:54:04 +01:00
Carlos Mocholí
0815e2a8c5
Remove torch<=1.4.0 checks ( #5998 )
...
* Remove torch<=1.4.0 checks
* Update pytorch_lightning/utilities/data.py
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
2021-02-16 17:53:40 -05:00
chaton
e982800b81
Add PredictLoop ( #5752 )
...
* integrate distrib_type
* sync changes
* sync
* fixes
* add forgotten generators
* add missing logic
* update
* import
* missed imports
* import fixes
* isort
* mv f
* changelog
* format
* move helper to parallel plugin
* d
* add world size
* clean up
* duplicate
* activate ddp_sharded and tpu
* set nvidia flags
* remove unused colab var
* use_tpu <-> on_tpu attrs
* make some ddp_cpu and clusterplugin tests pass
* Ref/accelerator connector (#5742 )
* final cleanup
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* connector cleanup
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* trainer cleanup
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* accelerator cleanup + missing logic in accelerator connector
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* add missing changes to callbacks
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* reflect accelerator changes to lightning module
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* clean cluster envs
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* cleanup plugins
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* add broadcasting
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* yapf
* remove plugin connector
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* plugins
* add predict_loop
* manual optimization
* clean predictloop
* update optimizer routing
* add predict loop on new accelerator
* resolve a bug
* add rank to torchelastic
* add predict_loop
* add predict loop on new accelerator
* resolve a bug
* fix memory mixed precision
* update
* setstate on trainer for pickling in ddp spawn
* add predict_loop
* clean predictloop
* add predict loop on new accelerator
* resolve a bug
* add predict_loop
* add predict loop on new accelerator
* resolve a bug
* add predict_loop
* add predict loop on new accelerator
* resolve a bug
* add predict_loop
* add predict loop on new accelerator
* resolve a bug
* add predict_loop
* clean predictloop
* add predict loop on new accelerator
* resolve a bug
* add predict_loop
* add predict loop on new accelerator
* resolve a bug
* resolve tests
* add predict method
* add back commented accelerator code
* adapt test for sync_batch_norm to new plugin
* fix deprecated tests
* fix ddp cpu choice when no num_processes are given
* yapf format
* skip a memory test that cannot pass anymore
* remove sanetize
* rename train to run_train
* remove useless hooks
* add misconfigurationException
* remove wrong naming
* resolve some legacy
* udpate docstring
* fix pickle error in spawn plugin
* x
* avoid
* x
* fix cyclic import in docs build
* add support for sharded
* update typing
* add sharded and sharded_spawn to distributed types
* make unwrap model default
* refactor LightningShardedDataParallel similar to LightningDistributedDataParallel
* update sharded spawn to reflect changes
* update sharded to reflect changes
* Merge 1.1.5 changes
* fix merge
* fix merge
* yapf isort
* fix merge
* yapf isort
* fix indentation in test
* copy over reinit scheduler implementation from dev1.2
* fix apex tracking calls with dev_debugger
* reduce diff to dev1.2, clean up
* fix trainer config test when gpus>0 and num_processes >0 and ddp_cpu
* sort plugin tests legacy/new
* fix error handling for amp on cpu
* fix merge
fix merge
fix merge
* [Feat] Resolve manual_backward (#5837 )
* resolve manual_backward
* resolve flake8
* update
* resolve for ddp_spawn
* resolve flake8
* resolve flake8
* resolve flake8
Co-authored-by: Ubuntu <ubuntu@ip-172-31-88-60.ec2.internal>
* fix tests/accelerator tests on cpu
* [BugFix] Resolve manual optimization (#5852 )
* resolve manual_optimization
* update
* update
Co-authored-by: Ubuntu <ubuntu@ip-172-31-88-60.ec2.internal>
* Remove copy trainer parameters to happen earlier within the loop and add safe guard to get ref model (#5856 )
* resovle a bug
* Accelerator refactor sharded rpc (#5854 )
* rpc branch
* merge
* update handling of rpc
* make devices etc. Optional in RPC
* set devices etc. later if necessary
* remove devices from sequential
* make devices optional in rpc
* fix import
* uncomment everything
* fix cluster selection
Co-authored-by: Ubuntu <ubuntu@ip-172-31-88-60.ec2.internal>
* resolve bug
* fix assert in rpc test
* resolve a test
* fix docs compilation
* accelerator refactor - fix for sharded parity test (#5866 )
* fix memory issue with ddp_spawn
* x
x
x
x
x
x
x
x
x
* x
* Remove DDP2 as this does not apply
* Add missing pre optimizer hook to ensure lambda closure is called
* fix apex docstring
* [accelerator][BugFix] Resolve some test for 1 gpu (#5863 )
* update
* revert init
* resolve a bug
* update
* resolve flake8
* update
* update
* update
* revert init
* resolve a bug
* update
* resolve flake8
* update
* update
* update
* update
* update
* revert init
* resolve a bug
* update
* resolve flake8
* update
* update
* update
* revert init
* update
* resolve flake8
* update
* update
* update
* update
* update
* all_gather
* update
* make plugins work, add misconfig for RPC
* update
* update
* remove breaking test
* resolve some tests
* resolve flake8
* revert to ddp_spawn
Co-authored-by: root <root@ip-172-31-88-60.ec2.internal>
Co-authored-by: Ubuntu <ubuntu@ip-172-31-88-60.ec2.internal>
Co-authored-by: Justus Schock <justus.schock@rwth-aachen.de>
* yapf isort
* resolve flake8
* fix apex doctests
* fix apex doctests 2
* resolve docs
* update drone
* clean env
* update
* update
* update
* update
* merge
* Fix RPC related tests, clean out old API, update for new accelerator API [skip ci] (#5881 )
* Fix RPC related tests, clean out old API, update for new accelerator API
* Move tests out of legacy folder, update paths and names
* Update test_remove_1-4.py
* Expose properties for tpu cores/gpus/num_gpus
* Add root GPU property
* Move properties to properties.py
* move tests that were previously in drone
* Fix root GPU property (#5908 )
* Move root GPU to property, remove horovod set as this is handled in horovod plugin, ensure we mock correctly to set GPU accelerator
* Add missing tests back
* fix best model path transfer when no checkpoint callback available
* Fix setup hook order [wip] (#5858 )
* Call trainer setup hook before accelerator setup
* Add test case
* add new test
* typo
* fix callback order in test
Co-authored-by: tchaton <thomas@grid.ai>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* rename ddp sequential -> rpc sequential for special test
* revert
* fix stupid merge problem
* Use property in connector for sampler (#5913 )
* merge the import conflicts
* fix spawning of processes in slurm
* [wip] Fix some bugs for TPU [skip ci] (#5878 )
* fixed for single tpu
* fixed spawn
* fixed spawn
* update
* update
* wip
* resolve bugs
* resolve bug
* update on comment
* removed decorator
* resolve comments
* set to 4
* update
* update
* need cleaning
* update
* update
* update
* resolve flake8
* resolve bugs
* exclude broadcast
* resolve bugs
* change test
* update
* update
* skip if meet fails
* properly raise trace
* update
* add catch
* wrap test
* resolve typo
* update
* typo
Co-authored-by: Lezwon Castelino <lezwon@gmail.com>
Co-authored-by: Your Name <you@example.com>
* resolve some tests
* update
* fix imports
* update
* resolve flake8
* update azure pipeline
* skip a sharded test on cpu that requires a gpu
* resolve tpus
* resolve bug
* resolve flake8
* update
* updat utils
* revert permission change on files
* suggestions from carlos
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
* remove unrelated formatting changes
* remove incomplete comment
* Update pytorch_lightning/accelerators/__init__.py
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
* remove unrelated formatting change
* add types
* warn 1.7 ddp manual backward only if ddp kwarg unset
* yapf + isort
* pep8 unused imports
* fix cyclic import in docs
* Apply suggestions from code review
* typer in accelerator.py
* typo
* resolve flake8
* update code
* update
* Update pytorch_lightning/trainer/predict_loop.py
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
* Update pytorch_lightning/trainer/predict_loop.py
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
* fix merge
* fix merge
* reset legacy accelerator
* add missing rename dispatch
* rename post traning
* update code
* resolved comments
* typo
* typo
* add flow description
* resolve comments
* update on comments
* update flow
* add backticks
* resolve tpu
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
Co-authored-by: justusschock <justus.schock@posteo.de>
Co-authored-by: Justus Schock <justus.schock@rwth-aachen.de>
Co-authored-by: Ubuntu <ubuntu@ip-172-31-88-60.ec2.internal>
Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
Co-authored-by: SeanNaren <sean@grid.ai>
Co-authored-by: root <root@ip-172-31-88-60.ec2.internal>
Co-authored-by: Lezwon Castelino <lezwon@gmail.com>
Co-authored-by: Your Name <you@example.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2021-02-16 17:11:56 -05:00
chaton
a52be5bb07
[Hot Fix] Ensure process_dataloader is called when tpu_cores > 1 to use Parallel DataLoader ( #6015 )
...
* hotfix for tpu
* update changelog
* Update CHANGELOG.md
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
2021-02-16 17:02:25 -05:00
Dusan Drevicky
c9fde04947
Make move_metrics_to_cpu work recursively ( #6007 )
...
* Propagate to_cpu flag down the recursion chain
* Refactor
* Add test
* Update CHANGELOG
* Update tests/utilities/test_memory.py
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-02-16 21:52:42 +00:00
Thien Bui
a0494aba72
MlflowLogger limit parameter value length to 250 char ( #5893 )
2021-02-16 21:22:06 +00:00
Nicki Skafte
4062c6246c
[Docs] Explain metric internals ( #5899 )
...
* correct docs
* fix levels
2021-02-16 16:14:30 -05:00
chaton
141316fb29
[BugFix] Resolve bugs in computer_vision_fine_tuning.py example ( #5985 )
...
* update the script to use DataModule
* add message at for the frozen parameters
* add message about trainable parameters
* resolve flake8
2021-02-16 21:01:04 +00:00
chaton
6e79bef996
[accelerator][FeatBugFix] Improve manual optimization API ( #5771 )
...
* fix trainer.model access
* move properties
* fix test_transfer_batch_hook
* fix auto_select_gpus
* fix omegaconf test
* fix test that needs to simulate slurm ddp
* add horovod plugin
* fix test with named arguments
* clean up whitespace
* fix datamodules test
* remove old accelerators
* fix naming
* move old plugins
* move to plugins
* create precision subpackage
* create training_type subpackage
* fix all new import errors
* fix wrong arguments order passed to test
* fix LR finder
* Added sharded training type and amp plugin
* Move clip grad to precision plugin
* Added sharded spawn, select accelerators based on distributed_backend + enable custom fp16 plugin automatically
* Fix import issue, attempting to fix tests
* Fix initial test
* Reflect hook logic from master, should wrap model after move to device
* Optional state consolidation, since master has optimizers not wrapped
* change attribute for instance test
* reset optimizers
optimizers are not used in main process, so state would be wrong.
* legacy
* imports in accel
* legacy2
* trainer imports
* fix import errors after rebase
* move hook to new setup location
* provide unwrapping logic
* fix trainer callback system
* added ddp2 implementation
* fix imports .legacy
* move plugins
* restore legacy
* drop test.py from root
* add tpu accelerator and plugins
* fixes
* fix lightning optimizer merge
* reset bugreportmodel
* unwrapping
* step routing forward
* model access
* unwrap
* opt
* integrate distrib_type
* sync changes
* sync
* fixes
* add forgotten generators
* add missing logic
* update
* import
* missed imports
* import fixes
* isort
* mv f
* changelog
* format
* move helper to parallel plugin
* d
* add world size
* clean up
* duplicate
* activate ddp_sharded and tpu
* set nvidia flags
* remove unused colab var
* use_tpu <-> on_tpu attrs
* make some ddp_cpu and clusterplugin tests pass
* Ref/accelerator connector (#5742 )
* final cleanup
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* connector cleanup
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* trainer cleanup
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* accelerator cleanup + missing logic in accelerator connector
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* add missing changes to callbacks
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* reflect accelerator changes to lightning module
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* clean cluster envs
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* cleanup plugins
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* add broadcasting
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* yapf
* remove plugin connector
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* plugins
* manual optimization
* update optimizer routing
* add rank to torchelastic
* fix memory mixed precision
* setstate on trainer for pickling in ddp spawn
* add predict method
* add back commented accelerator code
* adapt test for sync_batch_norm to new plugin
* fix deprecated tests
* fix ddp cpu choice when no num_processes are given
* yapf format
* skip a memory test that cannot pass anymore
* update on comments
* fix pickle error in spawn plugin
* x
* avoid
* x
* fix cyclic import in docs build
* add support for sharded
* update typing
* add sharded and sharded_spawn to distributed types
* make unwrap model default
* refactor LightningShardedDataParallel similar to LightningDistributedDataParallel
* update sharded spawn to reflect changes
* update sharded to reflect changes
* Merge 1.1.5 changes
* fix merge
* fix merge
* yapf isort
* fix merge
* yapf isort
* fix indentation in test
* copy over reinit scheduler implementation from dev1.2
* fix apex tracking calls with dev_debugger
* reduce diff to dev1.2, clean up
* fix trainer config test when gpus>0 and num_processes >0 and ddp_cpu
* sort plugin tests legacy/new
* fix error handling for amp on cpu
* fix merge
fix merge
fix merge
* [Feat] Resolve manual_backward (#5837 )
* resolve manual_backward
* resolve flake8
* update
* resolve for ddp_spawn
* resolve flake8
* resolve flake8
* resolve flake8
Co-authored-by: Ubuntu <ubuntu@ip-172-31-88-60.ec2.internal>
* fix tests/accelerator tests on cpu
* [BugFix] Resolve manual optimization (#5852 )
* resolve manual_optimization
* update
* update
Co-authored-by: Ubuntu <ubuntu@ip-172-31-88-60.ec2.internal>
* Remove copy trainer parameters to happen earlier within the loop and add safe guard to get ref model (#5856 )
* resovle a bug
* Accelerator refactor sharded rpc (#5854 )
* rpc branch
* merge
* update handling of rpc
* make devices etc. Optional in RPC
* set devices etc. later if necessary
* remove devices from sequential
* make devices optional in rpc
* fix import
* uncomment everything
* fix cluster selection
Co-authored-by: Ubuntu <ubuntu@ip-172-31-88-60.ec2.internal>
* resolve bug
* fix assert in rpc test
* resolve a test
* fix docs compilation
* accelerator refactor - fix for sharded parity test (#5866 )
* fix memory issue with ddp_spawn
* x
x
x
x
x
x
x
x
x
* x
* Remove DDP2 as this does not apply
* Add missing pre optimizer hook to ensure lambda closure is called
* fix apex docstring
* [accelerator][BugFix] Resolve some test for 1 gpu (#5863 )
* update
* revert init
* resolve a bug
* update
* resolve flake8
* update
* update
* update
* revert init
* resolve a bug
* update
* resolve flake8
* update
* update
* update
* update
* update
* revert init
* resolve a bug
* update
* resolve flake8
* update
* update
* update
* revert init
* update
* resolve flake8
* update
* update
* update
* update
* update
* all_gather
* update
* make plugins work, add misconfig for RPC
* update
* update
* remove breaking test
* resolve some tests
* resolve flake8
* revert to ddp_spawn
Co-authored-by: root <root@ip-172-31-88-60.ec2.internal>
Co-authored-by: Ubuntu <ubuntu@ip-172-31-88-60.ec2.internal>
Co-authored-by: Justus Schock <justus.schock@rwth-aachen.de>
* yapf isort
* resolve flake8
* fix apex doctests
* fix apex doctests 2
* resolve docs
* update drone
* clean env
* update
* update
* update
* update
* merge
* Fix RPC related tests, clean out old API, update for new accelerator API [skip ci] (#5881 )
* Fix RPC related tests, clean out old API, update for new accelerator API
* Move tests out of legacy folder, update paths and names
* Update test_remove_1-4.py
* Expose properties for tpu cores/gpus/num_gpus
* Add root GPU property
* Move properties to properties.py
* move tests that were previously in drone
* Fix root GPU property (#5908 )
* Move root GPU to property, remove horovod set as this is handled in horovod plugin, ensure we mock correctly to set GPU accelerator
* Add missing tests back
* fix best model path transfer when no checkpoint callback available
* Fix setup hook order [wip] (#5858 )
* Call trainer setup hook before accelerator setup
* Add test case
* add new test
* typo
* fix callback order in test
Co-authored-by: tchaton <thomas@grid.ai>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* rename ddp sequential -> rpc sequential for special test
* revert
* fix stupid merge problem
* Use property in connector for sampler (#5913 )
* merge the import conflicts
* fix spawning of processes in slurm
* [wip] Fix some bugs for TPU [skip ci] (#5878 )
* fixed for single tpu
* fixed spawn
* fixed spawn
* update
* update
* wip
* resolve bugs
* resolve bug
* update on comment
* removed decorator
* resolve comments
* set to 4
* update
* update
* need cleaning
* update
* update
* update
* resolve flake8
* resolve bugs
* exclude broadcast
* resolve bugs
* change test
* update
* update
* skip if meet fails
* properly raise trace
* update
* add catch
* wrap test
* resolve typo
* update
* typo
Co-authored-by: Lezwon Castelino <lezwon@gmail.com>
Co-authored-by: Your Name <you@example.com>
* resolve some tests
* update
* fix imports
* update
* resolve flake8
* update azure pipeline
* skip a sharded test on cpu that requires a gpu
* resolve tpus
* resolve bug
* resolve flake8
* update
* updat utils
* revert permission change on files
* suggestions from carlos
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
* remove unrelated formatting changes
* remove incomplete comment
* Update pytorch_lightning/accelerators/__init__.py
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
* remove unrelated formatting change
* add types
* warn 1.7 ddp manual backward only if ddp kwarg unset
* yapf + isort
* pep8 unused imports
* fix cyclic import in docs
* Apply suggestions from code review
* typer in accelerator.py
* typo
* Apply suggestions from code review
* formatting
* update on comments
* update typo
* Update pytorch_lightning/trainer/properties.py
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* update
* update on comments
* resolve some comments
* update on comments
* resolve test
* add toggle_model
* update
* update on comments
* update doc
* typo
* update
* typo
* remove space
* update
* update on comments
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: justusschock <justus.schock@posteo.de>
Co-authored-by: SeanNaren <sean@grid.ai>
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
Co-authored-by: Jirka Borovec <jirka.borovec@seznam.cz>
Co-authored-by: Justus Schock <justus.schock@rwth-aachen.de>
Co-authored-by: Ubuntu <ubuntu@ip-172-31-88-60.ec2.internal>
Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
Co-authored-by: root <root@ip-172-31-88-60.ec2.internal>
Co-authored-by: Lezwon Castelino <lezwon@gmail.com>
Co-authored-by: Your Name <you@example.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2021-02-16 16:00:35 -05:00
Rohit Gupta
6d1e055a32
Prune EvalModelTemplate from callbacks and utilities ( #6018 )
...
* boring
* boring
2021-02-16 19:59:57 +00:00
Jirka Borovec
dbf2a54325
update XLA nightly version check ( #5989 )
...
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2021-02-16 19:47:08 +00:00
Eric Cousineau
62d3ec9613
doc: Add hint towards using ArgumentParser.add_argument_group ( #5911 )
...
* doc: Add hint towards using ArgumentParser.add_argument_group
Since pl adds many arguments, it is nice to distinguish these arguments
* fixup! address review
Co-authored-by: chaton <thomas@grid.ai>
2021-02-16 19:45:55 +00:00
Jirka Borovec
960a60743f
fix fairscale compatible with PT 1.8 ( #5996 )
...
* try to extend fairscale available
* 1.2
2021-02-16 19:43:02 +00:00
Dusan Drevicky
c5919fde63
Basic examples fixes ( #5912 )
...
* Move pl_bolts assert to actually do something
* Define val, test steps, use _DATASETS_PATH
* Use DATASETS_PATH in DALI classifier
* Fix incorrect paths and style in example READMEs
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2021-02-16 19:31:07 +00:00
Jirka Borovec
1c87f1f6cd
remove legacy plugins ( #5950 )
...
* remove legacy plugins
* imports
* formatting
* fix docs references
* fix cluster environment inheritance
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-02-16 19:20:58 +00:00
Eric Cousineau
4531b1c796
wandb: Fix example rendering for docs ( #5905 )
...
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-02-16 20:14:01 +01:00
Jirka Borovec
936f42aa1c
clean AMP logic ( #5994 )
...
* clean AMP logic
* cleaning
* ...
* ...
* Even apex
2021-02-16 19:06:47 +00:00
Sean Naren
b40d414463
Fix error in pre-optim logic ( #5995 )
2021-02-16 19:05:44 +00:00
Jirka Borovec
b5d7d08da5
fix nightly releases & readme ( #5922 )
...
* fix nightly releases
* readme
* cuda
* doxker
* Apply suggestions from code review
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
* revert
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-02-16 13:46:28 -05:00
Jirka Borovec
a22ec15251
temporary suspend master update ( #6017 )
2021-02-16 13:15:41 -05:00
Jirka Borovec
fcfa7fabbf
move TPU cleaning to GH actions ( #5991 )
...
* move TPU cleaning to GH actions
* test
* .
2021-02-16 18:01:22 +00:00
Jirka Borovec
27ab76923a
try random TPU config ( #5992 )
...
* try random TPU config
* random
* Apply suggestions from code review
Co-authored-by: chaton <thomas@grid.ai>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2021-02-16 17:19:41 +01:00
Carlos Mocholí
47101c2d54
Update PULL_REQUEST_TEMPLATE [skip ci] ( #6000 )
...
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2021-02-16 20:14:17 +05:30
Jirka Borovec
7379e43396
CI: drop code formatters ( #5971 )
...
Co-authored-by: chaton <thomas@grid.ai>
2021-02-16 12:32:07 +00:00
Jirka Borovec
e6a2ccc850
fix deprecated call ( #6005 )
2021-02-16 12:11:13 +01:00
Adrian Wälchli
6dba26666a
add missing typing to trainer properties ( #5974 )
...
* add typing
* clean up
* isort
* fix typing in log_dir
2021-02-15 23:54:12 +00:00
Adrian Wälchli
aa60c08641
move device-specific teardown logic from training loop to accelerator ( #5973 )
...
* on train end
* switch order
2021-02-15 17:38:03 -05:00
Jirka Borovec
ae4dca9725
Docs: fix failing make ( #5988 )
2021-02-15 16:03:57 -05:00
Adrian Wälchli
d422ef2c89
clean up unused distributed sampler logic in trainer ( #5975 )
...
* clean up sampler unused logic
* undo cached
* imports
2021-02-15 14:48:35 -05:00
Eric Cousineau
4f63942c4d
Makefile: Refer to CONTRIBUTING doc, reword `test` to avoid "example" ( #5910 )
...
CONTRIBUTING: Add concrete example for running single test
2021-02-15 19:05:49 +00:00
Kaushik B
b5d29df646
Fix: hparams.yaml saved twice when using TensorBoardLogger ( #5953 )
2021-02-15 22:31:31 +05:30
Jirka Borovec
ba806c8ee0
enable testing DDP examples ( #4995 )
...
* enable testing DDP examples
* args
* ddp_spawn
* ddp as extra script
* path
# Conflicts:
# .drone.yml
* install
* -u
* q
2021-02-15 15:36:13 +00:00
William Falcon
b2950296d5
fixed TPU docs ( #5958 )
2021-02-15 13:58:15 +00:00
Akihiro Nitta
0a2fb05aac
Document exceptions in callbacks ( #5541 )
...
* Add Raises: section to docstring
* Add Raises section to the docs
* Add raises section to the docs
* Apply suggestions from code review
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* fix
* Remove unnecessary instance check
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2021-02-15 10:24:36 +00:00
takahashi
52c07f2f03
Docs: Fix broken get-started link ( #5960 )
2021-02-15 01:00:19 +01:00
Adrian Wälchli
c912c4b729
remove legacy accelerators ( #5949 )
...
* remove legacy accelerators
* update imports
* formatting
Co-authored-by: Jirka Borovec <jirka.borovec@seznam.cz>
2021-02-14 16:03:45 +00:00
Adrian Wälchli
a3d4e7c86a
move accelerator legacy tests ( #5948 )
...
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2021-02-13 19:42:18 -05:00
William Falcon
0345fcfaad
Update README.md
2021-02-13 14:44:19 -05:00
William Falcon
194f048263
Update README.md
2021-02-13 14:41:38 -05:00
William Falcon
d924dd6a41
Update README.md
2021-02-13 14:01:34 -05:00
William Falcon
11942558d0
Update README.md
2021-02-13 13:50:18 -05:00