Commit Graph

842 Commits

Author SHA1 Message Date
Kaushik B 42a7b70585
Add DDP Spawn being default for Multi GPUs (#6292) 2021-03-21 22:10:54 +01:00
thomas chaton 8853a36d45
[doc] Update Dict Train Loader doc. (#6579)
* update doc

* update example
2021-03-18 17:14:38 +00:00
thomas chaton 00cd918177
[doc] Add Zero Grad `set_to_none=True` trick (#6548)
* add trick to doc

* update

* update path

* Update docs/source/benchmarking/performance.rst

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2021-03-16 23:40:14 +00:00
Carlos Mocholí 9c5973357e
Update hook lifecycle (#6538)
* Update hook lifecycle

* Update docs/source/common/lightning_module.rst
2021-03-15 19:16:31 +00:00
Sean Naren 383565d225
Update DeepSpeed docs (#6528)
* Clean up docs and add some explicitness around stages

* Apply suggestions from code review

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2021-03-15 18:00:21 +00:00
Jirka Borovec b341b53f70
deprecate metrics pkg (#6505)
* deprecate metrics

* examples

* req

* docs

* Apply suggestions from code review

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Nicki Skafte <skaftenicki@gmail.com>

* pep8

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Nicki Skafte <skaftenicki@gmail.com>
2021-03-15 14:39:38 +00:00
Luca Di Liello 5d73fbbd81
Mean Average Precision metric for Information Retrieval (1/5) (#5032)
* init information retrieval metrics

* changed retrieval metrics names, expanded arguments and fixed typo

* added 'Retrieval' prefix to metrics and fixed conflict with already-present 'average_precision' file

* improved code formatting

* pep8 code compatibility

* features/implemented new Mean Average Precision metrics for Information Retrieval + doc

* fixed pep8 compatibility

* removed threshold parameter and fixed typo on types in RetrievalMAP and improved doc

* improved doc, put first class-specific args in RetrievalMetric and transformed RetrievalMetric in abstract class

* implemented tests for functional and class metric. fixed typo when input tensors are empty or when all targets are False

* fixed typos in doc and changed torch.true_divide to torch.div

* fixed typos pep8 compatibility

* fixed types in long division in ir_average_precision and example in mean_average_precision

* RetrievalMetric states are not lists and _metric method accepts predictions and targets for easier extension

* updated CHANGELOG file

* added '# noqa: F401' flag to not used imports

* added double space before '# noqa: F401' flag

* Update CHANGELOG.md

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* change get_mini_groups in get_group_indexes

* added checks on target inputs

* minor refactoring for code cleanness

* split tests over exception raising in separate function && refactored test code into multiple functions

* fixed pep8 compatibility

* implemented suggestions of @SkafteNicki

* fixed imports for isort and added types annontations to functions in test_map.py

* isort on test_map and fixed typing

* isort on retrieval and on __init__.py and utils.py in metrics package

* fixed typo in pytorch_lightning/metrics/__init__.py regarding code style

* fixed yapf compatibility

* fixed yapf compatibility

* fixed typo in doc

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Nicki Skafte <skaftenicki@gmail.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2021-03-15 12:18:43 +01:00
Akihiro Nitta 680e83adab
[doc] Update the order of zero_grad and backward (#6478)
* Fix zero_grad in docs

* Fix zero_grad in docs
2021-03-12 09:00:23 +00:00
Eric Cousineau e886d55ac1
argparse: Add use_argument_group=True (#6088)
* argparse: Add inplace option

Replicate in GAN model

* datamodule: Deduplicate logic w/ argparser utilities

* Update pl_examples/domain_templates/generative_adversarial_net.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Akihiro Nitta <nitta@akihironitta.com>

* Keep docstrings

* Correct name

* Whitespace

* Consistency

* fix weird type stuff

* try alt - use_argument_group

* fix syntax + lint

* fix ci errs

* fix ci

* change examples... still failing w/ "unrecognized arguments: --batch_size"

* address review

* mnist_datamodule: add some docstrings

* argparse: check cls or cls.__init__ for param

didn't capture issue, but meh

* fix lint

* fix no-doc edge case

* address review

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Akihiro Nitta <nitta@akihironitta.com>
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-03-11 10:50:49 -05:00
Elia Cereda f4cc7451a9
Add Trainer.validate(…) method to run one validation epoch (#4948)
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
Co-authored-by: chaton <thomas@grid.ai>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-03-11 03:46:37 +01:00
Sean Naren c81b2a8189
Set find unused parameters to True by default to fix breaking compatibility (#6438)
* Set find unused parameters to True by default to fix breaking models, add suggestion to re-enable

* Add changelog
2021-03-10 10:40:24 +01:00
Adrian Wälchli 75c6486ac7
update (#6403) 2021-03-09 09:47:51 +00:00
Carlos Mocholí efd272a3ca
Pass {fit,validate,test,predict} to setup() and teardown() (#6386) 2021-03-08 15:27:07 +01:00
Roger Shieh ff16104927
Update TBLogger docs (#6315)
* Update tensorboard.py

* Update logging.rst

* pep8

* Update logging.rst

* Update logging.rst

* Apply suggestions from code review

* add code sample

* Update logging.rst

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2021-03-08 01:17:49 +00:00
Akihiro Nitta c7f30a204c
[doc] Fix closure in manual optimization (#6374)
* Fix manual optimization docs

* Fix typo. Thanks @import-antigravity
2021-03-07 13:34:51 +01:00
Oier Mees 2708c3993d
[doc] Improve Multiple Val/Test Dataloaders with simultaneous batches option (#6320)
* improve doc to describe how to combine batches of multiple test and val dataloaders simultaneously

* fix typo

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* use paramref

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-03-07 11:24:19 +00:00
Rohit Gupta 38a5fe7af1
Remove optimizer_idx arg in manual optimization (#6093)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: chaton <thomas@grid.ai>
2021-03-07 08:48:50 +01:00
thomas chaton 4f391bce7c
give a more complete GAN example (#6294) 2021-03-05 17:54:09 -05:00
Nicki Skafte 4f904556e4
Update docs on arg train_dataloader in fit (#6076)
* add to docs

* update docs

* Apply suggestions from code review

* Update pytorch_lightning/core/hooks.py

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

* nested loaders

* Apply suggestions from code review

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* shorten text length

* Update pytorch_lightning/core/hooks.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-03-04 14:22:18 -05:00
Lezwon Castelino 577323c92a
leaving lezwon (#6347) 2021-03-04 18:37:58 +00:00
Adrian Wälchli bc577ca792
fix duplicate console logging bug v2 (#6275)
Co-authored-by: chaton <thomas@grid.ai>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2021-03-02 15:17:55 +05:30
Akihiro Nitta 412a7d812e
Remove opt from manual_backward in docs (#6267) 2021-03-01 18:15:43 +00:00
Akihiro Nitta 925f082572
Call `optimizer.zero_grad()` before backward inside closure in AutoOpt (#6147)
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-03-01 14:36:46 +01:00
Adrian Wälchli ce0568700b
update (#6237) 2021-03-01 14:14:53 +01:00
Rohit Gupta c7130b7e1e
Update with GitHub Discussions (#6186) 2021-02-25 00:28:23 +05:30
Jirka Borovec 46617d9021
Prune deprecated checkpoint arguments (#6162)
* prune prefix

* prune mode=auto

* chlog
2021-02-24 06:58:53 -05:00
Jirka Borovec 1d9c553b86
prune deprecated Trainer arg `enable_pl_optimizer` (#6163)
* prune enable_pl_optimizer

* prune automatic_optimization
2021-02-24 10:01:24 +00:00
Jirka Borovec 09baf29ecb
prune deprecated profiler as bool (#6164)
* prune profiler

* chlog
2021-02-24 09:08:21 +00:00
Sean Naren 863a70c294
Add specifics around DeepSpeed docs (#6142)
* Be more specific with DeepSpeed compatibility

* Better wording
2021-02-23 00:08:39 +01:00
Akihiro Nitta 1d28d11a07
Minor fixes/improvements in Metric docs (#6114)
* Fix wrong render

* Improve classification metrics docs

* Improve other domain metrics docs

* Change the structure level in the docs
2021-02-22 16:50:59 +08:00
Jirka Borovec 4574023e31
v1.2.0 (#6065)
* v1.2.0

* docs
2021-02-18 15:14:39 -05:00
Rohit Gupta b0074a471a
Update auto-opt docs (#6037)
* fix docs

* update on comments

* Apply suggestions from code review

Co-authored-by: Nicki Skafte <skaftenicki@gmail.com>

* Apply suggestions from code review

Co-authored-by: Nicki Skafte <skaftenicki@gmail.com>

* Apply suggestions from code review

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* rm comment

* Update docs/source/common/lightning_module.rst

Co-authored-by: chaton <thomas@grid.ai>

Co-authored-by: Nicki Skafte <skaftenicki@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: chaton <thomas@grid.ai>
2021-02-18 19:51:56 +00:00
Sean Naren 2cf39dc442
Add warnings to on_before/after_batch_transfer hooks (#6059)
* Add warnings to hooks

* Add default idx to prevent signature change in the future

* Nothing to see here

* Add default val to transfer_batch_to_device hook

* Apply suggestions from code review

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* Revert "Add default val to transfer_batch_to_device hook"

This reverts commit 5c6a68f2

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2021-02-18 14:24:19 -05:00
edenlightning 3449e2d79f
Docs for Pruning, Quantization, and SWA (#6041)
Co-authored-by: chaton <thomas@grid.ai>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
2021-02-18 13:51:51 +00:00
Shreya Purohit f48a9330ed
Fix docs typo (#6055)
Put .test() in  code blocks
2021-02-18 13:47:50 +00:00
Adrian Wälchli 115e58af9b
clarify gpu / process (#6049) 2021-02-18 12:52:10 +00:00
Rohit Gupta bcc0004955
Add before_batch_transfer and after_batch_transfer hooks (#3671)
* add hooks

* comment

* docs

* add tests

* make it private

* fix tests

* docs

* chlog

* testcode

* codefactor

* fix doctest

* fix doctest

* suggestions

* is always overriden

* pep and BoringModel

* BoringModel

* docs

* docs

* docs

* fix

* rebase

* rebase

* suggestions

* docs

* suggestions

* try fix docs

* docs

* update name

* yapf

* docs

* rebase

* yapf
2021-02-18 06:58:12 -05:00
Lezwon Castelino d2cd7cb0f9
Add option for weight tying on TPU's (#5441)
* added on_post_move_to_device

* added tests

* docs and refactors

* Update tests/backends/test_tpu_backend.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* Update docs/source/tpu.rst

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* Update docs/source/tpu.rst

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* Update pytorch_lightning/core/decorators.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* Update pytorch_lightning/core/decorators.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* Update docs/source/tpu.rst

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

* Update pytorch_lightning/core/decorators.py

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

* Update pytorch_lightning/core/decorators.py

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

* Update pytorch_lightning/core/decorators.py

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

* Update pytorch_lightning/core/decorators.py

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

* Update pytorch_lightning/core/hooks.py

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

* moved weight sharing module back to test

updated tpu available

* add count to warning

* fix doctest

* import trainer in doctest

* import trainer in doctest

* do not test code as no TPU device

* param count to layer count

* formatting

* update docs

* update import

* update

* resolve tests

* remove legacy accelerator

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: tchaton <thomas@grid.ai>
Co-authored-by: Your Name <you@example.com>
2021-02-18 00:03:26 +00:00
Sean Naren 7189d673f6
DeepSpeed Integration (#5954)
* Add initial deepspeed changes

* Address code review

* Move static method outside of function

* Fixes

* Add missing annotation

* Remove seed setting

* Doc changes

* Doc changes, add address reviews

* Fix docs

* Try fixing issue by moving to torch adam

* Clean up check

* Changes, better APIs!

* Add wrapper, swap to git install revision

* Add special test

* Add warning

* Address review

* Add better disclaimer

* Turn off ZeRO for testing due to compilation

* Add description on modifying parameters via the plugin

* Doc strings clear

* Small doc fixes

* Fix hash, reduce test

* Added CI change

* Move to azure pipeline

* Fix test name

* Add missing flag

* Remove sudo...

* Try conda instead

* Swap to conda base

* Try suggested install

* Apply suggestions from code review

* Apply suggestions from code review

* Revert "Apply suggestions from code review"

This reverts commit 41cca05a

* Revert "Apply suggestions from code review"

This reverts commit e06ec29e

* Remove setter

* Address most review

* Move out function, remove DeepSpeed from requirements

* Install deepspeed/mpi4py within container

* Use special tests, move to master commit for deepspeed

* Export path

* Force compile to happen first

* Remove!

* Debugging ninja

* Fix error in optimizer step logic

* Attempt to fix symbolic link

* Reverse to aid debugging

* Export path again

* Clean up mess

* var

* Revert "var"

This reverts commit 3450eaca

* Address review, add todo

* Add note about unsupported functionality

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: tchaton <thomas@grid.ai>
Co-authored-by: Jirka Borovec <jirka.borovec@seznam.cz>
2021-02-17 15:23:42 -05:00
Adrian Wälchli 6a409c7f84
remove outdated info (#6032) 2021-02-17 20:57:57 +01:00
Adrian Wälchli ad36c7b9ce
Add hint in docs for how to use shared memory (#6036) 2021-02-18 00:28:41 +05:30
chaton 5700fd091f
[Feat] Add TORCH_DISTRIBUTED_BACKEND env variable (#5981)
* add backend support

* resolve flake8

* update changelog

* update

* Apply suggestions from code review

* Update docs/source/advanced/multi_gpu.rst

* add patch as context manager

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-02-17 16:37:39 +00:00
Nicki Skafte 4062c6246c
[Docs] Explain metric internals (#5899)
* correct docs

* fix levels
2021-02-16 16:14:30 -05:00
chaton 6e79bef996
[accelerator][FeatBugFix] Improve manual optimization API (#5771)
* fix trainer.model access

* move properties

* fix test_transfer_batch_hook

* fix auto_select_gpus

* fix omegaconf test

* fix test that needs to simulate slurm ddp

* add horovod plugin

* fix test with named arguments

* clean up whitespace

* fix datamodules test

* remove old accelerators

* fix naming

* move old plugins

* move to plugins

* create precision subpackage

* create training_type subpackage

* fix all new import errors

* fix wrong arguments order passed to test

* fix LR finder

* Added sharded training type and amp plugin

* Move clip grad to precision plugin

* Added sharded spawn, select accelerators based on distributed_backend + enable custom fp16 plugin automatically

* Fix import issue, attempting to fix tests

* Fix initial test

* Reflect hook logic from master, should wrap model after move to device

* Optional state consolidation, since master has optimizers not wrapped

* change attribute for instance test

* reset optimizers

optimizers are not used in main process, so state would be wrong.

* legacy

* imports in accel

* legacy2

* trainer imports

* fix import errors after rebase

* move hook to new setup location

* provide unwrapping logic

* fix trainer callback system

* added ddp2 implementation

* fix imports .legacy

* move plugins

* restore legacy

* drop test.py from root

* add tpu accelerator and plugins

* fixes

* fix lightning optimizer merge

* reset bugreportmodel

* unwrapping

* step routing forward

* model access

* unwrap

* opt

* integrate distrib_type

* sync changes

* sync

* fixes

* add forgotten generators

* add missing logic

* update

* import

* missed imports

* import fixes

* isort

* mv f

* changelog

* format

* move helper to parallel plugin

* d

* add world size

* clean up

* duplicate

* activate ddp_sharded and tpu

* set nvidia flags

* remove unused colab var

* use_tpu <-> on_tpu attrs

* make some ddp_cpu and clusterplugin tests pass

* Ref/accelerator connector (#5742)

* final cleanup

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* connector cleanup

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* trainer cleanup

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* accelerator cleanup + missing logic in accelerator connector

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* add missing changes to callbacks

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* reflect accelerator changes to lightning module

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* clean cluster envs

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* cleanup plugins

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* add broadcasting

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* yapf

* remove plugin connector

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* plugins

* manual optimization

* update optimizer routing

* add rank to torchelastic

* fix memory mixed precision

* setstate on trainer for pickling in ddp spawn

* add predict method

* add back commented accelerator code

* adapt test for sync_batch_norm to new plugin

* fix deprecated tests

* fix ddp cpu choice when no num_processes are given

* yapf format

* skip a memory test that cannot pass anymore

* update on comments

* fix pickle error in spawn plugin

* x

* avoid

* x

* fix cyclic import in docs build

* add support for sharded

* update typing

* add sharded and sharded_spawn to distributed types

* make unwrap model default

* refactor LightningShardedDataParallel similar to LightningDistributedDataParallel

* update sharded spawn to reflect changes

* update sharded to reflect changes

* Merge 1.1.5 changes

* fix merge

* fix merge

* yapf isort

* fix merge

* yapf isort

* fix indentation in test

* copy over reinit scheduler implementation from dev1.2

* fix apex tracking calls with dev_debugger

* reduce diff to dev1.2, clean up

* fix trainer config test  when gpus>0 and num_processes >0 and ddp_cpu

* sort plugin tests legacy/new

* fix error handling for amp on cpu

* fix merge


fix merge


fix merge

* [Feat] Resolve manual_backward (#5837)

* resolve manual_backward

* resolve flake8

* update

* resolve for ddp_spawn

* resolve flake8

* resolve flake8

* resolve flake8

Co-authored-by: Ubuntu <ubuntu@ip-172-31-88-60.ec2.internal>

* fix tests/accelerator tests on cpu

* [BugFix] Resolve manual optimization (#5852)

* resolve manual_optimization

* update

* update

Co-authored-by: Ubuntu <ubuntu@ip-172-31-88-60.ec2.internal>

* Remove copy trainer parameters to happen earlier within the loop and add safe guard to get ref model (#5856)

* resovle a bug

* Accelerator refactor sharded rpc (#5854)

* rpc branch

* merge

* update handling of rpc

* make devices etc. Optional in RPC

* set devices etc. later if necessary

* remove devices from sequential

* make devices optional in rpc

* fix import

* uncomment everything

* fix cluster selection

Co-authored-by: Ubuntu <ubuntu@ip-172-31-88-60.ec2.internal>

* resolve bug

* fix assert in rpc test

* resolve a test

* fix docs compilation

* accelerator refactor - fix for sharded parity test (#5866)

* fix memory issue with ddp_spawn

* x


x


x


x


x


x


x


x


x

* x

* Remove DDP2 as this does not apply

* Add missing pre optimizer hook to ensure lambda closure is called

* fix apex docstring

* [accelerator][BugFix] Resolve some test for 1 gpu (#5863)

* update

* revert init

* resolve a bug

* update

* resolve flake8

* update

* update

* update

* revert init

* resolve a bug

* update

* resolve flake8

* update

* update

* update

* update

* update

* revert init

* resolve a bug

* update

* resolve flake8

* update

* update

* update

* revert init

* update

* resolve flake8

* update

* update

* update

* update

* update

* all_gather

* update

* make plugins work, add misconfig for RPC

* update

* update

* remove breaking test

* resolve some tests

* resolve flake8

* revert to ddp_spawn

Co-authored-by: root <root@ip-172-31-88-60.ec2.internal>
Co-authored-by: Ubuntu <ubuntu@ip-172-31-88-60.ec2.internal>
Co-authored-by: Justus Schock <justus.schock@rwth-aachen.de>

* yapf isort

* resolve flake8

* fix apex doctests

* fix apex doctests 2

* resolve docs

* update drone

* clean env

* update

* update

* update

* update

* merge

* Fix RPC related tests, clean out old API, update for new accelerator API [skip ci] (#5881)

* Fix RPC related tests, clean out old API, update for new accelerator API

* Move tests out of legacy folder, update paths and names

* Update test_remove_1-4.py

* Expose properties for tpu cores/gpus/num_gpus

* Add root GPU property

* Move properties to properties.py

* move tests that were previously in drone

* Fix root GPU property (#5908)

* Move root GPU to property, remove horovod set as this is handled in horovod plugin, ensure we mock correctly to set GPU accelerator

* Add missing tests back

* fix best model path transfer when no checkpoint callback available

* Fix setup hook order [wip] (#5858)

* Call trainer setup hook before accelerator setup

* Add test case

* add new test

* typo

* fix callback order in test

Co-authored-by: tchaton <thomas@grid.ai>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* rename ddp sequential -> rpc sequential for special test

* revert

* fix stupid merge problem

* Use property in connector for sampler (#5913)

* merge the import conflicts

* fix spawning of processes in slurm

* [wip] Fix some bugs for TPU [skip ci] (#5878)

* fixed for single tpu

* fixed spawn

* fixed spawn

* update

* update

* wip

* resolve bugs

* resolve bug

* update on comment

* removed decorator

* resolve comments

* set to 4

* update

* update

* need cleaning

* update

* update

* update

* resolve flake8

* resolve bugs

* exclude broadcast

* resolve bugs

* change test

* update

* update

* skip if meet fails

* properly raise trace

* update

* add catch

* wrap test

* resolve typo

* update

* typo

Co-authored-by: Lezwon Castelino <lezwon@gmail.com>
Co-authored-by: Your Name <you@example.com>

* resolve some tests

* update

* fix imports

* update

* resolve flake8

* update azure pipeline

* skip a sharded test on cpu that requires a gpu

* resolve tpus

* resolve bug

* resolve flake8

* update

* updat utils

* revert permission change on files

* suggestions from carlos

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* remove unrelated formatting changes

* remove incomplete comment

* Update pytorch_lightning/accelerators/__init__.py

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* remove unrelated formatting change

* add types

* warn 1.7 ddp manual backward only if ddp kwarg unset

* yapf + isort

* pep8 unused imports

* fix cyclic import in docs

* Apply suggestions from code review

* typer in accelerator.py

* typo

* Apply suggestions from code review

* formatting

* update on comments

* update typo

* Update pytorch_lightning/trainer/properties.py

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* update

* update on comments

* resolve some comments

* update on comments

* resolve test

* add toggle_model

* update

* update on comments

* update doc

* typo

* update

* typo

* remove space

* update

* update on comments

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: justusschock <justus.schock@posteo.de>
Co-authored-by: SeanNaren <sean@grid.ai>
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
Co-authored-by: Jirka Borovec <jirka.borovec@seznam.cz>
Co-authored-by: Justus Schock <justus.schock@rwth-aachen.de>
Co-authored-by: Ubuntu <ubuntu@ip-172-31-88-60.ec2.internal>
Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
Co-authored-by: root <root@ip-172-31-88-60.ec2.internal>
Co-authored-by: Lezwon Castelino <lezwon@gmail.com>
Co-authored-by: Your Name <you@example.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2021-02-16 16:00:35 -05:00
Eric Cousineau 62d3ec9613
doc: Add hint towards using ArgumentParser.add_argument_group (#5911)
* doc: Add hint towards using ArgumentParser.add_argument_group

Since pl adds many arguments, it is nice to distinguish these arguments

* fixup! address review

Co-authored-by: chaton <thomas@grid.ai>
2021-02-16 19:45:55 +00:00
Jirka Borovec 960a60743f
fix fairscale compatible with PT 1.8 (#5996)
* try to extend fairscale available

* 1.2
2021-02-16 19:43:02 +00:00
Jirka Borovec 1c87f1f6cd
remove legacy plugins (#5950)
* remove legacy plugins

* imports

* formatting

* fix docs references

* fix cluster environment inheritance

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-02-16 19:20:58 +00:00
William Falcon b2950296d5
fixed TPU docs (#5958) 2021-02-15 13:58:15 +00:00
takahashi 52c07f2f03
Docs: Fix broken get-started link (#5960) 2021-02-15 01:00:19 +01:00
Adrian Wälchli b8619a695f
new LightningModule hook "configure_callbacks" (#5621)
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: chaton <thomas@grid.ai>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2021-02-12 19:27:44 -05:00