Commit Graph

38 Commits

Author SHA1 Message Date
William Falcon b922409624
clean and organize fit (#3938)
* clean and organize fit

* clean and organize fit

* clean and organize fit

* clean and organize fit

* clean and organize fit
2020-10-07 11:04:10 -04:00
William Falcon 9c415d2c71
moves configure ddp to each backend (#3924)
* moves configure ddp to each backend

* moves configure ddp to each backend

* moves configure ddp to each backend

* added torch manual seed in test_mean_error

* test for complicated batch structure

* test for complicated batch structure

* test for complicated batch structure

Co-authored-by: ananyahjha93 <ananya@pytorchlightning.ai>
2020-10-07 00:50:16 -04:00
William Falcon e3007ffe0c
moves sync bn to each backend (#3925) 2020-10-06 22:42:33 -04:00
William Falcon af5887c0aa
fixed ddp flag crash (#3927) 2020-10-06 22:41:08 -04:00
Sean Naren e4a56fa5cf
Ensure global seed exists before passing into env subprocess.Popen call (#3904) 2020-10-06 12:31:49 -04:00
William Falcon 70e792344a
test selecting the correct backend. temp backends while slurm and TE are decoupled (#3848)
* test selecting the correct backend. tem backends while slurm and TE are decoupled

* test selecting the correct backend. tem backends while slurm and TE are decoupled
2020-10-04 15:44:50 -04:00
William Falcon 2c21f7d7e2
ref: adding compute environments (2/n) (#3842)
* ref: adding compute environments (2/n)

* ref: adding compute environments (2/n)

* ref: adding compute environments (2/n)

* ref: adding compute environments (2/n)
2020-10-04 08:48:46 -04:00
William Falcon 1f8ff7c48c
ref: callback system and init ddp (1/n) (#3836)
* refactored callback system and init ddp

* refactored callback system and init ddp

* refactored callback system and init ddp

* refactored callback system and init ddp
2020-10-03 23:39:17 -04:00
William Falcon 35d1111994
[WIP] ref: decoupled ddp, ddp spawn (finish 3733) (#3819)
* ref: finish #3733

* remove deprecated test

* remove deprecated test

* remove deprecated test

* remove deprecated test

* remove deprecated test

* remove deprecated test

* remove deprecated test

* remove deprecated test

* remove deprecated test

* remove deprecated test

* remove deprecated test

* remove deprecated test

* remove deprecated test

* Update pytorch_lightning/accelerators/ddp_backend.py

Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>

* remove deprecated test

* remove deprecated test

* remove deprecated test

Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
2020-10-03 14:05:31 -04:00
William Falcon ed1450a293
ref: clean up ddp before final fix (#3817)
* ref: clean up ddp before final fix

* ref: clean up ddp before final fix

* ref: clean up ddp before final fix

* ref: clean up ddp before final fix

* ref: clean up ddp before final fix
2020-10-03 12:01:02 -04:00
William Falcon a677833f84
ref: separate slurm from ddp (#3809)
* ref: separate slurm from ddp

* ref: separate te from ddp

* ref: merge

* ref: merge

* ref: merge
2020-10-02 23:08:34 -04:00
William Falcon afa43837a4
ref: part 8 of #3733 (#3806) 2020-10-02 18:46:18 -04:00
William Falcon 440f837f6d
ref: part a of #3733 (#3766)
* ref: part a of #3733

* ref: part a of #3733
2020-10-01 08:15:23 -04:00
William Falcon 890588a9ee
ref: precision plugins 1/n (#3504)
* ref: precision plugins 1/n

* ref: precision plugins 1/n
2020-09-15 09:56:12 -04:00
William Falcon 6bcfa8b068
ref: merge backends x/n (#3482) 2020-09-12 16:28:29 -04:00
William Falcon 00d155ae01
ref: merge backends x/n (#3477) 2020-09-12 12:36:55 -04:00
William Falcon dd324e4086
ref: accelerator connector methods x/n (#3470) 2020-09-11 22:25:48 -04:00
William Falcon ef20310873
ref: move specific accelerator code x/n (#3457)
* ref: organize args x/n

* ref: move specific accelerator code x/n

* ref: move specific accelerator code x/n

* ref: move specific accelerator code x/n
2020-09-11 10:56:21 -04:00
William Falcon 70af47db84
ref: organize args 4/n (#3456) 2020-09-10 21:58:47 -04:00
William Falcon 8f6b115511
ref: added model connector (#3407)
* ref: added model connector

* ref: added model connector

* ref: added model connector
2020-09-09 00:24:20 -04:00
William Falcon b0298cead8
ref: move train outside of setup training (#3297)
* ref: move train outside of setup training

* ref: move train outside of setup training

* ref: move train outside of setup training

* ref: move train outside of setup training
2020-08-31 20:36:52 -04:00
William Falcon bcd13f70b8
ref: run_pretrain_routine -> setup_training (#3294)
* ref: .tune()

* ref: run_pretrain_routine -> setup_training
2020-08-31 18:06:11 -04:00
William Falcon 3a26b4ff5c
ddp backend refactor (#3209) 2020-08-26 20:31:09 -04:00
William Falcon 6bae404bed
ref: ddp backend refactor (3) (#3208)
* ddp backend refactor

* ddp backend refactor
2020-08-26 20:03:09 -04:00
William Falcon a8daf914f8
ddp backend refactor (#3207) 2020-08-26 19:10:24 -04:00
William Falcon ff3c2f4cff
ddp backend refactor (#3204) 2020-08-26 18:43:28 -04:00
William Falcon 6c3cec3a3c
training amp scaling refactor (#3135) 2020-08-24 19:59:46 -04:00
William Falcon 0b3cb3c955
ref: moved ___step_end hooks (#3130)
* moved eval hooks

* moved eval hooks

* moved eval hooks

* moved eval hooks

* moved eval hooks

* moved eval hooks

* moved eval hooks
2020-08-24 17:50:47 -04:00
William Falcon 527b9dca36
refactored ddp backend forward (#3119) 2020-08-24 07:33:14 -04:00
Adrian Wälchli 188e06c261
ddp fix for trainer.test() + add basic ddp tests (#2997)
* add ddp script variations

* add ddp test

* rename

* shell

* test

* test

* try call

* try without subprocess

* test

* display the error

* list all variations

* try string

* try copy env

* debug

* pythonpath

* path

* update test

* change

* simple ddp test

* replace

* remove random port

* random port

* str

* clean up

* check run spawn

* clean up

* docs

* docs

* update test

* docs

* changelog

* changelog
2020-08-16 11:19:57 -04:00
William Falcon e7794eb79a
Fixes #2407 (#2981)
* fix gpus index error
2020-08-14 16:22:48 -04:00
Jirka Borovec 5bce06c050
nb. devices (#2973) 2020-08-14 11:37:21 +02:00
William Falcon 0c264689cb
Fixes #2942 (#2969)
* Fixes #2942

* doc fix
2020-08-13 21:54:57 -04:00
Jirka Borovec 4354690e55
add apex test (#2921)
* add apex test

* rename

* level

* events

* wrap

* evt

* miss

* apex

* apex

* apex

* apex

* apex

* apex

* Update tests/models/test_amp.py

Co-authored-by: William Falcon <waf2107@columbia.edu>

* notes

* notes

Co-authored-by: William Falcon <waf2107@columbia.edu>
2020-08-13 10:03:13 -04:00
Phil e3528afae3
Move optimizer creation after device placement for ddp backends. (#2904) 2020-08-12 06:34:59 -04:00
Jirka Borovec a6e7aa7796
allow using apex with any PT version (#2865)
* wip

* setup

* type

* name

* wip

* docs

* imports

* fix if

* fix if

* use_amp

* Apply suggestions from code review

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* Apply suggestions from code review

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* fix tests

* Apply suggestions from code review

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* fix tests

* todos

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2020-08-08 11:07:32 +02:00
Jirka Borovec b7d72706c3
clean imports (#2867)
* clean imports

* miss
2020-08-08 00:33:51 +02:00
Jirka Borovec f8c058215f
simplify tests & cleaning (#2588)
* simplify

* tmpdir

* revert

* clean

* accel

* types

* test

* edit test acc

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* Update test acc

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2020-08-07 23:22:05 +02:00