Commit Graph

103 Commits

Author SHA1 Message Date
William Falcon 4c4b090c66
depre (#4088) 2020-10-12 05:58:31 -04:00
William Falcon b9f2682b7d
clean docs, enable grad clip in manual mode (#4078)
* docs

* docs
2020-10-11 13:12:35 -04:00
William Falcon 7ffe05a3d1
ref: accelerator names (#4066)
* ref: accelerator names

* docs
2020-10-11 01:05:14 -04:00
William Falcon a4b9221fc5
ref: decouple apex second attemp part n/n (#4065)
* ref: decouple apex second attemp part n/n

* ref: decouple apex second attemp part n/n
2020-10-10 22:04:50 -04:00
William Falcon 0281b077d8
ref: decouple apex second attemp part 10/n (#4064)
* ref: decouple apex second attemp part 9/n

* ref: decouple apex second attemp part 9/n

* ref: decouple apex second attemp part 9/n
2020-10-10 20:05:05 -04:00
William Falcon dca86c310e
ref: decouple apex second attemp part 6/n (#4060)
* ref: decouple apex second attemp part 6/n

* ref: decouple apex second attemp part 6/n
2020-10-10 15:28:25 -04:00
William Falcon ce2edf1192
ref: decouple apex second attemp part 4/n (#4056)
* ref: decouple apex second attemp part 4/n

* ref: decouple apex second attemp part 4/n

* Update lightning.py

* ref: decouple apex second attemp part 4/n
2020-10-10 12:19:22 -04:00
William Falcon 3a6717ca34
ref: decouple apex second attemp part 3/n (#4055) 2020-10-10 11:05:57 -04:00
William Falcon 7285613974
ref: decouple apex second attemp part 2/n (#4054)
* ref: decouple apex second attemp part 2/n

* ref: decouple apex second attemp part 2/n
2020-10-10 10:24:20 -04:00
William Falcon e854d3744c
ref: decouple apex second attemp part 1/n (#4052) 2020-10-10 09:53:02 -04:00
William Falcon 5b261a230e
enable passing in custom accelerators (#4050)
* enable custom accelerators

* ref: finish decoupling apex, LM and backward

* ref: finish decoupling apex, LM and backward

* ref: finish decoupling apex, LM and backward
2020-10-10 09:21:08 -04:00
William Falcon 2b255a3df4
ref: enable custom clusters (1/n) (#4048)
* enable cluster plugins

* enable cluster plugins + test backend choices

* enable cluster plugins + test backend choices

* enable cluster plugins + test backend choices

* enable cluster plugins + test backend choices

* enable cluster plugins + test backend choices

* enable cluster plugins + test backend choices
2020-10-10 08:09:29 -04:00
William Falcon 0c42aa03fd
enables plugins (#4041)
* plugin hardware

* plugin hardware

* plugin hardware
2020-10-09 22:03:46 -04:00
William Falcon 048a816be3
added tests for the training epoch end (#3967) 2020-10-07 22:27:36 -04:00
William Falcon b922409624
clean and organize fit (#3938)
* clean and organize fit

* clean and organize fit

* clean and organize fit

* clean and organize fit

* clean and organize fit
2020-10-07 11:04:10 -04:00
William Falcon 9c415d2c71
moves configure ddp to each backend (#3924)
* moves configure ddp to each backend

* moves configure ddp to each backend

* moves configure ddp to each backend

* added torch manual seed in test_mean_error

* test for complicated batch structure

* test for complicated batch structure

* test for complicated batch structure

Co-authored-by: ananyahjha93 <ananya@pytorchlightning.ai>
2020-10-07 00:50:16 -04:00
William Falcon e3007ffe0c
moves sync bn to each backend (#3925) 2020-10-06 22:42:33 -04:00
William Falcon af5887c0aa
fixed ddp flag crash (#3927) 2020-10-06 22:41:08 -04:00
Lezwon Castelino 69833dad5b
Added check to verify xla device is TPU (#3274)
* tpu device check

* replaced with xmp spawn

* Revert "replaced with xmp spawn"

This reverts commit 6835380f

* replaced all instances of XLA_AVAILABLE

* moved inner_f to global scope

* made refactors

* added changelog

* added TPU_AVAILABLE variable

* fix codefactor issues

* removed form trainer and early stopping

* add TORCHXLA_AVAILABLE check

* added tests

* refactoring

* Update pytorch_lightning/utilities/xla_device_utils.py

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* updated function names

* fixed bug

* updated CHANGELOG.md

* added todo

* added type hints

* isort and black

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: William Falcon <waf2107@columbia.edu>
2020-10-06 19:54:37 +02:00
Sean Naren e4a56fa5cf
Ensure global seed exists before passing into env subprocess.Popen call (#3904) 2020-10-06 12:31:49 -04:00
William Falcon 70e792344a
test selecting the correct backend. temp backends while slurm and TE are decoupled (#3848)
* test selecting the correct backend. tem backends while slurm and TE are decoupled

* test selecting the correct backend. tem backends while slurm and TE are decoupled
2020-10-04 15:44:50 -04:00
William Falcon 2c21f7d7e2
ref: adding compute environments (2/n) (#3842)
* ref: adding compute environments (2/n)

* ref: adding compute environments (2/n)

* ref: adding compute environments (2/n)

* ref: adding compute environments (2/n)
2020-10-04 08:48:46 -04:00
Lezwon Castelino 4da240ea1b
added broadcast option to tpu (#3814)
* added broadcast option to tpu

* add device

* moved tpu broadcast to tpu_backend

* removed Lightning dist

* decode bytes

* pep8 fix

* fix bug

* test for broadcast

* updated changelog
2020-10-04 07:47:33 -04:00
William Falcon 1f8ff7c48c
ref: callback system and init ddp (1/n) (#3836)
* refactored callback system and init ddp

* refactored callback system and init ddp

* refactored callback system and init ddp

* refactored callback system and init ddp
2020-10-03 23:39:17 -04:00
William Falcon 35d1111994
[WIP] ref: decoupled ddp, ddp spawn (finish 3733) (#3819)
* ref: finish #3733

* remove deprecated test

* remove deprecated test

* remove deprecated test

* remove deprecated test

* remove deprecated test

* remove deprecated test

* remove deprecated test

* remove deprecated test

* remove deprecated test

* remove deprecated test

* remove deprecated test

* remove deprecated test

* remove deprecated test

* Update pytorch_lightning/accelerators/ddp_backend.py

Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>

* remove deprecated test

* remove deprecated test

* remove deprecated test

Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
2020-10-03 14:05:31 -04:00
William Falcon ed1450a293
ref: clean up ddp before final fix (#3817)
* ref: clean up ddp before final fix

* ref: clean up ddp before final fix

* ref: clean up ddp before final fix

* ref: clean up ddp before final fix

* ref: clean up ddp before final fix
2020-10-03 12:01:02 -04:00
William Falcon 0838c6bfce
ref: decoupled ddp2 (#3816) 2020-10-03 09:02:35 -04:00
William Falcon a677833f84
ref: separate slurm from ddp (#3809)
* ref: separate slurm from ddp

* ref: separate te from ddp

* ref: merge

* ref: merge

* ref: merge
2020-10-02 23:08:34 -04:00
William Falcon 74484edecd
ref: separate te from ddp (#3810)
* ref: separate te from ddp

* ref: separate te from ddp

* ref: separate te from ddp
2020-10-02 21:00:51 -04:00
William Falcon a28528cc8b
ref: remove weight loading hack for ddp_cpu (#3808) 2020-10-02 19:28:50 -04:00
William Falcon afa43837a4
ref: part 8 of #3733 (#3806) 2020-10-02 18:46:18 -04:00
ananthsub 3ab730e316
Swap torch.load for fsspec load in ddp spawn backend (#3787)
* Update ddp_spawn_backend.py

* Update ddp_cpu_spawn_backend.py

* log

Co-authored-by: Jirka Borovec <jirka@pytorchlightning.ai>
2020-10-02 21:00:01 +02:00
William Falcon 7c6ed1fa28
ref: part 7 of #3733 (#3802)
* ref: part 7 of #3733

* ref: part 7 of #3733
2020-10-02 14:23:27 -04:00
Jirka Borovec 62eabdd535
revert backend types (#3788)
* revert backend types

* todo

* todo
2020-10-02 06:18:44 -04:00
Akihiro Nitta ebc1b23fa3
Use `raise .. from ..` to explicitly chain exceptions (#3750)
* Fix exception chaining

* names

* Change exception names for consistency

Co-authored-by: Nicki Skafte <skaftenicki@gmail.com>

* Change exception names for consistency

Co-authored-by: Nicki Skafte <skaftenicki@gmail.com>

Co-authored-by: Jirka Borovec <jirka@pytorchlightning.ai>
Co-authored-by: Nicki Skafte <skaftenicki@gmail.com>
2020-10-01 21:45:44 +02:00
William Falcon 622c5c3982
ref: part 4 of #3733 (#3773)
* ref: part 4 of #3733

* ref: part 4 of #3733

* ref: part 4 of #3733

* ref: part 4 of #3733
2020-10-01 11:26:58 -04:00
William Falcon 440f837f6d
ref: part a of #3733 (#3766)
* ref: part a of #3733

* ref: part a of #3733
2020-10-01 08:15:23 -04:00
Lezwon Castelino 8be002ccc7
skip best_model_path if checkpoint_callback is None (#2962)
* skip best_model_path if checkpoint_callback is None

* removed test
2020-10-01 06:57:26 -04:00
William Falcon a38d108a68
add dist lib to enable syncing anything across devices (#3762)
* add dist lib to enable syncing anything across devices
2020-10-01 01:21:38 -04:00
Jirka Borovec 31a36f04df
define distributed as a type (#3740)
* define type

* miss

* Apply suggestions from code review

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

* miss

* warn

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2020-09-30 08:33:01 -04:00
William Falcon c41ea86b35
ref: move backends back to individual files (1/5) (ddp_cpu) (#3712)
* ref: make each backend independent for easier debugging and independent debugging

* ref: make each backend independent for easier debugging and independent debugging

* ref: make each backend independent for easier debugging and independent debugging

* ref: make each backend independent for easier debugging and independent debugging

* ref: make each backend independent for easier debugging and independent debugging

* ref: make each backend independent for easier debugging and independent debugging

* ref: test val epoch end

* ref: test val epoch end
2020-09-29 01:59:18 -04:00
Rohit Gupta 783750547d
disable optimizers setup during testing (#3059)
* disable configure_optimizers during testing

* minor changes

* hvd and ddp

* fix precision during testing

* fix ddp

* fix amp

* fix cpu

* update dp

* simplify optimizers

* add test

* codefactor

* ref optimizer setup

* chlog

* suggestions

* isort

* rebased with master
2020-09-29 01:09:04 +02:00
William Falcon 931995b55b
remove flake 8 (#3687) 2020-09-27 20:40:02 -04:00
William Falcon 031274c25d
fix dp issues + update examples and test examples (#3618)
* fix dp

* fix dp

* fix dp

* fix dp

* fix examples

* fix examples

* fix examples

* fix examples

* fix examples

* fix examples

* fix examples

* fix examples

* fix examples

* fix examples

* fix examples

* fix examples

* fix examples

* fix examples

* fix examples

* fix examples

* fix examples

* fix examples
2020-09-23 00:19:46 -04:00
Adrian Wälchli a71d62d840
Fix deterministic behavior in ddp_spawn (#3573)
* docs

* set env variable

* fix

* changelog
2020-09-20 19:42:58 -04:00
William Falcon 890588a9ee
ref: precision plugins 1/n (#3504)
* ref: precision plugins 1/n

* ref: precision plugins 1/n
2020-09-15 09:56:12 -04:00
William Falcon 810b445097
ref: apex plugin (#3502)
* ref: apex plugin

* ref: apex plugin

* ref: apex plugin
2020-09-15 06:02:42 -04:00
William Falcon 6bcfa8b068
ref: merge backends x/n (#3482) 2020-09-12 16:28:29 -04:00
William Falcon 518a0c0e92
ref: merge backends x/n (#3480) 2020-09-12 15:27:11 -04:00
William Falcon 0045119b3f
ref: merge backends x/n (#3478)
* ref: merge backends x/n

* ref: merge backends x/n

* ref: merge backends x/n

* ref: merge backends x/n
2020-09-12 13:55:55 -04:00