Commit Graph

43 Commits

Author SHA1 Message Date
Jerome Anand b2e98d6166
Run HPU tests only with yml (#12469) (#12478)
* Run HPU tests only with yml (#12469)

Execute supported tests serially

Signed-off-by: Jerome <janand@habana.ai>
2022-03-28 16:50:20 +09:00
Jerome Anand 812c2dc3d3
Add support for Habana accelerator (HPU) (#11808)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
Co-authored-by: Aki Nitta <nitta@akihironitta.com>
Co-authored-by: thomas chaton <thomas@grid.ai>
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
Co-authored-by: four4fish <88516121+four4fish@users.noreply.github.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Kaushik B <kaushikbokka@gmail.com>
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
Co-authored-by: jjenniferdai <89552168+jjenniferdai@users.noreply.github.com>
Co-authored-by: Kushashwa Ravi Shrimali <kushashwaravishrimali@gmail.com>
Co-authored-by: Akarsha Rao <94624926+raoakarsha@users.noreply.github.com>
Co-authored-by: Jirka <jirka.borovec@seznam.cz>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.comk-Pro.local>
2022-03-25 10:24:52 +00:00
Kaushik B 089fcb91a0
Collect and run all IPU tests (#11170)
* Collect and run all ipu tests

* Update azure pipeline

* Increase pytest verbosity

* Update RunIf

Co-authored-by: Akihiro Nitta <nitta@akihironitta.com>
2022-03-25 14:20:22 +09:00
Kaushik B 7b0d1183db
Update `gpus` flag with `accelerator` and `devices` flag (#12156)
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2022-03-23 19:52:12 +00:00
Aki Nitta fa7aa0babe
Update nightly GPU benchmark pool (#12366) 2022-03-21 18:17:34 +01:00
Jirka Borovec fe940e195d
CI: update prune_pkgs (#12382) 2022-03-21 12:50:50 +00:00
four4fish 1eff3b53c1
Update fairscale version (#11567)
Co-authored-by: Aki Nitta <nitta@akihironitta.com>
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
Co-authored-by: Jirka <jirka.borovec@seznam.cz>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2022-03-21 11:38:55 +00:00
Aki Nitta b8b855d411
Pin Docker image for testing on GPUs (#12368)
* Pin docker image sha
2022-03-18 01:16:54 +00:00
Jirka Borovec 1144673cd9
CI: sanity check for req. pkgs (#11819)
* CI: sanity check for req. pkgs
* scripts
* rename
* gcsfs ?
* rich !
* install extra
* move
* set -e

Co-authored-by: Aki Nitta <nitta@akihironitta.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2022-03-11 09:20:47 +00:00
Jirka Borovec cadcc67386
add Azure HPU agent (#12258) 2022-03-08 19:20:43 +04:00
Louis Taylor 73bda54e63
CI: update poplar sdk version (#12226) 2022-03-04 23:49:30 +00:00
Jirka Borovec 7bc87015ea
Unblock GPU CI (#11934)
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2022-02-16 21:15:44 +01:00
wangraying 8c07d8bf90
Add `Trainer(strategy="bagua")` (#11146)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
Co-authored-by: Sean Naren <sean@grid.ai>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
Co-authored-by: thomas chaton <thomas@grid.ai>
2022-02-04 17:02:09 +00:00
Sean Naren c66cd12445
Remove partitioning of model in ZeRO 3 (#10655) 2021-12-17 12:36:53 +00:00
Carlos Mocholí 152eb57def
Rename special to standalone (#10779) 2021-11-26 17:13:14 +00:00
Carlos Mocholí 5788789f01
Move benchmarks into the test directory (#10614) 2021-11-19 03:07:33 +01:00
Carlos Mocholí 939a861853
Update Python testing (#10269) 2021-11-04 18:26:24 +01:00
thomas chaton 9e844d9db6
Lite Docs and Example Improvements (#10303)
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-11-02 16:13:01 +01:00
Jirka Borovec edea0d4bc3
switch azure pool (#10266) 2021-11-01 11:42:11 +00:00
Carlos Mocholí 3a4e9970d6
Pin fairscale version (#10200) 2021-10-27 23:24:17 +00:00
Kaushik B 5e8829b97d
(1/n) tests: Use strategy flag instead of accelerator for training strategies (#9931)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-10-16 20:40:25 +05:30
Sean Naren 83acb8671d
Update DeepSpeed version, fix failing tests (#9898) 2021-10-11 22:35:33 +00:00
Kaushik B 3118480d60
Disable benchmark ci on PRs (#9430)
Co-authored-by: Jirka <jirka.borovec@seznam.cz>
2021-09-10 17:00:44 +05:30
Carlos Mocholí 0dfc6a18bd
Call any trainer function from the `LightningCLI` (#7508) 2021-08-28 04:43:14 +00:00
Adrian Wälchli de22e40095
restrict deepspeed version in CI (#8951) 2021-08-17 14:02:27 +01:00
Carlos Mocholí 93ab24d1ee
Replace DataLoader sampler once for IPUs (#8858) 2021-08-16 11:28:05 +02:00
thomas chaton 9e61de2063
Torch Elastic DDP DeadLock bug fix (#8655)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-08-02 21:48:43 +02:00
thomas chaton 85bba06529
update (#8674) 2021-08-02 11:56:09 +02:00
Jirka Borovec 470842f5c8
CI: validate JSON & fix benchmark (#8567)
* CI: validate JSON

* as GHA

* PT1.8

* 32g

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-07-28 18:09:15 +02:00
Kaushik B 4c79b3a5b3
Parity test (#7832)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Jirka <jirka.borovec@seznam.cz>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2021-07-21 02:53:53 +05:30
Adrian Wälchli 96729fc45a
update links for collect_env_details.py script (#8436) 2021-07-19 11:26:09 +00:00
Carlos Mocholí ae1fd6a201
Unblock GPU CI (#8456)
* Debug

* Increase SHM size

* Debug

* Refactor MNIST imports

* Undo debugging

* Prints
2021-07-19 09:41:18 +02:00
Carlos Mocholí 4184d7e738
Refactor GPU examples tests (#8294) 2021-07-06 13:14:04 +01:00
Stephen McGroarty f6a5bb2eee
Bump IPU version (#8290) 2021-07-06 10:27:34 +00:00
Adrian Wälchli e7139ab9f7
Support `DDPPlugin` to be used on CPU (#6208)
* Skip test due to 'Python bus error'

* Debug NCCL

* Remove NCCL_DEBUG statement

* Revert "Skip test due to 'Python bus error'"

This reverts commit e0a3e8785d.

* fix

* add test

* changelog

* yapf

* patch os environ

* make a special test

* destroy pg

* debug

* revert

* revert

* problematic test

* skip

* try the fixture

* test

* update sensitive test

* update changelog

* remove comment

* update wrong test

* update test name

* parameterization

* Revert "parameterization"

This reverts commit b0542f43f59c5ce66800883b5e2f0c66a97408cc.

* remove conftest

* ignore test

* teardown

* fix merge

* deep speed parameterization

* uncomment test

* update chlog

* update changelog

* split tests

* update test


update test


update test


update test

* update test comments

* unroll test

* unroll test

* unroll test

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* increase shm

* sudo

* unroll ipu

* Revert "sudo"

This reverts commit 6cc68c1478.

* Revert "increase shm"

This reverts commit 8c27163483.

* x

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* find guilty test

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* POPTORCH_WAIT_FOR_IPU=1

* move test

* redo parameterize for ipu

* de-comment test

* move chlog

* Update tests/accelerators/test_accelerator_connector.py

Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>

* Update tests/accelerators/test_accelerator_connector.py

Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>

Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
2021-07-02 12:00:24 +01:00
Carlos Mocholí 2c43bfc5ef
GPU CI - run torch 1.8 (LTS) (#8116) 2021-06-24 16:56:43 +00:00
Carlos Mocholí dd340a6598
Actually show deprecation warnings and their line level [2/2] (#8002)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-06-21 18:51:53 +02:00
Sean Naren f7459f5328
DeepSpeed Infinity Update (#7234)
* Update configs to match latest API

* Ensure we move the entire model to device before configure optimizer is called

* Add missing param

* Expose parameters

* Update references, drop local rank as it's now infered from the environment variable

* Fix ref

* Force install deepspeed 0.3.16

* Add guard for init

* Update pytorch_lightning/plugins/training_type/deepspeed.py

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* Revert type checking

* Install master for CI for testing purposes

* Update CI

* Fix tests

* Add check

* Update versions

* Set precision

* Fix

* See if i can force upgrade

* Attempt to fix

* Drop

* Add changelog

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-06-14 16:38:28 +00:00
Sean Naren 10839376e2
[IPU] Add special tests for IPUs 2/n (#7833)
* Add special tests for IPUs, run nvprof only if cuda available

* Add missing min_gpu
2021-06-04 23:23:09 +05:30
Carlos Mocholí e16d4fbdee
CI code cleaning (#7615) 2021-05-21 11:35:12 +00:00
Louis Taylor 1a62f7f5ff
ci: adjust torch version requirements in IPU pipeline (#7383)
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2021-05-06 18:20:05 +05:30
Louis Taylor b64aea637c
CI: move azure-pipelines config to separate directory (#7276)
* CI: move azure pipelines to separate directory

This removes some extra clutter in the top level as we add more
pipelines.

* rename

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-05-04 10:50:16 -04:00
Louis Taylor d413bab5ac
Add initial IPU CI job (#7251)
This adds an azure-pipelines job so we can verify the runners are
connected correctly. Since the IPU branch isn't merged, it won't yet
give any actual IPU test coverage.
2021-05-04 08:19:41 +00:00