Akarsha Rao
f167d76508
CI: HPU support v1.6.0 release ( #14794 )
...
* Update hpu-tests.yml to support v1.6.0 release
* Update Dockerfile
2022-09-20 12:26:27 +02:00
Carlos Mocholí
dfa570ef9f
Run CircleCI with the HEAD sha, not the base ( #14625 )
...
* Run CircleCI with the HEAD sha, not the base
* Different solution
2022-09-12 11:25:54 -04:00
Rui Wang
40868f7f43
Add bagua support for CUDA 11.6 images ( #14529 )
...
* Add support for bagua-cuda116
* Remove bagua-cuda115 from installation
Co-authored-by: Akihiro Nitta <nitta@akihironitta.com>
2022-09-09 20:07:25 +00:00
Adrian Wälchli
291dc1b615
Standalone Lite CI setup ( #14451 )
...
Co-authored-by: Jirka <jirka.borovec@seznam.cz>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2022-09-01 22:13:12 +00:00
Carlos Mocholí
00aefa82b7
Cleanup TPU CI script error management ( #14389 )
2022-08-31 11:38:54 +00:00
Jirka Borovec
74304db6f8
CI: update TPU docker ( #14448 )
2022-08-31 00:47:38 +05:30
Carlos Mocholí
3ba0f56b18
Remove support for the deprecated torchtext legacy ( #14375 )
2022-08-26 20:01:51 +00:00
otaj
1ae14ca754
[CI] fix horovod tests ( #14382 )
2022-08-25 17:30:06 +00:00
Adrian Wälchli
34f98836fb
Fix silent TPU CI failures ( #14034 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2022-08-24 13:24:24 +00:00
otaj
0bd5703b81
[CI] Trick Bagua into installing appropriate wheel in GPU tests ( #14380 )
...
Bagua trick needs to be replicated on everywhere applicable
2022-08-24 08:59:49 +00:00
otaj
bb634310e7
[CI] Bump CUDA in Docker images to 11.6.1 ( #14348 )
...
* bump cuda in docker images to 11.6.1
* PUSH TO HUB. REVERT THIS!
* conda forge for 11.6
* cuda 11.5
* revert conda changes
* 11.6 back again
* 11.6 back again, all of them
* maybe all passes now
* maybe all passes now
* final push
* Revert "PUSH TO HUB. REVERT THIS!"
This reverts commit 602bfce224
.
* Apply suggestions from code review
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2022-08-23 12:10:52 -04:00
Akihiro Nitta
d5f35ece72
CI/CD: Add CUDA version to docker image tags ( #13831 )
...
* append cuda version to tags
* revertme: push to hub
* Update docker readme
* Build base-conda-py3.9-torch1.12-cuda11.3.1
* Use new images in conda tests
* revertme: push to hub
* Revert "revertme: push to hub"
This reverts commit 0f7d534b2a
.
* Revert "revertme: push to hub"
This reverts commit 46a05fccbb
.
* Run conda if workflow edited
* Run gpu testing if workflow edited
* Use new tags in release/Dockerfile
* Build base-cuda and PL release images with all combinations
* Update release docker
* Update conda from py3.9-torch1.12 to py3.10-torch.1.12
* Fix ubuntu version
* Revert conda
* revertme: push to hub
* Don't build Python 3.10 for now...
* Fix pl release builder
* updating version contribute to the error? https://github.com/docker/buildx/issues/456
* Update actions' versions
* Update slack user to notify
* Don't use 11.6.0 to avoid bagua incompatibility
* Don't use 11.1, and use 11.1.1
* Update .github/workflows/ci-pytorch_test-conda.yml
Co-authored-by: Luca Medeiros <67411094+luca-medeiros@users.noreply.github.com>
* Update trigger
* Ignore artfacts from tutorials
* Trim docker images to distribute
* Add an image for tutorials
* Update conda image 3.8x1.10
* Try different conda variants
* No need to set cuda for conda jobs
* Update who to notify ipu failure
* Don't push
* update filenaem
Co-authored-by: Luca Medeiros <67411094+luca-medeiros@users.noreply.github.com>
2022-08-10 10:37:50 +00:00
Akihiro Nitta
0883971ccb
CI: Update XLA from 1.9 to 1.12 ( #14013 )
2022-08-05 05:04:45 -04:00
Adrian Wälchli
caaf35689c
Improvements to standalone scripts ( #13840 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2022-07-28 23:33:22 +00:00
Carlos Mocholí
1299e4f984
Run GPU tests with PyTorch 1.12 ( #13716 )
...
Co-authored-by: Jirka <jirka.borovec@seznam.cz>
2022-07-28 19:37:57 +05:30
Adrian Wälchli
fff62f0ae5
Fix TPU testing and collect all tests ( #11098 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
2022-07-27 15:40:40 +00:00
Adrian Wälchli
a8d7b4476c
Fix PyTorch spelling errors ( #13774 )
...
* Fix PyTorch spelling errors
* more
2022-07-25 12:51:16 -04:00
Jirka Borovec
64e8e8eb4b
CI: debug HPU flow ( #13419 )
...
* Update the hpu-tests.yml to pull docker from vault
* fire & sudo
* habana-gaudi-hpus
* Check the driver status on gaudi server (#13718 )
Co-authored-by: arao <arao@habana.ai>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Akarsha Rao <94624926+raoakarsha@users.noreply.github.com>
2022-07-20 12:35:01 +02:00
Jirka Borovec
e23756b15d
CI: debug TPU failing tests ( #13679 )
...
* list pytest
* docs
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* list
* test
* fix GK
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2022-07-15 17:40:04 -04:00
Jirka Borovec
954fd7e5a3
bump base NGC image ( #13346 )
2022-07-15 21:36:19 +00:00
Jirka Borovec
aa62fe36df
add testing PT 1.12 ( #13386 )
...
* add testing PT 1.12
* Fix quantization tests
* Fix another set of tests
* Fix check since https://github.com/pytorch/pytorch/pull/80139 is only going to be available for 1.13
* Skip this test for now for 1.12
Co-authored-by: SeanNaren <sean@grid.ai>
2022-07-15 19:41:23 +02:00
Adrian Wälchli
bb5e8be2e8
Simplify TPUSpawn rank management ( #11163 )
...
Co-authored-by: Kaushik B <kaushikbokka@gmail.com>
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
2022-07-14 15:43:41 +00:00
Kaushik B
56ff89743b
Fix TPU circleci tests ( #13432 )
...
* Fix TPU circleci tests
* Fix TPU circleci tests
* Fix TPU circleci tests
* Fix TPU circleci tests
* Fix TPU circleci tests
* Fix rank issue
* Fix rank issue
* debug alternative fix
* Revert properties
Co-authored-by: awaelchli <aedu.waelchli@gmail.com>
2022-07-11 13:25:32 -04:00
Jirka Borovec
30dce29005
fix PL release docker ( #13439 )
2022-06-29 19:36:36 +02:00
Jirka Borovec
b137ef7134
CI: fix requirements freeze ( #13441 )
...
* allow freeze
* ci
* typo
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* ipu
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2022-06-29 09:35:57 -04:00
awaelchli
511f1a6515
Reroute profiler to profilers ( #12308 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Akihiro Nitta <nitta@akihironitta.com>
2022-06-22 20:55:39 -04:00
Adrian Wälchli
b08259d536
Add `XLAEnvironment` plugin ( #11330 )
...
* add xla environment class
* add api reference
* integrate
* use xenv
* remove properties
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Kaushik B <kaushikbokka@gmail.com>
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2022-06-22 10:57:50 +02:00
Carlos Mocholí
ad87d2cad0
Future 5/n: Move requirements ( #13306 )
...
Co-authored-by: Jirka <jirka.borovec@seznam.cz>
2022-06-21 17:11:33 +02:00
Akarsha Rao
388ea92386
Update HPU Dockerfile to latest version ( #13344 )
2022-06-21 17:08:44 +02:00
Jirka Borovec
8ceab223c0
Fix repository links ( #13304 )
...
* GH org rename Lightning-AI
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* repo name
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2022-06-15 19:33:43 -04:00
Jirka Borovec
ab59f308b1
Future 4/n: test & legacy in test/ folder ( #13295 )
...
* move: legacy >> test/
* move: tests >> test/
* rename unittests
* update CI
* tests4pl
* tests_pytorch
* proxi
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* ci
* link
* cli
* standalone
* fixing
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* .
* Apply suggestions from code review
Co-authored-by: Akihiro Nitta <nitta@akihironitta.com>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* alone
* test -> tests
* Standalone fixes
* ci
* Update
* More fixes
* Fix coverage
* Fix mypy
* mypy
* Empty-Commit
* Fix
* mypy just for pl
* Fix standalone
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Akihiro Nitta <nitta@akihironitta.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2022-06-15 18:10:49 -04:00
Jirka Borovec
9cc714cdd1
Future 2/n: stand-alone examples ( #13294 )
...
* move: pl_examples >> src/
* convert pl_examples package to plain examples
* update CI for examples
* ci
* missing
* install
2022-06-15 08:53:51 -04:00
Jirka Borovec
759e89df21
Future 1/n: package in src/ folder ( #13293 )
...
* move: pytorch_lightning >> src/
* update setup & install
* update CI
* ci
* update CI for examples
* Self review
* mypy
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
* ci
* make
* docs
* typo
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* ci: gpu
* .
* hpu
* typing
* docs
* tpu
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2022-06-14 20:54:55 -04:00
Carlos Mocholí
0cf9d73d28
Drop PyTorch 1.8 support ( #13155 )
...
* Drop PyTorch 1.8 support
* Missed update
* Skip profiler test until supported
* Upgrade ipu dockerfile pytorch version
* Update XLA version
2022-06-14 20:46:44 -04:00
Jirka Borovec
78ff201c7e
Update CI setup ( #13291 )
...
* drop mamba
* use legacy GPU machines
2022-06-14 17:11:54 +00:00
Akarsha Rao
bfa8b7be2d
Create hpu-ci-runner Dockerfile ( #13239 )
...
* Create hpu-ci-runner Dockerfile
* Add ENTRYPOINT script 'start.sh' to hpu-ci-runner
* rename dirs
* ci
* add docker
* Fix build failure
* Fix build failure
* Fix title of nightly ci runner build
* Fix comments
* Fix comments
Co-authored-by: Jirka <jirka.borovec@seznam.cz>
Co-authored-by: Akihiro Nitta <nitta@akihironitta.com>
2022-06-08 16:02:16 -04:00
Akihiro Nitta
3c5a8a833e
Decouple pulling legacy checkpoints from existing GHA workflows and docker files ( #13185 )
...
* Add pull-legacy-checkpoints action
* Replace pulls with the new action and script
* Simplify
2022-06-02 15:39:14 +02:00
Jirka Borovec
de4ab1c027
update NGC docker ( #13136 )
...
* update docker
* Apply suggestions from code review
Co-authored-by: Akihiro Nitta <nitta@akihironitta.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2022-06-02 12:54:13 +00:00
Jirka Borovec
fab2ff35ad
CI: Azure - multiple configs ( #12984 )
...
* CI: Azure - multiple configs
* names
* benchmark
* Apply suggestions from code review
Co-authored-by: Akihiro Nitta <nitta@akihironitta.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2022-05-14 01:59:03 +00:00
Jirka Borovec
fec9a09672
add freeze for development and full range for install ( #12994 )
...
* freeze versions
* unfreeze
* dependabot
* Apply suggestions from code review
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
* fix all req
* ...
* use base
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* fix refs
* Apply suggestions from code review
Co-authored-by: Akihiro Nitta <nitta@akihironitta.com>
* Apply suggestions from code review
* dockers
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Akihiro Nitta <nitta@akihironitta.com>
2022-05-12 09:14:18 -04:00
Eric Wiener
3f78c4ca7a
Track CPU stats with DeviceStatsMonitor ( #11795 )
...
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Kaushik B <kaushikbokka@gmail.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2022-05-10 10:57:38 +00:00
Jirka Borovec
783ec43a85
parse strategies as own extras ( #12975 )
...
* parse strategies as own extras
* prune devel
* Update Makefile
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
* revert parse_requirements
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2022-05-09 09:25:53 -04:00
Jirka Borovec
7ce948edb6
Unpin CUDA docker image for GPU CI ( #12373 )
...
* unpin CUDA docker image for GPU CI
* Apply suggestions from code review
Co-authored-by: Aki Nitta <nitta@akihironitta.com>
Co-authored-by: Akihiro Nitta <akihiro@pytorchlightning.ai>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2022-05-06 02:56:57 +00:00
Jirka Borovec
bb51e2a55b
Merge pull request #12723 from PyTorchLightning/req/strategies
...
Separate strategies' requirements
2022-05-04 10:06:02 -04:00
Akihiro Nitta
ecd135e939
Update nvidia gpg key to fix nightly docker builds ( #12930 )
...
* Update gpg key
* Use curl instead of wget
* Install key manually
2022-05-02 09:00:44 +02:00
Akihiro Nitta
98b206e836
Use cmake installed with apt ( #12907 )
2022-04-28 07:44:52 +00:00
Akihiro Nitta
ace6a5827b
Update building docker images ( #12837 )
...
Co-authored-by: Akihiro Nitta <akihiro@pytorchlightning.ai>
2022-04-21 22:10:42 +00:00
Jirka Borovec
16b9580958
build more dockers & slack fails ( #12675 )
...
* build dockers
* add slack
* Apply suggestions from code review
Co-authored-by: Akihiro Nitta <nitta@akihironitta.com>
2022-04-13 17:24:08 +02:00
Jirka Borovec
f9b69ce5b0
CI: check docker requires ( #12677 )
...
* check docker requires
* ci update
* bagua
* conda
* cuda
2022-04-12 00:29:54 +09:00
Kaushik B
bd035af78a
Fix TPU CI ( #12419 )
2022-03-23 11:35:38 +05:30