lightning

Commit Graph

Author	SHA1	Message	Date
Carlos Mocholí	375ab53861	Migrate TPU tests to GitHub actions (#14687 ) * Migrate TPU tests to GitHub actions * No working dir * Keep _target * Dont skip draft * CHECK_SLEEP * Not yet * Remove recurrent cleanup script * Set secrets * a step cannot have both the `uses` and `run` keys * Version $PYTHON_VER was not found in the local cache * can't load package ... ($GOPATH not set) * The `set-env` command is disabled * Try updating go * Match timeout * simplify path * More cleanup * Install coverage. Unmark draft * Update .github/workflows/ci-pytorch-test-tpu.yml * DEBUG echo * Revert "DEBUG echo" This reverts commit `4011856e6e`. * More debug * SSH * Im stupid * Remove always() * Forgot some Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Luca Antiga <luca.antiga@gmail.com>	2022-10-21 20:01:39 +02:00
otaj	099580cf2b	Assistant fixes (#15221 ) Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2022-10-20 18:23:47 +00:00
Justus Schock	775e9ebc0f	Assistant for Unified Package (#15207 ) * Update assistant and workflow files * Update .actions/assistant.py Co-authored-by: otaj <6065855+otaj@users.noreply.github.com> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: otaj <ota@lightning.ai>	2022-10-20 14:17:27 +00:00
Jirka Borovec	4b9d028541	CI: enable CI run for PT 1.13 (#15128 ) * Apply suggestions from code review * enable CI to run for PT 1.13 Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2022-10-20 08:33:56 +00:00
ver217	2fef6d9403	Add ColossalAI strategy (#14224 ) Co-authored-by: HELSON <c2h214748@gmail.com> Co-authored-by: rohitgr7 <rohitgr1998@gmail.com> Co-authored-by: otaj <ota@lightning.ai> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2022-10-11 13:59:09 +02:00
Jirka Borovec	5f106957f7	CI: Use self-hosted Azure GPU runners (#14632 ) * move config * Apply suggestions from code review Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: Akihiro Nitta <nitta@akihironitta.com> Co-authored-by: otaj <6065855+otaj@users.noreply.github.com>	2022-10-05 10:43:54 +00:00
Carlos Mocholí	7ef87464dd	Refactor XLA and TPU checks across codebase (#14550 )	2022-10-04 22:54:14 +00:00
Carlos Mocholí	3028fd287d	Fix TPU test CI (#14926 ) * Fix TPU test CI * +x first * Lite first to uncovert errors faster * Fixes * One more * Simplify XLALauncher wrapping to avoid pickle error * debug * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Debug commit successful. Trying local definitions * Require tpu for mock test * ValueError: The number of devices must be either 1 or 8, got 4 instead * Fix mock test * Simplify call, rely on defaults * Skip OSError for now. Maybe upgrading will help * Simplify launch tests, move some to lite * Stricter typing * RuntimeError: Accessing the XLA device before processes have spawned is not allowed. * Revert "RuntimeError: Accessing the XLA device before processes have spawned is not allowed." This reverts commit `f65107ebf3`. * Alternative boring solution to the reverted commit * Fix failing test on CUDA machine * Workarounds * Try latest mkl * Revert "Try latest mkl" This reverts commit `d06813aa67`. * Wrong exception * xfail * Mypy * Comment change * Spawn launch refactor * Accept that we cannot lazy init now * Fix mypy and launch test failures * The base dockerfile already includes mkl-2022.1.0 - what if we use it? * try a different mkl version * Revert mkl version changes Co-authored-by: awaelchli <aedu.waelchli@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Akihiro Nitta <nitta@akihironitta.com>	2022-10-03 09:13:33 -04:00
Akihiro Nitta	e47d5a2376	CI: Combine conda and full testing into a single workflow (#14387 ) * Remove conda job * Remove conda job from readme * Remove conda jobs from checkgroup * Remove conda from docker builds * Remove base-conda dockerfile * Rewrite the strategy matrix while keeping equivalent * Run the workflow on this branch * Revert "Rewrite the strategy matrix while keeping equivalent" This reverts commit e54298d60e57cffbf8107890987be3fe4a006c77. * Add PyTorch versions * Run on draft and disable unrelated costly CI * Revert "Run the workflow on this branch" This reverts commit 51ed8b905d8926b630dce4817124bd486135d3ec. * tmp: Lightweight relevant CI * Fix CI pathfilter * Update matrix * Drop skipping logic * pip list * reorder pip list * tmp: lightweight ci * Install specified pytorch * Fix torch installation * Uncomment steps * Increase timeout * bad merge * Revert "Run on draft and disable unrelated costly CI" This reverts commit `eb5dc5e6bd`. * Update checkgroup * Update docs and remove Python/PyTorch versions * Remove pip-list * Fail if wrong pytorch version installed * Add Python 3.8, PyTorch 1.9 job * tmp: remove azure jobs * tmp: remove dockers * tmp: remove others * Run all combinations * Include oldest * Exclude no Python 3.10 distributions * tmp: no concurrency * tmp: double timeout * Add pytest log reporter * Add pytest-reportlog * Fewer jobs * Revert "tmp: no concurrency" This reverts commit `4a7978dcb3`. * fix artifact name * Revert test reports * Revert unrelated changes * Revert unrelated changes * Add the combination of ex-conda jobs * Update checkgroup * revert timeout * remove conda job * revert docker build workflow file Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>	2022-09-29 22:39:04 -04:00
Jerome Anand	136d57312d	Upgrade HPU image to release 1.6.1 (#14932 )	2022-09-29 11:22:27 +00:00
otaj	b06f9b7468	Improve building times of IPU docker image (#14934 )	2022-09-29 09:55:12 +00:00
Akarsha Rao	f167d76508	CI: HPU support v1.6.0 release (#14794 ) * Update hpu-tests.yml to support v1.6.0 release * Update Dockerfile	2022-09-20 12:26:27 +02:00
Carlos Mocholí	dfa570ef9f	Run CircleCI with the HEAD sha, not the base (#14625 ) * Run CircleCI with the HEAD sha, not the base * Different solution	2022-09-12 11:25:54 -04:00
Rui Wang	40868f7f43	Add bagua support for CUDA 11.6 images (#14529 ) * Add support for bagua-cuda116 * Remove bagua-cuda115 from installation Co-authored-by: Akihiro Nitta <nitta@akihironitta.com>	2022-09-09 20:07:25 +00:00
Adrian Wälchli	291dc1b615	Standalone Lite CI setup (#14451 ) Co-authored-by: Jirka <jirka.borovec@seznam.cz> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>	2022-09-01 22:13:12 +00:00
Carlos Mocholí	00aefa82b7	Cleanup TPU CI script error management (#14389 )	2022-08-31 11:38:54 +00:00
Jirka Borovec	74304db6f8	CI: update TPU docker (#14448 )	2022-08-31 00:47:38 +05:30
Carlos Mocholí	3ba0f56b18	Remove support for the deprecated torchtext legacy (#14375 )	2022-08-26 20:01:51 +00:00
otaj	1ae14ca754	[CI] fix horovod tests (#14382 )	2022-08-25 17:30:06 +00:00
Adrian Wälchli	34f98836fb	Fix silent TPU CI failures (#14034 ) Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2022-08-24 13:24:24 +00:00
otaj	0bd5703b81	[CI] Trick Bagua into installing appropriate wheel in GPU tests (#14380 ) Bagua trick needs to be replicated on everywhere applicable	2022-08-24 08:59:49 +00:00
otaj	bb634310e7	[CI] Bump CUDA in Docker images to 11.6.1 (#14348 ) * bump cuda in docker images to 11.6.1 * PUSH TO HUB. REVERT THIS! * conda forge for 11.6 * cuda 11.5 * revert conda changes * 11.6 back again * 11.6 back again, all of them * maybe all passes now * maybe all passes now * final push * Revert "PUSH TO HUB. REVERT THIS!" This reverts commit `602bfce224`. * Apply suggestions from code review Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>	2022-08-23 12:10:52 -04:00
Akihiro Nitta	d5f35ece72	CI/CD: Add CUDA version to docker image tags (#13831 ) * append cuda version to tags * revertme: push to hub * Update docker readme * Build base-conda-py3.9-torch1.12-cuda11.3.1 * Use new images in conda tests * revertme: push to hub * Revert "revertme: push to hub" This reverts commit `0f7d534b2a`. * Revert "revertme: push to hub" This reverts commit `46a05fccbb`. * Run conda if workflow edited * Run gpu testing if workflow edited * Use new tags in release/Dockerfile * Build base-cuda and PL release images with all combinations * Update release docker * Update conda from py3.9-torch1.12 to py3.10-torch.1.12 * Fix ubuntu version * Revert conda * revertme: push to hub * Don't build Python 3.10 for now... * Fix pl release builder * updating version contribute to the error? https://github.com/docker/buildx/issues/456 * Update actions' versions * Update slack user to notify * Don't use 11.6.0 to avoid bagua incompatibility * Don't use 11.1, and use 11.1.1 * Update .github/workflows/ci-pytorch_test-conda.yml Co-authored-by: Luca Medeiros <67411094+luca-medeiros@users.noreply.github.com> * Update trigger * Ignore artfacts from tutorials * Trim docker images to distribute * Add an image for tutorials * Update conda image 3.8x1.10 * Try different conda variants * No need to set cuda for conda jobs * Update who to notify ipu failure * Don't push * update filenaem Co-authored-by: Luca Medeiros <67411094+luca-medeiros@users.noreply.github.com>	2022-08-10 10:37:50 +00:00
Akihiro Nitta	0883971ccb	CI: Update XLA from 1.9 to 1.12 (#14013 )	2022-08-05 05:04:45 -04:00
Adrian Wälchli	caaf35689c	Improvements to standalone scripts (#13840 ) Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2022-07-28 23:33:22 +00:00
Carlos Mocholí	1299e4f984	Run GPU tests with PyTorch 1.12 (#13716 ) Co-authored-by: Jirka <jirka.borovec@seznam.cz>	2022-07-28 19:37:57 +05:30
Adrian Wälchli	fff62f0ae5	Fix TPU testing and collect all tests (#11098 ) Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>	2022-07-27 15:40:40 +00:00
Adrian Wälchli	a8d7b4476c	Fix PyTorch spelling errors (#13774 ) * Fix PyTorch spelling errors * more	2022-07-25 12:51:16 -04:00
Jirka Borovec	64e8e8eb4b	CI: debug HPU flow (#13419 ) * Update the hpu-tests.yml to pull docker from vault * fire & sudo * habana-gaudi-hpus * Check the driver status on gaudi server (#13718) Co-authored-by: arao <arao@habana.ai> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Akarsha Rao <94624926+raoakarsha@users.noreply.github.com>	2022-07-20 12:35:01 +02:00
Jirka Borovec	e23756b15d	CI: debug TPU failing tests (#13679 ) * list pytest * docs * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * list * test * fix GK Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2022-07-15 17:40:04 -04:00
Jirka Borovec	954fd7e5a3	bump base NGC image (#13346 )	2022-07-15 21:36:19 +00:00
Jirka Borovec	aa62fe36df	add testing PT 1.12 (#13386 ) * add testing PT 1.12 * Fix quantization tests * Fix another set of tests * Fix check since https://github.com/pytorch/pytorch/pull/80139 is only going to be available for 1.13 * Skip this test for now for 1.12 Co-authored-by: SeanNaren <sean@grid.ai>	2022-07-15 19:41:23 +02:00
Adrian Wälchli	bb5e8be2e8	Simplify TPUSpawn rank management (#11163 ) Co-authored-by: Kaushik B <kaushikbokka@gmail.com> Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>	2022-07-14 15:43:41 +00:00
Kaushik B	56ff89743b	Fix TPU circleci tests (#13432 ) * Fix TPU circleci tests * Fix TPU circleci tests * Fix TPU circleci tests * Fix TPU circleci tests * Fix TPU circleci tests * Fix rank issue * Fix rank issue * debug alternative fix * Revert properties Co-authored-by: awaelchli <aedu.waelchli@gmail.com>	2022-07-11 13:25:32 -04:00
Jirka Borovec	30dce29005	fix PL release docker (#13439 )	2022-06-29 19:36:36 +02:00
Jirka Borovec	b137ef7134	CI: fix requirements freeze (#13441 ) * allow freeze * ci * typo * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * ipu Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2022-06-29 09:35:57 -04:00
awaelchli	511f1a6515	Reroute profiler to profilers (#12308 ) Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: Akihiro Nitta <nitta@akihironitta.com>	2022-06-22 20:55:39 -04:00
Adrian Wälchli	b08259d536	Add `XLAEnvironment` plugin (#11330 ) * add xla environment class * add api reference * integrate * use xenv * remove properties Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Kaushik B <kaushikbokka@gmail.com> Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com> Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>	2022-06-22 10:57:50 +02:00
Carlos Mocholí	ad87d2cad0	Future 5/n: Move requirements (#13306 ) Co-authored-by: Jirka <jirka.borovec@seznam.cz>	2022-06-21 17:11:33 +02:00
Akarsha Rao	388ea92386	Update HPU Dockerfile to latest version (#13344 )	2022-06-21 17:08:44 +02:00
Jirka Borovec	8ceab223c0	Fix repository links (#13304 ) * GH org rename Lightning-AI * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * repo name Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2022-06-15 19:33:43 -04:00
Jirka Borovec	ab59f308b1	Future 4/n: test & legacy in test/ folder (#13295 ) * move: legacy >> test/ * move: tests >> test/ * rename unittests * update CI * tests4pl * tests_pytorch * proxi * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * ci * link * cli * standalone * fixing * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * . * Apply suggestions from code review Co-authored-by: Akihiro Nitta <nitta@akihironitta.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * alone * test -> tests * Standalone fixes * ci * Update * More fixes * Fix coverage * Fix mypy * mypy * Empty-Commit * Fix * mypy just for pl * Fix standalone Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Akihiro Nitta <nitta@akihironitta.com> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2022-06-15 18:10:49 -04:00
Jirka Borovec	9cc714cdd1	Future 2/n: stand-alone examples (#13294 ) * move: pl_examples >> src/ * convert pl_examples package to plain examples * update CI for examples * ci * missing * install	2022-06-15 08:53:51 -04:00
Jirka Borovec	759e89df21	Future 1/n: package in src/ folder (#13293 ) * move: pytorch_lightning >> src/ * update setup & install * update CI * ci * update CI for examples * Self review * mypy Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * ci * make * docs * typo * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * ci: gpu * . * hpu * typing * docs * tpu Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2022-06-14 20:54:55 -04:00
Carlos Mocholí	0cf9d73d28	Drop PyTorch 1.8 support (#13155 ) * Drop PyTorch 1.8 support * Missed update * Skip profiler test until supported * Upgrade ipu dockerfile pytorch version * Update XLA version	2022-06-14 20:46:44 -04:00
Jirka Borovec	78ff201c7e	Update CI setup (#13291 ) * drop mamba * use legacy GPU machines	2022-06-14 17:11:54 +00:00
Akarsha Rao	bfa8b7be2d	Create hpu-ci-runner Dockerfile (#13239 ) * Create hpu-ci-runner Dockerfile * Add ENTRYPOINT script 'start.sh' to hpu-ci-runner * rename dirs * ci * add docker * Fix build failure * Fix build failure * Fix title of nightly ci runner build * Fix comments * Fix comments Co-authored-by: Jirka <jirka.borovec@seznam.cz> Co-authored-by: Akihiro Nitta <nitta@akihironitta.com>	2022-06-08 16:02:16 -04:00
Akihiro Nitta	3c5a8a833e	Decouple pulling legacy checkpoints from existing GHA workflows and docker files (#13185 ) * Add pull-legacy-checkpoints action * Replace pulls with the new action and script * Simplify	2022-06-02 15:39:14 +02:00
Jirka Borovec	de4ab1c027	update NGC docker (#13136 ) * update docker * Apply suggestions from code review Co-authored-by: Akihiro Nitta <nitta@akihironitta.com> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2022-06-02 12:54:13 +00:00
Jirka Borovec	fab2ff35ad	CI: Azure - multiple configs (#12984 ) * CI: Azure - multiple configs * names * benchmark * Apply suggestions from code review Co-authored-by: Akihiro Nitta <nitta@akihironitta.com> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2022-05-14 01:59:03 +00:00

1 2 3 4

165 Commits