lightning

Commit Graph

Author	SHA1	Message	Date
Adrian Wälchli	e87c11a592	Upgrade GPU CI to PyTorch 1.13 (#15583 ) Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: Jirka <jirka.borovec@seznam.cz>	2022-11-12 14:58:37 +00:00
Carlos Mocholí	a3edbec501	Delete unused TPU CI files (#15611 ) Co-authored-by: Akihiro Nitta <nitta@akihironitta.com> Co-authored-by: Jirka Borovec <6035284+Borda@users.noreply.github.com>	2022-11-11 18:30:02 +00:00
Carlos Mocholí	6ba00af1e0	Drop PyTorch 1.9 support (#15347 ) * Drop 1.9 * Everything else * READMEs * Missed some * IPU skips * Remove exception type * Add back	2022-11-10 08:59:13 -05:00
Jerome Anand	e79a69a9ee	Upgrade to HPU release 1.7.0 (#15616 ) Signed-off-by: Jerome <janand@habana.ai> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2022-11-10 10:47:17 +01:00
Jirka Borovec	fb9dae8df3	ci: update install lite & cut pkg dependency (#14517 ) * ci: update install lite * try without lite in req file * ci: install * app * init * Revert "app" This reverts commit `f3f09e7888`. * ci: cpu * ci: gpu * pkg * env * bench * trigger * notes * prune * set version * fix version * git reset * hpu, ipu * adjust * --hard * git checkout * Apply suggestions from code review Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: Akihiro Nitta <nitta@akihironitta.com> * rc2 * L * docs * hpu Co-authored-by: awaelchli <aedu.waelchli@gmail.com> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: Akihiro Nitta <nitta@akihironitta.com> Co-authored-by: Luca Antiga <luca.antiga@gmail.com>	2022-10-31 20:50:51 +01:00
Carlos Mocholí	7f3e9de726	Fix TPU tests on master builds (#15349 )	2022-10-31 15:58:02 +00:00
Jirka Borovec	95ae393ca8	LAI: creating mirror package (#15105 ) * placeholder * mirror + prune * makedir * setup * ci * ci * name * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * ci clean * empty * py * parallel * doctest * flake8 * ci * typo * replace * clean * Apply suggestions from code review * re.sub * fix UI path * full replace * ui path? * replace * updates * regex * ci * fix * ci * path * ci * replace * Update .actions/setup_tools.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * also convert lightning_lite tests for PL tests to adapt mocking paths * fix app example test * update logger propagation for PL tests * update logger propagation for PL tests * Apply suggestions from code review * Revert "update logger propagation for PL tests" This reverts commit `c1a5e119c7`. * playwright * py * update import in tests * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * try edit import in overwrite * debug code * rev playwright * Revert "try edit import in overwrite" This reverts commit `c02f766521`. * ci: adjust examples * adjust examples cloud * mock lightning_app * Install assistant dependencies * lightning * setup * Apply suggestions from code review * Apply suggestions from code review Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * Apply suggestions from code review * disable cache * move doctest to install * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * ) * echo ./ * ci * lru * revert disabling cache, prints * ci * prune ci jobs * prune ci jobs * training loop standalone tests * add sys modules cleanup fixture * make use of fixture * revert standalone * ci e2e * fix imports in lightning * fix imports of lightning in tests * Revert "make use of fixture" This reverts commit `c15efdd205`. * Revert other commits for fixtures * revert use of fixture * py3.9 * fix mocking * fix paths * hack mocking * docs * Apply suggestions from code review * rev suggestion * Minor changes to the parametrizations * Update checkgroup with the new and changed jobs * include frontend dir * cli * fix imports and entry point * Revert standalone * rc1 * e2e on staging * Revert "Revert standalone" This reverts commit `9df96685b8`. * groups * to * ci: pt ver * docker * Apply suggestions from code review * Copy over changes from previous commit to other groups * Add back changes from bad merge * Uppercase step name everywhere * update * ci * ci: lai oldest Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com> Co-authored-by: Justus Schock <justus.schock@posteo.de> Co-authored-by: manskx <ahmed.mansy156@gmail.com> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: thomas chaton <thomas@grid.ai> Co-authored-by: Luca Antiga <luca.antiga@gmail.com>	2022-10-27 12:32:49 +02:00
Carlos Mocholí	375ab53861	Migrate TPU tests to GitHub actions (#14687 ) * Migrate TPU tests to GitHub actions * No working dir * Keep _target * Dont skip draft * CHECK_SLEEP * Not yet * Remove recurrent cleanup script * Set secrets * a step cannot have both the `uses` and `run` keys * Version $PYTHON_VER was not found in the local cache * can't load package ... ($GOPATH not set) * The `set-env` command is disabled * Try updating go * Match timeout * simplify path * More cleanup * Install coverage. Unmark draft * Update .github/workflows/ci-pytorch-test-tpu.yml * DEBUG echo * Revert "DEBUG echo" This reverts commit `4011856e6e`. * More debug * SSH * Im stupid * Remove always() * Forgot some Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Luca Antiga <luca.antiga@gmail.com>	2022-10-21 20:01:39 +02:00
otaj	099580cf2b	Assistant fixes (#15221 ) Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2022-10-20 18:23:47 +00:00
Justus Schock	775e9ebc0f	Assistant for Unified Package (#15207 ) * Update assistant and workflow files * Update .actions/assistant.py Co-authored-by: otaj <6065855+otaj@users.noreply.github.com> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: otaj <ota@lightning.ai>	2022-10-20 14:17:27 +00:00
Jirka Borovec	4b9d028541	CI: enable CI run for PT 1.13 (#15128 ) * Apply suggestions from code review * enable CI to run for PT 1.13 Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2022-10-20 08:33:56 +00:00
ver217	2fef6d9403	Add ColossalAI strategy (#14224 ) Co-authored-by: HELSON <c2h214748@gmail.com> Co-authored-by: rohitgr7 <rohitgr1998@gmail.com> Co-authored-by: otaj <ota@lightning.ai> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2022-10-11 13:59:09 +02:00
Jirka Borovec	5f106957f7	CI: Use self-hosted Azure GPU runners (#14632 ) * move config * Apply suggestions from code review Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: Akihiro Nitta <nitta@akihironitta.com> Co-authored-by: otaj <6065855+otaj@users.noreply.github.com>	2022-10-05 10:43:54 +00:00
Carlos Mocholí	7ef87464dd	Refactor XLA and TPU checks across codebase (#14550 )	2022-10-04 22:54:14 +00:00
Carlos Mocholí	3028fd287d	Fix TPU test CI (#14926 ) * Fix TPU test CI * +x first * Lite first to uncovert errors faster * Fixes * One more * Simplify XLALauncher wrapping to avoid pickle error * debug * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Debug commit successful. Trying local definitions * Require tpu for mock test * ValueError: The number of devices must be either 1 or 8, got 4 instead * Fix mock test * Simplify call, rely on defaults * Skip OSError for now. Maybe upgrading will help * Simplify launch tests, move some to lite * Stricter typing * RuntimeError: Accessing the XLA device before processes have spawned is not allowed. * Revert "RuntimeError: Accessing the XLA device before processes have spawned is not allowed." This reverts commit `f65107ebf3`. * Alternative boring solution to the reverted commit * Fix failing test on CUDA machine * Workarounds * Try latest mkl * Revert "Try latest mkl" This reverts commit `d06813aa67`. * Wrong exception * xfail * Mypy * Comment change * Spawn launch refactor * Accept that we cannot lazy init now * Fix mypy and launch test failures * The base dockerfile already includes mkl-2022.1.0 - what if we use it? * try a different mkl version * Revert mkl version changes Co-authored-by: awaelchli <aedu.waelchli@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Akihiro Nitta <nitta@akihironitta.com>	2022-10-03 09:13:33 -04:00
Akihiro Nitta	e47d5a2376	CI: Combine conda and full testing into a single workflow (#14387 ) * Remove conda job * Remove conda job from readme * Remove conda jobs from checkgroup * Remove conda from docker builds * Remove base-conda dockerfile * Rewrite the strategy matrix while keeping equivalent * Run the workflow on this branch * Revert "Rewrite the strategy matrix while keeping equivalent" This reverts commit e54298d60e57cffbf8107890987be3fe4a006c77. * Add PyTorch versions * Run on draft and disable unrelated costly CI * Revert "Run the workflow on this branch" This reverts commit 51ed8b905d8926b630dce4817124bd486135d3ec. * tmp: Lightweight relevant CI * Fix CI pathfilter * Update matrix * Drop skipping logic * pip list * reorder pip list * tmp: lightweight ci * Install specified pytorch * Fix torch installation * Uncomment steps * Increase timeout * bad merge * Revert "Run on draft and disable unrelated costly CI" This reverts commit `eb5dc5e6bd`. * Update checkgroup * Update docs and remove Python/PyTorch versions * Remove pip-list * Fail if wrong pytorch version installed * Add Python 3.8, PyTorch 1.9 job * tmp: remove azure jobs * tmp: remove dockers * tmp: remove others * Run all combinations * Include oldest * Exclude no Python 3.10 distributions * tmp: no concurrency * tmp: double timeout * Add pytest log reporter * Add pytest-reportlog * Fewer jobs * Revert "tmp: no concurrency" This reverts commit `4a7978dcb3`. * fix artifact name * Revert test reports * Revert unrelated changes * Revert unrelated changes * Add the combination of ex-conda jobs * Update checkgroup * revert timeout * remove conda job * revert docker build workflow file Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>	2022-09-29 22:39:04 -04:00
Jerome Anand	136d57312d	Upgrade HPU image to release 1.6.1 (#14932 )	2022-09-29 11:22:27 +00:00
otaj	b06f9b7468	Improve building times of IPU docker image (#14934 )	2022-09-29 09:55:12 +00:00
Akarsha Rao	f167d76508	CI: HPU support v1.6.0 release (#14794 ) * Update hpu-tests.yml to support v1.6.0 release * Update Dockerfile	2022-09-20 12:26:27 +02:00
Carlos Mocholí	dfa570ef9f	Run CircleCI with the HEAD sha, not the base (#14625 ) * Run CircleCI with the HEAD sha, not the base * Different solution	2022-09-12 11:25:54 -04:00
Rui Wang	40868f7f43	Add bagua support for CUDA 11.6 images (#14529 ) * Add support for bagua-cuda116 * Remove bagua-cuda115 from installation Co-authored-by: Akihiro Nitta <nitta@akihironitta.com>	2022-09-09 20:07:25 +00:00
Adrian Wälchli	291dc1b615	Standalone Lite CI setup (#14451 ) Co-authored-by: Jirka <jirka.borovec@seznam.cz> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>	2022-09-01 22:13:12 +00:00
Carlos Mocholí	00aefa82b7	Cleanup TPU CI script error management (#14389 )	2022-08-31 11:38:54 +00:00
Jirka Borovec	74304db6f8	CI: update TPU docker (#14448 )	2022-08-31 00:47:38 +05:30
Carlos Mocholí	3ba0f56b18	Remove support for the deprecated torchtext legacy (#14375 )	2022-08-26 20:01:51 +00:00
otaj	1ae14ca754	[CI] fix horovod tests (#14382 )	2022-08-25 17:30:06 +00:00
Adrian Wälchli	34f98836fb	Fix silent TPU CI failures (#14034 ) Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2022-08-24 13:24:24 +00:00
otaj	0bd5703b81	[CI] Trick Bagua into installing appropriate wheel in GPU tests (#14380 ) Bagua trick needs to be replicated on everywhere applicable	2022-08-24 08:59:49 +00:00
otaj	bb634310e7	[CI] Bump CUDA in Docker images to 11.6.1 (#14348 ) * bump cuda in docker images to 11.6.1 * PUSH TO HUB. REVERT THIS! * conda forge for 11.6 * cuda 11.5 * revert conda changes * 11.6 back again * 11.6 back again, all of them * maybe all passes now * maybe all passes now * final push * Revert "PUSH TO HUB. REVERT THIS!" This reverts commit `602bfce224`. * Apply suggestions from code review Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>	2022-08-23 12:10:52 -04:00
Akihiro Nitta	d5f35ece72	CI/CD: Add CUDA version to docker image tags (#13831 ) * append cuda version to tags * revertme: push to hub * Update docker readme * Build base-conda-py3.9-torch1.12-cuda11.3.1 * Use new images in conda tests * revertme: push to hub * Revert "revertme: push to hub" This reverts commit `0f7d534b2a`. * Revert "revertme: push to hub" This reverts commit `46a05fccbb`. * Run conda if workflow edited * Run gpu testing if workflow edited * Use new tags in release/Dockerfile * Build base-cuda and PL release images with all combinations * Update release docker * Update conda from py3.9-torch1.12 to py3.10-torch.1.12 * Fix ubuntu version * Revert conda * revertme: push to hub * Don't build Python 3.10 for now... * Fix pl release builder * updating version contribute to the error? https://github.com/docker/buildx/issues/456 * Update actions' versions * Update slack user to notify * Don't use 11.6.0 to avoid bagua incompatibility * Don't use 11.1, and use 11.1.1 * Update .github/workflows/ci-pytorch_test-conda.yml Co-authored-by: Luca Medeiros <67411094+luca-medeiros@users.noreply.github.com> * Update trigger * Ignore artfacts from tutorials * Trim docker images to distribute * Add an image for tutorials * Update conda image 3.8x1.10 * Try different conda variants * No need to set cuda for conda jobs * Update who to notify ipu failure * Don't push * update filenaem Co-authored-by: Luca Medeiros <67411094+luca-medeiros@users.noreply.github.com>	2022-08-10 10:37:50 +00:00
Akihiro Nitta	0883971ccb	CI: Update XLA from 1.9 to 1.12 (#14013 )	2022-08-05 05:04:45 -04:00
Adrian Wälchli	caaf35689c	Improvements to standalone scripts (#13840 ) Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2022-07-28 23:33:22 +00:00
Carlos Mocholí	1299e4f984	Run GPU tests with PyTorch 1.12 (#13716 ) Co-authored-by: Jirka <jirka.borovec@seznam.cz>	2022-07-28 19:37:57 +05:30
Adrian Wälchli	fff62f0ae5	Fix TPU testing and collect all tests (#11098 ) Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>	2022-07-27 15:40:40 +00:00
Adrian Wälchli	a8d7b4476c	Fix PyTorch spelling errors (#13774 ) * Fix PyTorch spelling errors * more	2022-07-25 12:51:16 -04:00
Jirka Borovec	64e8e8eb4b	CI: debug HPU flow (#13419 ) * Update the hpu-tests.yml to pull docker from vault * fire & sudo * habana-gaudi-hpus * Check the driver status on gaudi server (#13718) Co-authored-by: arao <arao@habana.ai> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Akarsha Rao <94624926+raoakarsha@users.noreply.github.com>	2022-07-20 12:35:01 +02:00
Jirka Borovec	e23756b15d	CI: debug TPU failing tests (#13679 ) * list pytest * docs * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * list * test * fix GK Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2022-07-15 17:40:04 -04:00
Jirka Borovec	954fd7e5a3	bump base NGC image (#13346 )	2022-07-15 21:36:19 +00:00
Jirka Borovec	aa62fe36df	add testing PT 1.12 (#13386 ) * add testing PT 1.12 * Fix quantization tests * Fix another set of tests * Fix check since https://github.com/pytorch/pytorch/pull/80139 is only going to be available for 1.13 * Skip this test for now for 1.12 Co-authored-by: SeanNaren <sean@grid.ai>	2022-07-15 19:41:23 +02:00
Adrian Wälchli	bb5e8be2e8	Simplify TPUSpawn rank management (#11163 ) Co-authored-by: Kaushik B <kaushikbokka@gmail.com> Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>	2022-07-14 15:43:41 +00:00
Kaushik B	56ff89743b	Fix TPU circleci tests (#13432 ) * Fix TPU circleci tests * Fix TPU circleci tests * Fix TPU circleci tests * Fix TPU circleci tests * Fix TPU circleci tests * Fix rank issue * Fix rank issue * debug alternative fix * Revert properties Co-authored-by: awaelchli <aedu.waelchli@gmail.com>	2022-07-11 13:25:32 -04:00
Jirka Borovec	30dce29005	fix PL release docker (#13439 )	2022-06-29 19:36:36 +02:00
Jirka Borovec	b137ef7134	CI: fix requirements freeze (#13441 ) * allow freeze * ci * typo * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * ipu Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2022-06-29 09:35:57 -04:00
awaelchli	511f1a6515	Reroute profiler to profilers (#12308 ) Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: Akihiro Nitta <nitta@akihironitta.com>	2022-06-22 20:55:39 -04:00
Adrian Wälchli	b08259d536	Add `XLAEnvironment` plugin (#11330 ) * add xla environment class * add api reference * integrate * use xenv * remove properties Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Kaushik B <kaushikbokka@gmail.com> Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com> Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>	2022-06-22 10:57:50 +02:00
Carlos Mocholí	ad87d2cad0	Future 5/n: Move requirements (#13306 ) Co-authored-by: Jirka <jirka.borovec@seznam.cz>	2022-06-21 17:11:33 +02:00
Akarsha Rao	388ea92386	Update HPU Dockerfile to latest version (#13344 )	2022-06-21 17:08:44 +02:00
Jirka Borovec	8ceab223c0	Fix repository links (#13304 ) * GH org rename Lightning-AI * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * repo name Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2022-06-15 19:33:43 -04:00
Jirka Borovec	ab59f308b1	Future 4/n: test & legacy in test/ folder (#13295 ) * move: legacy >> test/ * move: tests >> test/ * rename unittests * update CI * tests4pl * tests_pytorch * proxi * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * ci * link * cli * standalone * fixing * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * . * Apply suggestions from code review Co-authored-by: Akihiro Nitta <nitta@akihironitta.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * alone * test -> tests * Standalone fixes * ci * Update * More fixes * Fix coverage * Fix mypy * mypy * Empty-Commit * Fix * mypy just for pl * Fix standalone Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Akihiro Nitta <nitta@akihironitta.com> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2022-06-15 18:10:49 -04:00
Jirka Borovec	9cc714cdd1	Future 2/n: stand-alone examples (#13294 ) * move: pl_examples >> src/ * convert pl_examples package to plain examples * update CI for examples * ci * missing * install	2022-06-15 08:53:51 -04:00
Jirka Borovec	759e89df21	Future 1/n: package in src/ folder (#13293 ) * move: pytorch_lightning >> src/ * update setup & install * update CI * ci * update CI for examples * Self review * mypy Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * ci * make * docs * typo * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * ci: gpu * . * hpu * typing * docs * tpu Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2022-06-14 20:54:55 -04:00
Carlos Mocholí	0cf9d73d28	Drop PyTorch 1.8 support (#13155 ) * Drop PyTorch 1.8 support * Missed update * Skip profiler test until supported * Upgrade ipu dockerfile pytorch version * Update XLA version	2022-06-14 20:46:44 -04:00
Jirka Borovec	78ff201c7e	Update CI setup (#13291 ) * drop mamba * use legacy GPU machines	2022-06-14 17:11:54 +00:00
Akarsha Rao	bfa8b7be2d	Create hpu-ci-runner Dockerfile (#13239 ) * Create hpu-ci-runner Dockerfile * Add ENTRYPOINT script 'start.sh' to hpu-ci-runner * rename dirs * ci * add docker * Fix build failure * Fix build failure * Fix title of nightly ci runner build * Fix comments * Fix comments Co-authored-by: Jirka <jirka.borovec@seznam.cz> Co-authored-by: Akihiro Nitta <nitta@akihironitta.com>	2022-06-08 16:02:16 -04:00
Akihiro Nitta	3c5a8a833e	Decouple pulling legacy checkpoints from existing GHA workflows and docker files (#13185 ) * Add pull-legacy-checkpoints action * Replace pulls with the new action and script * Simplify	2022-06-02 15:39:14 +02:00
Jirka Borovec	de4ab1c027	update NGC docker (#13136 ) * update docker * Apply suggestions from code review Co-authored-by: Akihiro Nitta <nitta@akihironitta.com> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2022-06-02 12:54:13 +00:00
Jirka Borovec	fab2ff35ad	CI: Azure - multiple configs (#12984 ) * CI: Azure - multiple configs * names * benchmark * Apply suggestions from code review Co-authored-by: Akihiro Nitta <nitta@akihironitta.com> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2022-05-14 01:59:03 +00:00
Jirka Borovec	fec9a09672	add freeze for development and full range for install (#12994 ) * freeze versions * unfreeze * dependabot * Apply suggestions from code review Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * fix all req * ... * use base * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix refs * Apply suggestions from code review Co-authored-by: Akihiro Nitta <nitta@akihironitta.com> * Apply suggestions from code review * dockers Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Akihiro Nitta <nitta@akihironitta.com>	2022-05-12 09:14:18 -04:00
Eric Wiener	3f78c4ca7a	Track CPU stats with DeviceStatsMonitor (#11795 ) Co-authored-by: ananthsub <ananth.subramaniam@gmail.com> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Kaushik B <kaushikbokka@gmail.com> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2022-05-10 10:57:38 +00:00
Jirka Borovec	783ec43a85	parse strategies as own extras (#12975 ) * parse strategies as own extras * prune devel * Update Makefile Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * revert parse_requirements Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2022-05-09 09:25:53 -04:00
Jirka Borovec	7ce948edb6	Unpin CUDA docker image for GPU CI (#12373 ) * unpin CUDA docker image for GPU CI * Apply suggestions from code review Co-authored-by: Aki Nitta <nitta@akihironitta.com> Co-authored-by: Akihiro Nitta <akihiro@pytorchlightning.ai> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2022-05-06 02:56:57 +00:00
Jirka Borovec	bb51e2a55b	Merge pull request #12723 from PyTorchLightning/req/strategies Separate strategies' requirements	2022-05-04 10:06:02 -04:00
Akihiro Nitta	ecd135e939	Update nvidia gpg key to fix nightly docker builds (#12930 ) * Update gpg key * Use curl instead of wget * Install key manually	2022-05-02 09:00:44 +02:00
Akihiro Nitta	98b206e836	Use cmake installed with apt (#12907 )	2022-04-28 07:44:52 +00:00
Akihiro Nitta	ace6a5827b	Update building docker images (#12837 ) Co-authored-by: Akihiro Nitta <akihiro@pytorchlightning.ai>	2022-04-21 22:10:42 +00:00
Jirka Borovec	16b9580958	build more dockers & slack fails (#12675 ) * build dockers * add slack * Apply suggestions from code review Co-authored-by: Akihiro Nitta <nitta@akihironitta.com>	2022-04-13 17:24:08 +02:00
Jirka Borovec	f9b69ce5b0	CI: check docker requires (#12677 ) * check docker requires * ci update * bagua * conda * cuda	2022-04-12 00:29:54 +09:00
Kaushik B	bd035af78a	Fix TPU CI (#12419 )	2022-03-23 11:35:38 +05:30
Jirka Borovec	fe940e195d	CI: update prune_pkgs (#12382 )	2022-03-21 12:50:50 +00:00
four4fish	1eff3b53c1	Update fairscale version (#11567 ) Co-authored-by: Aki Nitta <nitta@akihironitta.com> Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com> Co-authored-by: Jirka <jirka.borovec@seznam.cz> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2022-03-21 11:38:55 +00:00
Jirka Borovec	efa870eebc	Docker: fix NCCL building Horovod (#12318 ) * Horovod w. MPI * nccl_built * fix	2022-03-18 14:23:19 +00:00
Jirka Borovec	7ee690758c	CI: fix running PT 1.11 (#12304 ) * fix fire * horovod * assistant * cmake * u20 * cuda * -j2 * fix mypy Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2022-03-12 09:00:20 +00:00
Jirka Borovec	bc8172856f	aggregate multiple helper scripts to single CLI (#11147 ) * nightly release * min version * fire	2022-03-11 11:13:43 +00:00
Jirka Borovec	1144673cd9	CI: sanity check for req. pkgs (#11819 ) * CI: sanity check for req. pkgs * scripts * rename * gcsfs ? * rich ! * install extra * move * set -e Co-authored-by: Aki Nitta <nitta@akihironitta.com> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2022-03-11 09:20:47 +00:00
Jirka Borovec	3b4061f39a	CI: enable testing for PT 1.11 (#11792 ) * enable PT 1.11 * horovod * Apply suggestions from code review Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: Aki Nitta <nitta@akihironitta.com>	2022-03-10 18:38:47 +00:00
Jirka Borovec	8577ef7bba	Skip horovod 0.24.0 only (#12248 ) * try skip horovod 0.24.0 only * HOROVOD_BUILD_CUDA_CC_LIST * fix test Co-authored-by: Aki Nitta <nitta@akihironitta.com> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2022-03-10 16:01:08 +00:00
wangraying	a0655611de	Add bagua installation in dockerfile (#11283 ) Co-authored-by: Aki Nitta <nitta@akihironitta.com> Co-authored-by: Jirka <jirka.borovec@seznam.cz>	2022-02-24 15:17:31 +01:00
Jirka Borovec	7bc87015ea	Unblock GPU CI (#11934 ) Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>	2022-02-16 21:15:44 +01:00
Aki Nitta	0a1b8b880d	Fix horovod installation `base-cuda` Dockerfile (#11811 ) * pip install --user * add checks * rm unrelated comment * consistent format * Fail if horovod not found Co-authored-by: Jirka <jirka.borovec@seznam.cz> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2022-02-10 16:48:33 +09:00
Aki Nitta	86b177ebe5	Fix `apex` installation path in Dockerfile (#11596 ) * empty commit * Specify apex installation target directory * pip install --user	2022-01-27 20:14:16 -05:00
Kaushik B	650c710efa	Rename training plugin test files & names to strategy (#11303 )	2022-01-04 14:32:45 +01:00
Carlos Mocholí	3692eba807	Drop Python 3.6 support (#11117 )	2021-12-21 17:06:15 +00:00
Kaushik B	2a5d05b562	Fix tpu spawn plugin test (#11131 )	2021-12-18 02:53:37 +00:00
Sean Naren	c66cd12445	Remove partitioning of model in ZeRO 3 (#10655 )	2021-12-17 12:36:53 +00:00
Jirka Borovec	e8659bd40e	update NGC (#10770 )	2021-11-29 14:14:37 +00:00
Carlos Mocholí	d2aaf6b4cc	Upgrade CI after the 1.10 release (#10075 )	2021-11-10 17:59:10 +01:00
Carlos Mocholí	939a861853	Update Python testing (#10269 )	2021-11-04 18:26:24 +01:00
Carlos Mocholí	70570f9eaa	Minimize the number of docker jobs (#10202 ) Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>	2021-10-29 07:48:05 +01:00
Carlos Mocholí	3a4e9970d6	Pin fairscale version (#10200 )	2021-10-27 23:24:17 +00:00
Carlos Mocholí	a0e45dc071	Some minor CI cleanup (#10088 )	2021-10-26 13:58:20 +02:00
Kaushik B	af4a8f1950	Refactor tests for TPU Accelerator (#9718 ) Co-authored-by: tchaton <thomas@grid.ai>	2021-10-14 19:45:15 +00:00
Danielle Pintz	940b910d27	[2/4] Add DeviceStatsMonitor callback (#9712 ) Co-authored-by: ananthsub <ananth.subramaniam@gmail.com> Co-authored-by: thomas chaton <thomas@grid.ai> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: Kaushik B <kaushikbokka@gmail.com> Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>	2021-10-13 18:29:36 +00:00
edwardpwtsoi	7c6efbc8a8	Resolved wrong mv usage for extracted directory (#9678 ) Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>	2021-10-05 12:56:33 +00:00
Jirka Borovec	0e6ee9c39d	CI: add mdformat (#8673 ) * add mdformat * exclude chlog * fix *** Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2021-08-03 18:19:09 +00:00
Jirka Borovec	66cc505339	update NGC (#8652 ) * update NGC Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2021-08-02 16:05:36 +00:00
Jirka Borovec	abbcfa1ab7	fix CI for PT 1.10 (#8526 ) * fix CI for PT 1.10 * Apply suggestions from code review Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2021-07-23 19:24:31 +02:00
thomas chaton	8d0df6fad2	[Feat] Improve TPU CI (#6078 ) * i * i * i * i * i * i * i * i * i * i * i * i * i * i * i * i * i * i * i * i * i * i * i * i * i * i * i * i * i * i * i * i * i * i * i * update * update ci * i * i * i * i	2021-07-19 19:43:21 +05:30
Jirka Borovec	74a09a23f1	CI: support PT 1.10 (#8133 ) * prepare PT 1.10 * dockers * fixes * readme	2021-07-14 18:04:33 +03:00
Carlos Mocholí	6ce77a102b	Set minimum PyTorch version to 1.6 (#8288 ) Co-authored-by: Jirka <jirka.borovec@seznam.cz>	2021-07-13 17:12:49 +00:00
Jirka Borovec	ed6d4baea2	ngc (#8242 )	2021-07-02 13:12:45 +01:00

1 2 3 4 5

222 Commits