Carlos Mocholí
5edb3b5126
[TPU] Do not delete jobs with "keepalive" in the name ( #17411 )
...
[TPU] Do not delete jobs with "keepalive" in the name
2023-04-19 14:02:02 +02:00
Jirka Borovec
26a549e87f
drop failing e2e quick app ( #17409 )
...
* drop failing e2e quick app
* codeowners
* Apply suggestions from code review
2023-04-19 12:31:56 +02:00
Adrian Wälchli
21ae19c69f
Add dynamo RunIf skip condition ( #17404 )
...
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-04-19 01:09:42 +02:00
Liyang90
47726391ad
[TPU] Add support for PJRT from PyTorch/XLA 2.0 ( #17352 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2023-04-18 18:52:36 +02:00
Carlos Mocholí
b0b5b9ce57
[TPU] Fix workflow ( #17406 )
...
* [TPU] Fix workflow
* Whitespace
2023-04-18 18:11:03 +02:00
Carlos Mocholí
90ad36795a
[TPU] Refactor availability check ( #17384 )
2023-04-18 17:52:13 +02:00
Dmitry Frolov
7893d22b6f
Bump default E2E tests image version ( #17403 )
...
Bump default E2E image version
Co-authored-by: Jirka Borovec <6035284+Borda@users.noreply.github.com>
2023-04-18 10:54:59 -04:00
Carlos Mocholí
3fd4d96cf3
Fix PyTorch MPS test failure in master ( #17405 )
2023-04-18 16:53:53 +02:00
Ryan Smith
8d5a91a2dd
Update Fabric CPU tests to work on GPU machines ( #17391 )
2023-04-18 14:03:40 +00:00
Carlos Mocholí
369ae61f85
[TPU] Fix workflow condition ( #17379 )
2023-04-18 08:30:56 -04:00
Carlos Mocholí
ae8838b0a1
GPU suggestion does not require devices anymore ( #17217 )
2023-04-18 13:12:16 +02:00
Carlos Mocholí
3511de3693
Replace range with RandomDataset in example ( #17325 )
2023-04-18 13:10:31 +02:00
belerico
cf5e63dcfa
Fabric PPO example `share_data` flag ( #17397 )
2023-04-18 12:56:12 +02:00
Ruslan Mukhametshin
7312b3f321
Add `header_style` argument to `RichModelSummary` callback ( #16788 )
2023-04-18 12:55:55 +02:00
Jirka Borovec
6da4b0f490
skip some App tests ( #17401 )
...
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-04-18 11:20:07 +02:00
Carlos Mocholí
c22276f945
Update `from_datasets` to support arbitrary iterables ( #17402 )
2023-04-17 23:27:24 +00:00
Egor Spirin
bb4e495627
Wraps sharded model for proper access to it `state_dict` in `FSDP` strategy ( #16558 )
...
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2023-04-18 01:03:43 +02:00
Adrian Wälchli
affe72cc3e
Add test for compiling FSDP model in Fabric ( #17394 )
2023-04-17 15:34:23 -04:00
Carlos Mocholí
16377339cb
Remove reference to outdated Kaggle tutorial ( #17390 )
2023-04-17 19:12:44 +02:00
Adrian Wälchli
71df9962b6
Update pip upgrade command in CI ( #17395 )
2023-04-17 17:05:57 +02:00
Carlos Mocholí
b02d9c1590
Cleanup the `is_distributed` property [TPU] ( #17381 )
2023-04-17 16:14:07 +02:00
Adrian Wälchli
0dc42f523e
Save and load sharded checkpoints with FSDP in Fabric ( #17323 )
...
Co-authored-by: Luca Antiga <luca.antiga@gmail.com>
Co-authored-by: Jirka Borovec <6035284+Borda@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-04-16 14:11:49 -04:00
Carlos Mocholí
13905a3464
Support all `CombinedLoader` modes during evaluation ( #17163 )
2023-04-16 20:01:52 +02:00
Ishan Dutta
e9d6856355
NumPy to Torch for lightning/fabric ( #17291 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2023-04-15 15:21:56 +00:00
Carlos Mocholí
97a61868fb
Sync module states during non-fit ( #17370 )
...
Co-authored-by: Jirka Borovec <6035284+Borda@users.noreply.github.com>
2023-04-15 02:35:51 +00:00
Ishan Dutta
9becc15ddf
Remove numpy from src/lightning/pytorch and use torch only ( #17278 )
...
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-04-15 02:26:56 +00:00
Arturo Ghinassi
b5d6bb6e8d
Added return in `convert_zero_checkpoint_to_fp32_state_dict` ( #17342 )
...
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-04-15 02:18:16 +00:00
OneSixth
a69fbcf71b
Add a warning to the detect_anomaly flag. ( #17380 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2023-04-15 02:08:48 +00:00
Ryan Smith
e1ce887fde
Fix `load_from_checkpoint` to return model on correct device ( #17308 )
...
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Jirka Borovec <6035284+Borda@users.noreply.github.com>
2023-04-15 02:08:00 +00:00
Alican Bozkurt
84eb82a595
change STEP_OUTPUT type from dict to mapping ( #17387 )
2023-04-15 02:02:11 +00:00
Jirka Borovec
8e7b949ef3
App: drop flaky doctest example ( #17366 )
...
* skip app dcotest
* Proposed change
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* delete
---------
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-04-14 23:48:03 +02:00
Carlos Mocholí
da883e73b9
[TPU] Use `pull_request_target` event ( #17377 )
2023-04-14 17:10:41 +02:00
Carlos Mocholí
05b481e3ae
[TPU] Add testing matrix with PJRT ( #17368 )
...
* Replace GKE in CI with manual gcloud usage
* Fix XRT test
* Reduce timeout to 35 minutes
* [TPU] Run tests with PJRT
* runtime as part of the job name
* CHANGELOG
* Update for app too
2023-04-14 16:39:13 +02:00
Carlos Mocholí
856b29fc72
[TPU] Replace GKE in CI with manual gcloud usage ( #17362 )
2023-04-14 12:47:31 +00:00
Adrian Wälchli
50662eb078
Fixes around `Strategy.set_world_ranks` ( #16966 )
...
* don't call set_world_ranks in xla strategy
* update
* fabric and other strategies
* CHANGELOG
* Typos
* Reuse test
---------
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2023-04-13 17:45:42 +02:00
Adrian Wälchli
17548d5cfd
Update link to image in Fabric PPO example ( #17360 )
2023-04-13 13:08:55 +02:00
Jirka Borovec
f14ee9edbc
docker: fix building PL image ( #17353 )
2023-04-12 17:52:42 -04:00
Carlos Mocholí
afd3123486
Update CHANGELOG after the 2.0.1 release ( #17235 )
2023-04-12 17:44:17 +02:00
Carlos Mocholí
04fb30bd97
Update fastapi dependency pins ( #17173 )
...
* Update fastapi dependency pins
* Apply suggestions from code review
* Update test.txt
* Update requirements/app/base.txt
* Revert "Update requirements/app/base.txt"
This reverts commit 59918ffc6c
.
* cloud update
* Bad merge
* fastapi 0.69.0 which pins starlette 0.15.0
* https://github.com/pydantic/pydantic/issues/1985
* Avoid CVE: https://github.com/tiangolo/fastapi/pull/3213
* Strict trio
* Skip windows test
---------
Co-authored-by: Jirka Borovec <6035284+Borda@users.noreply.github.com>
2023-04-12 17:33:56 +02:00
Carlos Mocholí
472333c9b8
Replace codecov pip package with codecov uploader ( #17349 )
2023-04-12 17:24:48 +02:00
Carlos Mocholí
1aa23267ab
Various Fabric documentation updates ( #17236 )
2023-04-11 23:05:57 +00:00
Carlos Mocholí
0489f2efed
[TPU] v4 support ( #17227 )
2023-04-11 22:24:11 +00:00
Gerson Kroiz
7b8fd85e01
[TPU] Remove error check for IterableDatasets ( #17331 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2023-04-11 22:04:17 +00:00
Adrian Wälchli
0c02c44c6d
Simplified setup of optimizers in FSDP ( #17309 )
...
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-04-11 20:13:41 +00:00
Adrian Wälchli
51697a8bd6
Combined setup of model and optimizer with FSDP ( #17305 )
...
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-04-11 19:58:53 +00:00
Jirka Borovec
86dbe38913
fix missing tutorials ( #17311 )
2023-04-11 15:18:18 -04:00
dependabot[bot]
78cd51a2ce
Bump peter-evans/create-pull-request from 4 to 5 ( #17313 )
...
Bumps [peter-evans/create-pull-request](https://github.com/peter-evans/create-pull-request ) from 4 to 5.
- [Release notes](https://github.com/peter-evans/create-pull-request/releases )
- [Commits](https://github.com/peter-evans/create-pull-request/compare/v4...v5 )
---
updated-dependencies:
- dependency-name: peter-evans/create-pull-request
dependency-type: direct:production
update-type: version-update:semver-major
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-04-11 20:56:40 +02:00
Ethan Harris
621644d0bc
App: Fix frontends when using multiprocessing in the cloud ( #17324 )
...
* App: Fix frontends when using multiprocessing in the cloud
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Update CHANGELOG.md
---------
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-04-11 12:28:44 -04:00
Carlos Mocholí
b2717f6878
[TPU] Improve TPU workflow ( #17237 )
...
* Trigger TPU tests if [TPU] is in the PR title
* Remove TODO
* checkgroup
* DEBUG
* Update
* 1h timeout
* Update
* Update
* Update
* Update
* Remove DEBUG
2023-04-11 16:33:32 +02:00
Carlos Mocholí
ebb62214c6
checkgroup
2023-04-11 16:30:58 +02:00