thomas chaton
71f44775c9
Resolve boto3 unable to local credentials ( #19472 )
2024-02-14 10:54:01 +00:00
thomas chaton
b097a4df3f
Improve data processing to enable downloading LAOIN 400M ( #19452 )
2024-02-13 13:23:39 +00:00
Xinyu Yang
47c8f4cba0
bugfix: skip write index.json if no data is wrote. ( #19439 )
2024-02-09 17:08:28 +00:00
Xinyu Yang
7b867c7d91
bugfix: correct node rank ( #19437 )
2024-02-09 15:21:28 +00:00
thomas chaton
4c2fc3b0cb
Add DNS optimize support ( #19429 )
...
* update
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* update
* update
* update
* update
* update
* update
---------
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: thomas <thomas@thomass-MacBook-Pro.local>
2024-02-08 11:14:57 +00:00
thomas chaton
ac9d63f4eb
Lightning Data: Refactor files ( #19424 )
2024-02-08 08:02:08 +00:00
thomas chaton
28a80238a4
Add support for tif ( #19421 )
2024-02-06 15:23:40 +00:00
thomas chaton
7dfc279b3f
Add support for parallelizing processing parquet files across workers and nodes. ( #19400 )
2024-02-05 23:21:25 +00:00
thomas chaton
af7e79a84a
Data Processing: Tiny optimization ( #19389 )
2024-02-01 18:21:54 +00:00
thomas chaton
8280519642
Data Processor: Add is_last argument to know when the last item for the current worker is being processed ( #19383 )
2024-02-01 12:09:06 +00:00
thomas chaton
5a0d2eff8c
map operator: Add support for non absolute input_dir and output_dir ( #19378 )
2024-02-01 08:25:47 +00:00
thomas chaton
28b380610f
StreamingDataloader: Resolve typo ( #19370 )
2024-01-30 16:52:47 +00:00
thomas chaton
322f474978
JPEGSerializer: Fix serializer io.bytes image ( #19369 )
2024-01-30 16:52:25 +00:00
thomas chaton
10c3a71dbd
Bump Lightning Cloud 0.5.64 ( #19372 )
2024-01-30 14:57:11 +00:00
thomas chaton
b0e1ee2469
map operator: Add support for nested folders ( #19366 )
2024-01-29 19:17:28 +00:00
thomas chaton
37a521cad2
map operator: Add weights to evenly distributed works among workers ( #19365 )
2024-01-29 18:27:37 +00:00
thomas chaton
c10fd22c74
BC: Switch map operator arguments order ( #19345 )
...
update
Co-authored-by: thomas <thomas@thomass-MacBook-Pro.local>
2024-01-25 09:37:28 +00:00
thomas chaton
012f68dcfd
StreamingDataloader: Add profiling support ( #19338 )
2024-01-24 20:30:55 +00:00
thomas chaton
0a75d3b7e6
tiny improvement ( #19341 )
2024-01-24 17:58:30 +00:00
Andy☼ McSherry☼
577bd85654
Allow any AWS authentication method in studios ( #19336 )
2024-01-24 16:20:53 +00:00
thomas chaton
ed367ca675
StreamingDataLoader: Resolve fault tolerance with the CombinedStreamingDataset and multiple workers ( #19326 )
2024-01-23 17:54:10 +00:00
thomas chaton
d08e6cd916
Add walk operator ( #19333 )
2024-01-23 14:21:08 +00:00
thomas chaton
75510dd9f8
StreamingDataset: Add intra node shuffling to accelerate second epoch ( #19296 )
2024-01-19 17:08:32 +00:00
thomas chaton
97d71aba0b
Data Processor: Resolve several bugs found while publishing a Studio ( #19309 )
2024-01-18 20:46:06 +00:00
thomas chaton
19d9eabbc5
Enable map over inputs without files input ( #19285 )
2024-01-16 12:19:01 +00:00
thomas chaton
564be3b521
Streaming Dataset: Resolve chunks eviction ( #19214 )
...
Co-authored-by: thomas <thomas@thomass-MacBook-Pro.local>
2024-01-01 18:58:58 -05:00
thomas chaton
91ef1902ec
StreamingDataset: Fault Tolerance v2 2/n ( #19201 )
...
Co-authored-by: thomas <thomas@thomass-MacBook-Pro.local>
2023-12-24 09:43:58 +09:00
thomas chaton
c989a97aa1
feat(fr) StreamingDataset: Fault Tolerance v2 1/n ( #19196 )
...
Co-authored-by: thomas <thomas@thomass-MacBook-Pro.local>
2023-12-21 17:01:23 +00:00
thomas chaton
12847132b1
lightning.data: Remove torch distributed for the Dataset Optimizer ( #19182 )
...
Co-authored-by: thomas <thomas@thomass-MacBook-Pro.local>
2023-12-20 13:57:07 +00:00
thomas chaton
0a5cca6711
StreamingDataset: Cleanup chunks right away if the dataset doesn't fit within the cache ( #19168 )
...
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: thomas <thomas@thomass-MacBook-Pro.local>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-12-18 23:01:55 +00:00
thomas chaton
7bd75778a6
Fix: Resolve checkpointing for the Streaming Dataset ( #19123 )
...
Co-authored-by: thomas <thomas@thomass-MacBook-Pro.local>
2023-12-08 11:10:07 +00:00
thomas chaton
e6b79d984d
StreamingDataset improve deletion strategy ( #19118 )
...
Co-authored-by: thomas <thomas@thomass-MacBook-Pro.local>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2023-12-07 08:48:08 -05:00
thomas chaton
4d15468555
Improve StreamingDataset Speed ( #19114 )
...
Co-authored-by: thomas <thomas@thomass-MacBook-Pro.local>
2023-12-05 19:50:27 +00:00
thomas chaton
08c9e51335
Resolve path for StreamingDataset ( #19094 )
...
Co-authored-by: thomas <thomas@thomass-MacBook-Pro.local>
2023-11-30 16:13:06 +00:00
thomas chaton
a6da1e3351
Add fault tolerance Streaming Dataset 2/n ( #19052 )
...
Co-authored-by: thomas <thomas@thomass-MacBook-Pro.local>
2023-11-23 17:40:04 +00:00
thomas chaton
7eca9c1642
Add numpy support for the StreamingDataset 1/2 ( #19050 )
...
Co-authored-by: thomas <thomas@thomass-MacBook-Pro.local>
2023-11-22 18:00:15 +00:00
thomas chaton
1073276a58
Add fault tolerance for the StreamingDataset 1/n ( #19049 )
...
Co-authored-by: thomas <thomas@thomass-MacBook-Pro.local>
2023-11-22 17:22:00 +00:00
thomas chaton
bc1658039f
Add direct s3 support to the streaming dataset ( #19044 )
...
Co-authored-by: thomas <thomas@thomass-MacBook-Pro.local>
2023-11-22 01:17:49 +00:00
thomas chaton
d3df1273b6
Add disk usage check before downloading files ( #19041 )
...
Co-authored-by: thomas <thomas@thomass-MacBook-Pro.local>
2023-11-21 20:10:18 +00:00
thomas chaton
6e517bd55b
Resolve Item Loader bugs ( #19017 )
...
Co-authored-by: thomas <thomas@thomass-MacBook-Pro.local>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-11-16 18:06:58 -05:00
thomas chaton
792cb73fc6
Remove the LightningDataset relying on un-maintained torchdata ( #19019 )
...
Co-authored-by: thomas <thomas@thomass-MacBook-Pro.local>
2023-11-16 16:08:15 -05:00
thomas chaton
7288302186
Add multiple uploaders to the map, optimize ( #18989 )
...
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: thomas <thomas@thomass-MacBook-Pro.local>
2023-11-13 14:27:50 -05:00
thomas chaton
1c86011dab
Add Video/Audio support ( #18977 )
...
Co-authored-by: thomas <thomas@thomass-MacBook-Pro.local>
2023-11-09 18:37:37 +00:00
thomas chaton
1b3a3fbaad
Prevent downloading more chunks than needed ( #18964 )
...
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: thomas <thomas@thomass-MacBook-Pro.local>
2023-11-07 19:40:21 +00:00
thomas chaton
20f58f63ef
Bump Lightning Cloud to 0.5.51 ( #18962 )
...
Co-authored-by: thomas <thomas@thomass-MacBook-Pro.local>
2023-11-07 17:30:39 +00:00
Adrian Wälchli
8a5d3423a7
Cache directory per worker to avoid collisions ( #18957 )
2023-11-07 10:19:03 -05:00
thomas chaton
529f07f254
Add support for deleting chunks ( #18959 )
...
Co-authored-by: thomas <thomas@thomass-MacBook-Pro.local>
Co-authored-by: Jirka Borovec <6035284+Borda@users.noreply.github.com>
2023-11-07 09:46:13 +00:00
Adrian Wälchli
62771f3932
Greedily select files for data processor workers based on size ( #18907 )
...
Co-authored-by: thomas <thomas@thomass-MacBook-Pro.local>
2023-11-06 19:33:50 -05:00
thomas chaton
e79ac21415
Add the input_dir in the cache_dir to avoid overlapping downloads ( #18960 )
...
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2023-11-06 19:01:37 -05:00
Adrian Wälchli
c4af18b2c5
Create cache dir if it doesn't exist ( #18955 )
2023-11-06 11:02:05 -05:00