lightning

Commit Graph

Author	SHA1	Message	Date
Adrian Wälchli	9d136a9fc5	Lightning Lite core and tests (#10175 )	2021-10-29 21:46:39 +00:00
Adrian Wälchli	b4f43b1695	Update docs for sync_dist logging option (#10186 ) Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2021-10-29 20:44:23 +00:00
Kaushik B	cedaebfcbb	Add `auto_device_count` method to `Accelerators` (#10222 ) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2021-10-29 22:31:32 +02:00
Danielle Pintz	848ad3f41d	Remove `training_tricks_connector.py` (#10112 ) * deprecate training tricks connector * fixes	2021-10-29 18:20:17 +00:00
Gili Tzabari	a967b6eba0	del iterator on_run_end() (#9915 )	2021-10-29 16:29:44 +00:00
Carlos Mocholí	e4eb61d812	Raise exception for `strategy=ddp_cpu\|tpu_spawn` (#10185 )	2021-10-29 16:15:24 +00:00
Carlos Mocholí	81d15c5986	Implement double optimizer closure for hook structure consistency (#10167 )	2021-10-29 13:03:04 +00:00
Danielle Pintz	c211adb579	Mark `callback_connector` as protected (#10121 ) Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2021-10-29 12:58:47 +00:00
thomas chaton	bd77f65463	Resolve batch_size in ResultCollection not resetted to 1 on epoch end (#10242 )	2021-10-29 13:55:11 +01:00
thomas chaton	843bf26297	Fix `log(sync_dist=True, on_epoch=True, on_step=True)` not reducing on step (#10227 ) Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>	2021-10-29 12:08:32 +00:00
Carlos Mocholí	4bc73b2b76	Avoid deprecated usage in accelerator connector tests (#10184 )	2021-10-29 12:36:21 +01:00
Ning	dbfadedfe7	Revert "Add support for `len(datamodule)` (#9895 )" (#10072 ) This reverts commit `6429de8944`.	2021-10-29 13:33:51 +02:00
Rohit Gupta	6a9adf26f7	Replace `_TORCH_GREATER_EQUAL_DEV_1_10` with `_TORCH_GREATER_EQUAL_1_10` (#10240 )	2021-10-29 10:36:02 +00:00
thomas chaton	5f4ffdee41	cleanup (#10081 )	2021-10-29 08:40:43 +00:00
Adrian Wälchli	3f9dfe4949	Fix iterating over a DummyLogger when `fast_dev_run > 0` (#10232 )	2021-10-29 07:22:59 +00:00
Adrian Wälchli	6ed7a0c172	Fix sigterm signal handling (#10189 ) Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2021-10-29 00:01:39 +00:00
Carlos Mocholí	03f01fb5ec	Fix gradient norm tracking and gradient clipping (#9287 ) * WIP * Progress * Undo test change * Fix plugin closure execution order * Update CHANGELOG * Fix manual optimization on AMP and skipping backward * Fix for deepspeed * Typo * Hook test for manual closure * Add skipping test with AMP * You are hideous, apex * Add deepspeed test * Update CHANGELOG * Fix for broken master * Add RunIf * FIXMEs * Rename * Fix grad norm * add a simple test * update test * update test * update test * fix merge conflicts * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Sea of changes * Undo change * Introduce TPUPrecisionPlugin * Undo changes * Undo changes * Resolve FIXME * Undo change * Undo change * Undo change * Fix FIXMEs * Fix FIXME * Correct value * Bad merge * Fix circular imports * WIP * Fixing clipping * Fixes * Bad merge * Move optimizer step and clipping into the `PrecisionPlugin` * Fix AMP * Update CHANGELOG * Fix tests * Underscore * Progress * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove pre_optimizer_step * Missed one * Progress * Progress * Fix test * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update FIXMEs * Fix test * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix test * DeepSpeed warning. mypy * Rename * Finish tests * Update CHANGELOG * Dumb fixes * accelerator=auto * Apply suggestions from code review Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> * Update on comments * Use ClassifModule Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>	2021-10-28 15:23:27 +00:00
Carlos Mocholí	5262b63dff	Pass the scaler as an input to `NativeMixedPrecisionPlugin` (#10055 ) Co-authored-by: thomas chaton <thomas@grid.ai>	2021-10-28 14:13:53 +00:00
Low Weng Fei	83d74bb385	Fix `reset_seed()` converting the `PL_SEED_WORKERS` environment variable `str` read to `bool` (#10099 ) Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: tchaton <thomas@grid.ai>	2021-10-28 12:57:41 +00:00
Rohit Gupta	9af1dd7443	Deprecate `lr_sch_names` from `LearningRateMonitor` (#10066 )	2021-10-28 12:57:04 +00:00
Rohit Gupta	85eb17cde5	initialize poptorch_models based on trainer_fn (#10149 )	2021-10-28 11:59:52 +00:00
Adrian Wälchli	63015b5c87	Let `DDPSpawnPlugin.spawn` return a result from rank 0 (#10162 ) Co-authored-by: Kaushik B <kaushikbokka@gmail.com>	2021-10-28 11:39:13 +02:00
Adrian Wälchli	07b1b56d5c	Fix setting device when creating "inf" monitor value in `ModelCheckpoint` (#10118 ) Co-authored-by: thomas chaton <thomas@grid.ai>	2021-10-28 09:10:55 +00:00
Adrian Wälchli	afd1ae124e	Update deepspeed precision plugin for Lite (#10164 )	2021-10-28 08:33:56 +00:00
Carlos Mocholí	dbe1662dc3	Replace `_TORCH_GREATER_EQUAL_DEV_1_10` with `_TORCH_GREATER_EQUAL_1_10` (#10157 )	2021-10-27 13:38:39 +01:00
Adrian Wälchli	808edcdebf	update type (#10163 )	2021-10-27 11:16:09 +00:00
Kaushik B	c33df2639f	Set `dataset` attribute to `MpDeviceLoader` used in TPU Spawn (#10151 )	2021-10-27 01:23:01 +05:30
Carlos Mocholí	48b6292cf0	Move optimizer step and clipping into the `PrecisionPlugin` (#10143 )	2021-10-26 17:26:26 +02:00
Rohit Gupta	93266e2c22	Avoid deprecated warnings from accelerator and checkpoint connector #10142	2021-10-26 14:10:30 +02:00
Danielle Pintz	38090e47d7	Small code simplification in `training_epoch_loop.py` (#10146 ) Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2021-10-26 13:22:36 +02:00
twsl	971281d27d	Make sure file and folder exists in Profiler (#10073 ) Co-authored-by: tchaton <thomas@grid.ai>	2021-10-26 11:13:31 +00:00
Danielle Pintz	a5235d5b01	Remove `model_connector.py` (#10111 )	2021-10-26 11:52:14 +02:00
Adrian Wälchli	871a96701a	Rename `master_params` to `main_params` (#10105 ) Co-authored-by: tchaton <thomas@grid.ai> Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>	2021-10-26 11:17:32 +02:00
Rohit Gupta	34d5980df6	Raise `MisconfigurationException` if `trainer.eval` is missing required methods (#10016 )	2021-10-25 23:12:08 -07:00
Danielle Pintz	13d6d7bad1	Remove `optimizer_connector.py` (#10120 )	2021-10-26 00:52:43 +00:00
Adrian Wälchli	21a5867dad	Rename `ClusterEnvironment.creates_processes` (#10106 ) Co-authored-by: tchaton <thomas@grid.ai>	2021-10-25 23:15:41 +00:00
Rajat Goel	47e7a2860f	Fix Enums parsing in generated hparms yaml (#9170 ) Co-authored-by: tchaton <thomas@grid.ai> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>	2021-10-25 21:23:20 +00:00
Eric Wiener	0e20119d24	Change default value of the `max_steps` Trainer argument from `None` to `-1` (#9460 ) Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: thomas chaton <thomas@grid.ai> Co-authored-by: rohitgr7 <rohitgr1998@gmail.com>	2021-10-25 20:21:33 +00:00
Danielle Pintz	1f7bd6650c	Mark accelerator connector as protected (#10032 )	2021-10-25 19:24:54 +00:00
jjenniferdai	6d79184ec5	Unify checkpoint load paths [redo #9693 ] (#10061 )	2021-10-25 19:05:31 +00:00
Adrian Wälchli	76081fb846	Mark SLURM detection methods in `AcceleratorConnector` as protected (#10101 ) Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>	2021-10-25 17:52:15 +00:00
Carlos Mocholí	2ee3127661	Use `torch.autocast` (#10053 )	2021-10-25 17:33:52 +00:00
Carlos Mocholí	43c70ece17	Fix `optimizers` overloads typing annotation (#10069 )	2021-10-25 16:51:46 +00:00
Carlos Mocholí	b376799430	Minor fixes related to clipping (#10130 ) Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>	2021-10-25 16:40:22 +00:00
Adrian Wälchli	d3e5a43546	Restrict setup methods to accept a single model (#10064 )	2021-10-25 16:32:57 +00:00
manipopopo	cfb2d87765	Disable quantization aware training observers (#8540 ) Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: tchaton <thomas@grid.ai> Co-authored-by: rohitgr7 <rohitgr1998@gmail.com>	2021-10-25 15:46:09 +00:00
Adrian Wälchli	aff80477b7	Remove dead code in accelerator connector (#10100 ) * remove dead code in accelerator connector * remove slurm "fake_slurm_managing_tasks" dead code	2021-10-25 13:37:40 +00:00
Kaushik B	64fc0d4257	Add method to TPUSpawn plugin to override how models are setup (#10039 )	2021-10-25 11:44:32 +00:00
Danielle Pintz	e94dcf6936	Mark `trainer.data_connector` as protected (#10031 ) Co-authored-by: tchaton <thomas@grid.ai>	2021-10-25 12:29:09 +01:00
Carlos Mocholí	f95ba20012	Do not use the base version by default in `_compare_version` (#10051 )	2021-10-25 16:41:32 +05:30

1 2 3 4 5 ...

3537 Commits