lightning

Commit Graph

Author	SHA1	Message	Date
Danielle Pintz	06c5903600	Simplify several profile calls (#11031 )	2021-12-14 19:49:19 +00:00
Danielle Pintz	3fcfd0214c	Remove `_call_accelerator_hook` Trainer method (#10999 )	2021-12-09 02:27:13 +01:00
Carlos Mocholí	99adc45af1	Follow-up changes to #10575 (#10957 ) Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2021-12-07 15:27:52 +01:00
Rajath Bharadwaj	7914e5c157	added UserWarnings if max_epochs not set in the Trainer class (#10700 )	2021-12-06 09:44:25 +00:00
Danielle Pintz	6043179931	Re-design `call_hook` interface (#10575 )	2021-12-04 16:39:55 -05:00
Carlos Mocholí	a28b4cd0c0	Sort out the dataloader idx logic for evaluation (#10923 ) Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2021-12-03 20:01:46 +00:00
four4fish	6fe3211573	Unroll dict input before call Accelerator X_steps (#10908 ) Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2021-12-03 17:00:52 +00:00
Adrian Wälchli	c55bc433ce	Fix retrieval of batch indices when dataloader num_workers > 0 (#10870 ) Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>	2021-12-02 10:36:10 +00:00
Rohit Gupta	5b9995da04	Fix schedule reset logic in pytorch profiler (#10837 )	2021-12-02 14:22:49 +05:30
Carlos Mocholí	0061619e0a	Improve typing for loops (#10780 )	2021-11-30 20:28:55 +00:00
Carlos Mocholí	1b43e43e9f	Minor changes in preparation for saving the loops state (#10783 )	2021-11-30 19:37:04 +05:30
four4fish	1d2878523a	2/n Move Precision Plugin into strategy - move optimizer related logics (#10596 ) Co-authored-by: Danielle Pintz <38207072+daniellepintz@users.noreply.github.com> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: thomas chaton <thomas@grid.ai> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2021-11-30 08:31:23 +00:00
four4fish	8bf7f9cce7	1/n Move Accelerator into strategy - move batch_to_device to strategy (#10649 ) * 1/n Integrate Device Specific Accelerator Logic with strategy - move batch_to_device to strategy * add changelog * add model is not none check * Apply suggestions from code review Co-authored-by: thomas chaton <thomas@grid.ai> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * Update CHANGELOG.md * Update test_datamodules.py * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update test_hooks.py * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update dp.py Co-authored-by: thomas chaton <thomas@grid.ai> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2021-11-29 12:11:21 -08:00
Carlos Mocholí	724a92b065	Mark outputs as protected in the evaluation loops (#10781 ) Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>	2021-11-28 20:09:30 +00:00
Carlos Mocholí	3089dc3829	Improve typing for loops (#10749 ) * Improve typing for loops * Free memory	2021-11-26 18:39:09 +00:00
Carlos Mocholí	31bb6e69ca	Avoid optional instances in Loops (#10735 ) * Avoid optional instances in Loops * More cleanup	2021-11-26 18:00:18 +00:00
Carlos Mocholí	ae53562c97	Remove dead code in `TrainingEpochLoop` (#10750 )	2021-11-26 17:49:00 +00:00
thomas chaton	3d6262b7a9	Fault Tolerant Manual: Add support for DDP (#10638 )	2021-11-25 18:31:53 +01:00
Kaushik B	e0b4bb2ea3	Deprecate `DeviceType` in favor of `_AcceleratorType` (#10503 ) Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2021-11-25 16:41:03 +01:00
thomas chaton	b28ab34ff5	Fault Tolerant Manual: Add loading to reload the states (#10699 ) Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com> Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2021-11-23 17:18:36 +00:00
Carlos Mocholí	a6dedcf492	Fix `move_metrics_to_cpu` with evaluation (#10631 )	2021-11-22 15:58:21 +00:00
Rohit Gupta	ec27313be2	Fix batch size extraction when set by the user in `LightningModule.log` (#10408 ) Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>	2021-11-19 16:48:26 +00:00
Carlos Mocholí	069ec1005a	Do not autodetach extras (#10424 ) * Do not autodetach extras * Update CHANGELOG * Use foo	2021-11-09 16:07:16 +00:00
Gili Tzabari	a967b6eba0	del iterator on_run_end() (#9915 )	2021-10-29 16:29:44 +00:00
Carlos Mocholí	03f01fb5ec	Fix gradient norm tracking and gradient clipping (#9287 ) * WIP * Progress * Undo test change * Fix plugin closure execution order * Update CHANGELOG * Fix manual optimization on AMP and skipping backward * Fix for deepspeed * Typo * Hook test for manual closure * Add skipping test with AMP * You are hideous, apex * Add deepspeed test * Update CHANGELOG * Fix for broken master * Add RunIf * FIXMEs * Rename * Fix grad norm * add a simple test * update test * update test * update test * fix merge conflicts * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Sea of changes * Undo change * Introduce TPUPrecisionPlugin * Undo changes * Undo changes * Resolve FIXME * Undo change * Undo change * Undo change * Fix FIXMEs * Fix FIXME * Correct value * Bad merge * Fix circular imports * WIP * Fixing clipping * Fixes * Bad merge * Move optimizer step and clipping into the `PrecisionPlugin` * Fix AMP * Update CHANGELOG * Fix tests * Underscore * Progress * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove pre_optimizer_step * Missed one * Progress * Progress * Fix test * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update FIXMEs * Fix test * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix test * DeepSpeed warning. mypy * Rename * Finish tests * Update CHANGELOG * Dumb fixes * accelerator=auto * Apply suggestions from code review Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> * Update on comments * Use ClassifModule Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>	2021-10-28 15:23:27 +00:00
Danielle Pintz	38090e47d7	Small code simplification in `training_epoch_loop.py` (#10146 ) Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2021-10-26 13:22:36 +02:00
Danielle Pintz	13d6d7bad1	Remove `optimizer_connector.py` (#10120 )	2021-10-26 00:52:43 +00:00
Eric Wiener	0e20119d24	Change default value of the `max_steps` Trainer argument from `None` to `-1` (#9460 ) Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: thomas chaton <thomas@grid.ai> Co-authored-by: rohitgr7 <rohitgr1998@gmail.com>	2021-10-25 20:21:33 +00:00
Carlos Mocholí	b376799430	Minor fixes related to clipping (#10130 ) Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>	2021-10-25 16:40:22 +00:00
Danielle Pintz	e94dcf6936	Mark `trainer.data_connector` as protected (#10031 ) Co-authored-by: tchaton <thomas@grid.ai>	2021-10-25 12:29:09 +01:00
Alessio Bonfiglio	2a2fa5a56a	Group all the logged gradients under the same sub-folder (#7756 )	2021-10-20 15:48:36 +00:00
Carlos Mocholí	e44921ee21	Fix `self.log(on_epoch=True, reduce_fx=sum)` on_batch_start (#9791 )	2021-10-20 01:56:37 +02:00
Ning	0b68f2abf8	Remove `reset_train_val_dataloaders` from Trainer and move data reloading logic to loop (#9671 ) Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>	2021-10-19 21:45:52 +02:00
Carlos Mocholí	e95f9b71c1	Set the optimization output result class as a class attribute (#9977 )	2021-10-19 16:33:08 +01:00
Carlos Mocholí	bb2dc68792	Simplify track grad norm condition (#9992 )	2021-10-19 15:00:16 +02:00
Adrian Wälchli	65150cdb42	Update docs for base Loop class with examples (#9993 ) Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>	2021-10-18 15:37:23 +00:00
Carlos Mocholí	c69a79c86f	Fix `self.log(on_epoch=True)` on_batch_start (#9780 )	2021-10-18 14:02:16 +02:00
Adrian Wälchli	7a9151637c	loop customization docs (#9609 ) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: thomas chaton <thomas@grid.ai> Co-authored-by: edenlightning <66261195+edenlightning@users.noreply.github.com>	2021-10-18 09:43:11 +00:00
four4fish	a002f872ea	[2/n] Directly call TrainingTypePlugin APIs instead of going through the Accelerator (#9901 ) Co-authored-by: tchaton <thomas@grid.ai>	2021-10-14 17:38:22 +02:00
Rohit Gupta	23e8b59ae7	Add `configure_gradient_clipping` hook in `LightningModule` (#9584 ) * init hook * docs * dep train args * update tests * doc * doc * .gitignore * not dep * add trainer args * add & update tests * fix tests * pre-commit * docs * add docs * add exception * code review * deepspeed * update tests * not * try fix * Apply suggestions from code review * update deepspeed * disable some tests * disable some tests * enable all tests	2021-10-13 20:15:13 +05:30
ananthsub	4610fddb19	Mark `Trainer.terminate_on_nan` protected and deprecate public property (#9849 ) Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2021-10-12 20:23:22 +00:00
Adrian Wälchli	6a0c47a014	remove redundant accumulation normalization in manual optimization (#9769 )	2021-10-11 15:26:12 +00:00
Rohit Gupta	4decbc0d95	Deprecate `dataloader_idx` from `on_train_batch_start/end` (#9816 ) * deprecate hooks * dep todo * explicit * Apply suggestions from code review * Apply suggestions from code review * code review * base	2021-10-07 10:18:11 +00:00
thomas chaton	5841ca9782	[Feat] Add auto_restart for fault tolerant training (#9722 )	2021-10-01 16:37:17 +00:00
Carlos Mocholí	6ef4e5ac76	Remove return value from the backward closure (#9770 )	2021-10-01 16:53:00 +02:00
Carlos Mocholí	44aed17aff	Remove duplicated native AMP + LBFGS check (#9748 )	2021-09-29 13:14:03 +00:00
thomas chaton	fa44dbcd9e	[Refactor] Simplify data loading logic around replacing sampler to prevent confusion (#9721 ) Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>	2021-09-28 17:04:02 +00:00
Carlos Mocholí	198aa852ef	Remove `training_epoch_end` outputs check (#9719 ) Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>	2021-09-28 14:21:46 +00:00
Carlos Mocholí	bc50591d49	reduce loop structure leakage into the `TrainingEpochLoop` (#9490 ) Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2021-09-28 13:22:22 +00:00
thomas chaton	64bbebc869	[bugfix] Resolve metrics not being properly resetted on validation epoch end (#9717 ) Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>	2021-09-27 16:16:45 +00:00

1 2 3 4

153 Commits