lightning

Commit Graph

Author	SHA1	Message	Date
Ethan Harris	b9bc77293b	Fix inconsistent outputs in `on__end` and `_end` (#6969 ) Co-authored-by: ananthsub <ananth.subramaniam@gmail.com> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>	2021-04-13 15:16:21 +01:00
ananthsub	e891ceb836	Remove evaluation loop legacy dict returns for `*_epoch_end` hooks (#6973 ) Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>	2021-04-13 12:37:54 +01:00
Roger Shieh	e35192dfcd	Update `DataLoader.persistent_workers` warnings in ddp_spawn (#6762 ) Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>	2021-04-09 11:38:13 +02:00
Akihiro Nitta	5e4dfd75d2	[RFC] Add `self.lr_schedulers()` to LightningModule for manual optimization (#6567 ) * Add test for lr_schedulers() * Add lr_schedulers to LightningModule * Update test comment * Update CHANGELOG	2021-04-09 10:32:14 +01:00
Adrian Wälchli	9c9e2a0325	fix gpus default for Trainer.add_argparse_args (#6898 )	2021-04-09 11:20:43 +02:00
ananthsub	851f9e3997	Move NaN/Inf detection to a separate utilities file (#6834 ) Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>	2021-04-09 01:47:02 +02:00
Sean Naren	742c48e994	[Fix] Ensure we set the eval/train flag correctly on accelerator model (#6877 ) * Ensure we move the model to eval mode before running evaluation * Ensure we set the flag appropriately across all stages * Add test, move hooks logic * Apply same fix to the validate loop * Update pytorch_lightning/trainer/trainer.py * Fix function name * Fix order, add predict * Shorten the name * Fix input dm, drop duplicate on predict start hook call, as it's called in the setup function * Use hook, remove double call	2021-04-08 14:04:26 -04:00
Anthony Kim	7f6154fcad	Add `Trainer(gradient_clip_algorithm='value'\|'norm')` (#6123 ) * add changelog * add clip by value * fix bug in training tricks.rst * fix bug in trainer.rst * Update trainer.rst * Update trainer.rst * Update CHANGELOG.md Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * Update pytorch_lightning/plugins/precision/deepspeed_precision.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * Update pytorch_lightning/utilities/enums.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * yapf formatting * update training tricks * update based on comment * update based on comment * Update pytorch_lightning/trainer/trainer.py Co-authored-by: ananthsub <ananth.subramaniam@gmail.com> * update based on comment * pep8 * mypy * mypy * Update docs/source/advanced/training_tricks.rst Co-authored-by: thomas chaton <thomas@grid.ai> * Update sharded_native_amp.py * Update test_sharded_parity.py * update test codes * Update test_tpu.py * Update pytorch_lightning/trainer/connectors/training_trick_connector.py Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * Update test_trainer.py * Update enums.py * Update enums.py * add super-class initialization to precision plugins. * add clip_grad horovod cpu test * add clip_grad horovod cpu test * use subprocess check_call * change order of horovod tests * set max_epochs 2 in horovod test * remove clip_grad_val test from horovod-cpu * remove "type: ignore" * divide clip grad val test in horovod * update based on comments * add super-class initialization to precision plugins. * bugfix * bugfix * revert some changes * revert some changes * Update tests/models/test_horovod.py * merge master * Delete signature test No point in testing a signature Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: thomas chaton <thomas@grid.ai> Co-authored-by: ananthsub <ananth.subramaniam@gmail.com> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: Jirka Borovec <jirka.borovec@seznam.cz>	2021-04-06 08:27:37 -05:00
Adrian Wälchli	127c52af74	Fix EarlyStopping logic when min_epochs not met (#6705 ) Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2021-04-06 12:41:07 +01:00
Adrian Wälchli	264aa689de	fix boolean check on iterable dataset when len not defined (#6828 ) * fix iterable dataset len check * update predict and validate * add validate to test * add changelog * add predict	2021-04-05 17:47:21 +01:00
Yuan-Hang Zhang	1bd5f36a5b	Fix validation progress counter with check_val_every_n_epoch > 1 (#5952 ) Co-authored-by: rohitgr7 <rohitgr1998@gmail.com> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2021-04-02 17:40:41 +09:00
Carlos Mocholí	0dd2deebea	Remove legacy support for the magic `log`/`progress_bar` keys in dict returns (#6734 )	2021-03-31 00:28:04 +02:00
thomas chaton	1302766f83	DeepSpeed ZeRO Update (#6546 ) * Add context to call hook to handle all modules defined within the hook * Expose some additional parameters * Added docs, exposed parameters * Make sure we only configure if necessary * Setup activation checkpointing regardless, saves the user having to do it manually * Add some tests that fail currently * update * update * update * add tests * change docstring * resolve accumulate_grad_batches * resolve flake8 * Update DeepSpeed to use latest version, add some comments * add metrics * update * Small formatting fixes, clean up some code * Few cleanups * No need for default state * Fix tests, add some boilerplate that should move eventually * Add hook removal * Add a context manager to handle hook * Small naming cleanup * wip * move save_checkpoint responsability to accelerator * resolve flake8 * add BC * Change recommended scale to 16 * resolve flake8 * update test * update install * update * update test * update * update * update test * resolve flake8 * update * update * update on comments * Push * pull * Update pytorch_lightning/plugins/training_type/deepspeed.py Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * Update pytorch_lightning/plugins/training_type/deepspeed.py Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * update * Apply suggestions from code review * Swap to using world size defined by plugin * update * update todo * Remove deepspeed from extra, keep it in the base cuda docker install * Push * pull * update * update * update * update * Minor changes * duplicate * format * format2 Co-authored-by: SeanNaren <sean@grid.ai> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: Sean Naren <sean.narenthiran@gmail.com> Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Jirka Borovec <jirka.borovec@seznam.cz>	2021-03-30 13:39:02 -04:00
Carlos Mocholí	90444706b2	Remove logger_connector legacy code (#6733 )	2021-03-30 12:33:33 +02:00
Kaushik B	f79a13e495	[Model Parallel] Add configure sharded model hook (#6679 ) * Add base hook for model parallel * fix callback signature * Simplify hook * Add hook logic * add tests * add property setter * add logic for being called once * Update changelog * Fix * fix return type * fix lambda callback test * Fix tests * Apply code suggestions * add logic for setup_optimizers_predispatch * add common dummy model * Swap call order * Remove test that isn't needed anymore * Update tests * Add a bit more doc * Few code review fixes * Update pytorch_lightning/accelerators/accelerator.py Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * Change hook name * Fix test * Test setup hook, refactor names * Swap call order of callbacks and model initialization * Change name of context manager Co-authored-by: SeanNaren <sean@grid.ai> Co-authored-by: Sean Naren <sean.narenthiran@gmail.com> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2021-03-29 14:50:51 -06:00
Łukasz Zalewski	cca0eca5f3	More explicit exception message when testing with fast_dev_run=True (#6667 ) Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2021-03-29 13:29:54 +00:00
Carlos Mocholí	f0c5479de9	Remove legacy `Result` parameters (#6016 )	2021-03-28 11:55:08 +02:00
thomas chaton	0e45220263	[warning] Add warning when values are not being reduced (#6417 ) * add warning non reduced * add test * update test * update changelog * Update pytorch_lightning/trainer/connectors/logger_connector/epoch_result_store.py Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com> * update Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com> Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>	2021-03-26 18:33:11 +00:00
Carlos Mocholí	bc613611e2	Do not add return dict items to callback_metrics (#6682 )	2021-03-26 14:05:20 +01:00
Jirka Borovec	217c12a4e7	Simplify deprecations (#6620 ) * use external deprecate * simplify * simplify * simplify * flake8 * . * others * .	2021-03-25 15:26:38 +01:00
Rohit Gupta	9be092dbdb	Add on_epoch_start to run at the beginning of every loop irrespective of train/val/test (#6498 ) * update docs * add hook and update docs * update tests * chlog * Update CHANGELOG.md Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * chlog Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2021-03-25 14:20:49 +01:00
Carlos Mocholí	2dd6f9e09d	`MetricsHolder` clean-up + typing (#6645 ) * Metrics holder cleanup and better error message * Update pytorch_lightning/trainer/connectors/logger_connector/logger_connector.py * _VALUE -> _METRIC_TYPE	2021-03-24 20:34:46 +01:00
Ethan Harris	741c452551	Fix disabled grads after call to predict (#6657 )	2021-03-23 23:07:48 +01:00
Carlos Mocholí	51b10f78f4	Refactor PyTorch profiler 4/5 (#6349 ) Co-authored-by: thomas chaton <thomas@grid.ai>	2021-03-23 18:13:29 +01:00
thomas chaton	0995d30fab	Flash predict step (#6577 ) * add predict_step * Update predict_loop.py * Update trainer.py * Update trainer.py * resolve bugs * update * update * update * resolve bug * resolve some failing tests * udpate tests * update * resolve tests * add a test * remove typo * add a test for attachement * update * changed to on_train_dataloader * remove __flash_special_attr__ * resolve tests * update * update * update * update on comments * Update pytorch_lightning/trainer/data_loading.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>	2021-03-23 11:13:13 -04:00
Carlos Mocholí	36d180e532	Refactor base profilers 3/5 (#6621 ) Co-authored-by: tchaton <thomas@grid.ai>	2021-03-23 10:07:35 +00:00
Carlos Mocholí	51c9260fad	Move profiler tests (#6619 )	2021-03-21 23:39:55 +00:00
Kaushik B	37f22c99ff	Add trainer.predict config validation (#6543 ) Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2021-03-21 21:07:54 +00:00
Kaushik B	b190403e28	Add outputs param for `on_val/test_epoch_end` hooks (#6120 ) * add outputs param for on_val/test_epoch_end hooks * update changelog * fix warning message * add custom call hook * cache logged metrics * add args to docstrings * use warning cache * add utility method for param in sig check * Update CHANGELOG.md Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * update docstring * add test for eval epoch end hook * add types and replace model ref * add deprecation test * fix test fx name * add model hooks warning * add old signature model to tests * add clear warning cache * sopport args param * update tests * add tests for model hooks * code suggestions * add signature utils * fix pep8 issues * fix pep8 issues * fix outputs issue * fix tests * code fixes * fix validate test * test Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>	2021-03-16 12:15:16 -04:00
Jirka Borovec	a312219d42	Prune metric: helpers and inputs 3/n (#6547 ) * _basic_input_validation * _check_shape_and_type_consistency * _check_num_classes_binary * _check_num_classes_mc * _check_num_classes_ml * _check_top_k * _check_classification_inputs * _input_format_classification * _reduce_stat_scores * DataType * rest * flake8 * chlog	2021-03-16 13:54:06 +01:00
Jirka Borovec	0f07eaf51a	refactor reading env defaults (#6510 ) * change tests * fix * test * _defaults_from_env_vars Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2021-03-16 10:10:17 +00:00
Roger Shieh	c48fc6a2ce	[test] lr_find with bs_scale (#6422 ) * init test: test_lr_find_with_bs_scale * Update test_lr_finder.py * remove gpu req * try boring model * custom boring model * pep8 * fix typo * Update test_lr_finder.py * typo * typo	2021-03-15 22:43:35 +05:30
Jirka Borovec	b341b53f70	deprecate metrics pkg (#6505 ) * deprecate metrics * examples * req * docs * Apply suggestions from code review Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: Nicki Skafte <skaftenicki@gmail.com> * pep8 Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: Nicki Skafte <skaftenicki@gmail.com>	2021-03-15 14:39:38 +00:00
Eric Cousineau	e886d55ac1	argparse: Add use_argument_group=True (#6088 ) * argparse: Add inplace option Replicate in GAN model * datamodule: Deduplicate logic w/ argparser utilities * Update pl_examples/domain_templates/generative_adversarial_net.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Akihiro Nitta <nitta@akihironitta.com> * Keep docstrings * Correct name * Whitespace * Consistency * fix weird type stuff * try alt - use_argument_group * fix syntax + lint * fix ci errs * fix ci * change examples... still failing w/ "unrecognized arguments: --batch_size" * address review * mnist_datamodule: add some docstrings * argparse: check cls or cls.__init__ for param didn't capture issue, but meh * fix lint * fix no-doc edge case * address review Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Akihiro Nitta <nitta@akihironitta.com> Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>	2021-03-11 10:50:49 -05:00
Elia Cereda	f4cc7451a9	Add Trainer.validate(…) method to run one validation epoch (#4948 ) Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com> Co-authored-by: chaton <thomas@grid.ai> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2021-03-11 03:46:37 +01:00
Kaushik B	74d79e7e0e	Raise an exception if check_val_every_n_epoch is not an integer (#6411 ) * raise an exception if check_val_every_n_epoch is not an integer * remove unused object * add type hints * add return type * update exception message * update exception message	2021-03-10 12:08:53 +05:30
Adrian Wälchli	615b2f7363	Improve DummyLogger (#6398 ) * fix dummy logger * docs * update docs * add changelog * add none return annotation * return empty string for name, version	2021-03-09 23:18:38 +00:00
Jirka Borovec	55dd3a4c64	Typing for tests 1/n (#6313 ) * typing * yapf * typing	2021-03-09 11:27:15 +00:00
Adrian Wälchli	fc6d402733	fix logger creating directory structure too early in DDP (#6380 ) * fix * add simple test * fix imports * add changelog * tighter test with on_fit_start hook closer to the dispatch call * move class inside test f unction * add a comment	2021-03-09 09:49:59 +00:00
Adrian Wälchli	718074b99a	Fix trainer not resetting lightning_optimizers (#6372 ) Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2021-03-08 09:58:03 +08:00
Rohit Gupta	38a5fe7af1	Remove optimizer_idx arg in manual optimization (#6093 ) Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: chaton <thomas@grid.ai>	2021-03-07 08:48:50 +01:00
Rohit Gupta	facfda85f1	Remove no return warning from val/test step (#6139 ) * remove warning * auto_opt * chlog * auto_opt * no_warning_call * rm old code * add warning for predict * Apply suggestions from code review Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2021-03-06 17:15:21 +00:00
Elia Cereda	d0596fac94	Refactor RunningStage usage in advance of implementing Trainer.validate() (#4945 ) * Update code Co-authored-by: EliaCereda * More property updates * Move properties. Introduce trainer._fitting * Use trainer.fitting * Fix reset dataloaders * Unused code * RunningStage.SANITY_CHECKING * Use setters * Fix bugs * Fix bugs * TrainerState.{FITTING,VALIDATING,TESTING,PREDICTING,TUNING} * Fix bugs * Fix bugs * Fix tests * Update CHANGELOG. Add deprecation warning. Fix tests * Unused imports * Optional trainer * More deprecation. More refactoring * Correct version * Use properties * Address comments * flake8 * Missed renamings * Typo * is -> == It is recommended to use for Enums since they are singletons, however, since the LightningEnum subclasses str, it's not a good idea in case a user sets the state/stage with a str * Also for tests * Typo * Address @tchaton's comments * PEP8 * Correct property * Update CHANGELOG * Apply suggestions from code review Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * Update pytorch_lightning/trainer/trainer.py Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * Remove called sanity check Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2021-03-06 12:40:19 +00:00
thomas chaton	2ec67a48b3	[bug] Fix Pytorch profiler with emit_nvtx (#6260 ) * resolve bug * update changelog * Update tests/trainer/test_trainer.py * Update pytorch_lightning/profiler/profilers.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * resolve comments * resolve flake8 Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>	2021-03-05 21:12:03 +01:00
thomas chaton	46540ee260	[bugfix] Resolve memory leak for evaluation (#6326 ) * resolve bug * resolve flake8 * revert name	2021-03-05 16:52:56 +09:00
Jirka Borovec	b9cf1223b9	missing tests default_root_dir=tmpdir (#6314 ) * default_root_dir=tmpdir * miss	2021-03-04 19:23:12 +00:00
Jirka Borovec	bf6ba83aef	prune duplicite test in optim (#6312 )	2021-03-03 15:41:00 +09:00
Jirka Borovec	d1a03153f3	Refactor: runif for spec 6/6 (#6307 ) * special * rpc	2021-03-02 18:57:13 +00:00
Jirka Borovec	ac583781db	Refactor: Runif for TPU and Horovod 5/n (#6301 ) * TPU * horovod * extra * fix * Apply suggestions from code review Co-authored-by: Nicki Skafte <skaftenicki@gmail.com> * doc Co-authored-by: Nicki Skafte <skaftenicki@gmail.com>	2021-03-02 16:21:20 +00:00
Nicki Skafte	24c3a3fc3e	Add possibility for custom naming when using multiple dataloaders (#6274 )	2021-03-02 17:03:36 +01:00

1 2 3 4 5 ...

452 Commits