lightning

Commit Graph

Author	SHA1	Message	Date
Kaushik B	f79a13e495	[Model Parallel] Add configure sharded model hook (#6679 ) * Add base hook for model parallel * fix callback signature * Simplify hook * Add hook logic * add tests * add property setter * add logic for being called once * Update changelog * Fix * fix return type * fix lambda callback test * Fix tests * Apply code suggestions * add logic for setup_optimizers_predispatch * add common dummy model * Swap call order * Remove test that isn't needed anymore * Update tests * Add a bit more doc * Few code review fixes * Update pytorch_lightning/accelerators/accelerator.py Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * Change hook name * Fix test * Test setup hook, refactor names * Swap call order of callbacks and model initialization * Change name of context manager Co-authored-by: SeanNaren <sean@grid.ai> Co-authored-by: Sean Naren <sean.narenthiran@gmail.com> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2021-03-29 14:50:51 -06:00
Łukasz Zalewski	cca0eca5f3	More explicit exception message when testing with fast_dev_run=True (#6667 ) Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2021-03-29 13:29:54 +00:00
Carlos Mocholí	f0c5479de9	Remove legacy `Result` parameters (#6016 )	2021-03-28 11:55:08 +02:00
thomas chaton	0e45220263	[warning] Add warning when values are not being reduced (#6417 ) * add warning non reduced * add test * update test * update changelog * Update pytorch_lightning/trainer/connectors/logger_connector/epoch_result_store.py Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com> * update Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com> Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>	2021-03-26 18:33:11 +00:00
Carlos Mocholí	bc613611e2	Do not add return dict items to callback_metrics (#6682 )	2021-03-26 14:05:20 +01:00
Ethan Harris	6b990f3fa5	Add artifcact_location arg to MLFlow logger (#6677 ) * Add artifcact_location arg to MLFlow logger * Add CHANGELOG URL * Update test	2021-03-26 00:12:03 +01:00
Rohit Gupta	9be092dbdb	Add on_epoch_start to run at the beginning of every loop irrespective of train/val/test (#6498 ) * update docs * add hook and update docs * update tests * chlog * Update CHANGELOG.md Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * chlog Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2021-03-25 14:20:49 +01:00
ananthsub	40976e4eba	Support teardown hook on DataModule (#4673 ) Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com> Co-authored-by: chaton <thomas@grid.ai>	2021-03-25 07:51:55 -05:00
Kaushik B	2cbdc01256	Fix checkpoint callback & Trainer.test(_) issue for TPUs (#6654 ) * Fix checkpoint callback issue for TPUs * update changelog * add barrier * apply code suggestions * update trainer test * remove spaces * fix tpu tests * Apply suggestions from code review * add comment Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>	2021-03-25 10:37:37 +00:00
Ethan Harris	d02fe342c1	Feature/double precision (#6595 ) Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>	2021-03-24 15:47:58 +05:30
Jirka Borovec	70beddfc13	Prune metrics: others 11/DoNe (#6659 ) * classif * grad_img * nlp * ssl * format	2021-03-24 09:16:28 +01:00
Ethan Harris	741c452551	Fix disabled grads after call to predict (#6657 )	2021-03-23 23:07:48 +01:00
thomas chaton	fd5cb7fcc3	Add PyTorch 1.8 Profiler 5/5 (#6618 ) * Refactor profilers * Update PassThrough * WIP - This is broken and will change * Update pytorch_lightning/profiler/pytorch.py Co-authored-by: thomas chaton <thomas@grid.ai> * resolve tests * resolve tests * find output * try something * update * add support for test and predict * update * update * use getattr * test * test * update * tests * update * update * update * update * update * remove file * update * update * update * update * update * test * update# * update * update tests * update * add suport for 1.8 * rename records * add support for 1.8 * update * resolve flake8 * resolve test * Refactor basic profilers * Fixes * Unused import * Introduce setup * Profile on all ranks. Print to stdout on 0 * Introduce dirpath + filename * CHANGELOG * Add tests. Address comments * add `on_run_stage_setup` * add on_run_stage_setup function * update * add test for RegisterRecordFunction * update lightnng flow direction * move variable to private * remove trace * Undo code that should be in 3/4 * Multi-stage multi-rank * 2/5 changes * Pass stage in __del__ * Remove TODOs * Describe on_evaluation_end. Add tests * Typo * Address comments * deepcopy tests * Advanced teardown * Fix teardown test * Fix tests * Minor change * Update CHANGELOG.md * Fix test * Quick fixes * Fix 6522 * resolve ddp tests * resolve tests * resolve some tests * update tests * resolve tests * update * resolve tests * resolve some tests * Missed fixes from 3/5 * Fixes * resolve some tests * resolve test for 1.7.1 * Broken refactor * Missed stage * Minor changes * resolve tests * Update CHANGELOG * resolve bug * remove print * Typo * Cleanup * resolve ddp test * remove barrier * update profiler * update * Smaller model * update * resolve tests * update * Minor changes. CHANGELOG * Minimize diff * update to 1.8.1 * RunIf. Extra code. Check segfault * resolve tests * Typo. Bad merge * Fixing a bad merge * replace for kineto * Update pytorch_lightning/profiler/pytorch.py Co-authored-by: ananthsub <ananth.subramaniam@gmail.com> * Update pytorch_lightning/profiler/pytorch.py Co-authored-by: ananthsub <ananth.subramaniam@gmail.com> * Minor changes * Bad merge * Use lists for flexibility * Use sets * predict_step * Ananth's suggestion * update * Docs * Update pl_examples/basic_examples/profiler_example.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * update example * update example Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com> Co-authored-by: ananthsub <ananth.subramaniam@gmail.com> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>	2021-03-23 20:43:21 +00:00
Carlos Mocholí	51b10f78f4	Refactor PyTorch profiler 4/5 (#6349 ) Co-authored-by: thomas chaton <thomas@grid.ai>	2021-03-23 18:13:29 +01:00
Jirka Borovec	a74909affa	prune metrics: info retrieval (#6649 )	2021-03-23 15:05:32 +00:00
Carlos Mocholí	36d180e532	Refactor base profilers 3/5 (#6621 ) Co-authored-by: tchaton <thomas@grid.ai>	2021-03-23 10:07:35 +00:00
Jirka Borovec	f93414d085	Prune metyrics: regression 9/n (#6637 ) * psnr * r2score * ssim * chlog	2021-03-23 10:01:25 +00:00
Jirka Borovec	efce2b7777	Prune metrics: regression 8/n (#6636 ) * explained_variance * tests * mean_absolute_error * mean_squared_error * mean_relative_error * mean_squared_log_error * chlog	2021-03-23 09:35:51 +01:00
Jirka Borovec	8cd75a4dd5	fix comparing versions (#6434 ) * fix comparing versions * chlog * . * ... * datasets	2021-03-23 07:51:45 +00:00
thomas chaton	2064ece582	[refactor] Add setup to profilers + _run_stage_setup to trainer 2/5 (#6633 ) * add setup * update * updates on comment * Minor changes * Extra import * Docs Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>	2021-03-22 14:32:31 -04:00
camruta	e2e1de0fb7	Add teardown method to BaseProfiler. (#6370 ) Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>	2021-03-22 11:49:06 +00:00
Kaushik B	37f22c99ff	Add trainer.predict config validation (#6543 ) Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2021-03-21 21:07:54 +00:00
Justus Schock	634d83134f	Add AMP for validation, prediction and testing (#6565 ) * Add Tests for val and test-steps * Add native AMP * pep8 tests * pep8 plugin * changelog	2021-03-20 23:15:49 +00:00
Jirka Borovec	3a56a6024e	Prune metrics: other classification 7/n (#6584 ) * confusion_matrix * iou * f_beta * hamming_distance * stat_scores * tests * flake8 * chlog	2021-03-20 03:18:52 +05:30
Kaushik B	87c03b1038	Update Gradient Clipping for TPU Accelerator (#6576 )	2021-03-20 01:02:57 +05:30
Ethan Harris	983a888f49	Fix all_gather for tpu_cores=8 (#6587 )	2021-03-19 21:56:58 +05:30
Sean Naren	4e9b453854	[Fix] Move init dist connection into the setup function (#6506 ) * Move connection setup into the setup function. Call setup hook after we set up the accelerator * Added CHANGELOG.md * fix setup order in callback test * fix input arguments in test * Mock distributed function, remove protection to turn into training type hook * Remove import * Add missing mock, ensure custom plugin does not create children process * Skip test on windows * Update deepspeed to init connection in setup * Do not initialize distributed module * Move DeepSpeed tests to special tests since dist communication is being set up * Special the test to see if this fixes CI * Delete accelerator connector test to see if its causing build to fail * Delete deepspeed test * Revert "Delete accelerator connector test to see if its causing build to fail" This reverts commit `edde60b8` * Revert "Delete deepspeed test" This reverts commit `9d317429` * Reverse hook * Reverse setup hooks to debug again * Add todo so i know where i left off * For single device move in pre_dispatch after setup function * Add additional model to device hook if any additional parameters have been set * See if we can enable deepspeed tests * Revert "See if we can enable deepspeed tests" This reverts commit `b5450def` * See if this hook approach works * Introduce new granular hooks * Remove import, fix tpu spawn by moving the function to setup * Added missing special test Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2021-03-18 14:33:39 -07:00
Kaushik B	b606171299	Update Changelog for v1.2.4 (#6581 ) * Update changelog for v1.2.4 * lagacy v1.2.4 * prune duplicates from changelog Co-authored-by: Jirka Borovec <jirka.borovec@seznam.cz>	2021-03-18 20:13:54 +00:00
Jirka Borovec	38a2119359	Prune metrics: precision & recall 6/n (#6573 ) * avg precision * precision * recall * curve * tests * chlog * isort * fix	2021-03-18 13:21:59 -04:00
Jirka Borovec	9e35f979ea	Prune metrics: AUC & AUROC (#6572 ) * class: AUC AUROC * func: auc auroc * format * tests	2021-03-18 10:38:56 +01:00
Jirka Borovec	2f6ce1ae7f	prune metric: accuracy 4/n (#6515 ) * prune accuracy * chlog * flake8 * Apply suggestions from code review Co-authored-by: Nicki Skafte <skaftenicki@gmail.com> * wrap * test * test * fix Co-authored-by: Nicki Skafte <skaftenicki@gmail.com>	2021-03-17 11:37:10 +00:00
Kaushik B	b190403e28	Add outputs param for `on_val/test_epoch_end` hooks (#6120 ) * add outputs param for on_val/test_epoch_end hooks * update changelog * fix warning message * add custom call hook * cache logged metrics * add args to docstrings * use warning cache * add utility method for param in sig check * Update CHANGELOG.md Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * update docstring * add test for eval epoch end hook * add types and replace model ref * add deprecation test * fix test fx name * add model hooks warning * add old signature model to tests * add clear warning cache * sopport args param * update tests * add tests for model hooks * code suggestions * add signature utils * fix pep8 issues * fix pep8 issues * fix outputs issue * fix tests * code fixes * fix validate test * test Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>	2021-03-16 12:15:16 -04:00
Jirka Borovec	a312219d42	Prune metric: helpers and inputs 3/n (#6547 ) * _basic_input_validation * _check_shape_and_type_consistency * _check_num_classes_binary * _check_num_classes_mc * _check_num_classes_ml * _check_top_k * _check_classification_inputs * _input_format_classification * _reduce_stat_scores * DataType * rest * flake8 * chlog	2021-03-16 13:54:06 +01:00
Jirka Borovec	6453091b8a	Prune metrics base classes 2/n (#6530 ) * base class * extensions * chlog * _stable_1d_sort * _check_same_shape * _input_format_classification_one_hot * utils * to_onehot * select_topk * to_categorical * get_num_classes * reduce * class_reduce * tests	2021-03-15 19:28:18 +00:00
Jirka Borovec	b341b53f70	deprecate metrics pkg (#6505 ) * deprecate metrics * examples * req * docs * Apply suggestions from code review Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: Nicki Skafte <skaftenicki@gmail.com> * pep8 Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: Nicki Skafte <skaftenicki@gmail.com>	2021-03-15 14:39:38 +00:00
Luca Di Liello	5d73fbbd81	Mean Average Precision metric for Information Retrieval (1/5) (#5032 ) * init information retrieval metrics * changed retrieval metrics names, expanded arguments and fixed typo * added 'Retrieval' prefix to metrics and fixed conflict with already-present 'average_precision' file * improved code formatting * pep8 code compatibility * features/implemented new Mean Average Precision metrics for Information Retrieval + doc * fixed pep8 compatibility * removed threshold parameter and fixed typo on types in RetrievalMAP and improved doc * improved doc, put first class-specific args in RetrievalMetric and transformed RetrievalMetric in abstract class * implemented tests for functional and class metric. fixed typo when input tensors are empty or when all targets are False * fixed typos in doc and changed torch.true_divide to torch.div * fixed typos pep8 compatibility * fixed types in long division in ir_average_precision and example in mean_average_precision * RetrievalMetric states are not lists and _metric method accepts predictions and targets for easier extension * updated CHANGELOG file * added '# noqa: F401' flag to not used imports * added double space before '# noqa: F401' flag * Update CHANGELOG.md Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * change get_mini_groups in get_group_indexes * added checks on target inputs * minor refactoring for code cleanness * split tests over exception raising in separate function && refactored test code into multiple functions * fixed pep8 compatibility * implemented suggestions of @SkafteNicki * fixed imports for isort and added types annontations to functions in test_map.py * isort on test_map and fixed typing * isort on retrieval and on __init__.py and utils.py in metrics package * fixed typo in pytorch_lightning/metrics/__init__.py regarding code style * fixed yapf compatibility * fixed yapf compatibility * fixed typo in doc Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Nicki Skafte <skaftenicki@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>	2021-03-15 12:18:43 +01:00
Adrian Wälchli	02fa32b7bc	Handle torch.jit scripted modules in layer summary (#6511 )	2021-03-15 03:17:42 +01:00
thomas chaton	0544efd453	[bug] Update broadcast + reduce decision ModelCheckpoint] (#6410 ) * resolve bug * update * update changelog * update PR * Update pytorch_lightning/trainer/connectors/logger_connector/epoch_result_store.py Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * add todo * resolve issues * resolve flake8 * update * add coverage for reduce * wip * restore back to brodbact * remove test.py * resolve flake8 * update * check world size * resolve test * update * use pytorch version when defined * update on comments * update on comments * flake8 * resolve bugs * Update CHANGELOG.md Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * update * update * update * update * remove test * update * resolve flake8 * update * update * update * proxy * update * update * resolve typo * prune * update parallel * update Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2021-03-14 17:14:27 +00:00
Adrian Wälchli	b2bcad1132	Fix tuner.scale_batch_size not finding batch size attribute when using datamodule (#5968 )	2021-03-14 09:16:19 +01:00
ananthsub	cea170e011	[feat] Support iteration-based checkpointing in model checkpoint callback (#6146 ) * Update model_checkpoint.py * add tests * Update model_checkpoint.py * Update test_model_checkpoint.py * fix tests * every_n_batches * Update test_model_checkpoint.py * defaults * rm tests * Update model_checkpoint.py * Update test_model_checkpoint.py * Prune deprecated metrics for 1.3 (#6161) * prune deprecated metrics for 1.3 * isort / yapf * Update model_checkpoint.py * add tests * defaults * Update CHANGELOG.md * pre-commit * Update model_checkpoint.py * update defaults * Update test_remove_1-5.py * Update model_checkpoint.py * Update model_checkpoint.py * Update model_checkpoint.py * Update model_checkpoint.py * Update model_checkpoint.py * Update model_checkpoint.py * fix tests * Update test_model_checkpoint.py * Update model_checkpoint.py * Update model_checkpoint.py * Update model_checkpoint.py * Update test_model_checkpoint.py * ckpt-callback * Update test_model_checkpoint.py * Update model_checkpoint.py * Update model_checkpoint.py * validation-end * Update model_checkpoint.py * Update test_model_checkpoint.py * Update test_model_checkpoint.py * Update test_model_checkpoint.py * Update test_model_checkpoint.py * clarify-names - Make names explicit as to which hooks they apply to - Use step instead of batch for consistency with global step * Update model_checkpoint.py * Update model_checkpoint.py * Update model_checkpoint.py * Update model_checkpoint.py * Update model_checkpoint.py * mutual-exclusive Make every_n_train_steps and every_n_val_epochs mutually exclusive * fix-default-0 * Update CHANGELOG.md * formatting * make-private make attributes private to the class * rebase Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>	2021-03-11 14:44:29 -08:00
Rohit Gupta	c53edce1a1	Disable batch transfer in DP mode (#6098 ) * add exceptions and test * hook * fix * clean up * clean up * regex * regex * docs * rev * comment and docs * chlog * Apply suggestions from code review Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * Apply suggestions from code review Co-authored-by: chaton <thomas@grid.ai> * Monkey-patch device count * docs * pep * api_change Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: chaton <thomas@grid.ai>	2021-03-11 10:51:10 -05:00
Max Frei	2ecda5df52	Allow user to disable the automatic formatting of checkpoint file names. (#6277 ) * cleaning SWA (#6259) * rename * if * test * chlog * Remove opt from manual_backward in docs (#6267) * switch agents pool (#6270) * Allow user to disable the automatic formatting of checkpoint file names. * Added changelog entry. * Made flake8 happy. * Applied review suggestion: quotes for special characters in docstring Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * Fixed example in docstring. * Fixed syntax error in docstring. Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Akihiro Nitta <nitta@akihironitta.com> Co-authored-by: thomas chaton <thomas@grid.ai> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2021-03-11 16:40:23 +08:00
Elia Cereda	f4cc7451a9	Add Trainer.validate(…) method to run one validation epoch (#4948 ) Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com> Co-authored-by: chaton <thomas@grid.ai> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2021-03-11 03:46:37 +01:00
Sean Naren	1c013b43e0	[Fix] Ensure we set the default device before initializing deepspeed (#6460 ) * Ensure we set the default device before initializing deepspeed * Add CHANGELOG.md * Update pytorch_lightning/plugins/training_type/deepspeed.py Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com> Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>	2021-03-10 16:29:37 +00:00
thomas chaton	7d4e74c745	[bug] All_gather support tensor on cpu (#6416 ) * add test * update changelog * update * rename function	2021-03-10 14:19:07 +00:00
Sean Naren	c81b2a8189	Set find unused parameters to True by default to fix breaking compatibility (#6438 ) * Set find unused parameters to True by default to fix breaking models, add suggestion to re-enable * Add changelog	2021-03-10 10:40:24 +01:00
Adrian Wälchli	615b2f7363	Improve DummyLogger (#6398 ) * fix dummy logger * docs * update docs * add changelog * add none return annotation * return empty string for name, version	2021-03-09 23:18:38 +00:00
thomas chaton	30d649b9a7	[changelog] Update Changelog on release v1.2.3 (#6444 ) * update changelog * legacy 1.2.3 Co-authored-by: Jirka Borovec <jirka.borovec@seznam.cz>	2021-03-09 15:17:36 -08:00
Adrian Wälchli	fc6d402733	fix logger creating directory structure too early in DDP (#6380 ) * fix * add simple test * fix imports * add changelog * tighter test with on_fit_start hook closer to the dispatch call * move class inside test f unction * add a comment	2021-03-09 09:49:59 +00:00
David Palzer	523c59bfdd	fixed bug where tuner would not tune lr if also tuning batch_size (#4688 ) * fixed bug where tuner would not tune lr if also tuning batch_size * added a '+1' to computing the smoothed loss. This maintains the behavior for the smoothed loss as before the bug fix * pep8 fix * add changelog Co-authored-by: chaton <thomas@grid.ai> Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2021-03-09 08:30:06 +08:00

1 2 3 4 5 ...

600 Commits