Commit Graph

4572 Commits

Author SHA1 Message Date
Kaushik B 87c03b1038
Update Gradient Clipping for TPU Accelerator (#6576) 2021-03-20 01:02:57 +05:30
Ethan Harris 983a888f49
Fix all_gather for tpu_cores=8 (#6587) 2021-03-19 21:56:58 +05:30
Sean Naren 4e9b453854
[Fix] Move init dist connection into the setup function (#6506)
* Move connection setup into the setup function. Call setup hook after we set up the accelerator

* Added CHANGELOG.md

* fix setup order in callback test

* fix input arguments in test

* Mock distributed function, remove protection to turn into training type hook

* Remove import

* Add missing mock, ensure custom plugin does not create children process

* Skip test on windows

* Update deepspeed to init connection in setup

* Do not initialize distributed module

* Move DeepSpeed tests to special tests since dist communication is being set up

* Special the test to see if this fixes CI

* Delete accelerator connector test to see if its causing build to fail

* Delete deepspeed test

* Revert "Delete accelerator connector test to see if its causing build to fail"

This reverts commit edde60b8

* Revert "Delete deepspeed test"

This reverts commit 9d317429

* Reverse hook

* Reverse setup hooks to debug again

* Add todo so i know where i left off

* For single device move in pre_dispatch after setup function

* Add additional model to device hook if any additional parameters have been set

* See if we can enable deepspeed tests

* Revert "See if we can enable deepspeed tests"

This reverts commit b5450def

* See if this hook approach works

* Introduce new granular hooks

* Remove import, fix tpu spawn by moving the function to setup

* Added missing special test

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-03-18 14:33:39 -07:00
Kaushik B b606171299
Update Changelog for v1.2.4 (#6581)
* Update changelog for v1.2.4

* lagacy v1.2.4

* prune duplicates from changelog

Co-authored-by: Jirka Borovec <jirka.borovec@seznam.cz>
2021-03-18 20:13:54 +00:00
Jirka Borovec 38a2119359
Prune metrics: precision & recall 6/n (#6573)
* avg precision

* precision
* recall

* curve

* tests

* chlog

* isort

* fix
2021-03-18 13:21:59 -04:00
thomas chaton 8853a36d45
[doc] Update Dict Train Loader doc. (#6579)
* update doc

* update example
2021-03-18 17:14:38 +00:00
Jirka Borovec 9e35f979ea
Prune metrics: AUC & AUROC (#6572)
* class: AUC AUROC

* func: auc auroc

* format

* tests
2021-03-18 10:38:56 +01:00
Jirka Borovec 2f6ce1ae7f
prune metric: accuracy 4/n (#6515)
* prune accuracy

* chlog

* flake8

* Apply suggestions from code review

Co-authored-by: Nicki Skafte <skaftenicki@gmail.com>

* wrap

* test

* test

* fix

Co-authored-by: Nicki Skafte <skaftenicki@gmail.com>
2021-03-17 11:37:10 +00:00
Jirka Borovec 297e438153
fix deprecation wrapper & tests (#6553)
* fix deprecation wrapper & tests

* flake8
2021-03-17 10:41:08 +00:00
thomas chaton 00cd918177
[doc] Add Zero Grad `set_to_none=True` trick (#6548)
* add trick to doc

* update

* update path

* Update docs/source/benchmarking/performance.rst

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2021-03-16 23:40:14 +00:00
Kaushik B b190403e28
Add outputs param for `on_val/test_epoch_end` hooks (#6120)
* add outputs param for on_val/test_epoch_end hooks

* update changelog

* fix warning message

* add custom call hook

* cache logged metrics

* add args to docstrings

* use warning cache

* add utility method for param in sig check

* Update CHANGELOG.md

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* update docstring

* add test for eval epoch end hook

* add types and replace model ref

* add deprecation test

* fix test fx name

* add model hooks warning

* add old signature model to tests

* add clear warning cache

* sopport args param

* update tests

* add tests for model hooks

* code suggestions

* add signature utils

* fix pep8 issues

* fix pep8 issues

* fix outputs issue

* fix tests

* code fixes

* fix validate test

* test

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2021-03-16 12:15:16 -04:00
Jirka Borovec 555a6fea21
prune warning & deprecation wrapper (#6540)
* docs

* wrapper

* test

* count

* flake8
2021-03-16 14:55:31 +00:00
Jirka Borovec a312219d42
Prune metric: helpers and inputs 3/n (#6547)
* _basic_input_validation

* _check_shape_and_type_consistency

* _check_num_classes_binary

* _check_num_classes_mc

* _check_num_classes_ml

* _check_top_k

* _check_classification_inputs

* _input_format_classification

* _reduce_stat_scores

* DataType

* rest

* flake8

* chlog
2021-03-16 13:54:06 +01:00
Jirka Borovec 0f07eaf51a
refactor reading env defaults (#6510)
* change tests

* fix

* test

* _defaults_from_env_vars

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-03-16 10:10:17 +00:00
Amog Kamsetty 6a14146811
Custom Plugin is_distributed (#6537)
* return from plugin

* dont return for tpu
2021-03-15 19:38:30 +00:00
Jirka Borovec 6453091b8a
Prune metrics base classes 2/n (#6530)
* base class

* extensions

* chlog

* _stable_1d_sort

* _check_same_shape

* _input_format_classification_one_hot

* utils

* to_onehot

* select_topk

* to_categorical

* get_num_classes

* reduce

* class_reduce

* tests
2021-03-15 19:28:18 +00:00
Carlos Mocholí 9c5973357e
Update hook lifecycle (#6538)
* Update hook lifecycle

* Update docs/source/common/lightning_module.rst
2021-03-15 19:16:31 +00:00
Adrian Wälchli ea36ee30b0
fix attribute access in LightningModule.toggle_optimizer (#6513) 2021-03-15 19:06:17 +01:00
Sean Naren 383565d225
Update DeepSpeed docs (#6528)
* Clean up docs and add some explicitness around stages

* Apply suggestions from code review

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2021-03-15 18:00:21 +00:00
Roger Shieh c48fc6a2ce
[test] lr_find with bs_scale (#6422)
* init test: test_lr_find_with_bs_scale

* Update test_lr_finder.py

* remove gpu req

* try boring model

* custom boring model

* pep8

* fix typo

* Update test_lr_finder.py

* typo

* typo
2021-03-15 22:43:35 +05:30
Jirka Borovec b341b53f70
deprecate metrics pkg (#6505)
* deprecate metrics

* examples

* req

* docs

* Apply suggestions from code review

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Nicki Skafte <skaftenicki@gmail.com>

* pep8

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Nicki Skafte <skaftenicki@gmail.com>
2021-03-15 14:39:38 +00:00
Jirka Borovec eb3ff413a9
CI: Azure publish results (#6514) 2021-03-15 14:38:40 +00:00
Luca Di Liello 5d73fbbd81
Mean Average Precision metric for Information Retrieval (1/5) (#5032)
* init information retrieval metrics

* changed retrieval metrics names, expanded arguments and fixed typo

* added 'Retrieval' prefix to metrics and fixed conflict with already-present 'average_precision' file

* improved code formatting

* pep8 code compatibility

* features/implemented new Mean Average Precision metrics for Information Retrieval + doc

* fixed pep8 compatibility

* removed threshold parameter and fixed typo on types in RetrievalMAP and improved doc

* improved doc, put first class-specific args in RetrievalMetric and transformed RetrievalMetric in abstract class

* implemented tests for functional and class metric. fixed typo when input tensors are empty or when all targets are False

* fixed typos in doc and changed torch.true_divide to torch.div

* fixed typos pep8 compatibility

* fixed types in long division in ir_average_precision and example in mean_average_precision

* RetrievalMetric states are not lists and _metric method accepts predictions and targets for easier extension

* updated CHANGELOG file

* added '# noqa: F401' flag to not used imports

* added double space before '# noqa: F401' flag

* Update CHANGELOG.md

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* change get_mini_groups in get_group_indexes

* added checks on target inputs

* minor refactoring for code cleanness

* split tests over exception raising in separate function && refactored test code into multiple functions

* fixed pep8 compatibility

* implemented suggestions of @SkafteNicki

* fixed imports for isort and added types annontations to functions in test_map.py

* isort on test_map and fixed typing

* isort on retrieval and on __init__.py and utils.py in metrics package

* fixed typo in pytorch_lightning/metrics/__init__.py regarding code style

* fixed yapf compatibility

* fixed yapf compatibility

* fixed typo in doc

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Nicki Skafte <skaftenicki@gmail.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2021-03-15 12:18:43 +01:00
Dipam Vasani 06756a84e6
document exceptions for metrics/functional (#6273)
* document exceptions for metrics/functional

* Apply suggestions from code review

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

* Apply suggestions from code review

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: Akihiro Nitta <nitta@akihironitta.com>
2021-03-15 12:07:52 +01:00
Jirka Borovec 156847bea7
CI: resume testing with py3.8 (#6516)
* testing on python 3.8

* req
2021-03-15 12:07:23 +01:00
Adrian Wälchli 02fa32b7bc
Handle torch.jit scripted modules in layer summary (#6511) 2021-03-15 03:17:42 +01:00
thomas chaton 0544efd453
[bug] Update broadcast + reduce decision ModelCheckpoint] (#6410)
* resolve bug

* update

* update changelog

* update PR

* Update pytorch_lightning/trainer/connectors/logger_connector/epoch_result_store.py

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* add todo

* resolve issues

* resolve flake8

* update

* add coverage for reduce

* wip

* restore back to brodbact

* remove test.py

* resolve flake8

* update

* check world size

* resolve test

* update

* use pytorch version when defined

* update on comments

* update on comments

* flake8

* resolve bugs

* Update CHANGELOG.md

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* update

* update

* update

* update

* remove test

* update

* resolve flake8

* update

* update

* update

* proxy

* update

* update

* resolve typo

* prune

* update parallel

* update

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-03-14 17:14:27 +00:00
Rohit Gupta dcd9dd8338
Update docs for limit_predict_batches (#6507)
* add docs and minor updates

* docs

* fraction
2021-03-14 09:09:58 +00:00
Adrian Wälchli b2bcad1132
Fix tuner.scale_batch_size not finding batch size attribute when using datamodule (#5968) 2021-03-14 09:16:19 +01:00
Akihiro Nitta 680e83adab
[doc] Update the order of zero_grad and backward (#6478)
* Fix zero_grad in docs

* Fix zero_grad in docs
2021-03-12 09:00:23 +00:00
Carlos Mocholí 518c7e4b2d
Remove unused mixin attributes (#6487)
* Remove unused mixing attributes

* Missing import
2021-03-12 08:29:52 +00:00
Adrian Wälchli 6596447f16
update xla version (#6464) 2021-03-12 10:04:47 +08:00
ananthsub cea170e011
[feat] Support iteration-based checkpointing in model checkpoint callback (#6146)
* Update model_checkpoint.py

* add tests

* Update model_checkpoint.py

* Update test_model_checkpoint.py

* fix tests

* every_n_batches

* Update test_model_checkpoint.py

* defaults

* rm tests

* Update model_checkpoint.py

* Update test_model_checkpoint.py

* Prune deprecated metrics for 1.3 (#6161)

* prune deprecated metrics for 1.3

* isort / yapf

* Update model_checkpoint.py

* add tests

* defaults

* Update CHANGELOG.md

* pre-commit

* Update model_checkpoint.py

* update defaults

* Update test_remove_1-5.py

* Update model_checkpoint.py

* Update model_checkpoint.py

* Update model_checkpoint.py

* Update model_checkpoint.py

* Update model_checkpoint.py

* Update model_checkpoint.py

* fix tests

* Update test_model_checkpoint.py

* Update model_checkpoint.py

* Update model_checkpoint.py

* Update model_checkpoint.py

* Update test_model_checkpoint.py

* ckpt-callback

* Update test_model_checkpoint.py

* Update model_checkpoint.py

* Update model_checkpoint.py

* validation-end

* Update model_checkpoint.py

* Update test_model_checkpoint.py

* Update test_model_checkpoint.py

* Update test_model_checkpoint.py

* Update test_model_checkpoint.py

* clarify-names

- Make names explicit as to which hooks they apply to
- Use step instead of batch for consistency with global step

* Update model_checkpoint.py

* Update model_checkpoint.py

* Update model_checkpoint.py

* Update model_checkpoint.py

* Update model_checkpoint.py

* mutual-exclusive

Make every_n_train_steps and every_n_val_epochs mutually exclusive

* fix-default-0

* Update CHANGELOG.md

* formatting

* make-private

make attributes private to the class

* rebase

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2021-03-11 14:44:29 -08:00
Adrian Wälchli 62d4304ca4
remove obsolete todo in pl_examples (#6475) 2021-03-11 18:49:30 +01:00
Rohit Gupta c53edce1a1
Disable batch transfer in DP mode (#6098)
* add exceptions and test

* hook

* fix

* clean up

* clean up

* regex

* regex

* docs

* rev

* comment and docs

* chlog

* Apply suggestions from code review

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* Apply suggestions from code review

Co-authored-by: chaton <thomas@grid.ai>

* Monkey-patch device count

* docs

* pep

* api_change

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: chaton <thomas@grid.ai>
2021-03-11 10:51:10 -05:00
Eric Cousineau e886d55ac1
argparse: Add use_argument_group=True (#6088)
* argparse: Add inplace option

Replicate in GAN model

* datamodule: Deduplicate logic w/ argparser utilities

* Update pl_examples/domain_templates/generative_adversarial_net.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Akihiro Nitta <nitta@akihironitta.com>

* Keep docstrings

* Correct name

* Whitespace

* Consistency

* fix weird type stuff

* try alt - use_argument_group

* fix syntax + lint

* fix ci errs

* fix ci

* change examples... still failing w/ "unrecognized arguments: --batch_size"

* address review

* mnist_datamodule: add some docstrings

* argparse: check cls or cls.__init__ for param

didn't capture issue, but meh

* fix lint

* fix no-doc edge case

* address review

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Akihiro Nitta <nitta@akihironitta.com>
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-03-11 10:50:49 -05:00
Jirka Borovec afe0ededa3
cover subproc coverage (#6477) 2021-03-11 15:45:26 +00:00
Kaushik B 079fe9bc09
Hotfix for torchvision (#6476) 2021-03-11 16:49:48 +05:30
Max Frei 2ecda5df52
Allow user to disable the automatic formatting of checkpoint file names. (#6277)
* cleaning SWA (#6259)

* rename

* if

* test

* chlog

* Remove opt from manual_backward in docs (#6267)

* switch agents pool (#6270)

* Allow user to disable the automatic formatting of checkpoint file names.

* Added changelog entry.

* Made flake8 happy.

* Applied review suggestion: quotes for special characters in docstring

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* Fixed example in docstring.

* Fixed syntax error in docstring.

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Akihiro Nitta <nitta@akihironitta.com>
Co-authored-by: thomas chaton <thomas@grid.ai>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-03-11 16:40:23 +08:00
Elia Cereda f4cc7451a9
Add Trainer.validate(…) method to run one validation epoch (#4948)
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
Co-authored-by: chaton <thomas@grid.ai>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-03-11 03:46:37 +01:00
Carlos Mocholí d1db604c61
Remove redundant test (#6466) 2021-03-10 20:16:09 +01:00
Sean Naren 1c013b43e0
[Fix] Ensure we set the default device before initializing deepspeed (#6460)
* Ensure we set the default device before initializing deepspeed

* Add CHANGELOG.md

* Update pytorch_lightning/plugins/training_type/deepspeed.py

Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>

Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
2021-03-10 16:29:37 +00:00
thomas chaton 7d4e74c745
[bug] All_gather support tensor on cpu (#6416)
* add test

* update changelog

* update

* rename function
2021-03-10 14:19:07 +00:00
Sean Naren c81b2a8189
Set find unused parameters to True by default to fix breaking compatibility (#6438)
* Set find unused parameters to True by default to fix breaking models, add suggestion to re-enable

* Add changelog
2021-03-10 10:40:24 +01:00
Kaushik B 74d79e7e0e
Raise an exception if check_val_every_n_epoch is not an integer (#6411)
* raise an exception if check_val_every_n_epoch is not an integer

* remove unused object

* add type hints

* add return type

* update exception message

* update exception message
2021-03-10 12:08:53 +05:30
Adrian Wälchli 615b2f7363
Improve DummyLogger (#6398)
* fix dummy logger

* docs

* update docs

* add changelog

* add none return annotation

* return empty string for name, version
2021-03-09 23:18:38 +00:00
thomas chaton 30d649b9a7
[changelog] Update Changelog on release v1.2.3 (#6444)
* update changelog

* legacy 1.2.3

Co-authored-by: Jirka Borovec <jirka.borovec@seznam.cz>
2021-03-09 15:17:36 -08:00
Jirka Borovec 55dd3a4c64
Typing for tests 1/n (#6313)
* typing

* yapf

* typing
2021-03-09 11:27:15 +00:00
Adrian Wälchli fc6d402733
fix logger creating directory structure too early in DDP (#6380)
* fix

* add simple test

* fix imports

* add changelog

* tighter test with on_fit_start hook closer to the dispatch call

* move class inside test f unction

* add a comment
2021-03-09 09:49:59 +00:00
Adrian Wälchli 75c6486ac7
update (#6403) 2021-03-09 09:47:51 +00:00