Commit Graph

357 Commits

Author SHA1 Message Date
Sean Naren e7134a9135
Sharded Plugin 2/n: Allow ddp plugin to modify optimizer state saving (#4675)
* Allow ddp plugin to modify optimizer state saving

* Rely on the accelerator for optimizer states

* Ensure we init the accelerator for the saving function

* Better comment for optim state dump

* Revert "Ensure we init the accelerator for the saving function"

This reverts commit af65effa

* Added accelerator check to initialize tuner before saving model checkpoint

* Simplify comment

* Revert "Added accelerator check to initialize tuner before saving model checkpoint"

This reverts commit f9929c0c

* Return single optimizer state to reduce duplication

* Fixed docstring

* Fixed typing

* Fixed comment

* Added CHANGELOG.md

Co-authored-by: chaton <thomas@grid.ai>
2020-11-18 16:38:35 +00:00
Jirka Borovec 5ea383332d
update chlog after 1.0.7 release (#4735) 2020-11-18 12:26:41 +00:00
Carlos Mocholí 396a46f55f
Add current_score to ModelCheckpoint.on_save_checkpoint (#4721)
* Add current_score to ModelCheckpoint.on_save_checkpoint

* Update CHANGELOG

[ci skip]

* fix

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

* fix2

* Add test for NaN

* Fix failing tests

* Simplify line

* Add test docstrings

Co-authored-by: chaton <thomas@grid.ai>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2020-11-18 08:09:44 +00:00
Nicki Skafte 51097669b9
[metrics] change default behaviour of state dict (#4685)
* fix state dict

* Update docs/source/metrics.rst

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

* changelog

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: chaton <thomas@grid.ai>
2020-11-16 12:33:45 +00:00
Jirka Borovec be60efb3cf
allow decorate model init with saving hparams (#4662)
* addd tests

* use boring model

* parsing init

* chlog

* double decorate

* Apply suggestions from code review

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* bug

Co-authored-by: chaton <thomas@grid.ai>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Nicki Skafte <skaftenicki@gmail.com>
Co-authored-by: Roger Shieh <sh.rog@protonmail.ch>
2020-11-16 11:02:26 +01:00
ananthsub d096a2ea6d
Fix setup callback hook to pass LightningModule through (#4608)
* Fix setup callback hook

* Update CHANGELOG.md

* Update test_trainer.py

* Update test_trainer.py

* Update test_trainer.py

* fix chlog

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2020-11-13 19:34:46 -05:00
Jirka Borovec 396a18eb78
update changelog after 1.0.6 (#4624)
* update changelog after 1.0.6

* fix formatting
2020-11-12 09:21:57 +01:00
chaton 3d202f9ecc
[FEAT] Refactor logging 3/3 [v1] (#4552)
* wip

* wip check how many tests break

* wip

* resolve some bugs

* resolve more bugs

* resolve 2 bugs

* resolve

* temp fix

* update

* remove useless code

* remove result

* try to resolve bug

* update changelog

* formatting

* remove pl

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
2020-11-11 17:05:24 +00:00
chaton 7e08b0d710
[bug-fix] DDP and automatic_optimization=False (#4485)
* resolve bug

* add self._running_manual_optim

* update

* update tests

* update lightning module

* resolve bug

* update tests

* update

* resolve pep8

* update

* replace by `ddp_spawn`

* temporary fix

* update

* update

* move update to training_loop

* make both ddp_spawn

* introduce `manual_optimizer_step`

* update changelog

* added changelog wrong place

* add force_optimizer_step

* update docstring for tests

* update optimizer_step

* update zero_grad

* resolve flake8

* move update into manual_optimizer_step

* add zero_grad

* remove zero_grad tests

* remove manual_backward in AMP, it doesn't help

* update

* loosen tests

* update

* update doc

* add TODO

* Removed unnecessary get model from native amp

* Remove try except with pytest raise

* Add seed, clean up imports, remove try catch to reproduce error

* update code

* update test

* revert back

* formatting

* Update pytorch_lightning/core/lightning.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

Co-authored-by: SeanNaren <sean@grid.ai>
Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2020-11-10 19:44:51 +00:00
maxjeblick 343d19fa86
Find parameters which are specified in the LightningDataModule, only (#4347)
* search for attribute in datamodule if not found elsewhere

* add test for datamodule

* add lightning_getattr test for datamodule

* Apply suggestions from code review

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* Update CHANGELOG.md

* Update CHANGELOG.md

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Nicki Skafte <skaftenicki@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2020-11-10 14:01:20 +01:00
Nicki Skafte 465ec752f8
Metric ddp bugfix (#4482)
* changes

* fix spelling

* small note

* trying to fix ddp test

* fix ddp

* fix for test

* suggestion

* CHANGELOG

* Update pytorch_lightning/metrics/metric.py

Co-authored-by: chaton <thomas@grid.ai>
Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
Co-authored-by: Sean Naren <sean@grid.ai>
2020-11-10 09:16:31 +01:00
Nicki Skafte 4f3160ba2e
Skip tuner algorithms on fast dev (#3903)
* skip on fast dev

* fix error

* changelog

* fix recursive issue

* combine tests

* pep8

* move logic to base funcs

* fix mistake

* Update pytorch_lightning/tuner/lr_finder.py

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

* pep

Co-authored-by: William Falcon <waf2107@columbia.edu>
Co-authored-by: Nicki Skafte <nugginea@gmail.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: chaton <thomas@grid.ai>
2020-11-10 00:34:42 +01:00
tarepan 41c9bee4f0
Fix load disparity between normal and hpc (#4526)
* Add missing load functionality in hpc

* Add general file load for hpc

* Add mark in CHANGELOG

* Fix Typo Li**hg**tning

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

* Refactor line separation

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

* Fix entangled fixation commit

* Fix naming of restore_model_states

* Fix amp restore place

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: chaton <thomas@grid.ai>
2020-11-09 17:26:38 +00:00
Stef | ステフ 4a6721af25
Missing TorchScript trace's update (#4586)
Co-authored-by: stef-ubuntu <stef@webempath.com>
Co-authored-by: Jeff Yang <ydcjeff@outlook.com>
Co-authored-by: Nicki Skafte <skaftenicki@gmail.com>
2020-11-09 15:01:13 +01:00
Travis Addair 51cc7a89ee
Horovod: fixed early stopping and added metrics aggregation (#3775)
* Fixed early stopping for Horovod

* Refactored to sync_dist_if_available

* Bump min Horovod version to support hvd.is_initialized

* Changelog

* Added back change for Horovod

* Removed redundant checks for initialization

* Implement metrics gathering for Horovod

* Added test for EvalResult

* Renamed ddp_sync_on_step -> dist_sync_on_step

* Added metric test for Horovod

* Added option pass callable allgather function to metric base class

* Added dist_sync_fn

* Fixed calls to private _sync_dist

* Fixed Horovod test

* Added sync_tensor to the distributed backend

* Skip Windows

* Insert test path

* Removed redundant import

* Updated drone

* Unset HOROVOD_GPU_ALLREDUCE

* Unset

* No cache dir

* No uninstall

* Unset variables

* Uninstall Horovod during initialization

* Replaced more references to ddp_sync_on_step

* Fixed imports

* Fixed attribute

* Added back default

* Lint

* Added back docstring

* Made gather_all_tensors default

* Added whitespace

* Update tests/models/test_horovod.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* Update pytorch_lightning/metrics/metric.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* Update CHANGELOG.md

Co-authored-by: Teddy Koker <teddy.koker@gmail.com>
Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2020-11-05 12:52:02 -05:00
Jirka Borovec 41c6a1307b
update changelog after 1.0.5 (#4505)
* update changelog

* update

Co-authored-by: chaton <thomas@grid.ai>
Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
2020-11-04 21:56:47 +01:00
ananthsub 5d08559c03
Avoid torchscript export for Metric forward (#4428)
* Update metric.py

* add test

* Update CHANGELOG.md

* Update test_metric_lightning.py

* Update test_metric_lightning.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2020-11-03 23:02:02 +01:00
Rohit Gupta 360b3d8844
Disable training when limit_train_batches=0 (#4371)
* Disable training when limit_train_batches=0

* chlog

* pep

* limit_train_batches

* BoringModel

Co-authored-by: Roger Shieh <sh.rog@protonmail.ch>
2020-11-03 12:10:35 +05:30
Rohit Gupta ad2556b669
Disable saving checkpoints if not trained (#4372)
* Disable saving checkpoints if not trained

* chlog

* update test

* fix

Co-authored-by: chaton <thomas@grid.ai>
2020-11-03 11:38:32 +05:30
Nicki Skafte 19187d38f9
[Metrics] Detach bugfix (#4313)
* detach on buffer

* doc update

* remove file

* changelog

* suggestions

* Update docs/source/metrics.rst

Co-authored-by: Teddy Koker <teddy.koker@gmail.com>

* fix for 4266

* Update docs/source/metrics.rst

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* Update CHANGELOG.md

Co-authored-by: Teddy Koker <teddy.koker@gmail.com>
Co-authored-by: chaton <thomas@grid.ai>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Ananya Harsh Jha <ananya@pytorchlightning.ai>
Co-authored-by: Roger Shieh <sh.rog@protonmail.ch>
Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2020-11-02 19:44:49 +00:00
chaton 102fa9ee7d
[BUGFIX] AMP + Precision unscale grad (#4441)
* move unscale within Native plugin

* remove gradient tracking from lightning backward

* forgot trainer.fit

* typo

* update

* cleanup

* set to 1.6

* typo

* skip if below 1.6 strict

* update changelog

* remove useless code

* Update tests/plugins/test_amp_plugin.py

Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>

* Update tests/plugins/test_amp_plugin.py

Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>

* update changelog

* Update CHANGELOG.md

Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
Co-authored-by: Jeff Yang <ydcjeff@outlook.com>
2020-11-02 16:36:48 +00:00
Jirka Borovec ef03c39ab7
Add step index in checkpoint name (#3807)
* true final value of global step

* ch check

* tests

* save each validation interval

* wip

* add test

* add test

* wip

* fix tests, revert old edits, fix merge conflicts, update doctests

* test + bugfix

* sort files

* format test

* suggestion by ananth

* added changelog

* naming

* docs

* example

* suggestion

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* fix test

* pep

* pep

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2020-11-02 15:05:58 +01:00
Lezwon Castelino 839813eb7b
timeout for tpu check (#4340)
* timeout for tpu check

* added tests

* updated CHANGELOG.md

* fixed windows tests

* Update pytorch_lightning/utilities/xla_device_utils.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* requested changes

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2020-11-01 01:04:25 +01:00
Dusan Drevicky 38bb4e2da0
[Metrics] Add multiclass auroc (#4236)
* Add functional multiclass AUROC metric

* Add multiclass_auroc tests

* fixup! Add functional multiclass AUROC metric

* fixup! fixup! Add functional multiclass AUROC metric

* Add multiclass_auroc doc reference

* Update CHANGELOG

* formatting

* Shorter error message regex match in tests

* Set num classes as pytest parameter

* formatting

* Update CHANGELOG

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Nicki Skafte <skaftenicki@gmail.com>
2020-10-30 19:56:13 +01:00
Jeff Yang 0f584faa6b
PyTorch 1.7 Stable support (#3821)
* prepare for 1.7 support [ci skip]

* tpu [ci skip]

* test run 1.7

* all 1.7, needs to fix tests

* couple with torchvision

* windows try

* remove windows

* 1.7 is here

* on purpose fail [ci skip]

* return [ci skip]

* 1.7 docker

* back to normal [ci skip]

* change to some_val [ci skip]

* add seed [ci skip]

* 4 places [ci skip]

* fail on purpose [ci skip]

* verbose=True [ci skip]

* use filename to track

* use filename to track

* monitor epoch + changelog

* Update tests/checkpointing/test_model_checkpoint.py

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2020-10-30 15:42:14 +00:00
Jeff Yang 48e0b33d56
[Changelog] 1.0.4 (#4440)
* changelog 1.0.4

* changelog 1.0.4
2020-10-30 13:34:29 +00:00
Nicki Skafte e0b856c105
[Metrics] Confusion matrix class interface (#4348)
* docs + precision + recall + f_beta + refactor

Co-authored-by: Teddy Koker <teddy.koker@gmail.com>

* rebase

Co-authored-by: Teddy Koker <teddy.koker@gmail.com>

* fixes

Co-authored-by: Teddy Koker <teddy.koker@gmail.com>

* added missing file

* docs

* docs

* extra import

* add confusion matrix

* add to docs

* add test

* pep8 + isort

* update tests

* move util function

* unify functional and class

* add to init

* remove old implementation

* update tests

* pep8

* add duplicate

* fix doctest

* Update pytorch_lightning/metrics/classification/confusion_matrix.py

Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>

* changelog

* bullet point args

* bullet docs

* bullet docs

Co-authored-by: ananyahjha93 <ananya@pytorchlightning.ai>
Co-authored-by: Teddy Koker <teddy.koker@gmail.com>
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
Co-authored-by: chaton <thomas@grid.ai>
Co-authored-by: Roger Shieh <55400948+s-rog@users.noreply.github.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2020-10-30 11:44:25 +01:00
Adrian Wälchli d1234c592d
deprecate passing ModelCheckpoint instance to Trainer(checkpoint_callback=...) (#4336)
* first attempt

* update tests

* support multiple

* test bugfix

* changelog

* pep

* pep

* import order

* import

* improve test for resuming

* test

* update test

* add references test

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* docstring suggestion deprecation

Co-authored-by: Jeff Yang <ydcjeff@outlook.com>

* paramref

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Jeff Yang <ydcjeff@outlook.com>
2020-10-30 04:47:37 +01:00
Martin Hwang b459fd26ac
fix: `nb` is set total number of devices, when nb is -1. (#4209)
* fix: `nb` is set total number of devices, when nb is -1.

 Refs: #4207

* feat: add test code
     1. test combination `auto_select_gpus`, `gpus` options using
Trainer
     2. test `pick_multiple_gpus` function directly

Refs: #4207

* docs: modify contents in `Select GPU devices`

 Refs: #4207

* refactore: reflect the reuslt of review

 Refs: #4207

* refactore: reflect the reuslt of review

 Refs: #4207

* Update CHANGELOG.md

Co-authored-by: chaton <thomas@grid.ai>
Co-authored-by: Roger Shieh <55400948+s-rog@users.noreply.github.com>
Co-authored-by: Nicki Skafte <skaftenicki@gmail.com>
2020-10-29 10:50:37 +01:00
Boris Dayma ff41d80706
feat(wandb): log in sync with Trainer step (#4405)
* feat(wandb): log in sync with Trainer step

* docs: update CHANGELOG

* style(test_wandb): fix formatting

* parentheses

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2020-10-29 01:07:06 +05:30
Carlos Mocholí 00cc69aed7
Add "monitor" to saved ModelCheckpoints (#4383)
* Add key

* Remove unused variables

* Update CHANGELOG [skip ci]

* best_model_monitor -> monitor

Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2020-10-28 15:21:08 +05:30
Dusan Drevicky c50c225f05
feature: Allow str arguments in Trainer.profiler (#3656)
* allow trainer's profiler param to have a str value

* add tests

* update docs

* update exception message

* Update CHANGELOG

* fix pep8 issues

* cleanup test code

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* Add deprecation warning if using bool for profiler

* Add deprecation tests and move deprecated tests

* Remove bool option to profiler from docs

* Deprecate bool args to profiler in CHANGELOG

* fixup! Add deprecation warning if using bool for profiler

* fixup! Add deprecation tests and move deprecated tests

* Apply suggestions from code review

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

* Implement suggestions, remove whitespace

* fixup! Implement suggestions, remove whitespace

* Allow bool, str (case insensitive), BaseProfiler

* Add info about bool deprecation to trainer

* fixup! Add info about bool deprecation to trainer

* Move deprecate todo to test_deprecated

* Test wrong profiler type, improve error message

* fixup! Test wrong profiler type, improve error message

* Update pytorch_lightning/trainer/connectors/profiler_connector.py

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* Apply suggestions from code review

* Readd bool to profiler types, test cli profiler arg

* Remove extra whitespace in doc

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* Apply suggestions from code review

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* Update deprecation versions

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2020-10-27 16:27:16 +05:30
Chenglu 8e3faa2da1
get help from docstring (#4344)
* Add geting help message from docstring

* Fix pep8 issue

* Apply suggestions from code review

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* Apply suggestions from code review

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
2020-10-26 23:38:58 +05:30
chaton f07ee33db6
BUG - Wandb: Sanitize callable. (#4320)
* add _sanitize_callable_params

* add call on _val if callable

* clean code formatter

* resolve pep8

* default return function name

* resolve pep8

* Apply suggestions from code review

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* Update CHANGELOG.md

Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2020-10-26 11:57:03 +00:00
Adrian Wälchli 376268f01e
Implement finalize for WandbLogger (#4341)
* wandb finish

* experiment

* upload at end of run

* changelog

* comment

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
2020-10-26 11:22:09 +00:00
ananthsub c8ccec7a02
Enable profilers to write to remote files with fsspec (#4162)
* Update profilers.py

Enable profilers to use write to remote files with fsspec

* Update profilers.py

* Update CHANGELOG.md

* Update pytorch_lightning/profiler/profilers.py

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* formatting

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2020-10-25 18:21:42 +01:00
Dusan Drevicky 6ad299573f
[Metrics] Fix/4237 auc unstable reorder (#4281)
* =Add deprecation warning for auc reorder

* =Add test for deprecation warning for auc reorder

* Update CHANGELOG

* Add reorder deprecation warning to auc docstring

* Fix pep8 f-string error

* remove duplicate import

Co-authored-by: Nicki Skafte <skaftenicki@gmail.com>
2020-10-25 10:26:40 +01:00
Adrian Wälchli 28d45a26a3
Set correct device ids in DDP [wip] (#4297)
* repro


debug


c


d


dd


d


d


d


ads


d


d


d


f


rank


f


v


d


d


d


d


d


d


d


d


d


d


d


set 


drop PL_DDP_PID


clean up


keep set gpus


revert


Revert "drop PL_DDP_PID"

This reverts commit 7d88cae469541ef19128f9c20919fd3a6f863039.
d


pid


gpus


clean up


clean up 


misconfig?


misconfig


clean


clean

* fix pep

* changelog

* remove script

Co-authored-by: chaton <thomas@grid.ai>
Co-authored-by: William Falcon <waf2107@columbia.edu>
2020-10-24 17:33:47 -04:00
Sean Naren 5641b266d5
Bug/4319 ddp checkpoint (#4323)
* Broadcast best model path to ensure we sync with main process + wait for main process to save

* Add barrier call to ensure all processes are in sync

* Added changelog commit

* Move sync of best model path/score to model checkpoint, keep barrier to ensure all processes complete

* Ensure we broadcast as tuple

* Add init check

* Update pytorch_lightning/callbacks/model_checkpoint.py

Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>

* Update pytorch_lightning/callbacks/model_checkpoint.py

Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>

* Removed model checkpoint code, added barrier to trainer to enforce we syncronize and wait for all processes to finish before completing training

* Add barrier within teardown call, removed horovod teardown to inherit from base accelerator

Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
2020-10-24 16:55:49 -04:00
ananthsub f6efb712ed
Skip replacing dataloader sampler if it's already a distributed sampler (#4273)
* Update data_loading.py

* Update data_loading.py

* add test + update flag description

* add to changelog

* Update test_dataloaders.py

* fix-pickle

* Update test_dataloaders.py

* Added missing reference calls

* Update tests/trainer/test_dataloaders.py

* Apply suggestions from code review

* Update data_loading.py

* Update test_dataloaders.py

Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2020-10-23 17:34:07 +01:00
Rohit Gupta 4c7ebdc32b
Add dirpath and filename parameter in ModelCheckpoint (#4213)
* Add dirpath and filename parameter in ModelCheckpoint

* remove old function

* chlog

* codefactor

* update tests

* docs

* fix doctest and added tests

* pathlib dirpath

* dep version and docs

* try fix doctest

* pep

* suggestions
Co-authored-by: carmocca <carlossmocholi@gmail.com>

* suggestions

* fix test

* pep

* trigger tests

* Apply suggestions from code review

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* suggestions

* try fix windows test

* add and update some tests

* trigger tests

* Apply suggestions from code review

* Apply suggestions from code review

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: William Falcon <waf2107@columbia.edu>
2020-10-23 09:59:12 +05:30
William Falcon 753362d0a4
enable ddp as a plugin (#4285)
* enable custom ddp plugin

* enable custom ddp plugin

* enable custom ddp plugin

* enable custom ddp plugin

* enable custom ddp plugin

* enable custom ddp plugin

* enable custom ddp plugin

* enable custom ddp plugin

* enable custom ddp plugin

* enable custom ddp plugin

* enable custom ddp plugin

Co-authored-by: chaton <thomas@grid.ai>
2020-10-22 05:15:51 -04:00
Mauricio Villegas 546476c704
Allow changing the logged step value in validation_step (#4130)
* Fix to bug identified in https://github.com/PyTorchLightning/pytorch-lightning/issues/4102

* update tests

* chlog

Co-authored-by: rohitgr7 <rohitgr1998@gmail.com>
2020-10-22 03:03:07 +05:30
Carlos Mocholí 2549ca40e6
Clean up optimizer code (#3587)
* Update optimizer code

* Update CHANGELOG

* Fix tuple of one list case

* Update docs

* Fix pep issue

* Minor typo [skip-ci]

* Use minimal match

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* Apply suggestions from code review

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2020-10-21 21:12:48 +02:00
Carlos Mocholí e0f9799dbf
Add strict option to lr_scheduler dict (#3586)
* Add strict option to lr_scheduler dict

* Update docs

* Unnecessary "else" after "raise"

* Update CHANGELOG

* Fix rebase
2020-10-21 14:14:37 +02:00
Nicki Skafte 3a38294d6d
New section for changelog [ci skip] (#4279)
* changelog

* Apply suggestions from code review

Co-authored-by: Nicki Skafte <nugginea@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2020-10-21 11:54:02 +02:00
Jirka Borovec e0e402dbe6
Docs/changelog for 1.0.3 (#4267)
* formatting

* miss

* missing & ver++

* path
2020-10-21 00:53:10 +02:00
Yigit Ozen fc23c1be74
Fix 1.0.0 changelog (#4180) 2020-10-15 20:36:54 +02:00
Jirka Borovec 8f2bba10e1
update chlog (#4177) 2020-10-15 11:42:10 -04:00
Jirka Borovec 4204ef7b53
Bugfix/4156 filter hparams for yaml - fsspec (#4158)
* add test

* fix

* sleepy boy

* chlog

* Apply suggestions from code review

Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>

Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
2020-10-15 16:53:42 +02:00