Commit Graph

3242 Commits

Author SHA1 Message Date
Jirka Borovec faa357648f
return simple docs to methods (#3645)
* return simple docs to methods

* sorting

* imports

* miss
2020-09-30 08:34:19 -04:00
Adrian Wälchli c73032e39d
Make ModelCheckpoint(save_top_k=-1) track the best models (#3735)
* fix topk=-1 tracking best

* update test

* clean up

* add changelog

* enable loading best topk in trainer.test()

* make trivial

* return right away

* make windows test path happy
2020-09-30 08:34:02 -04:00
Jirka Borovec a0968e4bdf
fix PT version in CUDA docker images (#3739)
* upgrade PT version

* update docker

* docker

* try 1.5

* fix docker versions

* old

* badge
2020-09-30 08:33:22 -04:00
Jirka Borovec 31a36f04df
define distributed as a type (#3740)
* define type

* miss

* Apply suggestions from code review

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

* miss

* warn

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2020-09-30 08:33:01 -04:00
William Falcon 00ba2b24b7
Drop all result docs. Make the separation between flow and logging clear (#3744)
* remove results docs. separate flow from log
2020-09-30 08:31:16 -04:00
Adrian Wälchli 9405c880af
log/save_interval based on global step (#3667)
* log interval based on global step

* test

* test

* test

* test

* pep

* pep

* added changelog

* pep

* merge

* remove unused arg
2020-09-30 12:26:27 +02:00
edenlightning 0009f29848
Update new-project.rst (#3734) 2020-09-29 23:44:27 -04:00
William Falcon b3be8022bd
tests for val step flow and logging (#3731)
* ref: test val epoch end

* ref: test val epoch end

* ref: test val epoch end

* ref: test log dict

* ref: test log dict

* ref: test log dict

* ref: test log dict
2020-09-29 22:12:56 -04:00
ananthsub 3dcf7130c5
Support checkpoint hooks on data module (#3563)
* Split out changes from #3563 to make that PR easier to review. This formats the file according to the Black formatter

* Store a reference to the trainer on the datamodule

Fixes #3682

* Update data_connector.py

* Update data_connector.py

* Update test_datamodules.py

* Split out changes from #3563 to make that PR easier to review. This formats the file according to the Black formatter

* support checkpoint hooks for datamodule

refactor on_{save/load}_checkpoint to a separate hook class that both the lightning module and data module inherit
add spots in callback connector to call new datamodule hooks if available

* hooks formatting

* Update hooks.py

* Update checkpoint_connector.py

* Update lightning.py

* update based on upstream/master

checkout upstream/master

* Update checkpoint_connector.py

* add tests

* undo format revert

* Updated CHANGELOG.md

* add checkpoint hooks

* add Dict type

* import CheckpointHooks
2020-09-29 19:51:44 +02:00
William Falcon c14928a72a
ref: test val flow steps (#3723)
* ref: test val epoch end

* ref: test val epoch end

* ref: test val epoch end
2020-09-29 11:42:38 -04:00
Maxim Grechkin 7bb139816a
Add a more direct test of multi-gpu training working (#2084)
* Add a more direct test of multi-gpu training working

* Update tests/base/develop_pipelines.py

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2020-09-29 15:38:09 +02:00
Carlos Mocholí 3b2efe5b2a
Fix ModelCheckpoint period (#3630)
* Fix ModelCheckpoint period

* Remove comma

* Minor changes

* skip check

* Revert "skip check"

Already pushed to master

This reverts commit 00d9e77b81.

Co-authored-by: Jirka Borovec <jirka@pytorchlightning.ai>
2020-09-29 15:36:45 +02:00
Akmalul Khairi a2b961c7bc
Update lr_finder.rst (#3714)
* Update lr_finder.rst

Misspelling correction

* Update docs/source/lr_finder.rst

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

Co-authored-by: William Falcon <waf2107@columbia.edu>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2020-09-29 08:28:14 -04:00
Jirka Borovec a16a41d871
formatting @Will (#3719) 2020-09-29 08:27:44 -04:00
William Falcon f42ea303c9
ref: enable self.log for eval loop metrics (#3715)
* ref: test val epoch end

* ref: test val epoch end

* ref: test val epoch end

* ref: test val epoch end

* ref: test val epoch end

* ref: test val epoch end
2020-09-29 02:00:28 -04:00
William Falcon c41ea86b35
ref: move backends back to individual files (1/5) (ddp_cpu) (#3712)
* ref: make each backend independent for easier debugging and independent debugging

* ref: make each backend independent for easier debugging and independent debugging

* ref: make each backend independent for easier debugging and independent debugging

* ref: make each backend independent for easier debugging and independent debugging

* ref: make each backend independent for easier debugging and independent debugging

* ref: make each backend independent for easier debugging and independent debugging

* ref: test val epoch end

* ref: test val epoch end
2020-09-29 01:59:18 -04:00
Rohit Gupta 783750547d
disable optimizers setup during testing (#3059)
* disable configure_optimizers during testing

* minor changes

* hvd and ddp

* fix precision during testing

* fix ddp

* fix amp

* fix cpu

* update dp

* simplify optimizers

* add test

* codefactor

* ref optimizer setup

* chlog

* suggestions

* isort

* rebased with master
2020-09-29 01:09:04 +02:00
William Falcon 4d5c0fa1bc
ref: separate flow vs log tests (#3704) 2020-09-28 12:01:52 -04:00
William Falcon cdd7266cd8
ref: enable self.log from val step (#3701)
* .log in eval

* ref

* ref: enable self.log in val step
2020-09-28 10:49:07 -04:00
William Falcon 2ecaa2a8be
ref: (2/n) fix no log in epoch end (#3699) 2020-09-28 08:25:44 -04:00
ananthsub 859ec92da5
Make Trainer.__test_using_best_weights use cloud_io's load to support more storage backends (#3694)
* Split out changes from #3563 to make that PR easier to review. This formats the file according to the Black formatter

* Store a reference to the trainer on the datamodule

Fixes #3682

* Update data_connector.py

* Update data_connector.py

* Update test_datamodules.py

* Support more storage backends in trainer.test using best weights

Similar to #3692

* Update trainer.py

* Update trainer.py

use cloud_io load directly
2020-09-28 07:53:57 -04:00
William Falcon ddd11075bd
[WIP] ref: deprecated results obj, added support for simpler comms (1/n) (#3681)
* ref: deprecated results obj, added support for simpler comms. Decouples logging from loops

* ref: deprecated results obj, added support for simpler comms. Decouples logging from loops

* ref: deprecated results obj, added support for simpler comms. Decouples logging from loops

* ref: deprecated results obj, added support for simpler comms. Decouples logging from loops

* ref: deprecated results obj, added support for simpler comms. Decouples logging from loops

* ref: deprecated results obj, added support for simpler comms. Decouples logging from loops

* fix global step err

* fix global step err

* fix global step err

* fix global step err

* fix global step err

* fix typing err

* fix str

* fix typing err
2020-09-27 23:19:46 -04:00
William Falcon ff2bab0996
ref: (results 1/n) enable tracking original metric when step and epoch are both true (#3685)
* enable tracking original metric when step and epoch are both true
2020-09-27 22:08:31 -04:00
William Falcon 931995b55b
remove flake 8 (#3687) 2020-09-27 20:40:02 -04:00
William Falcon a41704ee93
ref: add .log to lightning module (1/n) (#3686) 2020-09-27 20:26:16 -04:00
Adrian Wälchli f37e9e8a83
Fix global step increment on training_epoch_end (#3673)
* fix

* fix global step err

* fix global step err

* fix global step err

* fix global step err

* fix global step err

* fix global step err

Co-authored-by: William Falcon <waf2107@columbia.edu>
2020-09-27 20:19:51 -04:00
Adrian Wälchli d15fd751c7
change default save_top_k, save_last to None (#3680)
* topk default

* fix test that doesn't have best available

* remove print

* #3680 changes

* fix backward

* temp revert

te

* add warning by carmocca

* format docstring for test

* specify monitor in ES test with top k

* improve docstring for save_last

* remove commented lines

* revert passing model to test

* undo regex mistake

* changelog

* fix test covering case monitor=None and savetopk=-1

* docstring

* fix test for saving all checkpoints

* don't save checkpoints for save_top_k=0

* add test for savetopk=0

Co-authored-by @carmocca

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2020-09-27 20:05:02 -04:00
ananthsub 94c79bb3ba
Add a reference to the Trainer on the LightningDataModule (#3684)
* Split out changes from #3563 to make that PR easier to review. This formats the file according to the Black formatter

* Store a reference to the trainer on the datamodule

Fixes #3682

* Update data_connector.py

* Update data_connector.py

* Update test_datamodules.py
2020-09-27 19:48:01 -04:00
Pariente Manuel 3d76f604bd
Add ModelCheckpoint.to_yaml method (#3048)
* Add ModelCheckpoint.to_json()

* Add ModelCheckpoint.to_json() test

* Fix W292: Add new line at end of file

* Apply suggestions from code review

* Apply suggestions from code review

Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>

* Fixed tests

* Update pytorch_lightning/callbacks/model_checkpoint.py

* Apply suggestions from code review

* fix test

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2020-09-27 14:39:40 +02:00
William Falcon d79bce1dff
enable None model checkpoint default (#3669)
* enable None model checkpoint default

* enable None model checkpoint default

* enable None model checkpoint default

* enable None model checkpoint default

* enable None model checkpoint default

* enable None model checkpoint default

* enable None model checkpoint default

* enable None model checkpoint default

* enable None model checkpoint default

* enable None model checkpoint default

* enable None model checkpoint default

* enable None model checkpoint default

* enable None model checkpoint default

* enable None model checkpoint default

* enable None model checkpoint default

* enable None model checkpoint default

* enable None model checkpoint default

* enable None model checkpoint default

* enable None model checkpoint default

* enable None model checkpoint default

* enable None model checkpoint default

* enable None model checkpoint default
2020-09-26 23:14:04 -04:00
Jirka Borovec a94728c99b
spec Horovod version (#3661)
* spec Horovod version

* MAKEFLAGS="-j2"

* tests

* CI

* docker

* CI

* docker
2020-09-26 19:30:25 +02:00
Jeff Yang 05e5f03fd7
Enable PyTorch 1.7 in conda CI (#3541)
* enable pt 1.7

* readme

* nightly diff version testing, will delete later

* nightly diff version testing, will delete later

* back to normal [ci skip]

* use __ignored_properties__

* define __ignored_properties__ in respective modules

* change log

* formatting

Co-authored-by: Jirka Borovec <jirka@pytorchlightning.ai>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2020-09-25 16:20:15 +02:00
Jirka Borovec 7fd8ac6671
update stale config (#3509)
* update stale conf

* labels
2020-09-25 16:00:51 +02:00
Adrian Wälchli 3ff5327e83
Mocking loggers (part 1, wandb) (#3596)
* mocking for wandb

* remove wandb import in amp test

* mock loggers in sphinx

* check tests

* Update extra.txt

* setup

* dev

* min

* revert

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Jirka Borovec <jirka@pytorchlightning.ai>
2020-09-25 16:00:02 +02:00
Jirka Borovec 0784cf3ab4
dockers nightly (#3615)
* dockers nightly

* typo

* Apply suggestions from code review

Co-authored-by: Jeff Yang <ydcjeff@outlook.com>

Co-authored-by: Jeff Yang <ydcjeff@outlook.com>
2020-09-25 15:58:01 +02:00
Carlos Mocholí e70aea7642
Allow ModelCheckpoint monitor to be None (#3633)
* Fix ModelCheckpoint period

* Test for less epochs
2020-09-25 15:54:04 +02:00
Carlos Mocholí ed12e422a4
Fix incorrect "Saving latest checkpoint" warning (#3588)
* Fix incorrect "Saving latest checkpoint" warning

* Replace warning with info. Run PyCharm's optimize imports

* Remove unused class variable. Refactor logic. Improve test

* Fix De Morgan's
2020-09-25 14:18:06 +02:00
Jirka Borovec a25cb300d8
fix building nightly (#3642) 2020-09-25 08:15:06 -04:00
Antoine Broyelle 17c8c95fbc
Wrap prepare_data and setup only once inside DataModule (#3654)
Fix #3652
2020-09-25 07:09:50 -04:00
Carlos Mocholí 908382f196
Split GPUStatsMonitor function (#3644)
* Split function

* Add docstrings

* Add typing annotations

* Minor refactor

* Make static to add a test
2020-09-25 07:30:30 +02:00
William Falcon b5f0af182d
Update __init__.py 2020-09-24 21:55:59 -04:00
Jeff Yang a2120130ed
Lightning docker image based on base-cuda (#3637)
* use lightning CI docker

* exclude py3.8 and torch1.3

* torch 1.7

* mergify

* Apply suggestions from code review

Co-authored-by: Jirka Borovec <jirka@pytorchlightning.ai>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2020-09-24 23:14:15 +02:00
JackCaster 618eb913da
Update docstring for early_stop_callback default Trainer argument (#3641) 2020-09-24 22:15:01 +02:00
Jirka Borovec aa52c930f4
test examples (#3643)
* test examples

* testing

* testing

* typo

* req

* exception

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2020-09-24 17:33:11 +02:00
William Falcon c94c0a2b1e
fix examples (#3631)
* fix examples

* fix examples
2020-09-23 17:58:03 -04:00
ananthsub c61e1e697d
Add stronger typing to gradient accumulation scheduler callback (#3558)
* Update gradient_accumulation_scheduler.py

add types for gradient accumulation scheduler callback

* Update gradient_accumulation_scheduler.py
2020-09-23 20:22:10 +02:00
Adrian Wälchli 3affa0e49a
use tmpdir in tests when writing predictions to disk (#3561)
* save to tmpdir

* path
2020-09-23 07:44:15 -04:00
William Falcon 2db86381c6
Update README.md 2020-09-23 07:39:46 -04:00
William Falcon c0e26b8766
fix examples (#3623)
* fix examples

* fix examples
2020-09-23 07:36:51 -04:00
Jamie Morton a2574d7dd2
Adding clarifying documentation on the usage of second_order_closure (#3551)
* Adding clarifying documentation on the usage of second_order_closure

* oops typo

* making functions more sane

* fixing spacing issues - I think

* Apply suggestions from code review

* suggestions

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2020-09-23 10:43:10 +02:00