Commit Graph

62 Commits

Author SHA1 Message Date
Burhanuddin Rangwala b576201a3d
Added doc strings to wandb logger (#9109) 2021-08-26 16:01:42 +01:00
ananthsub 930b81f96c
Remove unused rank_zero_deprecation in WandB logger (#9034)
* Remove unused imports in WandB logger and corresponding test
2021-08-22 12:58:48 +01:00
Adrian Wälchli ad3f183bc3
convert warning cache usage to rank_zero_only in WandbLogger (#8764) 2021-08-20 10:39:25 +00:00
Carlos Mocholí a1264a6850
Automatic string fixes (#8886) 2021-08-13 14:28:14 +00:00
Adrian Wälchli 3ef8cd654d
Add warning when `wandb.run` already exists (#8714)
Co-authored-by: thomas chaton <thomas@grid.ai>
2021-08-10 10:14:48 +02:00
Adrian Wälchli 87093a3339
remove deprecated sync step argument from WandbLogger (#8763)
* remove deprecated sync step

* update chlog
2021-08-09 09:45:25 +02:00
Thien Tran 052aefc342
WandbLogger to log model topology by default (#8662)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-08-04 10:36:57 +00:00
Carlos Mocholí e63968ab88
Add `pyupgrade` to `pre-commit` (#8557)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-07-26 14:38:12 +02:00
Carlos Mocholí a64cc37394
Replace `yapf` with `black` (#7783)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-07-26 13:37:35 +02:00
Kaushik B f447839d16
Add `warning_cache.deprecation` and set warning stacklevel [1/2] (#8005)
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-06-18 11:50:24 +00:00
Boris Dayma 9097347ea8
feat(wandb): log models as artifacts (#6231)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-05-27 20:15:02 +02:00
Boris Dayma 2a20102321
fix(wandb): allow custom init args (#6989)
* feat(wandb): allow custom init args

* style: pep8

* fix: get dict args

* refactor: simplify init args

* test: test init args

* style: pep8

* docs: update CHANGELOG

* test: check default resume value

* fix: default value of anonymous

* fix: respect order of parameters

* feat: use look-up table for anonymous

* yapf formatting

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-05-04 09:45:36 +00:00
Tharindu Hasthika c502e47abf
Fixed setting of _save_dir when run initiated externally (#7106)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-04-23 01:14:46 +00:00
Boris Dayma 40d5a9d6df
fix(wandb): prevent WandbLogger from dropping values (#5931)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: chaton <thomas@grid.ai>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-02-27 01:52:23 +00:00
Kunal Mundada 4d96f19493
Document exceptions in loggers (#6171)
* Document exceptions in loggers

* minor formatting

* docstring changed in comet.py

* Apply suggestions from code review

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2021-02-25 21:08:32 +01:00
Eric Cousineau 4531b1c796
wandb: Fix example rendering for docs (#5905)
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-02-16 20:14:01 +01:00
Justus Schock da6dbc8d1d
PoC: Accelerator refactor (#5743)
* restoring the result from subprocess

* fix queue.get() order for results

* add missing "block_backward_sync" context manager

* add missing "block_backward_sync" context manager

* fix sync_batchnorm

* fix supported gpu-ids for tuple

* fix clip gradients and inf recursion

* accelerator selection: added cluster_environment plugin

* fix torchelastic test

* fix reduce early stopping decision for DDP

* fix tests: callbacks, conversion to lightning optimizer

* fix lightning optimizer does not pickle

* fix setting benchmark and deterministic option

* fix slurm amp test

* fix prepare_data test and determine node_rank

* fix retrieving last path when testing

* remove obsolete plugin argument

* fix test: test_trainer_config

* fix torchscript tests

* fix trainer.model access

* move properties

* fix test_transfer_batch_hook

* fix auto_select_gpus

* fix omegaconf test

* fix test that needs to simulate slurm ddp

* add horovod plugin

* fix test with named arguments

* clean up whitespace

* fix datamodules test

* remove old accelerators

* fix naming

* move old plugins

* move to plugins

* create precision subpackage

* create training_type subpackage

* fix all new import errors

* fix wrong arguments order passed to test

* fix LR finder

* Added sharded training type and amp plugin

* Move clip grad to precision plugin

* Added sharded spawn, select accelerators based on distributed_backend + enable custom fp16 plugin automatically

* Fix import issue, attempting to fix tests

* Fix initial test

* Reflect hook logic from master, should wrap model after move to device

* Optional state consolidation, since master has optimizers not wrapped

* change attribute for instance test

* reset optimizers

optimizers are not used in main process, so state would be wrong.

* legacy

* imports in accel

* legacy2

* trainer imports

* fix import errors after rebase

* move hook to new setup location

* provide unwrapping logic

* fix trainer callback system

* added ddp2 implementation

* fix imports .legacy

* move plugins

* restore legacy

* drop test.py from root

* add tpu accelerator and plugins

* fixes

* fix lightning optimizer merge

* reset bugreportmodel

* unwrapping

* step routing forward

* model access

* unwrap

* opt

* integrate distrib_type

* sync changes

* sync

* fixes

* add forgotten generators

* add missing logic

* update

* import

* missed imports

* import fixes

* isort

* mv f

* changelog

* format

* move helper to parallel plugin

* d

* add world size

* clean up

* duplicate

* activate ddp_sharded and tpu

* set nvidia flags

* remove unused colab var

* use_tpu <-> on_tpu attrs

* make some ddp_cpu and clusterplugin tests pass

* Ref/accelerator connector (#5742)

* final cleanup

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* connector cleanup

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* trainer cleanup

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* accelerator cleanup + missing logic in accelerator connector

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* add missing changes to callbacks

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* reflect accelerator changes to lightning module

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* clean cluster envs

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* cleanup plugins

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* add broadcasting

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* yapf

* remove plugin connector

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* plugins

* manual optimization

* update optimizer routing

* add rank to torchelastic

* fix memory mixed precision

* setstate on trainer for pickling in ddp spawn

* add predict method

* add back commented accelerator code

* adapt test for sync_batch_norm to new plugin

* fix deprecated tests

* fix ddp cpu choice when no num_processes are given

* yapf format

* skip a memory test that cannot pass anymore

* fix pickle error in spawn plugin

* x

* avoid

* x

* fix cyclic import in docs build

* add support for sharded

* update typing

* add sharded and sharded_spawn to distributed types

* make unwrap model default

* refactor LightningShardedDataParallel similar to LightningDistributedDataParallel

* update sharded spawn to reflect changes

* update sharded to reflect changes

* Merge 1.1.5 changes

* fix merge

* fix merge

* yapf isort

* fix merge

* yapf isort

* fix indentation in test

* copy over reinit scheduler implementation from dev1.2

* fix apex tracking calls with dev_debugger

* reduce diff to dev1.2, clean up

* fix trainer config test  when gpus>0 and num_processes >0 and ddp_cpu

* sort plugin tests legacy/new

* fix error handling for amp on cpu

* fix merge


fix merge


fix merge

* [Feat] Resolve manual_backward (#5837)

* resolve manual_backward

* resolve flake8

* update

* resolve for ddp_spawn

* resolve flake8

* resolve flake8

* resolve flake8

Co-authored-by: Ubuntu <ubuntu@ip-172-31-88-60.ec2.internal>

* fix tests/accelerator tests on cpu

* [BugFix] Resolve manual optimization (#5852)

* resolve manual_optimization

* update

* update

Co-authored-by: Ubuntu <ubuntu@ip-172-31-88-60.ec2.internal>

* Remove copy trainer parameters to happen earlier within the loop and add safe guard to get ref model (#5856)

* resovle a bug

* Accelerator refactor sharded rpc (#5854)

* rpc branch

* merge

* update handling of rpc

* make devices etc. Optional in RPC

* set devices etc. later if necessary

* remove devices from sequential

* make devices optional in rpc

* fix import

* uncomment everything

* fix cluster selection

Co-authored-by: Ubuntu <ubuntu@ip-172-31-88-60.ec2.internal>

* resolve bug

* fix assert in rpc test

* resolve a test

* fix docs compilation

* accelerator refactor - fix for sharded parity test (#5866)

* fix memory issue with ddp_spawn

* x


x


x


x


x


x


x


x


x

* x

* Remove DDP2 as this does not apply

* Add missing pre optimizer hook to ensure lambda closure is called

* fix apex docstring

* [accelerator][BugFix] Resolve some test for 1 gpu (#5863)

* update

* revert init

* resolve a bug

* update

* resolve flake8

* update

* update

* update

* revert init

* resolve a bug

* update

* resolve flake8

* update

* update

* update

* update

* update

* revert init

* resolve a bug

* update

* resolve flake8

* update

* update

* update

* revert init

* update

* resolve flake8

* update

* update

* update

* update

* update

* all_gather

* update

* make plugins work, add misconfig for RPC

* update

* update

* remove breaking test

* resolve some tests

* resolve flake8

* revert to ddp_spawn

Co-authored-by: root <root@ip-172-31-88-60.ec2.internal>
Co-authored-by: Ubuntu <ubuntu@ip-172-31-88-60.ec2.internal>
Co-authored-by: Justus Schock <justus.schock@rwth-aachen.de>

* yapf isort

* resolve flake8

* fix apex doctests

* fix apex doctests 2

* resolve docs

* update drone

* clean env

* update

* update

* update

* update

* merge

* Fix RPC related tests, clean out old API, update for new accelerator API [skip ci] (#5881)

* Fix RPC related tests, clean out old API, update for new accelerator API

* Move tests out of legacy folder, update paths and names

* Update test_remove_1-4.py

* Expose properties for tpu cores/gpus/num_gpus

* Add root GPU property

* Move properties to properties.py

* move tests that were previously in drone

* Fix root GPU property (#5908)

* Move root GPU to property, remove horovod set as this is handled in horovod plugin, ensure we mock correctly to set GPU accelerator

* Add missing tests back

* fix best model path transfer when no checkpoint callback available

* Fix setup hook order [wip] (#5858)

* Call trainer setup hook before accelerator setup

* Add test case

* add new test

* typo

* fix callback order in test

Co-authored-by: tchaton <thomas@grid.ai>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* rename ddp sequential -> rpc sequential for special test

* revert

* fix stupid merge problem

* Use property in connector for sampler (#5913)

* merge the import conflicts

* fix spawning of processes in slurm

* [wip] Fix some bugs for TPU [skip ci] (#5878)

* fixed for single tpu

* fixed spawn

* fixed spawn

* update

* update

* wip

* resolve bugs

* resolve bug

* update on comment

* removed decorator

* resolve comments

* set to 4

* update

* update

* need cleaning

* update

* update

* update

* resolve flake8

* resolve bugs

* exclude broadcast

* resolve bugs

* change test

* update

* update

* skip if meet fails

* properly raise trace

* update

* add catch

* wrap test

* resolve typo

* update

* typo

Co-authored-by: Lezwon Castelino <lezwon@gmail.com>
Co-authored-by: Your Name <you@example.com>

* resolve some tests

* update

* fix imports

* update

* resolve flake8

* update azure pipeline

* skip a sharded test on cpu that requires a gpu

* resolve tpus

* resolve bug

* resolve flake8

* update

* updat utils

* revert permission change on files

* suggestions from carlos

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* remove unrelated formatting changes

* remove incomplete comment

* Update pytorch_lightning/accelerators/__init__.py

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* remove unrelated formatting change

* add types

* warn 1.7 ddp manual backward only if ddp kwarg unset

* yapf + isort

* pep8 unused imports

* fix cyclic import in docs

* Apply suggestions from code review

* typer in accelerator.py

* typo

* Apply suggestions from code review

* formatting

* update on comments

* update typo

* Update pytorch_lightning/trainer/properties.py

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* update

* suggestion from code review

* suggestion from code review

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: SeanNaren <sean@grid.ai>
Co-authored-by: Jirka Borovec <jirka.borovec@seznam.cz>
Co-authored-by: chaton <thomas@grid.ai>
Co-authored-by: Ubuntu <ubuntu@ip-172-31-88-60.ec2.internal>
Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
Co-authored-by: root <root@ip-172-31-88-60.ec2.internal>
Co-authored-by: Lezwon Castelino <lezwon@gmail.com>
Co-authored-by: Your Name <you@example.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2021-02-12 15:48:56 -05:00
Jirka Borovec 79d42d83e7
formatting 3/n: PL modules (#5716)
* cb

* log

* prof

* tune

* flake8
2021-02-08 14:28:38 -05:00
Rohit Gupta 2abf4693bc Fix log_dir property (#5537)
* fix and update tests

* update with ModelCheckpoint

* chlog

* wip wandb fix

* all fixed

Co-authored-by: chaton <thomas@grid.ai>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2021-02-05 21:40:42 +01:00
Jirka Borovec 7b30133a82
flake8 & isort (#5647) 2021-01-25 14:31:38 -05:00
Boris Dayma f0fafa2be0
feat(wandb): add sync_step (#5351)
* docs(wandb): add details to args

* feat(wandb): no sync between trainer and W&B steps

* style: pep8

* tests(wandb): test sync_step

* docs(wandb): add references

* docs(wandb): fix typo

* feat(wandb): more explicit warning

* feat(wandb): order of args

* style: Apply suggestions from code review

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* style: long line

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
2021-01-24 17:44:09 -05:00
Justus Schock ef7345dc4e
add possibility for nested loaders (#5404)
* add possibility for nested loaders

* pep8: newline
2021-01-24 07:32:02 -05:00
Arnaud Gelas 8629048659
Fix isort failures in loggers (#5527)
Remove from skipped module in pyproject.toml and fix failures on:
- pytorch_lightning/loggers/*.py
2021-01-15 22:53:56 +05:30
Jirka Borovec 9610ea817b refactor imports of logger dependencies (#4860)
* refactor imports of logger dependencies

* fix

* fix

* fix

* name

* fix

* mocks

* fix tests

* fix mlflow

* fix test tube

* fix wandb import check

* whitespace

* name

* name

* hack

* hack

* rev

* fix

* update mlflow import check

* try without installing conda dep

* .

* .

* .

* .

* .

* .

* .

* .

* .

Co-authored-by: Adrian Wälchli <adrian.waelchli@inf.unibe.ch>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

(cherry picked from commit ec0fb7a3ec)
2021-01-06 15:16:06 +01:00
Jirka Borovec 74d0652164 flake8 ++ 2021-01-05 09:58:37 +01:00
Boris Dayma dcd29aef06 feat(wandb): offset logging step when resuming (#5050)
* feat(wandb): offset logging step when resuming

* feat(wandb): output warnings

* fix(wandb): allow step to be None

* test(wandb): update tests

* feat(wandb): display warning only once

* style: fix PEP issues

* tests(wandb): fix tests

* tests(wandb): improve test

* style: fix whitespace

* feat: improve warning

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* feat(wandb): use variable from class instance

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* tests(wandb): check warnings

* feat(wandb): use WarningCache

* tests(wandb): fix tests

* style: fix formatting

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2021-01-05 09:58:37 +01:00
Haswanth Aekula ac996fb008 Fixed docs for WandbLogger (#5128)
Fixed a small bug with the `WandbLogger` docs.

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Roger Shieh <sh.rog@protonmail.ch>
2021-01-05 09:58:37 +01:00
Jirka Borovec 0f36525e8f
fix/enable - check F401 (#5201)
* refactor - check F401

* missed

* fix
2020-12-21 10:15:04 +01:00
Rohit Gupta ef762a0d2a
update logging docs and decorators (#4431)
* update logging docs

* experiment

* add decorators to base and csv logger methods

* fix

* doc fix

* update docs

* update docs

* Update pytorch_lightning/loggers/base.py

Co-authored-by: chaton <thomas@grid.ai>
2020-12-01 11:35:00 +05:30
Boris Dayma c586e5db77
feat(wandb): let wandb cli handle runs (#4648)
* feat(wandb): reinit handled by CLI

* fix: typo

* docs(wandb): improve formatting

* test(wandb): set wandb.run to None

* test(wandb): fix tests

* style: fix formatting

* docs(wandb): fix documentation

* Update code markup

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* docs(wandb): update CHANGELOG

* test(wandb): init called only when needed

* Update CHANGELOG.md

* try fix the test

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: chaton <thomas@grid.ai>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: edenlightning <66261195+edenlightning@users.noreply.github.com>
Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
Co-authored-by: rohitgr7 <rohitgr1998@gmail.com>
2020-11-24 01:31:28 +05:30
Rohit Gupta 2d9d7e4daa
Add prefix argument in loggers (#4557)
* Add prefix parameter in loggers

* chlog

* pep

* patch test

* remove args, access via self

* try fix the test

* try fix the test

* try fix the test

* prefix test

* fix assert has calls


fix assert call

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2020-11-22 06:38:58 +01:00
Jeff Yang baa8558cc0
logger docs and api docs (#3950)
* logger and api docs

* remove gpu_usage_logger, lr_logger

* update docstring

* fix wandb example

* remove step result

* charts

* add some charts info

Co-authored-by: Teddy Koker <teddy.koker@gmail.com>
Co-authored-by: Nicki Skafte <skaftenicki@gmail.com>
2020-11-13 20:35:54 +05:30
Boris Dayma ff41d80706
feat(wandb): log in sync with Trainer step (#4405)
* feat(wandb): log in sync with Trainer step

* docs: update CHANGELOG

* style(test_wandb): fix formatting

* parentheses

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2020-10-29 01:07:06 +05:30
chaton f07ee33db6
BUG - Wandb: Sanitize callable. (#4320)
* add _sanitize_callable_params

* add call on _val if callable

* clean code formatter

* resolve pep8

* default return function name

* resolve pep8

* Apply suggestions from code review

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* Update CHANGELOG.md

Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2020-10-26 11:57:03 +00:00
Adrian Wälchli 376268f01e
Implement finalize for WandbLogger (#4341)
* wandb finish

* experiment

* upload at end of run

* changelog

* comment

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
2020-10-26 11:22:09 +00:00
Adrian Wälchli 3ff5327e83
Mocking loggers (part 1, wandb) (#3596)
* mocking for wandb

* remove wandb import in amp test

* mock loggers in sphinx

* check tests

* Update extra.txt

* setup

* dev

* min

* revert

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Jirka Borovec <jirka@pytorchlightning.ai>
2020-09-25 16:00:02 +02:00
Rohit Gupta 07b857769a
Allow kwargs in Wandb & Neptune + kwargs docstring (#3475)
* Allow kwargs in WandbLogger

* isort

* kwargs docstring

* typo

* kwargs for other loggers

* pep and isort

* formatting

* fix failing test

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2020-09-19 18:51:43 +02:00
William Falcon f43028f3ae
added copyright notices (#3062) 2020-08-19 22:03:22 -04:00
Adrian Wälchli f16b4cfc52
save_dir fix for MLflowLogger + save_dir tests for others (#2502)
* mlflow rework

* logger save_dir

* folder

* mlflow

* simplify

* fix test

* add a test for file dir contents

* new line

* changelog

* docs

* Update CHANGELOG.md

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* test for comet logger

* improve mlflow checkpoint test

* prevent  commet logger error on pytest exit

* test tensorboard save dir structure

* wandb save dir test

* skip test on windows

* add mlflow to pickle tests

* wandb

* code factor

* remove unused imports

* remove unused setter

* wandb mock

* wip mock

* wip mock

* wandb tests with mocking

* clean up

* clean up

* comments

* include wandblogger in test

* clean up

* missing argument

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2020-07-09 07:15:41 -04:00
Anthony Bisulco 899cd74044
flatten Wandb hyperparameters dict (#2459)
* wandb logging fix

* Changelog fix

* change test
2020-07-08 07:45:25 +02:00
Adrian Wälchli 145670f893
fix logging on rank 0 only (#2425)
* fix and test for ddp block logging rank > 0

* rename

* use the dummy logger

* dummy logger test

* set the logger in  model

* decorator for rank zero experiment

* simplify check

* simplify

* fix problem with None in checkpoint path

* revert configure logger

* unused import

* offline

* try rank 0 decorator in checkpoint

* try fix test

* imgs

* add asserts to make sure log zero only saves checkpoints

* add asserts to make sure log zero only saves checkpoints

* add asserts to make sure log zero only saves checkpoints

* add asserts to make sure log zero only saves checkpoints

* add asserts to make sure log zero only saves checkpoints

* fix tpu tests

* fix tpu tests

Co-authored-by: William Falcon <waf2107@columbia.edu>
2020-06-30 18:09:16 -04:00
Adrian Wälchli 25ee51bc57
Continue Jeremy's early stopping PR #1504 (#2391)
* add state_dict for early stopping

* move best attr after monitor_op defined

* improve early stopping and model checkpoint callbacks

* fix formatting

* fix attr init order

* clean up setting of default_root_dir attr

* logger needs default root dir set first

* reorg trainer init

* remove direct references to checkpoint callback

* more fixes

* more bugfixes

* run callbacks at epoch end

* update tests to use on epoch end

* PR cleanup

* address failing tests

* refactor for homogeneity

* fix merge conflict

* separate tests

* tests for early stopping bug regressions

* small fixes

* revert model checkpoint change

* typo fix

* fix tests

* update train loop

* cannot pass an int as default_save_path

* refactor log message

* fix test case

* appease the linter

* fix some doctests

* move config to callback

* fixes from rebase

* fixes from rebase

* chlog

* docs

* reformat

* formatting

* fix

* fix

* fixes from rebase

* add new test for patience

* Update pytorch_lightning/callbacks/model_checkpoint.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* Update pytorch_lightning/callbacks/model_checkpoint.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* Update tests/callbacks/test_early_stopping.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* fix formatting

* remove enable_early_stop attribute

* add state_dict for early stopping

* move best attr after monitor_op defined

* improve early stopping and model checkpoint callbacks

* fix formatting

* fix attr init order

* clean up setting of default_root_dir attr

* logger needs default root dir set first

* reorg trainer init

* remove direct references to checkpoint callback

* more fixes

* more bugfixes

* run callbacks at epoch end

* update tests to use on epoch end

* PR cleanup

* address failing tests

* refactor for homogeneity

* fix merge conflict

* separate tests

* tests for early stopping bug regressions

* small fixes

* revert model checkpoint change

* typo fix

* fix tests

* update train loop

* fix test case

* appease the linter

* fix some doctests

* move config to callback

* fixes from rebase

* fixes from rebase

* chlog

* docs

* reformat

* formatting

* fix

* fix

* fixes from rebase

* add new test for patience

* Update pytorch_lightning/callbacks/model_checkpoint.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* Update pytorch_lightning/callbacks/model_checkpoint.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* Update tests/callbacks/test_early_stopping.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* fix formatting

* remove enable_early_stop attribute

* fix test with new epoch indexing

* fix progress bar totals

* fix off by one error (see #2289) epoch starts at 0 now

* added missing imports

* fix hpc_save folderpath

* fix formatting

* fix tests

* small fixes from a rebase

* fix

* tmpdir

* tmpdir

* tmpdir

* wandb

* fix merge conflict

* add back evaluation after training

* test_resume_early_stopping_from_checkpoint TODO

* undo the horovod check

* update changelog

* remove a duplicate test from merge error

* try fix dp_resume test

* add the logger fix from master

* try remove default_root_dir

* try mocking numpy

* try import numpy in docs test

* fix wandb test

* pep 8 fix

* skip if no amp

* dont mock when doctesting

* install extra

* fix the resume ES test

* undo conf.py changes

* revert remove comet pickle from test

* Update CHANGELOG.md

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* Update weights_loading.rst

* Update weights_loading.rst

* Update weights_loading.rst

* renamed flag

* renamed flag

* revert the None check in logger experiment name/version

* add the old comments

* _experiment

* test chckpointing on DDP

* skip the ddp test on windows

* cloudpickle

* renamed flag

* renamed flag

* parentheses for clarity

* apply suggestion max epochs

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

Co-authored-by: Jeremy Jordan <jtjordan@ncsu.edu>
Co-authored-by: Jirka <jirka@pytorchlightning.ai>
Co-authored-by: Jeremy Jordan <13970565+jeremyjordan@users.noreply.github.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: William Falcon <waf2107@columbia.edu>
2020-06-28 21:36:46 -04:00
Boris Dayma 00f1ac11e6
fix(wandb): use same logger on multiple training loops (#2055)
* fix(wandb): use same logger on multiple training loops

New training loops reset step to 0 which would previously try to overwrite logs

fix #2015

* docs(changelog.md): add reference to PR 2055
2020-06-02 18:46:02 -04:00
Justus Schock 6456247287
Re-Enable Import Errors (#1938)
* update logger imports

* pep8 fixes

* pep8
2020-05-25 07:31:35 -04:00
Anthony Bisulco 76af84718a
Group argument wandb (#1760)
* group argument wandb

* formatting fix
2020-05-10 13:15:51 -04:00
Oliver Neumann 152a2eb30c
wandb logger 'global_step' affects other logger (#1492)
* Removed unnecessary 'global_step' from wandb logger.

* Fixed wrong step implementation in wandb and missing metric skipping in logger base.

* simplified metric check in base logger

* Added Fix Description in CHANGELOG.md

* Updated wandb logger tests.

* udpate test, step=3

* Moved Fix Description in CHANGELOG.md to unreleased.

* Update CHANGELOG.md

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2020-05-02 08:50:47 -04:00
Jirka Borovec 58a467dd68
model checkpint on rank_zero_only & global rank state (#1408)
* try delete in async or DDP us0-ecase

* changelog

* add model chekpoint rank

* simple delete

* flake8

* use global rank

* chnagelog

* fix review

* fix import

* proposal

* proposal

* proposal

* improve proposal (fix problems with method call self)

* cleaning

Co-authored-by: Adrian Wälchli <adrian.waelchli@students.unibe.ch>
Co-authored-by: William Falcon <waf2107@columbia.edu>
2020-04-24 17:21:00 -04:00
Boris Dayma f3d139e90f
fix(wandb): allow use of sweeps (#1512)
* fix(wandb): allow use of sweeps

overwrite run config parameters due to precision error

fix #1290

* docs(wandb): update changelog

* test(wandb): update config test

Co-authored-by: William Falcon <waf2107@columbia.edu>
2020-04-24 10:29:24 -04:00
Adrian Wälchli 6e1d72d98a
Improved docs for Loggers (#1484)
* improve __init__

* improve logger base

* improve comet logger docs

* improved docs for mlflow

* improved nepune logger docs

* fix matplotlib import issue

* improve tensorboard docs

* improve docs for test tube

* improved trains logger docs

* improve wandb logger docs

* improved docs in experiment_logging.rst

* added MLflow to the list of loggers

* fix too long lines

* fix trains doctest

* fix neptune doctest

* fix mlflow doctest

* Apply suggestions from code review

Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com>

* Apply suggestions from code review

* fix whitespace

* try bypass mode for neptune (fix doctest api key error)

* try "test" as api key

* Revert "try "test" as api key"

This reverts commit fd77db26d551f08b4b4a12bb93cbd8f7a0814f29.

* try test as api key

* update neptune docs

* bump neptune minimal version

* revert unnecessary bypass code

* test if CI runs doctests in .rst files

* Revert "test if CI runs doctests in .rst files"

This reverts commit a45aeb460a8c4b7445a35dd7b49265f48d11c485.

* add doctest directive

* neptune demo links

* added tutorial link for W&B

* fix line too long

* fix merge error

* fix merge error

* add instructions how to install loggers

* add instructions how to install the loggers

* hide _abc_impl property from docs

* review Borda, 4 spaces

* indentation in example sections

* blank

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2020-04-16 12:04:12 -04:00
Jirka Borovec b3fe17ddeb
fix flushing loggers (#1459)
* flushing loggers

* flushing loggers

* flushing loggers

* flushing loggers

* changelog

* typo

* fix trains

* optimize imports

* add logger test all

* add logger test pickle

* flake8

* fix benchmark

* hanging loggers

* try

* del

* all

* cleaning
2020-04-14 20:32:33 -04:00