Commit Graph

752 Commits

Author SHA1 Message Date
Jirka Borovec f72e354ee6 fixing TensorBoard (#687)
* flake8

* fix typo

* fix tensorboardlogger
drop test_tube dependence

* formatting

* fix tensorboard & tests

* upgrade Tensorboard

* test formatting separately

* try to fix JIT issue

* add tests for 1.4
2020-01-16 07:22:29 -05:00
William Falcon 88b750a018
default logger is now tensorboard (#609)
* refactor

* refactor

* refactor

* made tensorboard the default not test-tube
2020-01-14 14:40:41 -05:00
MartinPernus 3002bd3df5 log named parameters (#660) 2020-01-13 22:54:06 -05:00
Vadim Bereznyuk 756c70a4a0 Clearer disable validation logic (#650)
* Clearer disable validation logic

* fix for fast_dev_run

* flake8 fix

* Test check fix

* update error message
2020-01-13 22:31:15 -05:00
Boris Dayma ec7fc97857 Feature: wandb logger (#627)
* Basic wandb support

* refactor(wandb): remove unused variables and document logger

* docs(wandb): explain how to use WandbLogger

* test(wandb): add tests for WandbLogger

* feat(wandb): add save_dir

* fix(wandb): allow pickle of logger

* fix(wandb): save logs in custom directory

* test(wandb): test import

* docs(wandb): simplify docstring and use doctest

* test: increase number of epochs for satisfactory accuracy

* test(test_load_model_from_checkpoint): ensure we load last checkpoint

Co-authored-by: Chris Van Pelt <vanpelt@wandb.com>
Co-authored-by: William Falcon <waf2107@columbia.edu>
2020-01-13 22:25:27 -05:00
Jirka Borovec f7db44e750 fix deprecated tng and abstract ligntning (#644) 2020-01-13 22:20:38 -05:00
Jakub 8dc8a8bfd3 Neptune integration (#648)
* added neptune integration

* added tests for NeptuneLogger, added neptune to docs

* updated link to neptune support

* fixed docstrings, fixed try/except in tests, changed append_tags input

* fixed docstrings line lenght

* bumped epoch nr in model restore tests

* added tags support for single strings

* fixed passing neptune token to backend

* fixed project name in offline mode

* added save_top_k=-1 to checkpoint callback

* reformated initialization of neptune in online mode

* bumped epoch nr to 4 in test_load_model_from_checkpoint

* bumped epoch nr to 5

Co-authored-by: William Falcon <waf2107@columbia.edu>
2020-01-13 22:20:01 -05:00
Ayberk Aydın 0ae3dd9ed4 Fix GAN training. (#603)
* fix dangling gradients

make sure only the gradients of the current optimizer's paramaters are calculated in the training step.

* add note about multiple optimizer gradient update

* Update training_loop.py
2020-01-13 22:12:04 -05:00
Ayla Khan 1969c6cc2a Remove extraneous f character from f-string. (#679)
Makes tracking experiment names confusion, especially when using uuids.
2020-01-13 22:11:04 -05:00
Jirka Borovec db6b404748 CI pass (#671)
* fix pillow in test

* test acc

* update version in deprecated msg
2020-01-13 22:09:47 -05:00
Vadim Bereznyuk 12edc3099c Fix the number of training batches used in the training loop (#653)
* Fix the number of processed training batches

* Fix tests

* fix tests

* fix tests

* One more attempt

* Fix another test
2020-01-05 14:37:09 -05:00
Vadim Bereznyuk 7824b5c5f5 Fix percent_checks (#649)
* fix percent_checks

* Added _percent_range_check

* remove max
2020-01-05 14:36:06 -05:00
Hao Sheng ca73b70d15 fix of issue 600 (#625) 2019-12-14 20:24:46 -08:00
Jeremy Jordan 3dd0b8c186 fix metric name to work with default earlystopping (#628) 2019-12-14 20:23:44 -08:00
Jay Morgan d1633aac11 Fix #618 Change papi to api (#619)
* Change papi to api

* Added try catch for old/new api reference
2019-12-10 16:24:21 -08:00
Adrian Wälchli e2ee4ddbdb Fix early stopping off by 2 (min_epochs) (#617)
* fix early stopping off by 2

* add min_epochs example in docs
2019-12-09 10:32:49 -08:00
VSJMilewski d562172b4c Allow for multiple example inputs when creating summary (#543) 2019-12-09 04:42:07 -08:00
Elliot Waite b492e2b89e Change nb to num in ABCs, comments, and tqdm logging (#613)
* Change nb to num in ABCs, comments, and tqdm logging

* Fix warnings text

* Make warnings one line

* Change num to number in comments
2019-12-09 04:40:26 -08:00
Jirka Borovec 5d00e62047 Fix logger, tensorboard (#610)
* fix logger tests

* fix missing flush

* fix tensorboard

* fix namespace

* fix flush

* fix add_hparams
2019-12-08 07:59:25 -08:00
Nic Eggert 5329c72cb0 Implement TensorboardLogger (#607)
* Implement TensorboardLogger

* Pass default_save_path to trainers

* Update tensorboard.py
2019-12-07 23:25:37 -05:00
Nic Eggert 2baa80d626 Make sure train doesn't crash when called at max_epoch (#608) 2019-12-07 23:22:03 -05:00
Jirka Borovec 4970624f8b fix Logger tests for Win (#605)
* fix mlflow test

* fix mlflow test

* update logger / mlflow

* flake8

* fix appveyor
2019-12-07 19:25:12 -05:00
ctlaltdefeat 58cc6e13b9 Update logging.py (#602) 2019-12-07 10:12:33 -05:00
schwobr 2f01c03b38 Additional hooks (#598)
* Renamed `on_sanity_check_start` to `on_train_start` and added `on_train_end` to `ModelHooks`

* changed tests to use `on_train_start` instead of `on_sanity_check_start`
2019-12-07 08:52:06 -05:00
Elliot Waite 1051c189e1 Simplify variables: step, epoch, max_epochs, min_epochs (#589) 2019-12-07 08:50:21 -05:00
Adrian Wälchli f7e1040236 Docs and Tests for "gpus" Trainer Argument (#593)
* add table for gpus argument

* fix typo in error message

* tests for supported values

* tests for unsupported values

* fix typo

* add table for gpus argument

* fix typo in error message

* tests for supported values

* tests for unsupported values

* fix typo

* fix typo list->str

* fix travis warning "line too long"
2019-12-07 08:48:45 -05:00
YehCF cc65f39d97 Fix number of total steps shown in progress bar during sanity validation check when number of validation dataloaders >= 2 (#597)
* type: debug

Calculate the adequate number of steps to run during sanity_check.
This fixes the bug when there are two or more validation dataloaders.

- Before: total=self.num_sanity_val_steps
- After: total=self.num_sanity_val_steps*len(self.get_val_dataloaders())

* type: refactor

Put total=... in the next line

* type: refactor

run flake8
2019-12-07 08:47:59 -05:00
Nic Eggert 0489e31b02 Fix CometML tests (#585)
* monkeypatch atexit.register to fix problem with cometml logging

* Use experiment id for version in cometml
2019-12-07 00:24:59 -05:00
Jirka Borovec 1d4b6be17b rename trainer modules, drop `_mixin` (#571)
* rename trainer modules, drop _mixin

* fix imports
2019-12-04 11:39:14 -05:00
Jirka Borovec e0dbc8ab46 Abstract Mixin classes (#572)
* make partial Trainer classes as abstract

* add empty attributes/methods

* flake8

* fix mixin order

* update abstact

* reorder
2019-12-04 10:57:32 -05:00
Adrian Wälchli 218f0a5b4a inspect training_step for opt_idx (#573) 2019-12-04 07:32:47 -05:00
Ir1dXD c316173e89 use print for INFO and lower levels summarize() (#580)
* use print for INFO and lower levels summarize()

* use logging.INFO instead of magic number

* bring logging.info back for other cases

* move logging config to __init__.py

* prepend the model summary with a newline
2019-12-04 07:05:34 -05:00
Ir1dXD d4571d1d6f filter param with no grad (#579) 2019-12-04 07:04:58 -05:00
Dang Nguyen Anh Khoa b5b77e44b1 fix logging error (#575)
* fix logging error

* no need for the '+' sign

* move space to beginning of next line
2019-12-04 07:04:14 -05:00
Jirka Borovec ab4fea0b55 fix defecation warnings (#570)
* fix defecation warnings

* flake8

* update deprecations
2019-12-04 06:59:19 -05:00
Jirka Borovec 3a58937d8b rename variables nb -> num (#567)
* rename nb -> num

* flake8

* batch_nb, epoch_nb, gpu_nb, split_nb

* add _num deprecations
2019-12-04 06:57:10 -05:00
Mary Trofimova a6d64ac013 Support torch.optim.lr_scheduler.ReduceLROnPlateau (#320)
* feat: add reducelronplateau callback

* feat: use reducelronplateau callback in trainer

* feat: only on unsupported lr schedulers

* feat: last but not the least merge of master

* feat: merge master

* feat: support only on scheduler in reduceLrOnPlateauScheduler

* refactor: code style

* Update pt_callbacks.py

* Update trainer.py

* Update train_loop_mixin.py

* Update trainer.py

* Update train_loop_mixin.py
2019-12-03 07:59:41 -05:00
Yongrae Jo 2b8475f590 Add resuming from specific checkpoint (#516)
* Add resume_from_checkpoint

* Fix variable name

* #515 Remove did_restore

* #515 Simplify code

* #515 Update doc for resume_from_checkpoint

* #515 Add on_gpu
2019-11-30 16:48:38 -05:00
Pariente Manuel df7b6d958e Correct behavior for argument gpus in Trainer (#561) 2019-11-30 14:50:50 -05:00
williamFalcon db0587f158 fixed tests 2019-11-28 16:02:36 -08:00
William Falcon 29122e4308
Dp default (#560)
* set auto dp if no backend

* fix imagenet example

* run flake8 first to fail build on syntax first
2019-11-28 18:14:08 -05:00
Jirka Borovec d71556e7a1 Sphinx generated documentation (#521)
* upgrade req.

* move MkDocs

* create Sphinx

* init Sphinx

* move md from MkDocs to Sphinx

* CI: build docs

* build Sphinx

formatting

move docs from MD to docstring in particular package/modules

formatting

add Sphinx ext.

rename root_module to core

drop implicit name "_logger"

drop duplicate name "overwrite"

fix imports

use pytorch theme

add sample link mapping

try fix RTD build

use forked template

fix some docs warnings

fix paths

add deprecation warnings

fix flake8

fix paths

revert refactor

revert MLFlowLogger

* revert example import

* update link

* Update lightning_module_template.py
2019-11-28 12:48:55 -05:00
Jirka Borovec 47659daa5f speed-up testing (#504)
* extend CI timeout

* add short MNIST

* lower dataset and stop thr

* refactor imports

* formatting

* early stop

* play params

* play params

* minor refactoring

# Conflicts:
#	pytorch_lightning/testing/__init__.py
#	pytorch_lightning/testing/lm_test_module.py
#	pytorch_lightning/testing/lm_test_module_base.py
#	pytorch_lightning/testing/lm_test_module_mixins.py
#	pytorch_lightning/testing/model.py
#	pytorch_lightning/testing/model_base.py
#	pytorch_lightning/testing/model_mixins.py
#	pytorch_lightning/testing/test_module.py
#	pytorch_lightning/testing/test_module_base.py
#	pytorch_lightning/testing/test_module_mixins.py

* typo

Co-Authored-By: Ir1dXD <sirius.caffrey@gmail.com>

* Revert "refactor imports"

This reverts commit b86aee92

* update imports
2019-11-28 12:06:05 -05:00
Jirka Borovec 9785a3e78e Refactor: name modules (#548)
* refactor: rename some modules

* add deprecation warnings

* fix paths
2019-11-26 22:39:18 -05:00
Anton Bakhtin fea7cc87f6 Move model to cuda before creating optimizer (#554) 2019-11-26 22:35:38 -05:00
Jirka Borovec f2191b0cdf fix for pyTorch 1.2 (#549)
* min pytorch 1.2

* fix IterableDataset

* upgrade torchvision

* fix msg
2019-11-26 10:58:50 -05:00
MikeScarp 55f3ffd7c7 fixing bug in testing for IterableDataset (#547) 2019-11-26 04:59:20 -05:00
Tullie Murrell 48b797fdb0 Copy batch for local forward (#532) 2019-11-23 04:04:40 -05:00
Tullie Murrell 55edf7c922 Remove unneeded filename print (#540) 2019-11-23 04:00:39 -05:00
Tanel Alumäe 539d7bcb44 Avoid race condition in creating checkpoint directories (#530)
* Avoid race condition in creating checkpoint directories

In multi-GPU training, several processes run the code that creates checkpoint dirs. This fix avoids a probably rare situation (but it happened to me) where another process created a dir between the `exists` check and the `makedirs` call.

* Remove the now unneeded check for dir existence
2019-11-21 13:27:39 -05:00