Commit Graph

81 Commits

Author SHA1 Message Date
William Falcon 88b750a018
default logger is now tensorboard (#609)
* refactor

* refactor

* refactor

* made tensorboard the default not test-tube
2020-01-14 14:40:41 -05:00
Vadim Bereznyuk 756c70a4a0 Clearer disable validation logic (#650)
* Clearer disable validation logic

* fix for fast_dev_run

* flake8 fix

* Test check fix

* update error message
2020-01-13 22:31:15 -05:00
Jirka Borovec f7db44e750 fix deprecated tng and abstract ligntning (#644) 2020-01-13 22:20:38 -05:00
Elliot Waite b492e2b89e Change nb to num in ABCs, comments, and tqdm logging (#613)
* Change nb to num in ABCs, comments, and tqdm logging

* Fix warnings text

* Make warnings one line

* Change num to number in comments
2019-12-09 04:40:26 -08:00
schwobr 2f01c03b38 Additional hooks (#598)
* Renamed `on_sanity_check_start` to `on_train_start` and added `on_train_end` to `ModelHooks`

* changed tests to use `on_train_start` instead of `on_sanity_check_start`
2019-12-07 08:52:06 -05:00
Elliot Waite 1051c189e1 Simplify variables: step, epoch, max_epochs, min_epochs (#589) 2019-12-07 08:50:21 -05:00
YehCF cc65f39d97 Fix number of total steps shown in progress bar during sanity validation check when number of validation dataloaders >= 2 (#597)
* type: debug

Calculate the adequate number of steps to run during sanity_check.
This fixes the bug when there are two or more validation dataloaders.

- Before: total=self.num_sanity_val_steps
- After: total=self.num_sanity_val_steps*len(self.get_val_dataloaders())

* type: refactor

Put total=... in the next line

* type: refactor

run flake8
2019-12-07 08:47:59 -05:00
Jirka Borovec 1d4b6be17b rename trainer modules, drop `_mixin` (#571)
* rename trainer modules, drop _mixin

* fix imports
2019-12-04 11:39:14 -05:00
Jirka Borovec e0dbc8ab46 Abstract Mixin classes (#572)
* make partial Trainer classes as abstract

* add empty attributes/methods

* flake8

* fix mixin order

* update abstact

* reorder
2019-12-04 10:57:32 -05:00
Ir1dXD c316173e89 use print for INFO and lower levels summarize() (#580)
* use print for INFO and lower levels summarize()

* use logging.INFO instead of magic number

* bring logging.info back for other cases

* move logging config to __init__.py

* prepend the model summary with a newline
2019-12-04 07:05:34 -05:00
Jirka Borovec ab4fea0b55 fix defecation warnings (#570)
* fix defecation warnings

* flake8

* update deprecations
2019-12-04 06:59:19 -05:00
Jirka Borovec 3a58937d8b rename variables nb -> num (#567)
* rename nb -> num

* flake8

* batch_nb, epoch_nb, gpu_nb, split_nb

* add _num deprecations
2019-12-04 06:57:10 -05:00
Mary Trofimova a6d64ac013 Support torch.optim.lr_scheduler.ReduceLROnPlateau (#320)
* feat: add reducelronplateau callback

* feat: use reducelronplateau callback in trainer

* feat: only on unsupported lr schedulers

* feat: last but not the least merge of master

* feat: merge master

* feat: support only on scheduler in reduceLrOnPlateauScheduler

* refactor: code style

* Update pt_callbacks.py

* Update trainer.py

* Update train_loop_mixin.py

* Update trainer.py

* Update train_loop_mixin.py
2019-12-03 07:59:41 -05:00
Yongrae Jo 2b8475f590 Add resuming from specific checkpoint (#516)
* Add resume_from_checkpoint

* Fix variable name

* #515 Remove did_restore

* #515 Simplify code

* #515 Update doc for resume_from_checkpoint

* #515 Add on_gpu
2019-11-30 16:48:38 -05:00
Pariente Manuel df7b6d958e Correct behavior for argument gpus in Trainer (#561) 2019-11-30 14:50:50 -05:00
Jirka Borovec d71556e7a1 Sphinx generated documentation (#521)
* upgrade req.

* move MkDocs

* create Sphinx

* init Sphinx

* move md from MkDocs to Sphinx

* CI: build docs

* build Sphinx

formatting

move docs from MD to docstring in particular package/modules

formatting

add Sphinx ext.

rename root_module to core

drop implicit name "_logger"

drop duplicate name "overwrite"

fix imports

use pytorch theme

add sample link mapping

try fix RTD build

use forked template

fix some docs warnings

fix paths

add deprecation warnings

fix flake8

fix paths

revert refactor

revert MLFlowLogger

* revert example import

* update link

* Update lightning_module_template.py
2019-11-28 12:48:55 -05:00
Tullie Murrell c1ecca418e Write progress bar to stdout (#531)
* Default write progress bar to stdout

* Change validation progress too
2019-11-21 13:26:24 -05:00
Ir1dXD 5a9afb11cc change print to logging (#457)
* change print to logging

* always use logging.info

* use f-strings

* update code style

* set logging configs

* remove unused code
2019-11-05 08:43:21 -05:00
Vadim Bereznyuk 446a1b5d45 Split progress bar (#449)
* Splitted progress bars

* Iterable dataset total batches fix

* Use dynamic ncols and use batch as units

* Count epochs from 1 in progress bar

* Fix for disabled progress bar

* Code simplifications
2019-11-03 05:42:53 -05:00
Tullie Murrell 248495b1d1 Add tbptt (#429)
* Add truncated bptt

* Fix rebase error

* AutoPep8

* Address comments, incl default bptt_split impl

* Add tbptt test

* Add default split for lists/tuples

* Add tbptt docs

* Fix trainer spacing

* Update RequiredTrainerInterface.md
2019-10-31 06:45:28 -04:00
Vadim Bereznyuk f79bdf2327 Set total number of batches in progress bar while testing (#425) 2019-10-30 12:14:28 -04:00
William Falcon a4b43ce095
Loaders (#422)
* refactor dataloading

* refactor dataloading

* refactor dataloading

* refactor dataloading

* refactor dataloading

* refactor dataloading

* refactor dataloading

* refactor dataloading
2019-10-24 06:43:35 -04:00
William Falcon c6244594a6
clear memory cache before train starts (#418)
* clear memory cache before train starts

* clear memory cache before train starts
2019-10-23 11:41:00 -04:00
Vismantas 2aba70e228 parse_gpu_ids fix (#382)
* Unit tests for num_gpu property as proxy for __parse_gpu_ids.

* Refactoring __parse_gpu_ids

* Moved the function outside the class as it is
an utility function and did not depend on class in any way.
* Added unit tests for it.

* Mocked torch.cuda.device_count function in tests.

This allows the tests to be run on machines that do not have gpus.

* Fixed the parse_gpu_ids function to handle -1 case.

Function now handles -1 the same way as it does for '-1'.

* Unit tests for root_gpu added.

Added backend as a parameter as currently depending on backend set
or not, code fails with exception in certain circumstances, before
giving a wrong answer.

* Moved __set_root_gpu function out of the class.

This function does not depend on the class and can be tested
more easily this way.
Also added unit tests for this function. They simply reuse
data for the root_gpu property.

* determine_root_gpu_device passes unit tests.

* num_gpus passes unit tests.

Also added a None test for this function.

* parse_gpu_ids tests changed to reflect desired state after refactoring.

Planning to refactor parse_gpu_ids to always return list of ints.
This will simplify code that use output of this function.

* * parse_gpu_ids always returns lists
* parse_gpu_ids checks given ids against available ids
* parse_gpu_ids raises exception for non existant ids
* parse_gpu_ids returns None when no gpus are available
* cleaned up determine_root_gpu_device
* cleaned up num_gpus property
* Updated unit tests to reflect changes in the functions

* Flake8 fixes

* Moved fixture code up before where it is used.

* Updated documentation.

* Changed tests to match the API:
* gpus=-1 or gpus='-1' should use all available gpu devices
* gpus=N
    * N=0: no gpus should be used.
    * N>0: N gpus should be used
* gpus=list of ints or a comma separated string of numbers:
    Use the gpus indicated by the list or the string.

* Fixed code to pass all the changed tests for parsing gpus param.

* Refactoring parse_gpu_ids function.

* flake8 fixes.

* Updating documentation.

* flake8 fixes.

* flake8 fixes.

* flake8 fixes

* Update trainer.py

* Update dp_mixin.py

* Make reduce_distributed_output a stand alone function.
Fix imports.
Fix flake8.

* Add comet_ml dependency to tests requirements.txt

* Revert "Make reduce_distributed_output a stand alone function. Fix imports. Fix flake8."

This reverts commit eac0338

* Merge with master.
2019-10-23 05:05:09 -04:00
Jirka Borovec f18aee30a5 Minor imports cleaning (#402)
* code cleaning

* drop unused imports

* optimize imports
2019-10-22 11:32:40 +03:00
William Falcon 792ad00ff9
Fixed val interval (#405)
* added fixed frequency val batch check

* added fixed frequency val batch check

* Finished IterableDataset support

* flake8

* flake8

* flake8
2019-10-22 05:10:00 +03:00
William Falcon 1424157731
Refactor (#407)
* moved dp, ddp outside of trainer

* added main mixins

* finished major mixin refactor

* flake8

* finished major mixin refactor

* finished major mixin refactor

* finished major mixin refactor

* finished major mixin refactor

* finished major mixin refactor

* finished major mixin refactor

* finished major mixin refactor
2019-10-22 04:16:51 +03:00
tamyiuchau 4103a5ca73 Provide backward compatibility for #124 (#400)
* Provide backward compatibility for e681253

* typo fix
2019-10-21 08:16:55 +02:00
William Falcon 6111edaf82
Test fx (#390)
* changes to test fx

* changes to test fx

* changes to test fx

* changes to test fx

* changes to test fx

* changes to test fx

* changes to test fx

* changes to test fx

* changes to test fx

* changes to test fx
2019-10-19 00:39:30 +02:00
William Falcon 699bd2cb50
removed mlflow and custom logger tests (#389)
* changes to seed for tests

* changes to seed for tests

* changes to seed for tests

* changes to seed for tests

* changes to seed for tests

* changes to seed for tests

* changes to seed for tests

* changes to seed for tests

* changes to seed for tests

* changes to seed for tests

* changes to seed for tests

* changes to seed for tests

* changes to seed for tests

* changes to seed for tests

* changes to seed for tests
2019-10-18 23:03:28 +02:00
William Falcon 3dfcef6994
Loss keys (#387)
* any key in logs or progress bar is a candidate for callback metric

* any key in logs or progress bar is a candidate for callback metric
2019-10-18 15:28:13 +02:00
Hiroyuki Vincent Yamazaki 0fac2d64cf Fix off-by-one epoch length (#377) 2019-10-18 10:18:05 +02:00
William Falcon e5050700ce docs 2019-10-18 00:17:27 +02:00
William Falcon 2044126821
fixing tests (#372)
* fixing tests

* fixing tests

* fixing tests

* fixing tests

* fixing tests

* fixing tests

* fixing tests

* fixed tests

* fixed tests

* fixed tests

* fixed tests

* fixed tests

* fixed tests

* fixed tests

* fixed tests

* fixed tests
2019-10-16 07:28:47 -04:00
William Falcon e2cabb03ba
fix val logging (#362)
* fix test

* fix test

* fix test

* fix test

* fix test

* fix test

* fix test

* fix test

* fix test

* fix test

* fix test

* fix test

* fix test

* no warnings always

* no warnings always

* no warnings always

* no warnings always
2019-10-15 12:44:20 -04:00
Nic Eggert 19c2b8fc9e Allow disabling default logger, checkpoint_callback, and early_stop_callback (#360)
* Allow disabling logger, early stopping, and checkpoints

* Typo

* Get tests passing

* Update trainer.py
2019-10-12 06:00:24 -04:00
Yasser Souri 792ba59b78 Pad experiment version with zero for easier listing (#355) 2019-10-10 19:39:26 -04:00
William Falcon 426bb19846
Update trainer.py 2019-10-10 18:17:26 -04:00
William Falcon 46322b906b
fixed ckpt tests (#352)
* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests

* fixed ckpt tests
2019-10-10 15:16:19 -04:00
William Falcon 96c2a2de50 fixes Flake8 2019-10-09 17:49:29 -04:00
William Falcon 453568179b
Logger default (#351)
* weights go into default logger folder

* weights go into default logger folder

* weights go into default logger folder

* weights go into default logger folder

* weights go into default logger folder

* weights go into default logger folder

* ckpt callback in pretrain routine so exp already has version

* ckpt callback in pretrain routine so exp already has version

* ckpt callback in pretrain routine so exp already has version
2019-10-09 17:46:27 -04:00
William Falcon d95e693598
Logger default (#350)
* weights go into default logger folder

* weights go into default logger folder

* weights go into default logger folder

* weights go into default logger folder

* weights go into default logger folder

* weights go into default logger folder
2019-10-09 16:25:04 -04:00
William Falcon 6e0a562ecb fixed callback metrics ddp bug 2019-10-09 12:53:33 -04:00
William Falcon 5f1f3f6acc removed pdb 2019-10-09 10:45:06 -04:00
William Falcon 608a90a490
fixes non python type callback metrics and fast_dev_run (#345)
* fixes non python type callback metrics

* fixed fast dev run

* fixed fast dev run

* fixed fast dev run

* fixed fast dev run

* fixed fast dev run

* fixed fast dev run

* fixed fast dev run
2019-10-09 10:23:08 -04:00
Nic Eggert 8088052825 Finalize logger (#337)
* Ensure logger.finalize is called

* Call logger.finalize

* Update mlflow_logger.py

* Update test_logging.py

* Update trainer.py
2019-10-08 17:33:33 -04:00
William Falcon 49e04de5ac
Ports (#338)
* remove os.exit from early stopping

* remove os.exit from early stopping

* fixed weight summary

* fixed weight summary

* fixed weight summary

* fixed weight summary

* fixed weight summary

* fixed weight summary

* fixed weight summary
2019-10-08 17:11:47 -04:00
William Falcon dcaba55251
Early stopping (#332)
* callbacks use all other keys in return dict

* callbacks use all other keys in return dict

* callbacks use all other keys in return dict

* callbacks use all other keys in return dict

* remove os.exit from early stopping
2019-10-08 16:21:00 -04:00
Adrian Wälchli 6e3e740a7f Param printing (#336)
* print thousands as K, M, B, T, ...

* add option to print top-level modules only

* added doc string and added spacing

* do not print summary if neither "full" nor "top"

* updated docs showing summary print options

* fix line length for travis
2019-10-08 15:30:06 -04:00
William Falcon ff2a21a08a
default to O1 (#334) 2019-10-08 09:09:57 -04:00