lightning

Commit Graph

Author	SHA1	Message	Date
William Falcon	88b750a018	default logger is now tensorboard (#609 ) * refactor * refactor * refactor * made tensorboard the default not test-tube	2020-01-14 14:40:41 -05:00
Vadim Bereznyuk	756c70a4a0	Clearer disable validation logic (#650 ) * Clearer disable validation logic * fix for fast_dev_run * flake8 fix * Test check fix * update error message	2020-01-13 22:31:15 -05:00
Jirka Borovec	f7db44e750	fix deprecated tng and abstract ligntning (#644 )	2020-01-13 22:20:38 -05:00
Elliot Waite	b492e2b89e	Change nb to num in ABCs, comments, and tqdm logging (#613 ) * Change nb to num in ABCs, comments, and tqdm logging * Fix warnings text * Make warnings one line * Change num to number in comments	2019-12-09 04:40:26 -08:00
schwobr	2f01c03b38	Additional hooks (#598 ) * Renamed `on_sanity_check_start` to `on_train_start` and added `on_train_end` to `ModelHooks` * changed tests to use `on_train_start` instead of `on_sanity_check_start`	2019-12-07 08:52:06 -05:00
Elliot Waite	1051c189e1	Simplify variables: step, epoch, max_epochs, min_epochs (#589 )	2019-12-07 08:50:21 -05:00
YehCF	cc65f39d97	Fix number of total steps shown in progress bar during sanity validation check when number of validation dataloaders >= 2 (#597 ) * type: debug Calculate the adequate number of steps to run during sanity_check. This fixes the bug when there are two or more validation dataloaders. - Before: total=self.num_sanity_val_steps - After: total=self.num_sanity_val_stepslen(self.get_val_dataloaders()) type: refactor Put total=... in the next line * type: refactor run flake8	2019-12-07 08:47:59 -05:00
Jirka Borovec	1d4b6be17b	rename trainer modules, drop `_mixin` (#571 ) * rename trainer modules, drop _mixin * fix imports	2019-12-04 11:39:14 -05:00
Jirka Borovec	e0dbc8ab46	Abstract Mixin classes (#572 ) * make partial Trainer classes as abstract * add empty attributes/methods * flake8 * fix mixin order * update abstact * reorder	2019-12-04 10:57:32 -05:00
Ir1dXD	c316173e89	use print for INFO and lower levels summarize() (#580 ) * use print for INFO and lower levels summarize() * use logging.INFO instead of magic number * bring logging.info back for other cases * move logging config to __init__.py * prepend the model summary with a newline	2019-12-04 07:05:34 -05:00
Jirka Borovec	ab4fea0b55	fix defecation warnings (#570 ) * fix defecation warnings * flake8 * update deprecations	2019-12-04 06:59:19 -05:00
Jirka Borovec	3a58937d8b	rename variables nb -> num (#567 ) * rename nb -> num * flake8 * batch_nb, epoch_nb, gpu_nb, split_nb * add _num deprecations	2019-12-04 06:57:10 -05:00
Mary Trofimova	a6d64ac013	Support torch.optim.lr_scheduler.ReduceLROnPlateau (#320 ) * feat: add reducelronplateau callback * feat: use reducelronplateau callback in trainer * feat: only on unsupported lr schedulers * feat: last but not the least merge of master * feat: merge master * feat: support only on scheduler in reduceLrOnPlateauScheduler * refactor: code style * Update pt_callbacks.py * Update trainer.py * Update train_loop_mixin.py * Update trainer.py * Update train_loop_mixin.py	2019-12-03 07:59:41 -05:00
Yongrae Jo	2b8475f590	Add resuming from specific checkpoint (#516 ) * Add resume_from_checkpoint * Fix variable name * #515 Remove did_restore * #515 Simplify code * #515 Update doc for resume_from_checkpoint * #515 Add on_gpu	2019-11-30 16:48:38 -05:00
Pariente Manuel	df7b6d958e	Correct behavior for argument gpus in Trainer (#561 )	2019-11-30 14:50:50 -05:00
Jirka Borovec	d71556e7a1	Sphinx generated documentation (#521 ) * upgrade req. * move MkDocs * create Sphinx * init Sphinx * move md from MkDocs to Sphinx * CI: build docs * build Sphinx formatting move docs from MD to docstring in particular package/modules formatting add Sphinx ext. rename root_module to core drop implicit name "_logger" drop duplicate name "overwrite" fix imports use pytorch theme add sample link mapping try fix RTD build use forked template fix some docs warnings fix paths add deprecation warnings fix flake8 fix paths revert refactor revert MLFlowLogger * revert example import * update link * Update lightning_module_template.py	2019-11-28 12:48:55 -05:00
Tullie Murrell	c1ecca418e	Write progress bar to stdout (#531 ) * Default write progress bar to stdout * Change validation progress too	2019-11-21 13:26:24 -05:00
Ir1dXD	5a9afb11cc	change print to logging (#457 ) * change print to logging * always use logging.info * use f-strings * update code style * set logging configs * remove unused code	2019-11-05 08:43:21 -05:00
Vadim Bereznyuk	446a1b5d45	Split progress bar (#449 ) * Splitted progress bars * Iterable dataset total batches fix * Use dynamic ncols and use batch as units * Count epochs from 1 in progress bar * Fix for disabled progress bar * Code simplifications	2019-11-03 05:42:53 -05:00
Tullie Murrell	248495b1d1	Add tbptt (#429 ) * Add truncated bptt * Fix rebase error * AutoPep8 * Address comments, incl default bptt_split impl * Add tbptt test * Add default split for lists/tuples * Add tbptt docs * Fix trainer spacing * Update RequiredTrainerInterface.md	2019-10-31 06:45:28 -04:00
Vadim Bereznyuk	f79bdf2327	Set total number of batches in progress bar while testing (#425 )	2019-10-30 12:14:28 -04:00
William Falcon	a4b43ce095	Loaders (#422 ) * refactor dataloading * refactor dataloading * refactor dataloading * refactor dataloading * refactor dataloading * refactor dataloading * refactor dataloading * refactor dataloading	2019-10-24 06:43:35 -04:00
William Falcon	c6244594a6	clear memory cache before train starts (#418 ) * clear memory cache before train starts * clear memory cache before train starts	2019-10-23 11:41:00 -04:00
Vismantas	2aba70e228	parse_gpu_ids fix (#382 ) * Unit tests for num_gpu property as proxy for __parse_gpu_ids. * Refactoring __parse_gpu_ids * Moved the function outside the class as it is an utility function and did not depend on class in any way. * Added unit tests for it. * Mocked torch.cuda.device_count function in tests. This allows the tests to be run on machines that do not have gpus. * Fixed the parse_gpu_ids function to handle -1 case. Function now handles -1 the same way as it does for '-1'. * Unit tests for root_gpu added. Added backend as a parameter as currently depending on backend set or not, code fails with exception in certain circumstances, before giving a wrong answer. * Moved __set_root_gpu function out of the class. This function does not depend on the class and can be tested more easily this way. Also added unit tests for this function. They simply reuse data for the root_gpu property. * determine_root_gpu_device passes unit tests. * num_gpus passes unit tests. Also added a None test for this function. * parse_gpu_ids tests changed to reflect desired state after refactoring. Planning to refactor parse_gpu_ids to always return list of ints. This will simplify code that use output of this function. * * parse_gpu_ids always returns lists * parse_gpu_ids checks given ids against available ids * parse_gpu_ids raises exception for non existant ids * parse_gpu_ids returns None when no gpus are available * cleaned up determine_root_gpu_device * cleaned up num_gpus property * Updated unit tests to reflect changes in the functions * Flake8 fixes * Moved fixture code up before where it is used. * Updated documentation. * Changed tests to match the API: * gpus=-1 or gpus='-1' should use all available gpu devices * gpus=N * N=0: no gpus should be used. * N>0: N gpus should be used * gpus=list of ints or a comma separated string of numbers: Use the gpus indicated by the list or the string. * Fixed code to pass all the changed tests for parsing gpus param. * Refactoring parse_gpu_ids function. * flake8 fixes. * Updating documentation. * flake8 fixes. * flake8 fixes. * flake8 fixes * Update trainer.py * Update dp_mixin.py * Make reduce_distributed_output a stand alone function. Fix imports. Fix flake8. * Add comet_ml dependency to tests requirements.txt * Revert "Make reduce_distributed_output a stand alone function. Fix imports. Fix flake8." This reverts commit `eac0338` * Merge with master.	2019-10-23 05:05:09 -04:00
Jirka Borovec	f18aee30a5	Minor imports cleaning (#402 ) * code cleaning * drop unused imports * optimize imports	2019-10-22 11:32:40 +03:00
William Falcon	792ad00ff9	Fixed val interval (#405 ) * added fixed frequency val batch check * added fixed frequency val batch check * Finished IterableDataset support * flake8 * flake8 * flake8	2019-10-22 05:10:00 +03:00
William Falcon	1424157731	Refactor (#407 ) * moved dp, ddp outside of trainer * added main mixins * finished major mixin refactor * flake8 * finished major mixin refactor * finished major mixin refactor * finished major mixin refactor * finished major mixin refactor * finished major mixin refactor * finished major mixin refactor * finished major mixin refactor	2019-10-22 04:16:51 +03:00
tamyiuchau	4103a5ca73	Provide backward compatibility for #124 (#400 ) * Provide backward compatibility for `e681253` * typo fix	2019-10-21 08:16:55 +02:00
William Falcon	6111edaf82	Test fx (#390 ) * changes to test fx * changes to test fx * changes to test fx * changes to test fx * changes to test fx * changes to test fx * changes to test fx * changes to test fx * changes to test fx * changes to test fx	2019-10-19 00:39:30 +02:00
William Falcon	699bd2cb50	removed mlflow and custom logger tests (#389 ) * changes to seed for tests * changes to seed for tests * changes to seed for tests * changes to seed for tests * changes to seed for tests * changes to seed for tests * changes to seed for tests * changes to seed for tests * changes to seed for tests * changes to seed for tests * changes to seed for tests * changes to seed for tests * changes to seed for tests * changes to seed for tests * changes to seed for tests	2019-10-18 23:03:28 +02:00
William Falcon	3dfcef6994	Loss keys (#387 ) * any key in logs or progress bar is a candidate for callback metric * any key in logs or progress bar is a candidate for callback metric	2019-10-18 15:28:13 +02:00
Hiroyuki Vincent Yamazaki	0fac2d64cf	Fix off-by-one epoch length (#377 )	2019-10-18 10:18:05 +02:00
William Falcon	e5050700ce	docs	2019-10-18 00:17:27 +02:00
William Falcon	2044126821	fixing tests (#372 ) * fixing tests * fixing tests * fixing tests * fixing tests * fixing tests * fixing tests * fixing tests * fixed tests * fixed tests * fixed tests * fixed tests * fixed tests * fixed tests * fixed tests * fixed tests * fixed tests	2019-10-16 07:28:47 -04:00
William Falcon	e2cabb03ba	fix val logging (#362 ) * fix test * fix test * fix test * fix test * fix test * fix test * fix test * fix test * fix test * fix test * fix test * fix test * fix test * no warnings always * no warnings always * no warnings always * no warnings always	2019-10-15 12:44:20 -04:00
Nic Eggert	19c2b8fc9e	Allow disabling default logger, checkpoint_callback, and early_stop_callback (#360 ) * Allow disabling logger, early stopping, and checkpoints * Typo * Get tests passing * Update trainer.py	2019-10-12 06:00:24 -04:00
Yasser Souri	792ba59b78	Pad experiment version with zero for easier listing (#355 )	2019-10-10 19:39:26 -04:00
William Falcon	426bb19846	Update trainer.py	2019-10-10 18:17:26 -04:00
William Falcon	46322b906b	fixed ckpt tests (#352 ) * fixed ckpt tests * fixed ckpt tests * fixed ckpt tests * fixed ckpt tests * fixed ckpt tests * fixed ckpt tests * fixed ckpt tests * fixed ckpt tests * fixed ckpt tests * fixed ckpt tests * fixed ckpt tests * fixed ckpt tests * fixed ckpt tests * fixed ckpt tests * fixed ckpt tests * fixed ckpt tests * fixed ckpt tests * fixed ckpt tests * fixed ckpt tests * fixed ckpt tests * fixed ckpt tests * fixed ckpt tests * fixed ckpt tests * fixed ckpt tests * fixed ckpt tests * fixed ckpt tests * fixed ckpt tests * fixed ckpt tests * fixed ckpt tests * fixed ckpt tests * fixed ckpt tests * fixed ckpt tests * fixed ckpt tests * fixed ckpt tests * fixed ckpt tests * fixed ckpt tests * fixed ckpt tests * fixed ckpt tests * fixed ckpt tests * fixed ckpt tests * fixed ckpt tests * fixed ckpt tests * fixed ckpt tests * fixed ckpt tests * fixed ckpt tests * fixed ckpt tests * fixed ckpt tests * fixed ckpt tests * fixed ckpt tests * fixed ckpt tests * fixed ckpt tests * fixed ckpt tests * fixed ckpt tests * fixed ckpt tests * fixed ckpt tests * fixed ckpt tests * fixed ckpt tests * fixed ckpt tests * fixed ckpt tests * fixed ckpt tests * fixed ckpt tests * fixed ckpt tests * fixed ckpt tests * fixed ckpt tests * fixed ckpt tests	2019-10-10 15:16:19 -04:00
William Falcon	96c2a2de50	fixes Flake8	2019-10-09 17:49:29 -04:00
William Falcon	453568179b	Logger default (#351 ) * weights go into default logger folder * weights go into default logger folder * weights go into default logger folder * weights go into default logger folder * weights go into default logger folder * weights go into default logger folder * ckpt callback in pretrain routine so exp already has version * ckpt callback in pretrain routine so exp already has version * ckpt callback in pretrain routine so exp already has version	2019-10-09 17:46:27 -04:00
William Falcon	d95e693598	Logger default (#350 ) * weights go into default logger folder * weights go into default logger folder * weights go into default logger folder * weights go into default logger folder * weights go into default logger folder * weights go into default logger folder	2019-10-09 16:25:04 -04:00
William Falcon	6e0a562ecb	fixed callback metrics ddp bug	2019-10-09 12:53:33 -04:00
William Falcon	5f1f3f6acc	removed pdb	2019-10-09 10:45:06 -04:00
William Falcon	608a90a490	fixes non python type callback metrics and fast_dev_run (#345 ) * fixes non python type callback metrics * fixed fast dev run * fixed fast dev run * fixed fast dev run * fixed fast dev run * fixed fast dev run * fixed fast dev run * fixed fast dev run	2019-10-09 10:23:08 -04:00
Nic Eggert	8088052825	Finalize logger (#337 ) * Ensure logger.finalize is called * Call logger.finalize * Update mlflow_logger.py * Update test_logging.py * Update trainer.py	2019-10-08 17:33:33 -04:00
William Falcon	49e04de5ac	Ports (#338 ) * remove os.exit from early stopping * remove os.exit from early stopping * fixed weight summary * fixed weight summary * fixed weight summary * fixed weight summary * fixed weight summary * fixed weight summary * fixed weight summary	2019-10-08 17:11:47 -04:00
William Falcon	dcaba55251	Early stopping (#332 ) * callbacks use all other keys in return dict * callbacks use all other keys in return dict * callbacks use all other keys in return dict * callbacks use all other keys in return dict * remove os.exit from early stopping	2019-10-08 16:21:00 -04:00
Adrian Wälchli	6e3e740a7f	Param printing (#336 ) * print thousands as K, M, B, T, ... * add option to print top-level modules only * added doc string and added spacing * do not print summary if neither "full" nor "top" * updated docs showing summary print options * fix line length for travis	2019-10-08 15:30:06 -04:00
William Falcon	ff2a21a08a	default to O1 (#334 )	2019-10-08 09:09:57 -04:00

1 2

81 Commits