Commit Graph

769 Commits

Author SHA1 Message Date
William Falcon bf2067a609
enabled manual returns (#4089) 2020-10-12 10:06:17 -04:00
William Falcon 1dbc6ffbc1
added templates (#4077)
* docs

* docs
2020-10-11 09:35:51 -04:00
William Falcon 7ffe05a3d1
ref: accelerator names (#4066)
* ref: accelerator names

* docs
2020-10-11 01:05:14 -04:00
William Falcon 0281b077d8
ref: decouple apex second attemp part 10/n (#4064)
* ref: decouple apex second attemp part 9/n

* ref: decouple apex second attemp part 9/n

* ref: decouple apex second attemp part 9/n
2020-10-10 20:05:05 -04:00
William Falcon dbfe2b6129
ref: decouple apex second attemp part 9/n (#4063)
* ref: decouple apex second attemp part 9/n

* ref: decouple apex second attemp part 9/n
2020-10-10 18:44:24 -04:00
William Falcon 5ce9fc6bb3
ref: decouple apex second attemp part 7/n (#4061)
* ref: decouple apex second attemp part 7/n

* ref: decouple apex second attemp part 7/n

* ref: decouple apex second attemp part 7/n
2020-10-10 16:44:15 -04:00
William Falcon d1bbb449a3
ref: decouple apex second attemp part 5/n (#4058) 2020-10-10 14:35:25 -04:00
Rohit Gupta bdbf846029
Fix to print scaler value in progress bar (#4053)
* Fix to print scaler value in progress bar

* chlog

* Fix to print scaler value in progress bar

* Fix to print scaler value in progress bar
2020-10-10 12:20:11 -04:00
William Falcon ce2edf1192
ref: decouple apex second attemp part 4/n (#4056)
* ref: decouple apex second attemp part 4/n

* ref: decouple apex second attemp part 4/n

* Update lightning.py

* ref: decouple apex second attemp part 4/n
2020-10-10 12:19:22 -04:00
William Falcon 7285613974
ref: decouple apex second attemp part 2/n (#4054)
* ref: decouple apex second attemp part 2/n

* ref: decouple apex second attemp part 2/n
2020-10-10 10:24:20 -04:00
William Falcon 5b261a230e
enable passing in custom accelerators (#4050)
* enable custom accelerators

* ref: finish decoupling apex, LM and backward

* ref: finish decoupling apex, LM and backward

* ref: finish decoupling apex, LM and backward
2020-10-10 09:21:08 -04:00
William Falcon 2b255a3df4
ref: enable custom clusters (1/n) (#4048)
* enable cluster plugins

* enable cluster plugins + test backend choices

* enable cluster plugins + test backend choices

* enable cluster plugins + test backend choices

* enable cluster plugins + test backend choices

* enable cluster plugins + test backend choices

* enable cluster plugins + test backend choices
2020-10-10 08:09:29 -04:00
William Falcon 0c42aa03fd
enables plugins (#4041)
* plugin hardware

* plugin hardware

* plugin hardware
2020-10-09 22:03:46 -04:00
William Falcon 05e0b4e5a1
Revert "Remove limitation of batch scaler (#4006)" (#4040)
This reverts commit 7e756ca11f.
2020-10-09 21:03:23 -04:00
Jirka Borovec baf4f35027
add parsing OS env vars (#4022)
* add parsing OS env vars

* fix env

* Apply suggestions from code review

* overwrite init

* Apply suggestions from code review
2020-10-09 19:34:09 -04:00
edenlightning 627cae9483
[Docs] checkpoints (#4034)
* Update __init__.py

* Update weights_loading.rst

* docs for checkpoints
2020-10-09 19:10:25 -04:00
Nicki Skafte 7e756ca11f
Remove limitation of batch scaler (#4006)
* working code

* add tests

* fix scaling

* move patch dataloader to utils

* renaming

* fix tests

* add changelog

* update docs

* pep8
2020-10-09 14:53:01 -04:00
William Falcon bfdea3ea28
Multi opts tests and clarification (#4016)
* ref: clean up opts docs

* ref: clean up opts docs
2020-10-08 22:55:59 -04:00
William Falcon e68b949772
docs (#4003) 2020-10-08 15:54:52 -04:00
Nrupatunga fcfa587492
Bugfix/update trainer properties (#3975)
* make current_epoch and global_step to be same as trainer, after model restore.

* remove assignment here

* test

* minor modification

* merge with parent's master

* [bug-fix]: update trainer properties

* minor comment fix

* minor comment fix

* reset train loader in `on_train_epoch_start` hook

* makes sure the changes work

* minor chane

* update changelog

* adding unit test for reload_dataloaders_every_epoch arg

* modified changelog, to add PR number

* revert imports

* changes to unit test

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2020-10-08 10:20:55 -04:00
Jirka Borovec 8873750cf0
remove deprecated early_stop_callback (#3982) 2020-10-08 06:30:33 -04:00
edenlightning 6dfa748ce3
Add video tutorials to docs (#3977)
* videos in trainer api

* videos in docs

* videos in docs

* videos in trainer api

* videos in docs

* videos in docs

* videos in docs

* videos in docs

* Update new-project.rst

* docs

* Update new-project.rst
2020-10-08 05:49:56 -04:00
William Falcon 1d3c7dc8d6
removed deprecated trainer flags (#3969)
* removed deprecated flags

* removed es callback flag
2020-10-07 23:46:21 -04:00
William Falcon 048a816be3
added tests for the training epoch end (#3967) 2020-10-07 22:27:36 -04:00
William Falcon 4c0d063c86
outputs in __batch_end hooks (#3966)
* train_batch_end outputs

* added tests for the output hooks
2020-10-07 21:48:38 -04:00
William Falcon 65b6a6a497
0.10.0 (#3965) 2020-10-07 20:41:56 -04:00
William Falcon 6044cf9003
Fixes #3945 (#3947) 2020-10-07 13:46:27 -04:00
edenlightning 27f536b2ce
[CI SKIP] Fix early stop docs (#3940)
* Update early_stopping.rst

* Update __init__.py

* Update new-project.rst

* Update early_stopping.rst

* Update __init__.py

* Update early_stopping.rst

* Update __init__.py

Co-authored-by: William Falcon <waf2107@columbia.edu>
2020-10-07 13:01:50 -04:00
William Falcon b922409624
clean and organize fit (#3938)
* clean and organize fit

* clean and organize fit

* clean and organize fit

* clean and organize fit

* clean and organize fit
2020-10-07 11:04:10 -04:00
William Falcon 575e01be82
tests for multiple optimizers and dataloader combinations (#3937)
* added tests for multiple optimizers and dataloaders

* added tests for multiple optimizers and dataloaders

* added tests for multiple optimizers and dataloaders
2020-10-07 10:13:57 -04:00
ananthsub d3f40d6a9e
Update to_disk to use fsspec for remote file support (#3930)
* Update supporters.py

* Update CHANGELOG.md

* Update supporters.py

* Update supporters.py

* Update supporters.py

* Update supporters.py

* Update supporters.py

* Update supporters.py

* Update CHANGELOG.md

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2020-10-07 07:28:23 -04:00
edenlightning 335bb75356
update docs on logging (#3916)
* Update loggers.rst

* Update loggers.rst

* Update index.rst

* Create logging.rst

* Delete experiment_reporting.rst

* Delete experiment_logging.rst

* Update __init__.py
2020-10-06 18:53:39 -04:00
Jirka Borovec 064ae53d63
nb steps in early stop (#3909)
* nb steps

* if

* skip

* rev

* seed

* seed
2020-10-06 15:20:08 -04:00
Lezwon Castelino 69833dad5b
Added check to verify xla device is TPU (#3274)
* tpu device check

* replaced with xmp spawn

* Revert "replaced with xmp spawn"

This reverts commit 6835380f

* replaced all instances of XLA_AVAILABLE

* moved inner_f to global scope

* made refactors

* added changelog

* added TPU_AVAILABLE variable

* fix codefactor issues

* removed form trainer and early stopping

* add TORCHXLA_AVAILABLE check

* added tests

* refactoring

* Update pytorch_lightning/utilities/xla_device_utils.py

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* updated function names

* fixed bug

* updated CHANGELOG.md

* added todo

* added type hints

* isort and black

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: William Falcon <waf2107@columbia.edu>
2020-10-06 19:54:37 +02:00
William Falcon 2cf17a3718
Adds tests to make sure logging doesn't happen multiple times (#3899)
* Makes sure logging doesn't ever happen from non-root zero

* Makes sure logging doesn't ever happen from non-root zero

* Makes sure logging doesn't ever happen from non-root zero

* added bug report model

* fix local model

* fix local model

* fix local model

* fix local model
2020-10-06 12:43:51 -04:00
Teddy Koker 9600926619
Rename log_save_interval, row_log_interval (#3748)
* Rename row_log_interval -> log_every_n_steps
log_save_interval -> flush_logs_every_n_steps

* Changelog

* fixed title underline length

* typo

* Update pytorch_lightning/trainer/trainer.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* Update pytorch_lightning/trainer/trainer.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* pep8 + deprecation test

* 'todo: remove in 1.1 comment'

* 1.1 -> 0.11

* log

* docs

* depr API

* add depr tests

* note

* miss

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Jirka Borovec <jirka@pytorchlightning.ai>
2020-10-06 10:27:06 -04:00
edenlightning 2119184801
Fix docs for auto_lr_find (#3883)
* Fix docs for auto_lr_find

* change testcode to codeblock

we are not showing a complete example here
2020-10-05 22:28:38 -04:00
William Falcon b34c7add23
Fixes #3668, #3887 as a bonus (#3888)
* Fixes #3668, #3887 as a bonus

* Fixes #3668, #3887 as a bonus
2020-10-05 21:30:41 -04:00
Nrupatunga 7d47ed178b
[Bug-Fix]:properties `current_epoch` and `global_step` between model and trainer same always (#3785)
* make current_epoch and global_step to be same as trainer, after model restore.

* remove assignment here

* test

* minor modification

* Update pytorch_lightning/core/lightning.py

type check, better clarity

Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>

* Update pytorch_lightning/core/lightning.py

type check, better clarity

Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>

* comments for current_epoch and global_step properties

* Update tests/models/test_restore.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* update comments according to the changes made

* Update tests/models/test_restore.py

* add current_epoch, global_step to jit ignore list

* Add comments to CHANGELOG

* Update CHANGELOG.md

* Update tests/models/test_restore.py

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2020-10-05 11:10:40 -04:00
William Falcon b014223f72
Fixes #2678 - enables training_step to return None (#3862)
* Fixes #2678 - enables training_step to return None

* Fixes #2678 - enables training_step to return None
2020-10-05 07:33:46 -04:00
William Falcon d787208e76
Fixes #2792 (#3857) 2020-10-04 23:25:02 -04:00
William Falcon f58c760409
Fixes #2551 (#3858) 2020-10-04 23:02:35 -04:00
William Falcon d9656d166c
fixed model checkpoint frequency (#3852)
* fixed model checkpoint frequency

* fixed model checkpoint frequency

* fixed model checkpoint frequency

* fixed model checkpoint frequency

* merged
2020-10-04 21:49:20 -04:00
William Falcon c6df63a588
Fixes #2479 (#3856) 2020-10-04 21:30:33 -04:00
William Falcon 00f0d19a61
fixes #3798 (#3849)
* fix #3798

* added tbptt test for logging
2020-10-04 19:36:51 -04:00
Harshal Mittal 6723b924f8
docs/fix_typo (#3847) 2020-10-04 17:10:49 -04:00
William Falcon 70e792344a
test selecting the correct backend. temp backends while slurm and TE are decoupled (#3848)
* test selecting the correct backend. tem backends while slurm and TE are decoupled

* test selecting the correct backend. tem backends while slurm and TE are decoupled
2020-10-04 15:44:50 -04:00
William Falcon 1aa9d39506
Eval epoch can now log independently (#3843)
* ref: routed epoch outputs to logger

* ref: routed epoch outputs to logger

* ref: routed epoch outputs to logger

* ref: routed epoch outputs to logger
2020-10-04 13:36:35 -04:00
Adrian Wälchli 1906867fd4
deprecation warning (#3844) 2020-10-04 13:17:09 -04:00
William Falcon 2c21f7d7e2
ref: adding compute environments (2/n) (#3842)
* ref: adding compute environments (2/n)

* ref: adding compute environments (2/n)

* ref: adding compute environments (2/n)

* ref: adding compute environments (2/n)
2020-10-04 08:48:46 -04:00