William Falcon
1aa9d39506
Eval epoch can now log independently ( #3843 )
...
* ref: routed epoch outputs to logger
* ref: routed epoch outputs to logger
* ref: routed epoch outputs to logger
* ref: routed epoch outputs to logger
2020-10-04 13:36:35 -04:00
Jeff Yang
b76fc5bae5
use docker for conda CI ( #3841 )
...
* use docker in conda CI
* update env if needed
* update with pip
* remove setting pytorch
2020-10-04 13:18:20 -04:00
Adrian Wälchli
1906867fd4
deprecation warning ( #3844 )
2020-10-04 13:17:09 -04:00
William Falcon
2c21f7d7e2
ref: adding compute environments (2/n) ( #3842 )
...
* ref: adding compute environments (2/n)
* ref: adding compute environments (2/n)
* ref: adding compute environments (2/n)
* ref: adding compute environments (2/n)
2020-10-04 08:48:46 -04:00
Rohit Gupta
a628d181ee
Fix val_progress_bar total with num_sanity_val_steps ( #3751 )
...
* Fix val_progress_bar total with num_sanity_val_steps
* chlog
* Fix val_progress_bar total with num_sanity_val_steps
* move test
* replaced with sanity flag and suggestions
2020-10-04 08:32:18 -04:00
Lezwon Castelino
4da240ea1b
added broadcast option to tpu ( #3814 )
...
* added broadcast option to tpu
* add device
* moved tpu broadcast to tpu_backend
* removed Lightning dist
* decode bytes
* pep8 fix
* fix bug
* test for broadcast
* updated changelog
2020-10-04 07:47:33 -04:00
William Falcon
093535d433
ref: adding compute environments (1/n) ( #3837 )
...
* ref: adding compute environments (1/n)
* ref: adding compute environments (1/n)
* ref: adding compute environments (1/n)
2020-10-04 07:31:19 -04:00
Daniel Li
a3503ce3fd
Explicitly point out where should we set the random seed ( #3839 )
...
* Explicitly point out where should we set the random seed
* Update docs/source/multi_gpu.rst
Co-authored-by: Jeff Yang <ydcjeff@outlook.com>
Co-authored-by: Qinru Li <q4li@eng.ucsd.edu>
Co-authored-by: William Falcon <waf2107@columbia.edu>
Co-authored-by: Jeff Yang <ydcjeff@outlook.com>
2020-10-04 07:30:45 -04:00
William Falcon
1f8ff7c48c
ref: callback system and init ddp (1/n) ( #3836 )
...
* refactored callback system and init ddp
* refactored callback system and init ddp
* refactored callback system and init ddp
* refactored callback system and init ddp
2020-10-03 23:39:17 -04:00
ananthsub
b8a6408a11
Update trainer.py ( #3834 )
2020-10-03 22:18:05 -04:00
William Falcon
66aef10239
verified epoch logging ( #3830 )
...
* ref: fix epoch logging
* verified epoch logging
* verified epoch logging
* verified epoch logging
* verified epoch logging
* verified epoch logging
* verified epoch logging
* verified epoch logging
* verified epoch logging
2020-10-03 21:17:24 -04:00
William Falcon
35d1111994
[WIP] ref: decoupled ddp, ddp spawn (finish 3733) ( #3819 )
...
* ref: finish #3733
* remove deprecated test
* remove deprecated test
* remove deprecated test
* remove deprecated test
* remove deprecated test
* remove deprecated test
* remove deprecated test
* remove deprecated test
* remove deprecated test
* remove deprecated test
* remove deprecated test
* remove deprecated test
* remove deprecated test
* Update pytorch_lightning/accelerators/ddp_backend.py
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
* remove deprecated test
* remove deprecated test
* remove deprecated test
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
2020-10-03 14:05:31 -04:00
William Falcon
3903cf63c6
ref: training flag tests (val_check_interval) ( #3825 )
...
* added test_val_check_interval tests
* added test_val_check_interval tests
* added test_val_check_interval tests
2020-10-03 14:05:01 -04:00
ananthsub
8dd37e7c4a
Use fsspec in load to resolve more paths/URLs from storage backends ( #3692 )
...
* special case http for torch hub load
* Update CHANGELOG.md
* Update test.txt
2020-10-03 13:29:03 -04:00
William Falcon
0fb8c54fda
remove deprecated test ( #3820 )
2020-10-03 13:21:10 -04:00
William Falcon
d9bc95f83e
ref: bug fix with logging val epoch end + monitor ( #3812 )
...
* ref: fix metric err
* ref: fix metric err
* ref: fix metric err
* ref: merge
* ref: merge
* ref: merge
* ref: merge
* ref: decoupled ddp2
* ref: decoupled ddp2
* ref: decoupled ddp2
* ref: decoupled ddp2
* ref: decoupled ddp2
* ref: clean up ddp before final fix
* ref: clean up ddp before final fix
* ref: clean up ddp before final fix
* ref: clean up ddp before final fix
* ref: clean up ddp before final fix
* ref: clean up ddp before final fix
* ref: clean up ddp before final fix
* ref: clean up ddp before final fix
* ref: clean up ddp before final fix
* ref: clean up ddp before final fix
* ref: clean up ddp before final fix
2020-10-03 12:33:29 -04:00
William Falcon
ed1450a293
ref: clean up ddp before final fix ( #3817 )
...
* ref: clean up ddp before final fix
* ref: clean up ddp before final fix
* ref: clean up ddp before final fix
* ref: clean up ddp before final fix
* ref: clean up ddp before final fix
2020-10-03 12:01:02 -04:00
William Falcon
0838c6bfce
ref: decoupled ddp2 ( #3816 )
2020-10-03 09:02:35 -04:00
Jeff Yang
62320632d4
Some docs update ( #3794 )
...
* docs update
* docs update
* suggestions
* Update docs/source/introduction_guide.rst
Co-authored-by: William Falcon <waf2107@columbia.edu>
2020-10-03 08:15:07 -04:00
William Falcon
a677833f84
ref: separate slurm from ddp ( #3809 )
...
* ref: separate slurm from ddp
* ref: separate te from ddp
* ref: merge
* ref: merge
* ref: merge
2020-10-02 23:08:34 -04:00
Brendan Fahy
b14c4d4c70
handle fsspec inconsistency in PyArrowHDFS ( #3805 )
2020-10-02 22:35:42 -04:00
William Falcon
74484edecd
ref: separate te from ddp ( #3810 )
...
* ref: separate te from ddp
* ref: separate te from ddp
* ref: separate te from ddp
2020-10-02 21:00:51 -04:00
William Falcon
a28528cc8b
ref: remove weight loading hack for ddp_cpu ( #3808 )
2020-10-02 19:28:50 -04:00
William Falcon
afa43837a4
ref: part 8 of #3733 ( #3806 )
2020-10-02 18:46:18 -04:00
Jeff Yang
9942f3ebdf
Fix `on_train_batch_start` hook to end epoch early ( #3700 )
...
* init
* add test
* changelog and docs
* fix test
* Apply suggestion from code review
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2020-10-02 21:46:46 +02:00
ananthsub
3ab730e316
Swap torch.load for fsspec load in ddp spawn backend ( #3787 )
...
* Update ddp_spawn_backend.py
* Update ddp_cpu_spawn_backend.py
* log
Co-authored-by: Jirka Borovec <jirka@pytorchlightning.ai>
2020-10-02 21:00:01 +02:00
ananthsub
192fc018f3
Update model_checkpoint.py ( #3801 )
2020-10-02 14:49:46 -04:00
William Falcon
7c6ed1fa28
ref: part 7 of #3733 ( #3802 )
...
* ref: part 7 of #3733
* ref: part 7 of #3733
2020-10-02 14:23:27 -04:00
Jirka Borovec
22efce8f40
fix warning ( #3800 )
2020-10-02 13:51:02 -04:00
zcain117
0c12065efd
[TPU CI] Use timestamp+pythonVersion to form the docker image tag. ( #3779 )
...
* Use timestamp+pythonVersion to form the docker image tag.
* Remove temporary step to check new env var.
2020-10-02 16:22:47 +02:00
ananthsub
88ad4513c1
Use fsspec with OmegaConf saving in saving.py ( #3782 )
2020-10-02 15:37:37 +02:00
Nathan Raw
698f90164c
remove torch<1.3.0 warning from tb logger ( #3784 )
2020-10-02 15:36:55 +02:00
Jirka Borovec
62eabdd535
revert backend types ( #3788 )
...
* revert backend types
* todo
* todo
2020-10-02 06:18:44 -04:00
edenlightning
ab7d9bd1a5
Add link to PL forum in GH questions template ( #3708 )
...
* Update how-to-question.md
* Update how-to-question.md
* Apply suggestions from code review
* typo
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2020-10-02 12:05:46 +02:00
Jirka Borovec
1160270882
fix path in CI for release & python version in all dockers & duplicated badges ( #3765 )
...
* typo
* path
* check
* trigger
* fix conda
* pip ver
* fix cuda
* fix XLA
* fix xla
* ci
* docker
* BIULD
* unBIULD
* update
* py 3.8
* apex
* apex
2020-10-02 05:26:21 -04:00
Akihiro Nitta
ebc1b23fa3
Use `raise .. from ..` to explicitly chain exceptions ( #3750 )
...
* Fix exception chaining
* names
* Change exception names for consistency
Co-authored-by: Nicki Skafte <skaftenicki@gmail.com>
* Change exception names for consistency
Co-authored-by: Nicki Skafte <skaftenicki@gmail.com>
Co-authored-by: Jirka Borovec <jirka@pytorchlightning.ai>
Co-authored-by: Nicki Skafte <skaftenicki@gmail.com>
2020-10-01 21:45:44 +02:00
William Falcon
e17712e5c3
part 5 of #3733 ( #3774 )
...
* ref: part 4 of #3733
* ref: part 4 of #3733
* ref: part 4 of #3733
2020-10-01 12:34:12 -04:00
William Falcon
622c5c3982
ref: part 4 of #3733 ( #3773 )
...
* ref: part 4 of #3733
* ref: part 4 of #3733
* ref: part 4 of #3733
* ref: part 4 of #3733
2020-10-01 11:26:58 -04:00
Jeff Yang
128f9ee931
Fix for PyTorch 1.7 CI ( #3768 )
...
* changed to __jit_unsed_properties__
2020-10-01 16:37:00 +02:00
Nicki Skafte
fe290280be
Metric aggregation testing ( #3517 )
...
* aggregation testing
* add more tests
* mse
* more tests
* fix tests
* fix doctest
* fix codefactor
* fix import error
* fix doctest
* revert docfix
* test for model integration
* fix integration test
* added test cases
* fix rmsle
* aggregation testing
* add more tests
* mse
* more tests
* fix tests
* fix doctest
* fix codefactor
* fix import error
* fix doctest
* revert docfix
* test for model integration
* fix integration test
* fix psnr
* add warning/valueerror to embedding similarity
* fixed f scores
* disable some test
* fix tests
* fixing codefactor
* fix pep8
* changelog
* fix doctest
* cleaning test
* fix pickle error
* pickle fix
* fix pickle error
* Apply suggestions from code review
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* code cleanup + changes based on suggestions
* update based on suggestion
* update based on suggestions
* Apply suggestions from code review
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Nicki Skafte <nugginea@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2020-10-01 15:37:51 +02:00
William Falcon
ac2b0f0f06
ref: continue #3733 ( #3767 )
...
* ref: #3733 part 2
* ref: #3733 part 2
2020-10-01 09:25:33 -04:00
William Falcon
440f837f6d
ref: part a of #3733 ( #3766 )
...
* ref: part a of #3733
* ref: part a of #3733
2020-10-01 08:15:23 -04:00
Nicki Skafte
9a7d1a1876
[metrics] Accuracy num_classes error fix ( #3764 )
...
* change accuracy error to warning
* changelog
2020-10-01 13:00:42 +02:00
Lezwon Castelino
8be002ccc7
skip best_model_path if checkpoint_callback is None ( #2962 )
...
* skip best_model_path if checkpoint_callback is None
* removed test
2020-10-01 06:57:26 -04:00
GimmickNG
e4e60e9b82
Add datamodule parameter to lr_find() ( #3425 )
...
* Add datamodule parameter to lr_find()
* Fixed missing import
* Move datamodule parameter to end
* Add datamodule parameter test with auto_lr_find
* Change test for datamodule parameter
* Apply suggestions from code review
Co-authored-by: Nicki Skafte <skaftenicki@gmail.com>
* Fix lr_find documentation
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
* formatting
* Add description to datamodule param in lr_find
* pep8: remove trailing whitespace on line 105
* added changelog
Co-authored-by: Nicki Skafte <skaftenicki@gmail.com>
Co-authored-by: Nicki Skafte <nugginea@gmail.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2020-10-01 10:33:12 +02:00
William Falcon
7c61fc7c27
ref: fixes logging for eval steps ( #3763 )
...
* fixes logging for eval steps
2020-10-01 02:31:11 -04:00
Teddy Koker
5ec00ccd28
Added gradient clip test for native AMP ( #3754 )
...
* added gradient clip test for fp16
* pep8
2020-10-01 01:36:34 -04:00
William Falcon
a38d108a68
add dist lib to enable syncing anything across devices ( #3762 )
...
* add dist lib to enable syncing anything across devices
2020-10-01 01:21:38 -04:00
William Falcon
cf182e80fc
Finish Allow on_save_checkpoint... ( #3688 )
...
* Finish #3562
* Apply suggestions from code review
* Apply suggestions from code review
* fix tests
* Finish #3562
* Apply suggestions from code review
* Apply suggestions from code review
* fix tests
* fix structure
* fix structure
* make save_last test pass
* unnecessary global rank check
* fix test
* update test
* update test
* test
* test
* run save on all
* remove assert
* tracking saves
* check if fails
* test
* clean up
* adjust horovod test
* clean up
* remove unnecessary makdirs
* change
* undo
* debug
* debug
* debug
* debug
* mock
* undo debug code
* add extra assertions
* test
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <adrian.waelchli@inf.unibe.ch>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2020-09-30 16:15:29 -04:00
ananthsub
1eb1d17e25
Add trainer attribute to datamodule ( #3749 )
...
* Split out changes from #3563 to make that PR easier to review. This formats the file according to the Black formatter
* Store a reference to the trainer on the datamodule
Fixes #3682
* Update data_connector.py
* Update data_connector.py
* Update test_datamodules.py
* Add attribute to datamodule for trainer
2020-10-01 00:41:19 +05:30