Jirka Borovec
|
17f58d2e11
|
add rank warning (#1428)
* add rank warning
* changelog
* use rank_zero_warn
* user trainer_init
* replace warnings
* fix test
* flake8
* docs
* changelog
* bug lol
|
2020-04-09 14:05:46 -04:00 |
Jirka Borovec
|
ff1f8ef400
|
Test deprecated API for 0.8.0 and 0.9.0 (#1071)
* till 0.8
* refactor
* fix tests
* fix tests
* deprx till 0.9
* Update trainer.py
* Apply suggestions from code review
Co-authored-by: William Falcon <waf2107@columbia.edu>
|
2020-03-06 21:36:44 +01:00 |
Jirka Borovec
|
9785a3e78e
|
Refactor: name modules (#548)
* refactor: rename some modules
* add deprecation warnings
* fix paths
|
2019-11-26 22:39:18 -05:00 |
William Falcon
|
7099f8dbfb
|
split trainer mixins (#209)
* split trainer mixins
* Update multi_node_cluster_template.py
* Update single_cpu_template.py
* Update single_gpu_node_16bit_template.py
* Update single_gpu_node_ddp_template.py
* Update single_gpu_node_dp_template.py
* Update trainer_cpu_template.py
* Update trainer_io.py
* split trainer mixins
* Update multi_node_cluster_template.py
* deconflicted
* deconflicted
* deconflicted
|
2019-09-06 14:11:07 -04:00 |
William Falcon
|
60633eaa32
|
Moves hpc auto-resubmit to trainer from test-tube (#207)
* added slurm signal handler
* added restore weight functions
* set slurm signal handling inside process
* added resubmit docs
* added resubmit docs
* fixed missing param
* Update trainer.py
* fixed missing param
* fixed missing param
* debugging tests
* debugging tests
* debugging tests
* debugging tests
* debugging tests
* debugging tests
* debugging tests
|
2019-09-06 11:54:51 -04:00 |
William Falcon
|
10e4b18452
|
made imports absolute
|
2019-08-07 10:14:59 -04:00 |
William Falcon
|
35f23bbc82
|
Merge pull request #55 from williamFalcon/continue
add training restore
|
2019-08-07 09:02:16 -04:00 |
William Falcon
|
5c398d7a4e
|
removed bad hook call
|
2019-08-07 07:39:41 -04:00 |
William Falcon
|
a931ded310
|
removed bad hook call
|
2019-08-07 07:35:02 -04:00 |
William Falcon
|
95ec072d1e
|
removed bad hook call
|
2019-08-07 07:30:02 -04:00 |
William Falcon
|
d3f19c8321
|
added auto restore
|
2019-08-07 06:55:05 -04:00 |
Jiri BOROVEC
|
d9bfe964f9
|
update by flake8
|
2019-08-06 22:45:46 +02:00 |
Jiri BOROVEC
|
469941a528
|
pkg relative imports
* split requirements.txt
* pytest verbose
|
2019-08-05 10:52:09 +02:00 |
William Falcon
|
019b4d16d0
|
formatting
|
2019-08-04 13:08:14 -05:00 |
William Falcon
|
ef6d5a412c
|
proc 0 only for save hpc. all procs for hpc load
|
2019-08-01 16:19:04 -04:00 |
williamFalcon
|
27660b8a96
|
running tests
|
2019-07-28 05:57:37 -07:00 |
William Falcon
|
f5a01edfb8
|
added clean slurm save load test
|
2019-07-26 22:32:34 -04:00 |
William Falcon
|
f1de62671d
|
added clean slurm save load test
|
2019-07-26 22:32:27 -04:00 |
William Falcon
|
57edb08bd8
|
added clean slurm save load test
|
2019-07-26 22:28:09 -04:00 |
William Falcon
|
ffa7a0dbab
|
added clean slurm save load test
|
2019-07-26 22:26:55 -04:00 |
William Falcon
|
348223a702
|
fixed hpc save, load. cleaned apu
|
2019-07-26 22:09:35 -04:00 |
William Falcon
|
265411572f
|
fixed hpc save, load. cleaned apu
|
2019-07-26 22:04:27 -04:00 |
William Falcon
|
4148c36abd
|
added model save load test
|
2019-07-26 21:55:01 -04:00 |
William Falcon
|
aacf1947ea
|
auto state-dict and remove the way the model is loaded during hpc
|
2019-07-26 21:38:06 -04:00 |
William Falcon
|
e2c7fa44b7
|
auto state-dict and remove the way the model is loaded during hpc
|
2019-07-26 21:37:06 -04:00 |
William Falcon
|
1a835969a6
|
added saving tests to cpu
|
2019-07-26 12:14:58 -04:00 |
William Falcon
|
7e728d97e7
|
removed save model logging
|
2019-07-25 14:36:22 -04:00 |
William Falcon
|
10c3266ed4
|
pt dpp some ignores
|
2019-07-24 19:30:27 -04:00 |
William Falcon
|
d7be0aae1c
|
fixed correct module on hpc save
|
2019-07-24 18:16:02 -04:00 |
William Falcon
|
a63f74281a
|
fixed correct module on hpc save
|
2019-07-24 18:03:19 -04:00 |
William Falcon
|
423bc5c6c9
|
testing hpc save load
|
2019-07-24 18:01:33 -04:00 |
William Falcon
|
2408aa886d
|
testing hpc save load
|
2019-07-24 18:00:15 -04:00 |
William Falcon
|
b7ca857434
|
removed dead code in model save
|
2019-07-24 15:44:04 -04:00 |
William Falcon
|
b836e6f321
|
removed dead code in model save
|
2019-07-24 15:43:10 -04:00 |
William Falcon
|
98c112598e
|
added safeguards for callbacks in loading saving
|
2019-07-24 11:31:13 -04:00 |
William Falcon
|
8a3abec83a
|
added safeguards for callbacks in loading saving
|
2019-07-24 11:30:14 -04:00 |
William Falcon
|
8fd7a6001b
|
added safeguards for callbacks in loading saving
|
2019-07-24 11:14:19 -04:00 |
William Falcon
|
6d1d5ef68e
|
set dp as default backend
|
2019-07-18 11:45:55 -04:00 |
William Falcon
|
75e32daad4
|
clean up dead code
|
2019-07-03 17:09:39 -04:00 |
William Falcon
|
30e2fc6c4b
|
added on_hpc_load and on_hpc_save hooks
|
2019-07-02 09:36:48 -04:00 |
William Falcon
|
62e091f48d
|
added on_hpc_load and on_hpc_save hooks
|
2019-07-02 09:36:33 -04:00 |
William Falcon
|
f257c080c0
|
added on_hpc_load and on_hpc_save hooks
|
2019-07-02 09:35:15 -04:00 |
William Falcon
|
f2134a4ddd
|
integrated tensorboardx test-tube
|
2019-06-29 15:58:47 -04:00 |
William Falcon
|
bf0f5a5cbb
|
removed self.model refs
|
2019-06-26 18:12:33 -04:00 |
William Falcon
|
9bf3fcd45e
|
adding support for interrupt signals
|
2019-06-14 09:59:28 -04:00 |
William Falcon
|
88ff860c90
|
adding support for interrupt signals
|
2019-06-14 09:46:41 -04:00 |
William Falcon
|
edf03063a1
|
adding support for interrupt signals
|
2019-06-14 09:44:19 -04:00 |
William Falcon
|
8cca02d652
|
adding support for interrupt signals
|
2019-06-14 09:25:46 -04:00 |
William Falcon
|
69274d304d
|
adding support for interrupt signals
|
2019-06-14 09:24:51 -04:00 |
William Falcon
|
12352f1949
|
fixed epoch continuation from checkpoint
|
2019-05-05 12:15:04 -04:00 |