Commit Graph

53 Commits

Author SHA1 Message Date
Jirka Borovec 17f58d2e11
add rank warning (#1428)
* add rank warning

* changelog

* use rank_zero_warn

* user trainer_init

* replace warnings

* fix test

* flake8

* docs

* changelog

* bug lol
2020-04-09 14:05:46 -04:00
Jirka Borovec ff1f8ef400 Test deprecated API for 0.8.0 and 0.9.0 (#1071)
* till 0.8

* refactor

* fix tests

* fix tests

* deprx till 0.9

* Update trainer.py

* Apply suggestions from code review

Co-authored-by: William Falcon <waf2107@columbia.edu>
2020-03-06 21:36:44 +01:00
Jirka Borovec 9785a3e78e Refactor: name modules (#548)
* refactor: rename some modules

* add deprecation warnings

* fix paths
2019-11-26 22:39:18 -05:00
William Falcon 7099f8dbfb
split trainer mixins (#209)
* split trainer mixins

* Update multi_node_cluster_template.py

* Update single_cpu_template.py

* Update single_gpu_node_16bit_template.py

* Update single_gpu_node_ddp_template.py

* Update single_gpu_node_dp_template.py

* Update trainer_cpu_template.py

* Update trainer_io.py

* split trainer mixins

* Update multi_node_cluster_template.py

* deconflicted

* deconflicted

* deconflicted
2019-09-06 14:11:07 -04:00
William Falcon 60633eaa32
Moves hpc auto-resubmit to trainer from test-tube (#207)
* added slurm signal handler

* added restore weight functions

* set slurm signal handling inside process

* added resubmit docs

* added resubmit docs

* fixed missing param

* Update trainer.py

* fixed missing param

* fixed missing param

* debugging tests

* debugging tests

* debugging tests

* debugging tests

* debugging tests

* debugging tests

* debugging tests
2019-09-06 11:54:51 -04:00
William Falcon 10e4b18452 made imports absolute 2019-08-07 10:14:59 -04:00
William Falcon 35f23bbc82
Merge pull request #55 from williamFalcon/continue
add training restore
2019-08-07 09:02:16 -04:00
William Falcon 5c398d7a4e removed bad hook call 2019-08-07 07:39:41 -04:00
William Falcon a931ded310 removed bad hook call 2019-08-07 07:35:02 -04:00
William Falcon 95ec072d1e removed bad hook call 2019-08-07 07:30:02 -04:00
William Falcon d3f19c8321 added auto restore 2019-08-07 06:55:05 -04:00
Jiri BOROVEC d9bfe964f9 update by flake8 2019-08-06 22:45:46 +02:00
Jiri BOROVEC 469941a528 pkg relative imports
* split requirements.txt
* pytest verbose
2019-08-05 10:52:09 +02:00
William Falcon 019b4d16d0 formatting 2019-08-04 13:08:14 -05:00
William Falcon ef6d5a412c proc 0 only for save hpc. all procs for hpc load 2019-08-01 16:19:04 -04:00
williamFalcon 27660b8a96 running tests 2019-07-28 05:57:37 -07:00
William Falcon f5a01edfb8 added clean slurm save load test 2019-07-26 22:32:34 -04:00
William Falcon f1de62671d added clean slurm save load test 2019-07-26 22:32:27 -04:00
William Falcon 57edb08bd8 added clean slurm save load test 2019-07-26 22:28:09 -04:00
William Falcon ffa7a0dbab added clean slurm save load test 2019-07-26 22:26:55 -04:00
William Falcon 348223a702 fixed hpc save, load. cleaned apu 2019-07-26 22:09:35 -04:00
William Falcon 265411572f fixed hpc save, load. cleaned apu 2019-07-26 22:04:27 -04:00
William Falcon 4148c36abd added model save load test 2019-07-26 21:55:01 -04:00
William Falcon aacf1947ea auto state-dict and remove the way the model is loaded during hpc 2019-07-26 21:38:06 -04:00
William Falcon e2c7fa44b7 auto state-dict and remove the way the model is loaded during hpc 2019-07-26 21:37:06 -04:00
William Falcon 1a835969a6 added saving tests to cpu 2019-07-26 12:14:58 -04:00
William Falcon 7e728d97e7 removed save model logging 2019-07-25 14:36:22 -04:00
William Falcon 10c3266ed4 pt dpp some ignores 2019-07-24 19:30:27 -04:00
William Falcon d7be0aae1c fixed correct module on hpc save 2019-07-24 18:16:02 -04:00
William Falcon a63f74281a fixed correct module on hpc save 2019-07-24 18:03:19 -04:00
William Falcon 423bc5c6c9 testing hpc save load 2019-07-24 18:01:33 -04:00
William Falcon 2408aa886d testing hpc save load 2019-07-24 18:00:15 -04:00
William Falcon b7ca857434 removed dead code in model save 2019-07-24 15:44:04 -04:00
William Falcon b836e6f321 removed dead code in model save 2019-07-24 15:43:10 -04:00
William Falcon 98c112598e added safeguards for callbacks in loading saving 2019-07-24 11:31:13 -04:00
William Falcon 8a3abec83a added safeguards for callbacks in loading saving 2019-07-24 11:30:14 -04:00
William Falcon 8fd7a6001b added safeguards for callbacks in loading saving 2019-07-24 11:14:19 -04:00
William Falcon 6d1d5ef68e set dp as default backend 2019-07-18 11:45:55 -04:00
William Falcon 75e32daad4 clean up dead code 2019-07-03 17:09:39 -04:00
William Falcon 30e2fc6c4b added on_hpc_load and on_hpc_save hooks 2019-07-02 09:36:48 -04:00
William Falcon 62e091f48d added on_hpc_load and on_hpc_save hooks 2019-07-02 09:36:33 -04:00
William Falcon f257c080c0 added on_hpc_load and on_hpc_save hooks 2019-07-02 09:35:15 -04:00
William Falcon f2134a4ddd integrated tensorboardx test-tube 2019-06-29 15:58:47 -04:00
William Falcon bf0f5a5cbb removed self.model refs 2019-06-26 18:12:33 -04:00
William Falcon 9bf3fcd45e adding support for interrupt signals 2019-06-14 09:59:28 -04:00
William Falcon 88ff860c90 adding support for interrupt signals 2019-06-14 09:46:41 -04:00
William Falcon edf03063a1 adding support for interrupt signals 2019-06-14 09:44:19 -04:00
William Falcon 8cca02d652 adding support for interrupt signals 2019-06-14 09:25:46 -04:00
William Falcon 69274d304d adding support for interrupt signals 2019-06-14 09:24:51 -04:00
William Falcon 12352f1949 fixed epoch continuation from checkpoint 2019-05-05 12:15:04 -04:00