lightning/tests/checkpointing/test_torch_saving.py

# Copyright The PyTorch Lightning team.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
import platform

import pytest
import torch

from pytorch_lightning import Trainer
from pytorch_lightning.core.optimizer import LightningOptimizer
from tests.base import BoringModel


def test_model_torch_save(tmpdir):
    """Test to ensure torch save does not fail for model and trainer."""
    model = BoringModel()
    num_epochs = 1
    trainer = Trainer(
        default_root_dir=tmpdir,
        max_epochs=num_epochs,
    )
    temp_path = os.path.join(tmpdir, 'temp.pt')
    trainer.fit(model)

    # Ensure these do not fail
    torch.save(trainer.model, temp_path)
    torch.save(trainer, temp_path)
    trainer = torch.load(temp_path)


@pytest.mark.skipif(platform.system() == "Windows", reason="Distributed training is not supported on Windows")
def test_model_torch_save_ddp_cpu(tmpdir):
    """Test to ensure torch save does not fail for model and trainer using cpu ddp."""
    model = BoringModel()
    num_epochs = 1
    trainer = Trainer(
        default_root_dir=tmpdir,
        max_epochs=num_epochs,
        accelerator="ddp_cpu",
        num_processes=2,
    )
    temp_path = os.path.join(tmpdir, 'temp.pt')
    trainer.fit(model)

    # Ensure these do not fail
    torch.save(trainer.model, temp_path)
    torch.save(trainer, temp_path)


@pytest.mark.skipif(torch.cuda.device_count() < 2, reason="test requires multi-GPU machine")
def test_model_torch_save_ddp_cuda(tmpdir):
    """Test to ensure torch save does not fail for model and trainer using gpu ddp."""
    model = BoringModel()
    num_epochs = 1
    trainer = Trainer(
        default_root_dir=tmpdir,
        max_epochs=num_epochs,
        accelerator="ddp_spawn",
        gpus=2
    )
    temp_path = os.path.join(tmpdir, 'temp.pt')
    trainer.fit(model)

    # Ensure these do not fail
    torch.save(trainer.model, temp_path)
    torch.save(trainer, temp_path)
Added getstate/setstate method for torch.save serialization (#4127) * Added getstate/setstate method for torch.save serialization, added additional Optional Typing to results object * Added tests to ensure torch.save does not fail * Added flags to ensure compatible ddp cpu environment * Removed torch version check due to minimum already being 1.3, reduced epochs for speed * Moved tests to separate file * Update to accelerator, move to ddp_spawn to prevent hanging ddp 2020-10-13 20:47:23 +00:00			`# Copyright The PyTorch Lightning team.`
			`#`
			`# Licensed under the Apache License, Version 2.0 (the "License");`
			`# you may not use this file except in compliance with the License.`
			`# You may obtain a copy of the License at`
			`#`
			`# http://www.apache.org/licenses/LICENSE-2.0`
			`#`
			`# Unless required by applicable law or agreed to in writing, software`
			`# distributed under the License is distributed on an "AS IS" BASIS,`
			`# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.`
			`# See the License for the specific language governing permissions and`
			`# limitations under the License.`
			`import os`
			`import platform`

			`import pytest`
			`import torch`

			`from pytorch_lightning import Trainer`
optimizer clean up (#4658) * add LightningOptimizer * typo * add mock closure * typo * remove logic in optimizer_step * update * update * update * desactivate LightningOptimizer for hovorod * resolve flake * typo * check optimizer name * change name * added backward to LightningOptimizer * remove use_lightning_optimizer * move update * simplify init * resolve comments * resolve bug * update * update * resolve bugs * resolve flake8 * set state * work manual_optimizer_step * add doc * add enable_pl_optimizer * make optimizer_step * add make_optimizer_step * add examples * resolve test * add test_optimizer_return_options_enable_pl_optimizer * add enable_pl_optimizer=True * update * update tests * resolve bugs * update * set Trainer to False * update * resolve bugs * update * remove from doc * resolve bug * typo * update * set to True * simplification * typo * resolve horovod * unwrap horovod * remove Optimizer * resolve horovod * move logic to amp_backend * doesn't seem to be pickable * update * add again * resolve some bugs * cleanup * resolve bug with AMP * change __repr__ * round at -12 * udpate * update * update * remove from horovod * typo * add convert_to_lightning_optimizers in each accelerators * typo * forgot * forgot a convert_to_lightning_optimizers * update * update * update * increase coverage * update * resolve flake8 * update * remove useless code * resolve comments + add support for LightningOptimizer base class * resolve flake * check optimizer get wrapped back * resolve DDPSharded * reduce code * lightningoptimizer * Update pytorch_lightning/core/optimizer.py Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * Update pytorch_lightning/core/lightning.py * remove reference to step function * Apply suggestions from code review * update on comments * resolve * Update CHANGELOG.md * add back training_step in apex and native_amp * rename optimizer_step Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: William Falcon <waf2107@columbia.edu> Co-authored-by: Sean Naren <sean.narenthiran@gmail.com> 2020-12-01 00:09:46 +00:00			`from pytorch_lightning.core.optimizer import LightningOptimizer`
[tests/checkpointing] refactor with BoringModel (#4661) * [tests/checkpointing] refactor with BoringModel * [tests/checkpointing] refactor with BoringModel * [tests/checkpointing] refactor with BoringModel * LessBoringModel -> LogInTwoMethods * LessBoringModel -> LogInTwoMethods * LessBoringModel -> TrainingStepCalled Co-authored-by: chaton <thomas@grid.ai> Co-authored-by: Ananya Harsh Jha <ananya@pytorchlightning.ai> 2020-11-24 00:23:12 +00:00			`from tests.base import BoringModel`
Added getstate/setstate method for torch.save serialization (#4127) * Added getstate/setstate method for torch.save serialization, added additional Optional Typing to results object * Added tests to ensure torch.save does not fail * Added flags to ensure compatible ddp cpu environment * Removed torch version check due to minimum already being 1.3, reduced epochs for speed * Moved tests to separate file * Update to accelerator, move to ddp_spawn to prevent hanging ddp 2020-10-13 20:47:23 +00:00

deprecate enable_pl_optimizer as it is not restored properly (#5244) * update * clean test * still in progress * udpdate test * update * update * resolve flake * add test for zero_grad * update * works without accumulated_grad * update * update * resolve amp * revert back to True * update * clean tests * cleaned out * typo * update test * git repare bug * remove print * udpate * Fix formatting/optimizer imports * Refactor the test for cleanliness * Add vanilla model to the test, better var names * Fixed var names, let's clean up these mock tests * repare test * update test * resolve flake8 * add manual_optimization * update tests * resolve flake8 * add random accumulate_grad_batches * improve test * Update tests/trainer/optimization/test_parity_automatic_optimization.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * Update tests/trainer/optimization/test_parity_automatic_optimization.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * update * clean tests * correct bug * Apply suggestions from code review * format * adress comments * update on comments * wip * typo * depreceate enable_pl_optimizer * resolve latest bugs * update * resolve merge * add comment * Update pytorch_lightning/core/lightning.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * Update tests/deprecated_api/test_remove_1-3.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * Update pytorch_lightning/trainer/connectors/optimizer_connector.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * Update pytorch_lightning/trainer/trainer.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * Update pytorch_lightning/trainer/trainer.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * Update tests/trainer/optimization/test_parity_automatic_optimization.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * update on comments * update restore * add a property * remove setstate as not needed anymore * update test * provide optimizer to on_before_zero_grad * update on comments * update on comments * Update pytorch_lightning/trainer/trainer.py Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * Update tests/trainer/optimization/test_parity_automatic_optimization.py Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * Update tests/trainer/optimization/test_parity_automatic_optimization.py Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * Update tests/trainer/optimization/test_parity_automatic_optimization.py Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * mofidy import * update changelog * resolve flake8 * update * update * clean doc Co-authored-by: SeanNaren <sean@grid.ai> Co-authored-by: Ubuntu <ubuntu@ip-172-31-62-109.ec2.internal> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Jirka Borovec <jirka.borovec@seznam.cz> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: Sean Naren <sean.narenthiran@gmail.com> (cherry picked from commit f2e99d617f05ec65fded81ccc6d0d59807c47573) 2021-01-08 21:13:12 +00:00			`def test_model_torch_save(tmpdir):`
Added getstate/setstate method for torch.save serialization (#4127) * Added getstate/setstate method for torch.save serialization, added additional Optional Typing to results object * Added tests to ensure torch.save does not fail * Added flags to ensure compatible ddp cpu environment * Removed torch version check due to minimum already being 1.3, reduced epochs for speed * Moved tests to separate file * Update to accelerator, move to ddp_spawn to prevent hanging ddp 2020-10-13 20:47:23 +00:00			`"""Test to ensure torch save does not fail for model and trainer."""`
[tests/checkpointing] refactor with BoringModel (#4661) * [tests/checkpointing] refactor with BoringModel * [tests/checkpointing] refactor with BoringModel * [tests/checkpointing] refactor with BoringModel * LessBoringModel -> LogInTwoMethods * LessBoringModel -> LogInTwoMethods * LessBoringModel -> TrainingStepCalled Co-authored-by: chaton <thomas@grid.ai> Co-authored-by: Ananya Harsh Jha <ananya@pytorchlightning.ai> 2020-11-24 00:23:12 +00:00			`model = BoringModel()`
Added getstate/setstate method for torch.save serialization (#4127) * Added getstate/setstate method for torch.save serialization, added additional Optional Typing to results object * Added tests to ensure torch.save does not fail * Added flags to ensure compatible ddp cpu environment * Removed torch version check due to minimum already being 1.3, reduced epochs for speed * Moved tests to separate file * Update to accelerator, move to ddp_spawn to prevent hanging ddp 2020-10-13 20:47:23 +00:00			`num_epochs = 1`
			`trainer = Trainer(`
			`default_root_dir=tmpdir,`
			`max_epochs=num_epochs,`
			`)`
			`temp_path = os.path.join(tmpdir, 'temp.pt')`
			`trainer.fit(model)`

			`# Ensure these do not fail`
			`torch.save(trainer.model, temp_path)`
			`torch.save(trainer, temp_path)`
optimizer clean up (#4658) * add LightningOptimizer * typo * add mock closure * typo * remove logic in optimizer_step * update * update * update * desactivate LightningOptimizer for hovorod * resolve flake * typo * check optimizer name * change name * added backward to LightningOptimizer * remove use_lightning_optimizer * move update * simplify init * resolve comments * resolve bug * update * update * resolve bugs * resolve flake8 * set state * work manual_optimizer_step * add doc * add enable_pl_optimizer * make optimizer_step * add make_optimizer_step * add examples * resolve test * add test_optimizer_return_options_enable_pl_optimizer * add enable_pl_optimizer=True * update * update tests * resolve bugs * update * set Trainer to False * update * resolve bugs * update * remove from doc * resolve bug * typo * update * set to True * simplification * typo * resolve horovod * unwrap horovod * remove Optimizer * resolve horovod * move logic to amp_backend * doesn't seem to be pickable * update * add again * resolve some bugs * cleanup * resolve bug with AMP * change __repr__ * round at -12 * udpate * update * update * remove from horovod * typo * add convert_to_lightning_optimizers in each accelerators * typo * forgot * forgot a convert_to_lightning_optimizers * update * update * update * increase coverage * update * resolve flake8 * update * remove useless code * resolve comments + add support for LightningOptimizer base class * resolve flake * check optimizer get wrapped back * resolve DDPSharded * reduce code * lightningoptimizer * Update pytorch_lightning/core/optimizer.py Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * Update pytorch_lightning/core/lightning.py * remove reference to step function * Apply suggestions from code review * update on comments * resolve * Update CHANGELOG.md * add back training_step in apex and native_amp * rename optimizer_step Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: William Falcon <waf2107@columbia.edu> Co-authored-by: Sean Naren <sean.narenthiran@gmail.com> 2020-12-01 00:09:46 +00:00			`trainer = torch.load(temp_path)`
Added getstate/setstate method for torch.save serialization (#4127) * Added getstate/setstate method for torch.save serialization, added additional Optional Typing to results object * Added tests to ensure torch.save does not fail * Added flags to ensure compatible ddp cpu environment * Removed torch version check due to minimum already being 1.3, reduced epochs for speed * Moved tests to separate file * Update to accelerator, move to ddp_spawn to prevent hanging ddp 2020-10-13 20:47:23 +00:00

Refactor: clean trainer device & distrib setters (#5297) * naive replace * simplify * clean * . * fix * . * fix * fix 2021-01-04 17:10:13 +00:00			`@pytest.mark.skipif(platform.system() == "Windows", reason="Distributed training is not supported on Windows")`
Added getstate/setstate method for torch.save serialization (#4127) * Added getstate/setstate method for torch.save serialization, added additional Optional Typing to results object * Added tests to ensure torch.save does not fail * Added flags to ensure compatible ddp cpu environment * Removed torch version check due to minimum already being 1.3, reduced epochs for speed * Moved tests to separate file * Update to accelerator, move to ddp_spawn to prevent hanging ddp 2020-10-13 20:47:23 +00:00			`def test_model_torch_save_ddp_cpu(tmpdir):`
			`"""Test to ensure torch save does not fail for model and trainer using cpu ddp."""`
[tests/checkpointing] refactor with BoringModel (#4661) * [tests/checkpointing] refactor with BoringModel * [tests/checkpointing] refactor with BoringModel * [tests/checkpointing] refactor with BoringModel * LessBoringModel -> LogInTwoMethods * LessBoringModel -> LogInTwoMethods * LessBoringModel -> TrainingStepCalled Co-authored-by: chaton <thomas@grid.ai> Co-authored-by: Ananya Harsh Jha <ananya@pytorchlightning.ai> 2020-11-24 00:23:12 +00:00			`model = BoringModel()`
Added getstate/setstate method for torch.save serialization (#4127) * Added getstate/setstate method for torch.save serialization, added additional Optional Typing to results object * Added tests to ensure torch.save does not fail * Added flags to ensure compatible ddp cpu environment * Removed torch version check due to minimum already being 1.3, reduced epochs for speed * Moved tests to separate file * Update to accelerator, move to ddp_spawn to prevent hanging ddp 2020-10-13 20:47:23 +00:00			`num_epochs = 1`
			`trainer = Trainer(`
			`default_root_dir=tmpdir,`
			`max_epochs=num_epochs,`
			`accelerator="ddp_cpu",`
			`num_processes=2,`
			`)`
			`temp_path = os.path.join(tmpdir, 'temp.pt')`
			`trainer.fit(model)`

			`# Ensure these do not fail`
			`torch.save(trainer.model, temp_path)`
			`torch.save(trainer, temp_path)`


			`@pytest.mark.skipif(torch.cuda.device_count() < 2, reason="test requires multi-GPU machine")`
			`def test_model_torch_save_ddp_cuda(tmpdir):`
			`"""Test to ensure torch save does not fail for model and trainer using gpu ddp."""`
[tests/checkpointing] refactor with BoringModel (#4661) * [tests/checkpointing] refactor with BoringModel * [tests/checkpointing] refactor with BoringModel * [tests/checkpointing] refactor with BoringModel * LessBoringModel -> LogInTwoMethods * LessBoringModel -> LogInTwoMethods * LessBoringModel -> TrainingStepCalled Co-authored-by: chaton <thomas@grid.ai> Co-authored-by: Ananya Harsh Jha <ananya@pytorchlightning.ai> 2020-11-24 00:23:12 +00:00			`model = BoringModel()`
Added getstate/setstate method for torch.save serialization (#4127) * Added getstate/setstate method for torch.save serialization, added additional Optional Typing to results object * Added tests to ensure torch.save does not fail * Added flags to ensure compatible ddp cpu environment * Removed torch version check due to minimum already being 1.3, reduced epochs for speed * Moved tests to separate file * Update to accelerator, move to ddp_spawn to prevent hanging ddp 2020-10-13 20:47:23 +00:00			`num_epochs = 1`
			`trainer = Trainer(`
			`default_root_dir=tmpdir,`
			`max_epochs=num_epochs,`
			`accelerator="ddp_spawn",`
			`gpus=2`
			`)`
			`temp_path = os.path.join(tmpdir, 'temp.pt')`
			`trainer.fit(model)`

			`# Ensure these do not fail`
			`torch.save(trainer.model, temp_path)`
			`torch.save(trainer, temp_path)`