peru/tests/test_parallelism.py

from textwrap import dedent

from peru import plugin

import shared


def assert_parallel(n):
    # The plugin module keep a global counter of all the jobs that run in
    # parallel, so that we can write these tests.
    if plugin.DEBUG_PARALLEL_MAX != n:
        raise AssertionError('Expected {} parallel {}. Counted {}.'.format(
            n, 'job' if n == 1 else 'jobs', plugin.DEBUG_PARALLEL_MAX))


class ParallelismTest(shared.PeruTest):

    def setUp(self):
        # Make sure nothing is fishy with the jobs counter, and reset the max.
        plugin.debug_assert_clean_parallel_count()
        plugin.DEBUG_PARALLEL_MAX = 0

    def tearDown(self):
        # Make sure nothing is fishy with the jobs counter. No sense in
        # resetting the max here, because the rest of our tests don't know to
        # reset it anyway.
        plugin.debug_assert_clean_parallel_count()

    def test_two_jobs_in_parallel(self):
        # This just checks that two different modules can actually be fetched
        # in parallel.
        foo = shared.create_dir()
        bar = shared.create_dir()
        peru_yaml = dedent('''\
            imports:
                foo: ./
                bar: ./

            cp module foo:
                path: {}

            cp module bar:
                path: {}
            '''.format(foo, bar))
        test_dir = shared.create_dir({'peru.yaml': peru_yaml})
        shared.run_peru_command(['sync'], test_dir)
        assert_parallel(2)

    def test_jobs_flag(self):
        # This checks that the --jobs flag is respected, even when two modules
        # could have been fetched in parallel.
        foo = shared.create_dir()
        bar = shared.create_dir()
        peru_yaml = dedent('''\
            imports:
                foo: ./
                bar: ./

            cp module foo:
                path: {}

            cp module bar:
                path: {}
            '''.format(foo, bar))
        test_dir = shared.create_dir({'peru.yaml': peru_yaml})
        shared.run_peru_command(['sync', '-j1'], test_dir)
        assert_parallel(1)

    def test_identical_fields(self):
        # This checks that modules with identical fields are not fetched in
        # parallel. This is the same logic that protects us from fetching a
        # given module twice, like when it's imported with two different named
        # rules.
        foo = shared.create_dir()
        peru_yaml = dedent('''\
            imports:
                foo1: ./
                foo2: ./

            cp module foo1:
                path: {}

            cp module foo2:
                path: {}
            '''.format(foo, foo))
        test_dir = shared.create_dir({'peru.yaml': peru_yaml})
        shared.run_peru_command(['sync'], test_dir)
        assert_parallel(1)

    def test_identical_plugin_cache_fields(self):
        # Plugins that use caching also need to avoid running in parallel, if
        # their cache directories are the same. The noop_cache plugin (created
        # for this test) uses the path field (but not the nonce field) in its
        # plugin cache key. Check that these two modules are not fetched in
        # parallel, even though their module fields aren't exactly the same.
        foo = shared.create_dir()
        peru_yaml = dedent('''\
            imports:
                foo1: ./
                foo2: ./

            noop_cache module foo1:
                path: {}
                # nonce is ignored, but it makes foo1 different from foo2 as
                # far as the module cache is concerned
                nonce: '1'

            noop_cache module foo2:
                path: {}
                nonce: '2'
            '''.format(foo, foo))
        test_dir = shared.create_dir({'peru.yaml': peru_yaml})
        shared.run_peru_command(['sync'], test_dir)
        assert_parallel(1)
test_parallelism.py Summary: Finally write some tests that check how much parallelism is actually going on. I initially wrote D74 to help, but that was way more complicated than it needed to be. Instead, just count to a global variable in `plugins.py` and write tests against that variable. Several cases get exercised here: - Parallelism works at all. - `--jobs 1` turns off parallelism. - Identical modules don't get fetched in parallel. - Modules with the same plugin cache dir don't either. For that last case, create the `noop_cache` plugin, which doesn't actually do anything besides defining `cache fields` in its `plugin.yaml`. Test Plan: For each of our locks (the module cache lock, the plugin cache lock, and the parallelism semaphore), delete it, and confirm that a different test fails each time. Also, set `DEFAULT_PARALLEL_FETCH_LIMIT` to 1 and confirm that the first test fails. Reviewers: sean Differential Revision: https://phabricator.buildinspace.com/D79 2014-08-31 09:43:15 +00:00			`from textwrap import dedent`

			`from peru import plugin`

			`import shared`


			`def assert_parallel(n):`
			`# The plugin module keep a global counter of all the jobs that run in`
			`# parallel, so that we can write these tests.`
			`if plugin.DEBUG_PARALLEL_MAX != n:`
			`raise AssertionError('Expected {} parallel {}. Counted {}.'.format(`
			`n, 'job' if n == 1 else 'jobs', plugin.DEBUG_PARALLEL_MAX))`


rewrite the cache to be asynchronous We've been seeing "Task X took Y seconds" warnings in our tests for a long time, especially on Windows. Running git commands synchronously blocks other tasks from running, like display redrawing. It's bad practice in an async program. One of the barriers to async-ifying the cache code earlier was that many commands relied on having exclusive ownership of the index file while they were running. For example, 1) read a tree into the index, 2) merge another tree into some subdirectory, 3) write out the result. If any other git commands ran in the middle of that, it would screw up the result. So we need to rewrite every cache function to use its own temporary index file, if we want them to run in parallel. The reason I'm finally getting around to this now, is that I'm trying to reduce the number of git commands that run in a no-op sync. One of the optimizations I'm going to want to do, is to reuse the index file from the last sync, so that we don't need a `read-tree` and an `update-index` just to set us up for `diff-files`. But the plumbing to do that right is pretty much the same as what we should be doing to run every git command with its own index anyway. So let's just bite the bullet and do that now, and then reusing index files will be easy after that. 2015-11-21 20:34:29 +00:00			`class ParallelismTest(shared.PeruTest):`
test_parallelism.py Summary: Finally write some tests that check how much parallelism is actually going on. I initially wrote D74 to help, but that was way more complicated than it needed to be. Instead, just count to a global variable in `plugins.py` and write tests against that variable. Several cases get exercised here: - Parallelism works at all. - `--jobs 1` turns off parallelism. - Identical modules don't get fetched in parallel. - Modules with the same plugin cache dir don't either. For that last case, create the `noop_cache` plugin, which doesn't actually do anything besides defining `cache fields` in its `plugin.yaml`. Test Plan: For each of our locks (the module cache lock, the plugin cache lock, and the parallelism semaphore), delete it, and confirm that a different test fails each time. Also, set `DEFAULT_PARALLEL_FETCH_LIMIT` to 1 and confirm that the first test fails. Reviewers: sean Differential Revision: https://phabricator.buildinspace.com/D79 2014-08-31 09:43:15 +00:00
			`def setUp(self):`
			`# Make sure nothing is fishy with the jobs counter, and reset the max.`
make sure DEBUG_PARALLEL_COUNT is decremented on a plugin error Summary: This global is used in tests, to keep track of how many jobs run at once. Jobs are supposed to increment it when they start and decrement it when they finish. But if the plugin returned an error code, we were throwing and failing to decrement that code. This didn't repro for me locally, because the only tests that checked the counter were happening early in my test run, but it caused Travis failures because of a quirk in test run order. This diff fixes the issue by putting the decrement in a finally clause, and it extends the plugin tests so that all of them sanity check this counter. Reviewers: sean Reviewed By: sean Differential Revision: https://phabricator.buildinspace.com/D130 2014-11-24 01:12:47 +00:00			`plugin.debug_assert_clean_parallel_count()`
test_parallelism.py Summary: Finally write some tests that check how much parallelism is actually going on. I initially wrote D74 to help, but that was way more complicated than it needed to be. Instead, just count to a global variable in `plugins.py` and write tests against that variable. Several cases get exercised here: - Parallelism works at all. - `--jobs 1` turns off parallelism. - Identical modules don't get fetched in parallel. - Modules with the same plugin cache dir don't either. For that last case, create the `noop_cache` plugin, which doesn't actually do anything besides defining `cache fields` in its `plugin.yaml`. Test Plan: For each of our locks (the module cache lock, the plugin cache lock, and the parallelism semaphore), delete it, and confirm that a different test fails each time. Also, set `DEFAULT_PARALLEL_FETCH_LIMIT` to 1 and confirm that the first test fails. Reviewers: sean Differential Revision: https://phabricator.buildinspace.com/D79 2014-08-31 09:43:15 +00:00			`plugin.DEBUG_PARALLEL_MAX = 0`

			`def tearDown(self):`
			`# Make sure nothing is fishy with the jobs counter. No sense in`
			`# resetting the max here, because the rest of our tests don't know to`
			`# reset it anyway.`
make sure DEBUG_PARALLEL_COUNT is decremented on a plugin error Summary: This global is used in tests, to keep track of how many jobs run at once. Jobs are supposed to increment it when they start and decrement it when they finish. But if the plugin returned an error code, we were throwing and failing to decrement that code. This didn't repro for me locally, because the only tests that checked the counter were happening early in my test run, but it caused Travis failures because of a quirk in test run order. This diff fixes the issue by putting the decrement in a finally clause, and it extends the plugin tests so that all of them sanity check this counter. Reviewers: sean Reviewed By: sean Differential Revision: https://phabricator.buildinspace.com/D130 2014-11-24 01:12:47 +00:00			`plugin.debug_assert_clean_parallel_count()`
test_parallelism.py Summary: Finally write some tests that check how much parallelism is actually going on. I initially wrote D74 to help, but that was way more complicated than it needed to be. Instead, just count to a global variable in `plugins.py` and write tests against that variable. Several cases get exercised here: - Parallelism works at all. - `--jobs 1` turns off parallelism. - Identical modules don't get fetched in parallel. - Modules with the same plugin cache dir don't either. For that last case, create the `noop_cache` plugin, which doesn't actually do anything besides defining `cache fields` in its `plugin.yaml`. Test Plan: For each of our locks (the module cache lock, the plugin cache lock, and the parallelism semaphore), delete it, and confirm that a different test fails each time. Also, set `DEFAULT_PARALLEL_FETCH_LIMIT` to 1 and confirm that the first test fails. Reviewers: sean Differential Revision: https://phabricator.buildinspace.com/D79 2014-08-31 09:43:15 +00:00
			`def test_two_jobs_in_parallel(self):`
			`# This just checks that two different modules can actually be fetched`
			`# in parallel.`
			`foo = shared.create_dir()`
			`bar = shared.create_dir()`
			`peru_yaml = dedent('''\`
			`imports:`
			`foo: ./`
			`bar: ./`

			`cp module foo:`
			`path: {}`

			`cp module bar:`
			`path: {}`
			`'''.format(foo, bar))`
			`test_dir = shared.create_dir({'peru.yaml': peru_yaml})`
			`shared.run_peru_command(['sync'], test_dir)`
			`assert_parallel(2)`

			`def test_jobs_flag(self):`
			`# This checks that the --jobs flag is respected, even when two modules`
			`# could have been fetched in parallel.`
			`foo = shared.create_dir()`
			`bar = shared.create_dir()`
			`peru_yaml = dedent('''\`
			`imports:`
			`foo: ./`
			`bar: ./`

			`cp module foo:`
			`path: {}`

			`cp module bar:`
			`path: {}`
			`'''.format(foo, bar))`
			`test_dir = shared.create_dir({'peru.yaml': peru_yaml})`
			`shared.run_peru_command(['sync', '-j1'], test_dir)`
			`assert_parallel(1)`

			`def test_identical_fields(self):`
			`# This checks that modules with identical fields are not fetched in`
			`# parallel. This is the same logic that protects us from fetching a`
comment update, now that module imports are gone Reviewers: sean Reviewed By: sean Differential Revision: https://phabricator.buildinspace.com/D107 2014-10-02 18:48:49 +00:00			`# given module twice, like when it's imported with two different named`
			`# rules.`
test_parallelism.py Summary: Finally write some tests that check how much parallelism is actually going on. I initially wrote D74 to help, but that was way more complicated than it needed to be. Instead, just count to a global variable in `plugins.py` and write tests against that variable. Several cases get exercised here: - Parallelism works at all. - `--jobs 1` turns off parallelism. - Identical modules don't get fetched in parallel. - Modules with the same plugin cache dir don't either. For that last case, create the `noop_cache` plugin, which doesn't actually do anything besides defining `cache fields` in its `plugin.yaml`. Test Plan: For each of our locks (the module cache lock, the plugin cache lock, and the parallelism semaphore), delete it, and confirm that a different test fails each time. Also, set `DEFAULT_PARALLEL_FETCH_LIMIT` to 1 and confirm that the first test fails. Reviewers: sean Differential Revision: https://phabricator.buildinspace.com/D79 2014-08-31 09:43:15 +00:00			`foo = shared.create_dir()`
			`peru_yaml = dedent('''\`
			`imports:`
			`foo1: ./`
			`foo2: ./`

			`cp module foo1:`
			`path: {}`

			`cp module foo2:`
			`path: {}`
			`'''.format(foo, foo))`
			`test_dir = shared.create_dir({'peru.yaml': peru_yaml})`
			`shared.run_peru_command(['sync'], test_dir)`
			`assert_parallel(1)`

delete compat.indent() Summary: That was a python3.2 compatibility function. Now we're only >=3.3. Test Plan: Ran `test.sh` inside a 3.3 virtualenv. Reviewers: sean Differential Revision: https://phabricator.buildinspace.com/D83 2014-09-05 18:54:53 +00:00			`def test_identical_plugin_cache_fields(self):`
test_parallelism.py Summary: Finally write some tests that check how much parallelism is actually going on. I initially wrote D74 to help, but that was way more complicated than it needed to be. Instead, just count to a global variable in `plugins.py` and write tests against that variable. Several cases get exercised here: - Parallelism works at all. - `--jobs 1` turns off parallelism. - Identical modules don't get fetched in parallel. - Modules with the same plugin cache dir don't either. For that last case, create the `noop_cache` plugin, which doesn't actually do anything besides defining `cache fields` in its `plugin.yaml`. Test Plan: For each of our locks (the module cache lock, the plugin cache lock, and the parallelism semaphore), delete it, and confirm that a different test fails each time. Also, set `DEFAULT_PARALLEL_FETCH_LIMIT` to 1 and confirm that the first test fails. Reviewers: sean Differential Revision: https://phabricator.buildinspace.com/D79 2014-08-31 09:43:15 +00:00			`# Plugins that use caching also need to avoid running in parallel, if`
			`# their cache directories are the same. The noop_cache plugin (created`
			`# for this test) uses the path field (but not the nonce field) in its`
			`# plugin cache key. Check that these two modules are not fetched in`
			`# parallel, even though their module fields aren't exactly the same.`
			`foo = shared.create_dir()`
			`peru_yaml = dedent('''\`
			`imports:`
			`foo1: ./`
			`foo2: ./`

			`noop_cache module foo1:`
			`path: {}`
			`# nonce is ignored, but it makes foo1 different from foo2 as`
			`# far as the module cache is concerned`
			`nonce: '1'`

			`noop_cache module foo2:`
			`path: {}`
			`nonce: '2'`
			`'''.format(foo, foo))`
			`test_dir = shared.create_dir({'peru.yaml': peru_yaml})`
			`shared.run_peru_command(['sync'], test_dir)`
			`assert_parallel(1)`