spaCy/spacy/tests/parser/test_parse_navigate.py

# coding: utf-8
from __future__ import unicode_literals

import pytest

from ..util import get_doc


@pytest.fixture
def text():
    return """
It was a bright cold day in April, and the clocks were striking thirteen.
Winston Smith, his chin nuzzled into his breast in an effort to escape the
vile wind, slipped quickly through the glass doors of Victory Mansions,
though not quickly enough to prevent a swirl of gritty dust from entering
along with him.

The hallway smelt of boiled cabbage and old rag mats. At one end of it a
coloured poster, too large for indoor display, had been tacked to the wall.
It depicted simply an enormous face, more than a metre wide: the face of a
man of about forty-five, with a heavy black moustache and ruggedly handsome
features. Winston made for the stairs. It was no use trying the lift. Even at
the best of times it was seldom working, and at present the electric current
was cut off during daylight hours. It was part of the economy drive in
preparation for Hate Week. The flat was seven flights up, and Winston, who
was thirty-nine and had a varicose ulcer above his right ankle, went slowly,
resting several times on the way. On each landing, opposite the lift-shaft,
the poster with the enormous face gazed from the wall. It was one of those
pictures which are so contrived that the eyes follow you about when you move.
BIG BROTHER IS WATCHING YOU, the caption beneath it ran.
"""


@pytest.fixture
def heads():
    return [1, 1, 0, 3, 2, 1, -4, -1, -1, -7, -8, 1, -10, 2, 1, -3, -1, -15,
            -1, 1, 4, -1, 1, -3, 0, -1, 1, -2, -4, 1, -2, 1, -2, 3, -1, 1,
            -4, -13, -14, -1, -2, 2, 1, -3, -1, 1, -2, -9, -1, 3, 1, 1, -14,
            1, -2, 1, -2, -1, 1, -2, -6, -1, -1, -2, -1, -1, -42, -1, 2, 1,
            0, -1, 1, -2, -1, 2, 1, -4, -8, 0, 1, -2, -1, -1, 3, -1, 1, -6,
            9, 1, 7, -1, 1, -2, 3, 2, 1, -10, -1, 1, -2, -22, -1, 1, 0, -1,
            2, 1, -4, -1, -2, -1, 1, -2, -6, -7, 1, -9, -1, 2, -1, -3, -1,
            3, 2, 1, -4, -19, -24, 3, 2, 1, -4, -1, 1, 2, -1, -5, -34, 1, 0,
            -1, 1, -2, -4, 1, 0, 1, -2, -1, 1, -2, -6, 1, 9, -1, 1, -3, -1,
            -1, 3, 2, 1, 0, -1, -2, 7, -1, 5, 1, 3, -1, 1, -10, -1, -2, 1,
            -2, -15, 1, 0, -1, -1, 2, 1, -3, -1, -1, -2, -1, 1, -2, -12, 1,
            1, 0, 1, -2, -1, -2, -3, 9, -1, 2, -1, -4, 2, 1, -3, -4, -15, 2,
            1, -3, -1, 2, 1, -3, -8, -9, -1, -2, -1, -4, 1, -2, -3, 1, -2,
            -19, 17, 1, -2, 14, 13, 3, 2, 1, -4, 8, -1, 1, 5, -1, 2, 1, -3,
            0, -1, 1, -2, -4, 1, 0, -1, -1, 2, -1, -3, 1, -2, 1, -2, 3, 1,
            1, -4, -1, -2, 2, 1, -5, -19, -1, 1, 1, 0, 1, 6, -1, 1, -3, -1,
            -1, -8, -9, -1]


def test_parser_parse_navigate_consistency(en_tokenizer, text, heads):
    tokens = en_tokenizer(text)
    doc = get_doc(tokens.vocab, words=[t.text for t in tokens], heads=heads)
    for head in doc:
        for child in head.lefts:
            assert child.head == head
        for child in head.rights:
            assert child.head == head


def test_parser_parse_navigate_child_consistency(en_tokenizer, text, heads):
    tokens = en_tokenizer(text)
    doc = get_doc(tokens.vocab, words=[t.text for t in tokens], heads=heads)

    lefts = {}
    rights = {}
    for head in doc:
        assert head.i not in lefts
        lefts[head.i] = set()
        for left in head.lefts:
            lefts[head.i].add(left.i)
        assert head.i not in rights
        rights[head.i] = set()
        for right in head.rights:
            rights[head.i].add(right.i)
    for head in doc:
        assert head.n_rights == len(rights[head.i])
        assert head.n_lefts == len(lefts[head.i])
    for child in doc:
        if child.i < child.head.i:
            assert child.i in lefts[child.head.i]
            assert child.i not in rights[child.head.i]
            lefts[child.head.i].remove(child.i)
        elif child.i > child.head.i:
            assert child.i in rights[child.head.i]
            assert child.i not in lefts[child.head.i]
            rights[child.head.i].remove(child.i)
    for head_index, children in lefts.items():
        assert not children
    for head_index, children in rights.items():
        assert not children


def test_parser_parse_navigate_edges(en_tokenizer, text, heads):
    tokens = en_tokenizer(text)
    doc = get_doc(tokens.vocab, words=[t.text for t in tokens], heads=heads)
    for token in doc:
        subtree = list(token.subtree)
        debug = '\t'.join((token.text, token.left_edge.text, subtree[0].text))
        assert token.left_edge == subtree[0], debug
        debug = '\t'.join((token.text, token.right_edge.text, subtree[-1].text, token.right_edge.head.text))
        assert token.right_edge == subtree[-1], debug
Modernise Doc parse tree navigation tests and don't depend on models 2017-01-11 20:14:15 +00:00			`# coding: utf-8`
* Add test for parse tree navigation 2015-01-30 07:02:58 +00:00			`from __future__ import unicode_literals`
Modernise Doc parse tree navigation tests and don't depend on models 2017-01-11 20:14:15 +00:00
* Add test for parse tree navigation 2015-01-30 07:02:58 +00:00			`import pytest`

💫 Refactor test suite (#2568) ## Description Related issues: #2379 (should be fixed by separating model tests) * total execution time down from > 300 seconds to under 60 seconds 🎉 * removed all model-specific tests that could only really be run manually anyway – those will now live in a separate test suite in the [`spacy-models`](https://github.com/explosion/spacy-models) repository and are already integrated into our new model training infrastructure * changed all relative imports to absolute imports to prepare for moving the test suite from `/spacy/tests` to `/tests` (it'll now always test against the installed version) * merged old regression tests into collections, e.g. `test_issue1001-1500.py` (about 90% of the regression tests are very short anyways) * tidied up and rewrote existing tests wherever possible ### Todo - [ ] move tests to `/tests` and adjust CI commands accordingly - [x] move model test suite from internal repo to `spacy-models` - [x] ~~investigate why `pipeline/test_textcat.py` is flakey~~ - [x] review old regression tests (leftover files) and see if they can be merged, simplified or deleted - [ ] update documentation on how to run tests ### Types of change enhancement, tests ## Checklist <!--- Before you submit the PR, go over this checklist and make sure you can tick off all the boxes. [] -> [x] --> - [x] I have submitted the spaCy Contributor Agreement. - [x] I ran the tests, and all new and existing tests passed. - [ ] My changes don't require a change to the documentation, or if they do, I've added all required information. 2018-07-24 21:38:44 +00:00			`from ..util import get_doc`

* Add test for parse tree navigation 2015-01-30 07:02:58 +00:00
			`@pytest.fixture`
Modernise Doc parse tree navigation tests and don't depend on models 2017-01-11 20:14:15 +00:00			`def text():`
💫 Refactor test suite (#2568) ## Description Related issues: #2379 (should be fixed by separating model tests) * total execution time down from > 300 seconds to under 60 seconds 🎉 * removed all model-specific tests that could only really be run manually anyway – those will now live in a separate test suite in the [`spacy-models`](https://github.com/explosion/spacy-models) repository and are already integrated into our new model training infrastructure * changed all relative imports to absolute imports to prepare for moving the test suite from `/spacy/tests` to `/tests` (it'll now always test against the installed version) * merged old regression tests into collections, e.g. `test_issue1001-1500.py` (about 90% of the regression tests are very short anyways) * tidied up and rewrote existing tests wherever possible ### Todo - [ ] move tests to `/tests` and adjust CI commands accordingly - [x] move model test suite from internal repo to `spacy-models` - [x] ~~investigate why `pipeline/test_textcat.py` is flakey~~ - [x] review old regression tests (leftover files) and see if they can be merged, simplified or deleted - [ ] update documentation on how to run tests ### Types of change enhancement, tests ## Checklist <!--- Before you submit the PR, go over this checklist and make sure you can tick off all the boxes. [] -> [x] --> - [x] I have submitted the spaCy Contributor Agreement. - [x] I ran the tests, and all new and existing tests passed. - [ ] My changes don't require a change to the documentation, or if they do, I've added all required information. 2018-07-24 21:38:44 +00:00			`return """`
Modernise Doc parse tree navigation tests and don't depend on models 2017-01-11 20:14:15 +00:00			`It was a bright cold day in April, and the clocks were striking thirteen.`
			`Winston Smith, his chin nuzzled into his breast in an effort to escape the`
			`vile wind, slipped quickly through the glass doors of Victory Mansions,`
			`though not quickly enough to prevent a swirl of gritty dust from entering`
			`along with him.`

			`The hallway smelt of boiled cabbage and old rag mats. At one end of it a`
			`coloured poster, too large for indoor display, had been tacked to the wall.`
			`It depicted simply an enormous face, more than a metre wide: the face of a`
			`man of about forty-five, with a heavy black moustache and ruggedly handsome`
			`features. Winston made for the stairs. It was no use trying the lift. Even at`
			`the best of times it was seldom working, and at present the electric current`
			`was cut off during daylight hours. It was part of the economy drive in`
			`preparation for Hate Week. The flat was seven flights up, and Winston, who`
			`was thirty-nine and had a varicose ulcer above his right ankle, went slowly,`
			`resting several times on the way. On each landing, opposite the lift-shaft,`
			`the poster with the enormous face gazed from the wall. It was one of those`
			`pictures which are so contrived that the eyes follow you about when you move.`
			`BIG BROTHER IS WATCHING YOU, the caption beneath it ran.`
			`"""`


			`@pytest.fixture`
			`def heads():`
			`return [1, 1, 0, 3, 2, 1, -4, -1, -1, -7, -8, 1, -10, 2, 1, -3, -1, -15,`
			`-1, 1, 4, -1, 1, -3, 0, -1, 1, -2, -4, 1, -2, 1, -2, 3, -1, 1,`
			`-4, -13, -14, -1, -2, 2, 1, -3, -1, 1, -2, -9, -1, 3, 1, 1, -14,`
			`1, -2, 1, -2, -1, 1, -2, -6, -1, -1, -2, -1, -1, -42, -1, 2, 1,`
			`0, -1, 1, -2, -1, 2, 1, -4, -8, 0, 1, -2, -1, -1, 3, -1, 1, -6,`
			`9, 1, 7, -1, 1, -2, 3, 2, 1, -10, -1, 1, -2, -22, -1, 1, 0, -1,`
			`2, 1, -4, -1, -2, -1, 1, -2, -6, -7, 1, -9, -1, 2, -1, -3, -1,`
			`3, 2, 1, -4, -19, -24, 3, 2, 1, -4, -1, 1, 2, -1, -5, -34, 1, 0,`
			`-1, 1, -2, -4, 1, 0, 1, -2, -1, 1, -2, -6, 1, 9, -1, 1, -3, -1,`
			`-1, 3, 2, 1, 0, -1, -2, 7, -1, 5, 1, 3, -1, 1, -10, -1, -2, 1,`
			`-2, -15, 1, 0, -1, -1, 2, 1, -3, -1, -1, -2, -1, 1, -2, -12, 1,`
			`1, 0, 1, -2, -1, -2, -3, 9, -1, 2, -1, -4, 2, 1, -3, -4, -15, 2,`
			`1, -3, -1, 2, 1, -3, -8, -9, -1, -2, -1, -4, 1, -2, -3, 1, -2,`
			`-19, 17, 1, -2, 14, 13, 3, 2, 1, -4, 8, -1, 1, 5, -1, 2, 1, -3,`
			`0, -1, 1, -2, -4, 1, 0, -1, -1, 2, -1, -3, 1, -2, 1, -2, 3, 1,`
			`1, -4, -1, -2, 2, 1, -5, -19, -1, 1, 1, 0, 1, 6, -1, 1, -3, -1,`
			`-1, -8, -9, -1]`
* Add test for parse tree navigation 2015-01-30 07:02:58 +00:00

Modernise Doc parse tree navigation tests and don't depend on models 2017-01-11 20:14:15 +00:00			`def test_parser_parse_navigate_consistency(en_tokenizer, text, heads):`
			`tokens = en_tokenizer(text)`
💫 Refactor test suite (#2568) ## Description Related issues: #2379 (should be fixed by separating model tests) * total execution time down from > 300 seconds to under 60 seconds 🎉 * removed all model-specific tests that could only really be run manually anyway – those will now live in a separate test suite in the [`spacy-models`](https://github.com/explosion/spacy-models) repository and are already integrated into our new model training infrastructure * changed all relative imports to absolute imports to prepare for moving the test suite from `/spacy/tests` to `/tests` (it'll now always test against the installed version) * merged old regression tests into collections, e.g. `test_issue1001-1500.py` (about 90% of the regression tests are very short anyways) * tidied up and rewrote existing tests wherever possible ### Todo - [ ] move tests to `/tests` and adjust CI commands accordingly - [x] move model test suite from internal repo to `spacy-models` - [x] ~~investigate why `pipeline/test_textcat.py` is flakey~~ - [x] review old regression tests (leftover files) and see if they can be merged, simplified or deleted - [ ] update documentation on how to run tests ### Types of change enhancement, tests ## Checklist <!--- Before you submit the PR, go over this checklist and make sure you can tick off all the boxes. [] -> [x] --> - [x] I have submitted the spaCy Contributor Agreement. - [x] I ran the tests, and all new and existing tests passed. - [ ] My changes don't require a change to the documentation, or if they do, I've added all required information. 2018-07-24 21:38:44 +00:00			`doc = get_doc(tokens.vocab, words=[t.text for t in tokens], heads=heads)`
Modernise Doc parse tree navigation tests and don't depend on models 2017-01-11 20:14:15 +00:00			`for head in doc:`
* Add test for parse tree navigation 2015-01-30 07:02:58 +00:00			`for child in head.lefts:`
Fix equality check in test 2017-10-16 17:50:35 +00:00			`assert child.head == head`
* Add test for parse tree navigation 2015-01-30 07:02:58 +00:00			`for child in head.rights:`
Fix equality check in test 2017-10-16 17:50:35 +00:00			`assert child.head == head`
* Extend parse tree navigation tests 2015-02-07 23:28:45 +00:00

Modernise Doc parse tree navigation tests and don't depend on models 2017-01-11 20:14:15 +00:00			`def test_parser_parse_navigate_child_consistency(en_tokenizer, text, heads):`
			`tokens = en_tokenizer(text)`
💫 Refactor test suite (#2568) ## Description Related issues: #2379 (should be fixed by separating model tests) * total execution time down from > 300 seconds to under 60 seconds 🎉 * removed all model-specific tests that could only really be run manually anyway – those will now live in a separate test suite in the [`spacy-models`](https://github.com/explosion/spacy-models) repository and are already integrated into our new model training infrastructure * changed all relative imports to absolute imports to prepare for moving the test suite from `/spacy/tests` to `/tests` (it'll now always test against the installed version) * merged old regression tests into collections, e.g. `test_issue1001-1500.py` (about 90% of the regression tests are very short anyways) * tidied up and rewrote existing tests wherever possible ### Todo - [ ] move tests to `/tests` and adjust CI commands accordingly - [x] move model test suite from internal repo to `spacy-models` - [x] ~~investigate why `pipeline/test_textcat.py` is flakey~~ - [x] review old regression tests (leftover files) and see if they can be merged, simplified or deleted - [ ] update documentation on how to run tests ### Types of change enhancement, tests ## Checklist <!--- Before you submit the PR, go over this checklist and make sure you can tick off all the boxes. [] -> [x] --> - [x] I have submitted the spaCy Contributor Agreement. - [x] I ran the tests, and all new and existing tests passed. - [ ] My changes don't require a change to the documentation, or if they do, I've added all required information. 2018-07-24 21:38:44 +00:00			`doc = get_doc(tokens.vocab, words=[t.text for t in tokens], heads=heads)`
* Extend parse tree navigation tests 2015-02-07 23:28:45 +00:00
			`lefts = {}`
			`rights = {}`
Modernise Doc parse tree navigation tests and don't depend on models 2017-01-11 20:14:15 +00:00			`for head in doc:`
* Extend parse tree navigation tests 2015-02-07 23:28:45 +00:00			`assert head.i not in lefts`
			`lefts[head.i] = set()`
			`for left in head.lefts:`
			`lefts[head.i].add(left.i)`
			`assert head.i not in rights`
			`rights[head.i] = set()`
			`for right in head.rights:`
			`rights[head.i].add(right.i)`
Modernise Doc parse tree navigation tests and don't depend on models 2017-01-11 20:14:15 +00:00			`for head in doc:`
* Extend parse tree navigation tests 2015-02-07 23:28:45 +00:00			`assert head.n_rights == len(rights[head.i])`
			`assert head.n_lefts == len(lefts[head.i])`
Modernise Doc parse tree navigation tests and don't depend on models 2017-01-11 20:14:15 +00:00			`for child in doc:`
* Extend parse tree navigation tests 2015-02-07 23:28:45 +00:00			`if child.i < child.head.i:`
			`assert child.i in lefts[child.head.i]`
			`assert child.i not in rights[child.head.i]`
			`lefts[child.head.i].remove(child.i)`
			`elif child.i > child.head.i:`
			`assert child.i in rights[child.head.i]`
			`assert child.i not in lefts[child.head.i]`
			`rights[child.head.i].remove(child.i)`
			`for head_index, children in lefts.items():`
			`assert not children`
			`for head_index, children in rights.items():`
			`assert not children`
* Add test for right_edge and left_edge 2015-04-29 20:08:27 +00:00

Modernise Doc parse tree navigation tests and don't depend on models 2017-01-11 20:14:15 +00:00			`def test_parser_parse_navigate_edges(en_tokenizer, text, heads):`
			`tokens = en_tokenizer(text)`
💫 Refactor test suite (#2568) ## Description Related issues: #2379 (should be fixed by separating model tests) * total execution time down from > 300 seconds to under 60 seconds 🎉 * removed all model-specific tests that could only really be run manually anyway – those will now live in a separate test suite in the [`spacy-models`](https://github.com/explosion/spacy-models) repository and are already integrated into our new model training infrastructure * changed all relative imports to absolute imports to prepare for moving the test suite from `/spacy/tests` to `/tests` (it'll now always test against the installed version) * merged old regression tests into collections, e.g. `test_issue1001-1500.py` (about 90% of the regression tests are very short anyways) * tidied up and rewrote existing tests wherever possible ### Todo - [ ] move tests to `/tests` and adjust CI commands accordingly - [x] move model test suite from internal repo to `spacy-models` - [x] ~~investigate why `pipeline/test_textcat.py` is flakey~~ - [x] review old regression tests (leftover files) and see if they can be merged, simplified or deleted - [ ] update documentation on how to run tests ### Types of change enhancement, tests ## Checklist <!--- Before you submit the PR, go over this checklist and make sure you can tick off all the boxes. [] -> [x] --> - [x] I have submitted the spaCy Contributor Agreement. - [x] I ran the tests, and all new and existing tests passed. - [ ] My changes don't require a change to the documentation, or if they do, I've added all required information. 2018-07-24 21:38:44 +00:00			`doc = get_doc(tokens.vocab, words=[t.text for t in tokens], heads=heads)`
Modernise Doc parse tree navigation tests and don't depend on models 2017-01-11 20:14:15 +00:00			`for token in doc:`
* Add test for right_edge and left_edge 2015-04-29 20:08:27 +00:00			`subtree = list(token.subtree)`
Modernise Doc parse tree navigation tests and don't depend on models 2017-01-11 20:14:15 +00:00			`debug = '\t'.join((token.text, token.left_edge.text, subtree[0].text))`
* Add test for right_edge and left_edge 2015-04-29 20:08:27 +00:00			`assert token.left_edge == subtree[0], debug`
Modernise Doc parse tree navigation tests and don't depend on models 2017-01-11 20:14:15 +00:00			`debug = '\t'.join((token.text, token.right_edge.text, subtree[-1].text, token.right_edge.head.text))`
* Add test for right_edge and left_edge 2015-04-29 20:08:27 +00:00			`assert token.right_edge == subtree[-1], debug`