Commit Graph

4 Commits

Author SHA1 Message Date
kadarakos 1bb87f35bc
Detect cycle during projectivize (#10877)
* detect cycle during projectivize

* not complete test to detect cycle in projectivize

* boolean to int type to propagate error

* use unordered_set instead of set

* moved error message to errors

* removed cycle from test case

* use find instead of count

* cycle check: only perform one lookup

* Return bool again from _has_head_as_ancestor

Communicate presence of cycles through an output argument.

* Switch to returning std::pair to encode presence of a cycle

The has_cycle pointer is too easy to misuse. Ideally, we would have a
sum type like Rust's `Result` here, but C++ is not there yet.

* _is_non_proj_arc: clarify what we are returning

* _has_head_as_ancestor: remove count

We are now explicitly checking for cycles, so the algorithm must always
terminate. Either we encounter the head, we find a root, or a cycle.

* _is_nonproj_arc: simplify condition

* Another refactor using C++ exceptions

* Remove unused error code

* Print graph with cycle on exception

* Include .hh files in source package

* Add FIXME comment

* cycle detection test

* find cycle when starting from problematic vertex

Co-authored-by: Daniël de Kok <me@danieldk.eu>
2022-06-08 19:34:11 +02:00
Daniël de Kok 78a8bec4d0
Make core projectivization functions cdef nogil (#10241)
* Make core projectivization methods cdef nogil

While profiling the parser, I noticed that relatively a lot of time is
spent in projectivization. This change rewrites the functions in the
core loops as cdef nogil for efficiency.

In C++-land, we use vector in place of Python lists and absent heads
are represented as -1 in place of None.

* _heads_to_c: add assertion

Validation should be performed by the caller, but this assertion ensures that
we are not reading/writing out of bounds with incorrect input.
2022-02-21 15:02:21 +01:00
Adriane Boyd a119667a36
Clean up spacy.tokens (#6046)
* Clean up spacy.tokens

* Update `set_children_from_heads`:
  * Don't check `dep` when setting lr_* or sentence starts
  * Set all non-sentence starts to `False`

* Use `set_children_from_heads` in `Token.head` setter
  * Reduce similar/duplicate code (admittedly adds a bit of overhead)
  * Update sentence starts consistently

* Remove unused `Doc.set_parse`

* Minor changes:
  * Declare cython variables (to avoid cython warnings)
  * Clean up imports

* Modify set_children_from_heads to set token range

Modify `set_children_from_heads` so that it adjust tokens within a
specified range rather then the whole document.

Modify the `Token.head` setter to adjust only the tokens affected by the
new head assignment.
2020-09-16 20:32:38 +02:00
Sofie Van Landeghem ca491722ad
The Parser is now a Pipe (2) (#5844)
* moving syntax folder to _parser_internals

* moving nn_parser and transition_system

* move nn_parser and transition_system out of internals folder

* moving nn_parser code into transition_system file

* rename transition_system to transition_parser

* moving parser_model and _state to ml

* move _state back to internals

* The Parser now inherits from Pipe!

* small code fixes

* removing unnecessary imports

* remove link_vectors_to_models

* transition_system to internals folder

* little bit more cleanup

* newlines
2020-07-30 23:30:54 +02:00