Speed up the StateC::L feature function (#10019)

* Speed up the StateC::L feature function

This function gets the n-th most-recent left-arc with a particular head.
Before this change, StateC::L would construct a vector of all left-arcs
with the given head and then pick the n-th most recent from that vector.
Since the number of left-arcs strongly correlates with the doc length
and the feature is constructed for every transition, this can make
transition-parsing quadratic.

With this change StateC::L:

- Searches left-arcs backwards.
- Stops early when the n-th matching transition is found.
- Does not construct a vector (reducing memory pressure).

This change doesn't avoid the linear search when the transition that is
queried does not occur in the left-arcs. Regardless, performance is
improved quite a bit with very long docs:

Before:

   N  Time

 400   3.3
 800   5.4
1600  11.6
3200  30.7

After:

   N  Time

 400   3.2
 800   5.0
1600   9.5
3200  23.2

We can probably do better with more tailored data structures, but I
first wanted to make a low-impact PR.

Found while investigating #9858.

* StateC::L: simplify loop
This commit is contained in:
Daniël de Kok 2022-01-13 09:03:55 +01:00 committed by Daniël de Kok
parent 176a90edee
commit 677c1a3507
1 changed files with 14 additions and 9 deletions

View File

@ -1,3 +1,4 @@
from cython.operator cimport dereference as deref, preincrement as incr
from libc.string cimport memcpy, memset
from libc.stdlib cimport calloc, free
from libc.stdint cimport uint32_t, uint64_t
@ -184,16 +185,20 @@ cdef cppclass StateC:
int L(int head, int idx) nogil const:
if idx < 1 or this._left_arcs.size() == 0:
return -1
cdef vector[int] lefts
for i in range(this._left_arcs.size()):
arc = this._left_arcs.at(i)
# Work backwards through left-arcs to find the arc at the
# requested index more quickly.
cdef size_t child_index = 0
it = this._left_arcs.const_rbegin()
while it != this._left_arcs.rend():
arc = deref(it)
if arc.head == head and arc.child != -1 and arc.child < head:
lefts.push_back(arc.child)
idx = (<int>lefts.size()) - idx
if idx < 0:
child_index += 1
if child_index == idx:
return arc.child
incr(it)
return -1
else:
return lefts.at(idx)
int R(int head, int idx) nogil const:
if idx < 1 or this._right_arcs.size() == 0: