What a beautiful micro-optimization! What's less beautiful though is
the fact that the segment-3 version is only used by a single function,
where it could have easily been inlined.
Part of P0192, funded by [Anonymous], nrook, and -Tom-.
Featuring a stupid variant of Turbo C++'s __memcpy__() intrinsic that
does the exact same thing, but just with reordered instructions. What a
waste of time.
Part of P0189, funded by Arandui and Lmocinemod.