OK, this is the big one. We still keep using `#include` guards only
where we absolutely need to, but with each header now being valid in
isolation, this can now actually help *minimize* the length of each
translation unit's `#include` list. Turns out that after removing all
the duplicates, we only *actually* need to guard 29 headers across all
5 games.
Part of P0285, funded by [Anonymous] and iruleatgames.
Reason: Saving SI and DI on the stack way too late. Just because ZUN
absolutely *had* to move the clipping condition before these two PUSH
instructions… Was it really necessary to save a total of 4 instructions
for an unlikely worst case in a function that's maybe called like 10-20
times per frame *at worst*?
Part of P0192, funded by [Anonymous], nrook, and -Tom-.