The fact that every sprite format comes with its own blitter is one of
the major sources of bloat in PC-98 Touhou, and of TH01 in particular.
So how about writing a single decently optimized blitter, and calling
into that from the entire game?
Especially because generating distinct blitting functions for every
width is a much better use of all that memory: It eliminates horizontal
loops, and ensures that we use the optimal MOV variant for each sprite
size. Removing any checks for empty bytes (which will turn out to never
have been a good idea for any PC-98 model ever) and unrolling the main
blitting loop using Duff's Device already gets us something that,
depending on the PC-98 model, is easily 2-4× faster than the typical
naive C implementation you'd find in TH01. With master.lib being not
that faster…
Making more use of C++ templates would have been fancy, but horizontal
sprite clipping can change the blit width depending on runtime values.
So, we're back to X macro code generation after all.
Part of P0233, funded by [Anonymous].
Yup, unaligned! The prefilling case is quite broken on T98-Next, but
given that this emulator hasn't seen any development since 2010 and
every other emulator gets it right, we can reasonably assume that to be
a bug in that emulator.
Completes P0232, funded by [Anonymous].
Choosing C++ RAII wrappers because there's at least one case where ZUN
misplaced a manual grcg_off(). This implementation combines safety with
the optimal instructions for both dynamic and static use cases.
Part of P0232, funded by [Anonymous].
Starting with the simple refresh rate-oblivious code from TH01, until
we've figured out what the rest of the master.lib code is doing and
have valid reasons to include it. Also extending the second counter to
32-bit because we *might* be measuring some processes that could take
longer than 19:35 minutes…
Part of P0232, funded by [Anonymous].
Moving the code from TH01 to a new platform layer, and deciding against
the `pc98_` prefix, which is sort of implied by the directory of the
header file it came from. Namespaces would be ideal, but Turbo C++ 4.0J
sadly doesn't support them.
Part of P0232, funded by [Anonymous].
We'd like to use this optimization in the platform layer as well.
Turning it into an inline function via __emit__() also allows us to
turn a bunch of other macros into proper inline functions.
Part of P0232, funded by [Anonymous].
Over on the `debloated` branch, we're going to use them in our own
platform-specific code, which obviously is not decompiled from
anything.
Part of P0232, funded by [Anonymous].
And dissolve the "vars" units, which would be too annoying to merge on
the `debloated` branch otherwise. This interpretation of these globals
further highlights the type differences between REIIDEN.EXE and
FUUIN.EXE.
Placing them in directly in `resident.hpp`; on the `debloated` branch,
we could neatly define each of these fields in a matching .cpp file,
but that file would need to be compiled twice due to the aforementioned
differences between binaries. Better to keep those in `op_01.cpp`,
`main_01.cpp`, or `fuuin_01.cpp`, respectively.
Part of P0229, funded by Ember2528.
Allows us to use them as switch cases in the `debloated` branch, in
exchange for turning some inline functions to macros.
Part of P0229, funded by Ember2528.
This variable makes more sense if we name it after its one actual and
consistent usage. All others make more sense when interpreted as a bug
or bloat.
Part of P0229, funded by Ember2528.
Same rationale here – this naming scheme clarifies how these variables
are just redundant copies out of the resident structure.
Part of P0229, funded by Ember2528.
The `credit_` prefix may seem redundant within the REIIDEN.CFG
structure, but consistency seems more important here. Makes it much
easier to follow how these fields are copied around.
Part of P0229, funded by Ember2528.
In all places visited during the next 6 pushes: The resident structure
and copies of its values, the packfile implementation, boss entities,
rendering font ROM glyphs to VRAM, and overall inconsistent code
between the three binaries.
Part of P0229, funded by Ember2528.
Neatly dissolves two of the three game-specific preprocessor branches
in `th01/hardware/grppsafx.h`. Moving at least one function into a
corresponding .cpp file will also simplify the corresponding debloating
commit on the respective branch.
Part of P0229, funded by Ember2528.
In which the shrink types """conveniently""" use a signed comparison
that effectively limits their width to 127 pixels, which forces a
shrink/nonshrink distinction upon the entire rest of the code. 🙄
Part of P0228, funded by [Anonymous] and nrook.