Commit Graph

4 Commits

Author SHA1 Message Date
nmlgc b61e612fdf [Maintenance] #include each header's dependencies within the header itself
OK, this is the big one. We still keep using `#include` guards only
where we absolutely need to, but with each header now being valid in
isolation, this can now actually help *minimize* the length of each
translation unit's `#include` list. Turns out that after removing all
the duplicates, we only *actually* need to guard 29 headers across all
5 games.

Part of P0285, funded by [Anonymous] and iruleatgames.
2024-07-09 08:46:42 +02:00
nmlgc 45c9e71533 [Maintenance] Fix another bunch of accumulated typos and dead code
Part of P0239, funded by Ember2528.
2023-04-28 22:21:21 +02:00
nmlgc 4837ec7ec9 [Research] Benchmark various sprite blitting approaches
Running this on various PC-98 models confirms that unchecked blitting
(i.e., what you would intuitively consider to be the best method) is in
fact faster than checking either byte of a 16-pixel-wide sprite
beforehand, and has been throughout the PC-98's lifespan. For optimal
performance on the 286 and 386, we might want to use MOVS instead of
MOV, but even that difference is way too small to truly matter.
Also, nice to see turns out that our blitter outperforms a naive pure C
implementation by 2-4×, depending on the model. And master.lib is not
*that* much faster…

The gaiji in `Research/blitperf.bmp` were taken from the Unifont
version 15.0.01 glyphs for:

	• U+2022 BULLET •
	• U+23F1 STOPWATCH ⏱
	• U+1F40C SNAIL 🐌

Part of P0233, funded by [Anonymous].
2023-03-04 19:40:55 +01:00
nmlgc aa0aad8141 [Platform] [PC-98] Generic byte-aligned sprite blitter
The fact that every sprite format comes with its own blitter is one of
the major sources of bloat in PC-98 Touhou, and of TH01 in particular.
So how about writing a single decently optimized blitter, and calling
into that from the entire game?

Especially because generating distinct blitting functions for every
width is a much better use of all that memory: It eliminates horizontal
loops, and ensures that we use the optimal MOV variant for each sprite
size. Removing any checks for empty bytes (which will turn out to never
have been a good idea for any PC-98 model ever) and unrolling the main
blitting loop using Duff's Device already gets us something that,
depending on the PC-98 model, is easily 2-4× faster than the typical
naive C implementation you'd find in TH01. With master.lib being not
that faster…

Making more use of C++ templates would have been fancy, but horizontal
sprite clipping can change the blit width depending on runtime values.
So, we're back to X macro code generation after all.

Part of P0233, funded by [Anonymous].
2023-03-04 19:40:55 +01:00