ReC98

Commit Graph

Author	SHA1	Message	Date
nmlgc	b61e612fdf	[Maintenance] #include each header's dependencies within the header itself OK, this is the big one. We still keep using `#include` guards only where we absolutely need to, but with each header now being valid in isolation, this can now actually help minimize the length of each translation unit's `#include` list. Turns out that after removing all the duplicates, we only actually need to guard 29 headers across all 5 games. Part of P0285, funded by [Anonymous] and iruleatgames.	2024-07-09 08:46:42 +02:00
nmlgc	45c9e71533	[Maintenance] Fix another bunch of accumulated typos and dead code Part of P0239, funded by Ember2528.	2023-04-28 22:21:21 +02:00
nmlgc	4837ec7ec9	[Research] Benchmark various sprite blitting approaches Running this on various PC-98 models confirms that unchecked blitting (i.e., what you would intuitively consider to be the best method) is in fact faster than checking either byte of a 16-pixel-wide sprite beforehand, and has been throughout the PC-98's lifespan. For optimal performance on the 286 and 386, we might want to use MOVS instead of MOV, but even that difference is way too small to truly matter. Also, nice to see turns out that our blitter outperforms a naive pure C implementation by 2-4×, depending on the model. And master.lib is not that much faster… The gaiji in `Research/blitperf.bmp` were taken from the Unifont version 15.0.01 glyphs for: • U+2022 BULLET • • U+23F1 STOPWATCH ⏱ • U+1F40C SNAIL 🐌 Part of P0233, funded by [Anonymous].	2023-03-04 19:40:55 +01:00
nmlgc	aa0aad8141	[Platform] [PC-98] Generic byte-aligned sprite blitter The fact that every sprite format comes with its own blitter is one of the major sources of bloat in PC-98 Touhou, and of TH01 in particular. So how about writing a single decently optimized blitter, and calling into that from the entire game? Especially because generating distinct blitting functions for every width is a much better use of all that memory: It eliminates horizontal loops, and ensures that we use the optimal MOV variant for each sprite size. Removing any checks for empty bytes (which will turn out to never have been a good idea for any PC-98 model ever) and unrolling the main blitting loop using Duff's Device already gets us something that, depending on the PC-98 model, is easily 2-4× faster than the typical naive C implementation you'd find in TH01. With master.lib being not that faster… Making more use of C++ templates would have been fancy, but horizontal sprite clipping can change the blit width depending on runtime values. So, we're back to X macro code generation after all. Part of P0233, funded by [Anonymous].	2023-03-04 19:40:55 +01:00

4 Commits