Commit Graph

14 Commits

Author SHA1 Message Date
nmlgc 4837ec7ec9 [Research] Benchmark various sprite blitting approaches
Running this on various PC-98 models confirms that unchecked blitting
(i.e., what you would intuitively consider to be the best method) is in
fact faster than checking either byte of a 16-pixel-wide sprite
beforehand, and has been throughout the PC-98's lifespan. For optimal
performance on the 286 and 386, we might want to use MOVS instead of
MOV, but even that difference is way too small to truly matter.
Also, nice to see turns out that our blitter outperforms a naive pure C
implementation by 2-4×, depending on the model. And master.lib is not
*that* much faster…

The gaiji in `Research/blitperf.bmp` were taken from the Unifont
version 15.0.01 glyphs for:

	• U+2022 BULLET •
	• U+23F1 STOPWATCH ⏱
	• U+1F40C SNAIL 🐌

Part of P0233, funded by [Anonymous].
2023-03-04 19:40:55 +01:00
nmlgc aa0aad8141 [Platform] [PC-98] Generic byte-aligned sprite blitter
The fact that every sprite format comes with its own blitter is one of
the major sources of bloat in PC-98 Touhou, and of TH01 in particular.
So how about writing a single decently optimized blitter, and calling
into that from the entire game?

Especially because generating distinct blitting functions for every
width is a much better use of all that memory: It eliminates horizontal
loops, and ensures that we use the optimal MOV variant for each sprite
size. Removing any checks for empty bytes (which will turn out to never
have been a good idea for any PC-98 model ever) and unrolling the main
blitting loop using Duff's Device already gets us something that,
depending on the PC-98 model, is easily 2-4× faster than the typical
naive C implementation you'd find in TH01. With master.lib being not
that faster…

Making more use of C++ templates would have been fancy, but horizontal
sprite clipping can change the blit width depending on runtime values.
So, we're back to X macro code generation after all.

Part of P0233, funded by [Anonymous].
2023-03-04 19:40:55 +01:00
nmlgc abeaf851a4 [Platform] [PC-98] EGC rectangle copies
Yup, unaligned! The prefilling case is quite broken on T98-Next, but
given that this emulator hasn't seen any development since 2010 and
every other emulator gets it right, we can reasonably assume that to be
a bug in that emulator.

Completes P0232, funded by [Anonymous].
2023-03-04 19:40:55 +01:00
nmlgc afa6253683 [Platform] [PC-98] GRCG tile and color wrappers
Choosing C++ RAII wrappers because there's at least one case where ZUN
misplaced a manual grcg_off(). This implementation combines safety with
the optimal instructions for both dynamic and static use cases.

Part of P0232, funded by [Anonymous].
2023-02-28 08:08:17 +01:00
nmlgc d15bb0c4fa [Platform] [PC-98] Graphics GDC initialization
I've copy-pasted this snippet so many times, it's time it gets a proper
home.

Part of P0232, funded by [Anonymous].
2023-02-28 08:08:17 +01:00
nmlgc 03519d2af8 [Platform] [PC-98] Font ROM glyph retrieval
Lol @ getting the glyph header field order wrong in 2021…

Part of P0232, funded by [Anonymous].
2023-02-28 08:08:17 +01:00
nmlgc e5d7c9489c [Platform] [PC-98] Gaiji upload
Will come in handy for various research programs… 👀

Part of P0232, funded by [Anonymous].
2023-02-28 08:08:17 +01:00
nmlgc d22c1e6db3 [Platform] [PC-98] Hardware palette setters
Optimally, these are called *at most* once per frame. No need to
micro-optimize here.

Part of P0232, funded by [Anonymous].
2023-02-28 08:08:17 +01:00
nmlgc f1108b5548 [Platform] [PC-98] VSync: Retrigger the VSync interrupt after INT 18h
Well, that didn't take long. Unlike *debugging* this issue after you
encounter it on real hardware…

Part of P0232, funded by [Anonymous].
2023-02-28 08:08:17 +01:00
nmlgc df0672762b [Platform] [PC-98] VSync interrupt handler
Starting with the simple refresh rate-oblivious code from TH01, until
we've figured out what the rest of the master.lib code is doing and
have valid reasons to include it. Also extending the second counter to
32-bit because we *might* be measuring some processes that could take
longer than 19:35 minutes…

Part of P0232, funded by [Anonymous].
2023-02-28 08:08:17 +01:00
nmlgc 82f27f3771 [Platform] [PC-98] Page flipping
Inline functions wouldn't generate optimal code in some cases.

Part of P0232, funded by [Anonymous].
2023-02-28 08:08:17 +01:00
nmlgc 4548b874d6 [Platform] [PC-98] Font ROM glyph types
Moving the code from TH01 to a new platform layer, and deciding against
the `pc98_` prefix, which is sort of implied by the directory of the
header file it came from. Namespaces would be ideal, but Turbo C++ 4.0J
sadly doesn't support them.

Part of P0232, funded by [Anonymous].
2023-02-28 08:08:17 +01:00
nmlgc 055cc20f79 [Platform] [x86 Real Mode] CPU flag macros
Over on the `debloated` branch, we're going to use them in our own
platform-specific code, which obviously is not decompiled from
anything.

Part of P0232, funded by [Anonymous].
2023-02-28 08:08:16 +01:00
nmlgc 6b5102d2e7 [Platform] [x86 Real Mode] Implement Turbo C++ 4.0J exception handler removal
Full version with support for `operator new`.

Part of P0230, funded by [Anonymous].
2023-02-28 08:07:53 +01:00