Let's see if anyone ever tries to compile this codebase with a pre-C89
compiler that enforces the # at the beginning of the line.
Part of P0203, funded by [Anonymous] and GhostRiderCog.
Reason: Too much micro-optimization using 32-bit registers, which
aren't supported by Turbo C++'s inline assembler. It's also just
another variation on a common function we've decompiled time and time
again.
Part of P0192, funded by [Anonymous], nrook, and -Tom-.
It's not a kaja_func_t if it's shifted left by 8 bits. Why is it
shifted left by 8 bits to begin with, though? Why not just pass a
kaja_func_t, and assign it to AH? Arrrrgh.
Part of P0139, funded by [Anonymous].
Hardcoding these *might* have been acceptable if the numbers actually
matched the sizes defined in GAME.BAT, but they don't. With PMD's
AH=22h function, there's really no excuse though.
About time I looked into this, and expressed that constant as an inline
function that can easily be replaced with a proper implementation.
Part of P0139, funded by [Anonymous].
I'll leave a comprehensive, fully documented enum to interested
contributors, since that would involve research into basically the
entire history of the PC-9800 series, and even the clearly out-of-scope
PC-88VA. After all, PC-98 Touhou only needs to distinguish "OPN- /
PC-9801-26K-compatible sound sources handled by PMD.COM" from
"everything else", since all other PMD varieties are OPNA- /
PC-9801-86-compatible.
Part of P0139, funded by [Anonymous].
Whoops, turns out that the build has been broken on TASM32 version 5.3
(the one in the DevKit) ever since 7897bf1. In contrast to version 5.0
(which I use for my development), 5.3 actually defines 32-bit segments
if you specify a .386 CPU before using .MODEL.
That might have been the reason for the .286 workaround all along?
Turns out there's the USE16 modifier, which makes this much more
explicit than switching CPUs.
Reason: Pascal calling convention with function parameters but no stack
frame. Theoretically we can __emit__() everything inside this function,
but there's no way we can get a `RETN 8` this way. Oh, and it also
accesses SI and DI without backing them up to the stack.
And thanks to TLINK apparently not reporting fixup overflows when
segments are small enough (?), it took quite a while to get that CALL
correct and not weirdly offset by 32 bytes. 😕
Part of P0134, funded by [Anonymous].
> assigning to the DI register immediately before a CALL
Yeah, no amount of comma operator trickery can get *that* out of this
compiler. Also, these TH05 .PI functions are the only place in PC-98
Touhou with a `IMUL DI, imm8` instruction, which is impossible to get
out of Turbo C++'s built-in assembler.
Well, at least the `if` branches decompile somewhat nicely.
Part of P0134, funded by [Anonymous].
Well, it *would* have been decompilable, but that ridiculous placement
of the nullptr assignment would have forced the entire function call to
be spelled out in inline ASM, verbatim. No amount of comma operator
trickery would have generated the same instructions either. And for a
function this small and obvious in what its decompilation *should* be,
it really defeated the purpose of adding a separate translation unit…
Part of P0134, funded by [Anonymous].
My repo will never include all of master.lib anyway, as explained in
https://rec98.nmlgc.net/blog/2020-11-03.
That gets me a median build time of 27 seconds for the entirety of
ReC98 on my system, just as estimated in that same blog post. And we're
far from done with #include optimizations…
Part of P0133, funded by [Anonymous].
Yeah, why *were* we assembling them in the 16-bit part before?!
Possible reasons:
• In a time before Tup, it made no actual difference whether these
little files were assembled in the 32-bit or 16-bit part. Now it sort
of does, since we've temporarily given up on minimal rebuilds in the
16-bit part.
• Emphasizing the temporary nature of the 32-bit part by deliberately
moving everything to the 16-bit part as early as possible?
• It all started with the ZUN.COM ASM code, which doesn't include any
other files, and can therefore be perfectly tracked by a Makefile.
Which *was* superior than the exclusive dumb batch file we had in the
past. And then I've simply cargo-culted all new .ASM translation
units into the 16-bit part well.
Oh, and another positive side effect of temporarily not using 16-bit
TASM: The build process now also runs on Windows 95.
Part of P0113, funded by Lmocinemod.
5 enums where code generation wants an `int`, vs. 11 cases where using
the minimum size is exactly the right default. So it's way more
idiomatic to force those 5 to 16 bits via a dummy element… except that
we can't give it a single, consistent name, because you can't redeclare
the same element in a different enum later.
Oh well, let's have this ugly naming convention instead, which makes it
totally clear that the force element not, in fact, a valid value of
that enum.
Part of P0085, funded by -Tom-.
All the weird double returns in FUUIN.EXE just magically appear with
-O-! 😮
And yeah, it's a bowl of global state spaghetti once again. 🍝 Named
the functions in a way that would make sense to a user of the API, who
should be aware of typical side effects, like, y'know, a changed
hardware palette… That's how you end up with the supposed "main"
function getting a "_palette_show" suffix.
Completes P0082, funded by Ember2528.
This cautionary tale of wasted time will now be told at
https://github.com/nmlgc/Slices
with additional context for anyone else thinking about starting their
own ReC project.
Completes P0077, funded by Splashman and -Tom-.
Which finally allows us to use the PLANE_SIZE macro in ASM land. Yeah,
(ROW_SIZE * RES_Y) has finally got old.
Part of P0073, funded by [Anonymous] and -Tom-.
Yes, decompilation, of something that was so obviously originally
written in ASM. We're still left with two un-decompilable instructions
here, but I'm amazed at how nicely I was able to abstract away all of
the gory register details, leading to pretty clear, readable, and dare
I say *portable* code?! Turbo C++ was once again pretty helpful here:
• `static_cast<char>(_BX) = _AL` actually compiles into `MOV BL, AL`,
as you would have intended,
• and no-op assignments like _DI = _DI are optimized away, allowing
us to leave them in for clarity, so that we can have all parameter
assignments for the SPRITE16 display call in a single place.
I love this compiler.
Part of P0060, funded by Touhou Patch Center.
Yup, function parameters that can clearly be identified as coordinates
are by far the fastest way to raise the calculated position
independence percentage. Kinda makes it sound like useless work, which
I'm only doing because it's dictated by some counting algorithm on a
website, but decompilation will want to un-hex all of these values
anyway. We're merely doing that right now, across all games.
Part of P0058, funded by -Tom-.
Aha! TH03's in-game graphics run in line-doubled 640×200 simply because
that's what this SPRITE16.COM version was written for, making use of
the PC-98 EGC for optimized blitting.
Doesn't seem all *too* optimized though, given that it chooses to
effectively draw every sprite twice, just in case it might overlap with
something that's already in VRAM. It first clears the previous VRAM
content at the drawing position according to the sprite's alpha mask,
then ORs in the actual sprite data. The EGC can do monochrome alpha-
tested blitting just as well as the GRCG, but once the sprite data
covers all bitplanes, ORing is apparently the best it can do by itself?
More technical details on the raster operations in the next push!
Completes P0056, funded by rosenrose and [Anonymous].
Which was actually found out by @m1yur1 last year, who thankfully
tweeted this in reference to ReC98:
https://twitter.com/m1yur1/status/1018855232371998720
Thanks a lot! But what makes this more than a piece of trivia is the
fact that the StormySpace release actually *was* bundled with
documentation. Shoutout to the Neo Kobe PC-98 collection for preserving
the original release! This should now greatly simplify any RE efforts
related to TH03's INT 42h calls. (Not *trivialize*, because there's
still all this EGC hardware to be understood…)
And sure, you *were* allowed to use this driver in your own game, but
replacing the copyright with your own isn't exactly the nicest thing to
do… That now makes three library programmers that ZUN didn't credit.
Makes me wonder what makes M. Kajihara so special. Probably the fact
that Touhou has always been about the music for ZUN, first and
foremost.
Part of P0056, funded by rosenrose and [Anonymous].
The TH02 version is a piece of cake…
… but TH04 starts turning it into this un-decompilable piece of
unnecessarily micro-optimized ZUN code. Couldn't have chosen anything
better for the first separate ASM translation unit.
Aside from now having to convert names of exported *variables* to
uppercase for visibility in ASM translation units, the most notable
lesson in this was the one about avoiding fixup overflows. From the
Borland C++ Version 4.0 User's Guide:
"In an assembly language program, a fixup overflow frequently
occurs if you have declared an external variable within a
segment definition, but this variable actually exists in a
different segment."
Can't be restated often enough.
Completes P0032, funded by zorg.
In which ZUN actively refuses help from master.lib and rips out
everything that isn't immediately related to reading the one joystick
port of the PC-9801-86 sound board.
Maybe there actually was a good reason for that?
Funded by -Tom-.
TASM crashes if you try to define a structure member with the same name
as an existing numeric equate.
And yes, we should have solved this by finally librarifying master.lib,
but that'll be a rather involved subtask as well.