The TH05 difference: A ridiculous out-of-bounds structure field access.
Could have done safely by just passing a shot_t*, but ZUN's too cool
for that?
Also, about time we started putting the function prototypes directly
into the C headers, and the C headers *only*, upon reverse-engineering…
Part of P0036, funded by zorg.
It seems that the main th0?/ directories should only contain actual
translation units (of which there are more than previously assumed)
as well as other not really further classifiable slices?
Part of P0036, funded by zorg.
And once again, the TH05 version is un-decompilable. :/ It was pretty
close this time, though, as the entire block between PUSH DI and POP DI
kind of resembles a separate inlined function, in accordance with Turbo
C++'s automatic backup of the DI register, as researched in 7f971a0.
Except that it contains a loop, and Turbo C++ refuses to inline any
function with `do`, `while`, `for`, or `goto`. If it didn't, it would
have totally worked.
Also, yes, C++ class methods are treated identically in this regard.
Oh well. Shot type control functions next, finally!
Completes P0035, funded by zorg.
Including the confirmation that both games have an 8-frame deathbomb
window.
The placement of the variables is all over the place though, what the
hell?
Part of P0034, funded by zorg.
They are supposed to lag behind the player's movement by one frame, and
therefore have to be tracked separately.
I would have also included TH02, if it weren't for the weird sprite
indirection system (not storing X and Y positions directly in the
structure, but taking them from a buffer, with their addresses
always changing, WTF?!) that absolutely needs separate attention.
Part of P0034, funded by zorg.
So it's *_put(), inherited from master.lib, for everything just writing
to text RAM, and *_render() for everything more involved? But what
about master.lib's own graphics RAM functions like super_put()? Need to
fix that inconsistency some day.
Once again no decompilation, because…
Part of P0033, funded by zorg.
The TH02 version is a piece of cake…
… but TH04 starts turning it into this un-decompilable piece of
unnecessarily micro-optimized ZUN code. Couldn't have chosen anything
better for the first separate ASM translation unit.
Aside from now having to convert names of exported *variables* to
uppercase for visibility in ASM translation units, the most notable
lesson in this was the one about avoiding fixup overflows. From the
Borland C++ Version 4.0 User's Guide:
"In an assembly language program, a fixup overflow frequently
occurs if you have declared an external variable within a
segment definition, but this variable actually exists in a
different segment."
Can't be restated often enough.
Completes P0032, funded by zorg.
Rule of thumb going forward: Everything that emits data is .asm,
everything that doesn't is .inc.
(Let's hope that th01_reiiden_2.inc won't exist for that much longer!)
Part of P0032, funded by zorg.
So yes, we *can* technically decompile from anywhere, by splitting the
segment after the function we want, then .SEQuentially GROUPing the two
segments back together into one virtual segment matching the original
one. This gives us one more point where we can slot in new compilation
units that emit their code into the same segment, in the order given on
the link command line.
*But* since all ASM in ReC98 heavily relies on being assembled in MASM
mode, we then start to suffer from MASM's group addressing quirk,
described in the "Accessing data in a segment belonging to a group"
section in the Turbo Assembler Version 5 User's Guide.
Which then forces us to manually prefix every single function call
• from inside the group
• to anywhere else within the newly created segment
with the group name. It's stupidly boring busywork, because of all the
function calls you *mustn't* prefix. Special tooling might make this
easier, but I don't have it, and I'm not getting crowdfunded for it.
And while this is faster than porting the entire codebase to Ideal
mode, I'll only do this on rare occasions.
Like the upcoming, particularly awful piece of reverse-engineering.
Completes P0031, funded by zorg.
"Yeah, let's do this real quick, how can this possibly be hard, it's
just MOVs and a few function calls"…
…except that these MOVs access quite a lot of data, which we now all
have to declare in the C world, hooray.
Once it came to midbosses and bosses, I just turned them into C structs
after all. Despite what I said in 260edd8… after all, the ASM world
doesn't care about the representation in the C world, so they don't
necessarily have to be the same.
Since these structs can't contain everything related to midbosses and
bosses (really, why did all those variables have to be spread out like
this, ZUN?), it also made for a nice occasion to continue the "stuff"
naming scheme, describing "an obviously incomplete collection of
variables related to a thing", first seen in 160d4eb.
Also, PROCDESC apparently is the only syntactically correct option to
declare an extern near proc?
Also, that `boss_phase_timed_out` variable only needs to be here
already because TCC enforces word alignment for the .data segment…
yeah, it's technically not related to this commit, but why waste time
working around it if we can just include that one variable.
Completes P0030, funded by zorg.
I've had the idea to hide this implementation detail and improve code
readability for some time now, but it obviously must still all inline,
to be indistinguishable from a direct assignment of the correct value…
… which, amazingly, it does! Even the static_cast from float to int.
The latter allows us to exclusively implement this for float, since we
do have to express the occasional value smaller than 16.
Who needs macros anyway. Yay, C++ in TH04 and TH05 after all!
Part of P0030, funded by zorg.
No leading underscore for functions with Pascal calling convention, but
we do have one for all variables, because it's not worth it to put
keywords in front of everything for no reason.
Seemed to have forgotten this rule in 2017?
Part of P0030, funded by zorg.
These are used from quite a few places, so it seems best to just name
them after the rect on the playfield they leave out, which is then
typically where the background picture goes.
…*except* that in doing this, we quickly run up against the symbol
length limit of 32 characters. TASM can expand it via the /mv option,
but TCC only lets you *reduce* it to even less. (Why?)
So, my initial idea of `playfield_fill_around_(x)_(y)_(w)_(h)` wouldn't
have worked. But those coordinates are kinda important, I'd say…
Well then, let's just go with `fillm` instead of `fill_around` then.
"Fill with mask at the given coordinates"… yeah, that would work.
Part of P0029, funded by zorg.
… yeah, I don't really like these ambiguous "mode" and "mode change"
variable names either, but what's the alternative? Something something
"sub-phase", to distinguish them from regular phases? Feels way too
early to decide on something more specific. And pretty much nothing I
could come up with right now would have made their inconsistent use any
clearer.
But I need to decide on *something* before moving on, so… eh, let's
just go with what uth05win chose.
Also, yeah, dealing with those 0xFE and 0xFD boss_phase constants some
other time 😛
Also, today in "Weird TASM crashes": Trailing commas at the end of
`public` lines…
Completes P0028, funded by zorg.
Turning these into a struct will be very painful with all the
collisions. Not going to do that until we decompiled every single
reference to those.
Part of P0028, funded by zorg.
As in, "HiScore Entry!!", "Extend!!", "Full PowerUp!!", etc.
Class CFloatingText in uth05win… but I decided against naming it like
that because of some stage/BGM title-related variables in the middle.
Funded by -Tom-.
Many thanks to http://bytepointer.com/tasm/index.htm for providing a
better searchable resource for TASM's default `LEA imm16` → `MOV imm16`
optimization, which we initially had to hack around here.
Funded by -Tom-.
In which ZUN uses little-endian BCD as the exclusive internal storage
for both the current and the high score. Which are then updated using,
once again, ridiculously micro-optimized ASM code that uses the
venerable x86 BCD instructions.
Funded by -Tom-.
So apparently, this way of distorting a circle into an ellipse (?) by
adding a value to the angle for one of the two coordinates isn't
actually widely known in math and doesn't have a name. Fair enough.
Funded by -Tom-.
Yup, nothing interesting in here.
Except for maybe the confirmation that LS00.BB is used for the curved
bullets used by Shinki and EX-Alice?
Funded by -Tom-.
… because TH04's version of this takes the ASCII stage ID directly from
the resident structure.
So boss battles are simply triggered by setting the background tile
scroll speed to 0? That means…
Funded by zorg.
With TH05 definitely being the Galaxy Brain version of this function.
You'll see once I get to push the C decompilation for that one…
Anyway, that covers all shared input functions of TH02-TH05!
Funded by zorg.
Seemingly included in every other larger structure describing anything
remotely sprite-like. Couldn't find this in the earlier games,
unfortunately…
Funded by zorg.
TH02 just seems to use a sequence of branches in sub_D6CA.
I thought about keeping variable and function names consistent with
uth05win (on which most upcoming reverse-engineering commits will be
based). But as it will turn out, it ignored a certain very important
substructure, so I don't think it's worth introducing this naming style
clash.
Funded by zorg.
Yes, you're reading that correctly. If the cursor is at 255, reading a
16-bit value will fill the upper 8 bits with the neighboring cursor
value, which always is 0xFF.
Funded by -Tom-.
Look at that TH05 vector2_at_opt function. What the hell, the caller is
supposed to set up the stack frame for the function? How do you even get
a compiler to do this (and no, I haven't found a compiler switch)? No
way around writing a separate "optimizer" as part of the compilation
pipeline, it seems.
Oh, right, these functions can have parameters. So, let's turn snd_kaja_func()
into a macro that combines the function number and the parameter into the AX
value for the driver.
And renaming them all to the short filenames they will be decompiled to for
consistency. These functions aren't really immediately hardware-related, as
we've established earlier in the decompilation.
Only one code segment left in both OP and FUUIN! its-happening.gif
Yeah, that commit is way larger than I'm comfortable with, but none of these
functions is particularly large or difficult to decompile (with the exception
of graph_putsa_fx(), which I actually did weeks ago), and OP and MAIN have
their own unique functions in between the shared ones, so…
Yes, all of it. Including the bouncing polygons, of course. And since it's
placed at the end of ZUN's code inside the executable, the code's already
position-independent and fully hackable.
Turns out we're not quite done with reduction yet, as there still are a bunch
of macros in master.h that #define PC-98-specific hardware constants and I/O
ports.
With TH03 changing the calling convention for most of the code from __cdecl to
__pascal, I've been getting more and more confused about this myself. So,
let's settle on the following consistent syntax for function calls:
* C where the calling convention is actually __cdecl and where TASM's emitted
__cdecl code matches the original binary
* PASCAL where the calling convention is actually __pascal
* STDCALL where the calling convention is actually __cdecl, but where
the caller either defers stack cleanup (summing up the stack size of
multiple functions, then cleaning it all in a single "add sp" instruction)
or where the stack is cleared in a different way (e.g. "pop cx").
Unfortunately though, when using the ARG directive to automatically generate
an appropriate RET instruction for the given calling convention, TASM always
emits ENTER and LEAVE instructions even when no local variables are declared,
which greatly limits the number of functions where we can use that syntax. -.-
Note how it's only one *mode* in TH02/TH03, but two *modes* in TH04/TH05,
since you can't select between FM and Beep sound effect modes in TH02/TH03 (or
even disable sounds altogether). Might be a bit confusing, but it seemed
appropriate enough to distinguish the two functions.
Well, the naming.
Even though only TH02 actually uses MIDI (and thus, the MMD driver), every
game since then contains interrupt instructions for both functions. We could
just name it "pmd", since it seems like that's what came first - the AH
numbers of the 6 functions that make up MMD's interrupt API are identical to
those of the equivalent functions in PMD, even including gaps in the numbering
for PMD functions that don't have an equivalent in MIDI. However, except for
the FM sound effect handling and the key display in TH05's Music Room, these 6
functions are all the games actually use. Also, we already distinguish between
PMD and MMD in the driver check functions, and it might be confusing to only
imply PMD from now on?
So, "kaja" it is, collectively referring to the shared aspects of both
drivers.
Yup, packfiles finally proved that we really have a different set of changes
to master.lib in every game. Also, there are bound to be more of these game-
specific small changes to otherwise identical code in ZUN's own code.
And hey, no need to define that value in the build scripts anymore.
(I've also considered just copying modified versions into the individual game
subdirectories, but it's not too nice to expect people to diff them in order
to actually understand why these copies exist and where the changes actually
are.)