Whoops, turns out that the build has been broken on TASM32 version 5.3
(the one in the DevKit) ever since 7897bf1. In contrast to version 5.0
(which I use for my development), 5.3 actually defines 32-bit segments
if you specify a .386 CPU before using .MODEL.
That might have been the reason for the .286 workaround all along?
Turns out there's the USE16 modifier, which makes this much more
explicit than switching CPUs.
How was this game even *built*, originally, if it uses *both* a common
shared set of library functions *and* obviously copy-pasted and
separately compiled versions of some of these functions?
Part of P0132, funded by [Anonymous].
There's the better name, in ALLCAPS for improved grepping. TH01 is also
going to need a pseudo-binary to bundle translation units that appear
in more than one .EXE, and since "segment 2" would be wrong for that
game, it makes more sense to have one consistent name for these
pseudo-binaries in all games.
(Maintenance mode commit)
… with either the port or the argument in registers. This is one of the
few actually dumb concessions to the PI counter on the website.
Then again, binary *is* a better representation for the contents of the
GRCG tile register than hex…
Part of P0117, funded by [Anonymous].
About time I finally developed this piece of tech. Towards TH05, this
segment got more and more undecompilable ASM functions mixed inbetween
C ones. Which means that pretty much all of the current ASM land
`#include`s in that segment will have to become translation units. And
we *really* don't want an additional layer of numbered, per-binary
translation units that just `#include` maybe one or two functions.
Also yeah, no _TEXT suffix, to drive home the point that this is a
"library" segment, and not really "owned" by any one file.
Part of P0113, funded by Lmocinemod.
And with all possible .COM executables decompiled, this set of changes
reaches an acceptable scope, allowing us to *finally*…
Part of P0077, funded by Splashman and -Tom-.
The TH04/TH05 BGM/SE mode setup is a good example for code where
different structure field offsets will vanish completely upon reverse-
engineering. If we continued to use the per-game ID string as the
variable name, we'd only have another game-specific "difference" there.
Part of P0065, funded by Touhou Patch Center.
So many things named `score_*`, so many things named `hiscore_*`…
Let's go with `scoredat_*`, which clearly indicates that this stuff is
saved into a file, while still being only 8 characters.
Part of P0063, funded by -Tom-.
Not applying this leak to TH03 since it would have more than one
`key_det` variable, resulting in names that are as much fanfiction as
the current ones…
Yup, function parameters that can clearly be identified as coordinates
are by far the fastest way to raise the calculated position
independence percentage. Kinda makes it sound like useless work, which
I'm only doing because it's dictated by some counting algorithm on a
website, but decompilation will want to un-hex all of these values
anyway. We're merely doing that right now, across all games.
Part of P0058, funded by -Tom-.
Rule of thumb going forward: Everything that emits data is .asm,
everything that doesn't is .inc.
(Let's hope that th01_reiiden_2.inc won't exist for that much longer!)
Part of P0032, funded by zorg.
And already, the script begins to crumble, reminding me of what a
terrible idea it actually was. Like, if you did it for real, you'd get
so many false positives that the script stops being useful, since
every raw number above 0x90 (the size of the _DATA segment of the
Borland C++ DOS startup code) can potentially be a memory reference.
I do think that the script now covers the sweet spot between full-blown
emulation and shallow parsing though, so going to do at least a few
more files.
Which is the last step on the way to completely position-independent
code, with no random hex numbers that should have been data pointers,
but weren't automatically turned into data pointers by IDA because
they're only ever addressed in the indirect fashion of
mov bx, [bp-array_index]
mov ax, [bx+0D00h] ; 0D00h is obviously an array of some sort
Removing all of these makes it practicable to add or delete code without
breaking the game in the process. Basic "modding", so to speak.
Automatically catching all possible cases where this happens actually
amounts to emulating the entire game, and *even then*, we're not
guaranteed that the *size* of the array just falls out as a byproduct
of this emulation and the tons of heuristics I would have thrown on top
of that. ZUN hates proper bounds checking and the correct size of each
array may simply never be implied anywhere.
So, rather than going through all that trouble of that (and hell, I
haven't even finished *parsing* this nasty MASM assembly format), and
since nothing really has happened in this project for almost two years,
I chose to just turn this into a text manipulation issue and figure out
the rest manually. Yeah, quick and dirty, and it probably won't scale if
I ever end up doing the same for PC-98 Policenauts, but it'd better work
at least for the rest of PC-98 Touhou.
Trying to do one of those per day from now on. Probably won't make it
due to the reverse-engineering effort required for the big main
executables of each game, but it'd sure be cool if I did.
Oh, right, these functions can have parameters. So, let's turn snd_kaja_func()
into a macro that combines the function number and the parameter into the AX
value for the driver.
And renaming them all to the short filenames they will be decompiled to for
consistency. These functions aren't really immediately hardware-related, as
we've established earlier in the decompilation.
Only one code segment left in both OP and FUUIN! its-happening.gif
Yeah, that commit is way larger than I'm comfortable with, but none of these
functions is particularly large or difficult to decompile (with the exception
of graph_putsa_fx(), which I actually did weeks ago), and OP and MAIN have
their own unique functions in between the shared ones, so…
MAIN.EXE shares most of the code in this segment, but I can't remove it from
there right now due to the weird ordering of the data segments in that
executable…
And yes, once again, those three seemingly random type casts in here are
*necessary* to build a bit-perfect binary.
Small detour into MAINE.EXE because it has all the juicy algorithms that will
explain the remaining unknown members of the highscore data structure, and
there's this one code segment here we need to get out of the way first.
Don't really understand the other games yet because they start introducing
joystick support and TH03 has multiplayer and then there are these master.lib
modifications that don't really make any sense to me, especially when you add
that TH04 seemingly does not read js_stat *at all*, yet still works just fine
with a gamepad and... urgh.
Well, duh, of course, we *can* do this in order to allow decompilation to be
started at the end (not the beginning) of any segment. In fact, if we hadn't
done this, we would have had to start by moving _TEXT out to libraries....
This took long enough, so we're not covering the COM files right now. Like, I
can't even tell how you're supposed to work around the forced word alignment
for the _TEXT segment. Guess we'll just have to decompile all of these in one
go, just like we did with ZUNSOFT.COM.
Also, it really seems as if we're merely trading one ugly workaround for
another in our quest for identical binaries.
I've looked at every openly available piece of PC-98 documentation, and there
don't seem to be any official names for the individual planes. The closest
thing I could find was the description at
http://island.geocities.jp/cklouch/column/pc98bas/pc98disphw2.htm
explaining that they represent the blue, red, green, and brightness component
when using the default PC-98 palette. However, these planes correspond to
nothing else but the 4 individual bits of the final index into the color
palette, and you can assign any color to every single palette slot. Therefore,
it's merely a convention that your own palettes don't have to follow (and in
Touhou, they don't).
Nevertheless, there doesn't seem to be an alternative, and the Neko Project II
source code uses the same B/R/G/E convention, so I'll go with that as well.
Turns out we're not quite done with reduction yet, as there still are a bunch
of macros in master.h that #define PC-98-specific hardware constants and I/O
ports.
Also covering the two variations for blitting only every second row or
blitting only a 320x200 quarter, as seen in the endings.
So yeah, there's indeed nothing wrong with piread.cpp. TH03 just uses that
separate function that only blits every second row of an image, and indeed
always loads the entire image as it would appear in a PNG conversion. Here's
what happens if you display these images using the non-interlacing function:
https://www.dropbox.com/s/885krj09d9l0890/th03%20PI%20no%20interlace.png