Also great news for those people who want to remove any and all C++ in
their mods, because this forces us to spell out subpixel literals as
actual floats, every time. And with that, you're back to being able to
simply search-replace for all the instances you'll have to change.
Part of P0099, funded by Ember2528.
And immediately, we discover another two hardcoded sprites, with, of
course, another set of functions for blitting and unblitting them…
Part of P0099, funded by Ember2528.
1 wrong byte of ASM that slipped through due to a off-by-one error in
an experimental mzdiff branch. Having MOV rather than LES here was
ultimately a harmless mistake though. In context, it's even covered by
the "identical instruction encoding" exception 🙂
A future sprite converter (documented in #8) could then convert these
to C or ASM arrays.
(Except for the piano sprites for TH05's Music Room, which are stored
and used in such a compressed way that it defeats the purpose of
storing them as bitmaps.
Which works in both Borland C++, Open Watcom, and Visual C++.
Not that we're about to port any of the games to these compilers, just
something I noticed while evaluating 32-bit compilers for ReC98's own
32-bit pipeline tools. Modders might want to look into that though,
since 100% position independence also makes it easier to change
compilers.
And that's exactly where TH01's sprite flickering comes from. You
aren't supposed to unblit all entities of one type immediately before
rendering them, since sprites of different entity types are free to
overlap each other. And you might have already blitted one of those
before!
Not to mention that ZUN *still* uses the sloppy unblitting method here.
🙄
Part of P0098, funded by Yanga.
Why the sloppyness of unblitting a whole 16×16 rectangle *if you have
a dedicated function to precisely unblit a .PTN sprite using its alpha
mask*???
Oh well, it's not the regular function called in the main loop, so who
cares…?
Part of P0098, funded by Yanga.
"Physics". Not only did ZUN restrict the X velocity to the 5 discrete
states of -8, -4, 0, 4, and 8 (because hey, unaligned blitting is slow
anyway?), but gravity is also only applied every 5 frames.
We're still missing quite a bit of usage code, but these are the core
functions. One of which turned out to be undecompilable, due to… a
rigorously defined instruction order when performing arithmetic between
`double`s and `float`s?! Still, spelling out all this stuff in ASM
seems much better than somehow splitting the data segment, just so that
we can immediately use literals there.
Part of P0097, funded by Ember2528.
… Wow, the orb is *actually* only ever displayed at byte-aligned X
coordinates, divisible by 8. It's only thanks to the constant spinning
that its movement appears at least *somewhat* smooth.
(And yeah, this is purely a rendering issue; internally, its position
*is* tracked at pixel precision.)
Part of P0097, funded by Ember2528.
So even TH01 wasn't 100% C++ after all. Turns out that this function
was the only instance in all of REIIDEN.EXE where ReC98 previously had
different encodings for identical x86 instructions.
Part of P0096, funded by Ember2528.
Surprise, it's the (terribly suboptimal) setgsta() example function
from the PC-9801 Programmers' Bible! 100% identical, so ZUN must have
read that book as well.
Used for screen shaking effects, as well as the scrolling backgrounds
at the start of the Final Boss stages.
Completes P0095, funded by Yanga.
No idea why ZUN split the 6 slots into 4 with a regular alpha plane,
and 2 with a negated alpha plane. The latter are only used for Sariel's
animations.
Part of P0095, funded by Yanga.
> "Sure, this is a PI push, but this second-to-last set of PI false
positives so small, let's immediately decompile it ^.^"
> Time taken: 22 minutes
Nice when this works out!
Part of P0095, funded by Yanga.
Too bad we still have to bother with renaming the function via #define
in the few cases where an executable contains two or more copies of
this function. We can't just make it `static`, since TH02 MAINE.EXE
actually does far-call it from a different segment (and thus,
translation unit).
Part of P0094, funded by Yanga.
The REIIDEN.EXE version (which is only shown when game-overing) has a
completely invisible timeout that force-enters a high score name after
1000... *keyboard inputs*? Not frames? Why. Like, how do even you
realistically to such a number.
(Best guess: It's a hidden easter egg to amuse players who place
drinking glasses on cursor keys. Or beer bottles.)
And hey, that initialization of the name variable with ASCII spaces.
The only actually meaningful byte-wise access… except that it's not,
because ZUN could have just used the Shift-JIS ideographic space for
the exact same effect.
Completes P0093, funded by Ember2528.
Now with POSIX file I/O in both executables. And the confirmation that
the name array is indeed exclusively accessed per-byte.
Part of P0093, funded by Ember2528.
It's exactly as terrible as you would have expected after hearing
"alphabet cursor actually stored as on-screen position". Nothing gained
in reducing redundancy any further here, any meaningful change would
pretty much have to rewrite the entire thing.
Part of P0093, funded by Ember2528.
In which inline functions are apparently the only way to trick Turbo
C++ into not actually optimizing away the useless `AND AL, 0FFh`.
But come on. Just accessing the high score name characters in chunks of
16-bit Shift-JIS codepoints (which is what they are!) instead of
breaking them down to bytes *everywhere* would have been both more
readable (no swapping of Shift-JIS literals, no bitshifts), more
efficient (this isn't running on an 8-bit CPU after all), and less of a
waste of my time…
Completes P0092, funded by Yanga.
Lol, the internal alphabet cursor position is actually stored as the
on-screen top-left position of the selected character, which this
function then maps back to a character index. At least that gets rid of
quite a bunch of PI false positives now.
Part of P0092, funded by Yanga.
With ternary operator expressions straight out of Jigoku.
(TL note: Jigoku means hell.)
Nice to see that Turbo C++ apparently has no nesting limit on function
call inlining in general, though!
Part of P0092, funded by Yanga.
The only TH01 executable not supporting the numpad?
And that's the point where you just rewrite the distinct input_prev_*
values as a single array, because they're local to input_sense()
anyway.
Part of P0091, funded by Ember2528.
Yes, TH01's memory info screen will recurse into itself for every 3
frames the PgUp key is held, requiring one additional PgDown press per
recursion to actually get out of it.
You can, of course, also crash the system via a stack overflow this
way, if that's your thing.
Part of P0091, funded by Ember2528.
Of which we can express precisely *nothing* as an inline function,
because Turbo C++ always emits a useless `JMP SHORT $+2` at the end of
such an inlined function if it contains nested `if` statements. This is
also what forced some of the functions in 90252cc to be expressed as
macros. By now, this is clear enough to be documented separately.
And to warrant this separate commit.
Completes P0090, funded by Yanga.
Starting with the odd one out, the one that doesn't use master.lib and
has two input sense functions: one for the main menu, and one for the
option window.
Both of which also immediately perform the ring arithmetic on the menu
cursor variable… because there's nothing else to be done with these
inputs in OP.EXE? Separating input sensing from processing apparently
wasn't all too obvious of a thought, and it's only truly done in TH02
and later.
Part of P0090, funded by Yanga.
That's… pretty specific. The only thing on the main menu with this
color is the "1996 ZUN" text at the bottom… probably part of an
effect that we never got to see. Every other idea would be baseless
speculation, given that the snapped buffer isn't used anywhere else.
Part of P0090, funded by Yanga.
Adding op/, main/, and end/ directories does nicely cover a great
majority of the "not really further classifiable slices" implied in
d56bd45.
Part of P0086, funded by [Anonymous] and Blue Bolt.
Again, 11 necessary workarounds, vs. forcing byte aligment in at least
18 places, and that number would have significantly grown in the
future.
Part of P0085, funded by -Tom-.
5 enums where code generation wants an `int`, vs. 11 cases where using
the minimum size is exactly the right default. So it's way more
idiomatic to force those 5 to 16 bits via a dummy element… except that
we can't give it a single, consistent name, because you can't redeclare
the same element in a different enum later.
Oh well, let's have this ugly naming convention instead, which makes it
totally clear that the force element not, in fact, a valid value of
that enum.
Part of P0085, funded by -Tom-.
Which repurpose the .PTN image slots to store the background of
frequently updated VRAM sections, like all the numbers in the HUD.
Future games would simply use the text RAM and gaiji for numbers. Which
would have worked just fine for TH01 as well (especially since all the
functions we've seen so far are aligned to the 8-pixel byte grid), but
it looks as if ZUN simply wasn't aware of gaiji during the development
of TH01.
Part of P0083, funded by Yanga.
Mangled C++ function names would *not* have been a mistake if I hadn't
made the other mistake of restricting parts of the code to C…
Part of P0083, funded by Yanga.
Which actually does inline… in C++, because Turbo C++ doesn't support
the `inline` keyword in C mode. So much for the superiority of that
language, even in 1994…
Part of P0083, funded by Yanga.
All the weird double returns in FUUIN.EXE just magically appear with
-O-! 😮
And yeah, it's a bowl of global state spaghetti once again. 🍝 Named
the functions in a way that would make sense to a user of the API, who
should be aware of typical side effects, like, y'know, a changed
hardware palette… That's how you end up with the supposed "main"
function getting a "_palette_show" suffix.
Completes P0082, funded by Ember2528.
They do make everything so much simpler, after all. Especially now that
th01/formats/grz.cpp should compile as a freestanding translation unit,
as part of the .GRZ viewer.
Part of P0082, funded by Ember2528.
Since we not only have the .PTN sub-image count array in the middle of
all those .GRP flags, but the .PTN loading code also reusing the
palette set flag…
Part of P0082, funded by Ember2528.
Turns out that the .GRP file functions are, of course, also present in
FUUIN.EXE… but use different palette fade functions there? This is the
first function in the way.
Part of P0082, funded by Ember2528.
Yet another run-length encoded graphics format, this one being
exclusively used to wastefully store Konngara's sword slash and kuji-in
kill "animation".
But for once, the terrible code generated by inline functions with
non-literal parameters perfectly matches what ZUN wrote here.
Part of P0081, funded by Ember2528.
I tried `brge` for the latter, but that had *the* most horrible
ergonomics, and I misspelled it as `bgre` 100% of the times I typed it
manually. Turns out that `dots` is also consistent with master.lib's
naming scheme, leaving `planar` to *actually* refer to types storing
multiple planes worth of pixels. These types are showing up more and
more, and deserve something better than their previous long-winded and
misleading name.
Part of P0081, funded by Ember2528.
Just like with the z_text_*() functions, master.lib doesn't already
have graph_init() and graph_exit() either, and once again, ZUN's code
here doesn't fully correspond to any master.lib function. Unlike the
z_text_*() functions though, those names aren't really the best
descriptions for these rather random combinations of BIOS calls and I/O
port writes…
Anyway, that's the entire segment!
Part of P0080, funded by Ember2528 and Splashman.
No macros for the port numbers here! Anyone who will try to read and
understand this code will probably want to look those up in PC-98
hardware documentation, and macros would just be an annoying layer of
indirection then.
Part of P0080, funded by Ember2528 and Splashman.
Page… 2? On a system with only page 0 and 1? Had to get out my real
PC-98 to double-check that I wasn't missing anything here, since
every emulator only looks at the bottom bit of the page number. But
real hardware seems to do the same, and there really is nothing special
to it semantically, being equivalent to page 0. 🤷
Part of P0080, funded by Ember2528 and Splashman.
Allowing us to then retrieve it using a function call with no run-time
cost, although we do have to be careful with the types here.
Also, is that another solution to decompilation puzzles that involve
types of number literals?
Part of P0080, funded by Ember2528 and Splashman.
And with all possible .COM executables decompiled, this set of changes
reaches an acceptable scope, allowing us to *finally*…
Part of P0077, funded by Splashman and -Tom-.
In which our typedefs mercilessly reveal ZUN's original sloppiness, and
the unncessary sign extension taking place here. Also, ❎ unused…
Completes P0069, funded by [Anonymous] and Yanga.
Yes, when clipping the start and end points to the screen area, ZUN
uses an integer division to calculate the line slopes, rather than a
floating-point one. Doesn't seem like it actually causes any incorrect
lines to be drawn, though; that case is only hit in the Mima boss
fight, which draws a few lines with a bottom coordinate of 400 rather
than 399. It *might* also restore the wrong pixels at parts of the
YuugenMagan fight, causing weird flickering, but seriously, that's an
issue everywhere you look in this game.
Part of P0069, funded by [Anonymous] and Yanga.
Templates would have been nicer, but as soon as you add just one
non-immediate parameter, Turbo C++ generates a useless store to a new
local variable, ruining the generated code.
Part of P0069, funded by [Anonymous] and Yanga.
TH01's (original) version also replicates the PC-98 text RAM's reverse
and underline attributes. Which was removed in later games,
interestingly and inconsistently enough.
Part of P0068, funded by Yanga.
I did consider not doing this, because "well, can't anyone who's
*actually* interested just diff the TH01 and TH02 implementations to
figure out the differences themselves", but that duplication ended up
feeling too filthy after all.
And hey, it's a nice excuse to update TH02's version to current naming
standards! 😛
Part of P0068, funded by Yanga.
Including the longest function present in more than one game among all
of PC-98 Touhou, and #23 on the list of longest functions overall,
which draws a 1-pixel line between two arbitrary pixels.
Completes P0067, funded by Splashman.
Semi-unused, that is, the one use of this function doesn't actually
move the rectangle to a different position. Ironically, the non-moving
back-to-front function immediately above *is* unused…
Also, too bad that stack order is the only reason we can't use structs
to combine all plane variables into a single object.
Part of P0067, funded by Splashman.
Previously sloppily mis-RE'd as "some page variable, idk", back in
2015…
Now also with a page number typedef. And yeah, restricting bool to C++
has now proven to be stupid after all.
Part of P0067, funded by Splashman.
Which store colors as GRB, as suggested by the structure's ID string.
Even master.lib's own functions add an additional XCHG AH, AL
instruction to get colors into and out of this format. MASTER.MAN
suggests that it's some sort of standard on PC-98. It does match the
order of ths hardware's palette register ports, after all.
(0AAh = green, 0ACh = red, 0AEh = blue)
Now we also know why __seg* wasn't used more commonly, as lamented in
c8e8e98. Turbo C++ simply doesn't support a lot of arithmetic on
segment pointers.
And then that undecompilable far call to a function within the same
segment, but inside a different translation unit…
Also, thanks again to Egor for the SCOPY@ hack that debuted in 0460072.
Would have probably struggled with this a lot more without that.
And *then* you realize that TH01 effectively doesn't even use the
resident palette. 😐
And yes, we're procrastinating the whole issue of potentially using
a single translation unit for all three binaries by using a common
segment name, because it *really* isn't that easy.
Completes P0066, funded by Keyblade Wiedling Neko and Splashman.
The TH04/TH05 BGM/SE mode setup is a good example for code where
different structure field offsets will vanish completely upon reverse-
engineering. If we continued to use the per-game ID string as the
variable name, we'd only have another game-specific "difference" there.
Part of P0065, funded by Touhou Patch Center.
At least wherever Turbo C++ and master.lib want us to use a
non-pointer, since both use uint16_t for segment values throughout
their APIs instead of the more sensible void __seg*. Maybe, integer
arithmetic on segment values was widely considered more important than
dereferencing?
*Finally*. We already used `(unsigned) int` in quite a few places where
we actually want a 16-bit value, which was bound to annoy future port
developers.
I thought this would be a good target for my first attempt at
decompilation. Should have known something was up, because it's the only
undecompiled proc in the translation unit.
Oh well, SCOPY stands for "structure copy" anyway. /s
Rule of thumb going forward: Everything that emits data is .asm,
everything that doesn't is .inc.
(Let's hope that th01_reiiden_2.inc won't exist for that much longer!)
Part of P0032, funded by zorg.
Only one code segment left in both OP and FUUIN! its-happening.gif
Yeah, that commit is way larger than I'm comfortable with, but none of these
functions is particularly large or difficult to decompile (with the exception
of graph_putsa_fx(), which I actually did weeks ago), and OP and MAIN have
their own unique functions in between the shared ones, so…
Which, for some reason, is also found in the MAIN.EXE of every later game
in between completely unrelated hardware and file format functions.
Separate commit because it has its own segment in REIIDEN.EXE, and because
coming up with the nice function names took pretty long, since I haven't done
anything involving trigonometry in the past 5 years...
Yet another set of questionable C reimplementations of master.lib functions to
waste my time. And half of them, including z_text_(v)putsa, aren't even called
anywhere.
So apparently, TH01 isn't double-buffered in the usual sense, and instead uses
the second hardware framebuffer (page 1) exclusively to keep the background
image and any non-animated sprites, including the cards. Then, in order to
limit flickering when animating the bullet, character and boss sprites on top
of that (or just to the limit number of VRAM accesses, who knows), ZUN goes to
great lengths and tries to make sure to only copy back the pixels that were
modified on plane 0 in the last frame.
(Which doesn't work that well though. When you play the game, you still notice
tons of flickering whenever sprites overlap.)
And by "great lengths", I mean "having a separate counterpart function for
each shape and sprite animated which recalculates and copies back the same
pixels from plane 1 to plane 0", because that's what the new functions here
lead me to believe. Both of them are only called at one place: the wave
function on the second half of Elis' entrance animation, and the horizontal
masked line function for Reimu's X attack animations.
This function raises one of those essential questions about the eventual ports
we'd like to do. I'll explain everything more thoroughly here, since people
who might complain about the ports not being faithful enough need to
understand this.
----
The original plan was aim for "100% frame-perfect" ports and advertise them as
such. However, the PC-98 is not a console with fixed specs. As the name
implies, it's a computer architecture, and a plethora of different, more and
more powerful PC-98 models were released during its lifespan. Even if we only
consider the subset of products that fulfills the minimum requirements to run
the PC-98 Touhou games, that's still a sizable number of systems.
Therefore, the only true definition of a *frame* can be "everything that is
drawn between two Vsync wait calls". Such a *frame* may contain certain
expensive function calls, and certain systems may run these functions slower
than the developer expected, thus effectively leading to more *frames* than
the developer explicitly specified.
This is one of those functions.
Here, we have a scaling function that appears to be written deliberately to
run very slow, which ends up creating the rolling effect you see in the route
selection and the high score and continue screens of TH01. However, that
doesn't change the fact that the function is still CPU-bound, and neither
waits for Vsync nor is iteratively called by something that does. The faster
your CPU, the faster the rolling effect gets… until ultimately, it's faster
than one frame and therefore vanishes altogether. Mind you, this is true on
both emulators and real hardware. The final PC-98 model, the Ra43, had a CPU
clocked at 433 Mhz, and it may have even been instant there.
If you use more optimized algorithm, it also runs faster on the same CPU (I
tried this, and it worked beautifully)… you get the idea.
Still, it may very well be that this algorithm was not a deliberate choice and
simply resulted from a lack of experience, especially since this was ZUN's
first game.
That leaves us with two approaches to porting functions like these:
1) Look at the recommended system requirements ZUN specified, configure the
PC-98 emulator accordingly, measure how much of the work is done in each
frame, then rewrite the function to be bound to that specific frame rate…
2) …or just continue using a CPU-bound algorithm, which will pretty much
complete instantly on any modern system.
I'd argue that 2) is actually the more "faithful" approach. It will run faster
than the typical clock speeds people emulate the games at, and maybe draw a
bit of criticism because of that, but it seems a lot more rational than the
approximation provided by 1). Not to mention that it's undeniably easier to
implement, and hey, a faster game feels a lot better than a slower one, right?
… Oh well, maybe we'll still encounter some kind of CPU-bound animation that
is so essential to the experience that we do want to lock it to a certain
frame rate…
Fuck TH02 and above and their bizarre assembly code that indeed appears to be,
uh, playfully "optimized" in the most inadequate of places, far away from the
innermost loop. It's ALWAYS just these one or two instructions I just can't
fucking get out of the C compiler, which lead to the conclusion that these
functions must have either been first compiled to assembly, then "fine-tuned"
and then linked into the executable…
… or I'm really just missing some obscure compiler setting.
At least with TH01, you can tell that the source language must have undeniably
been C++, and the decompilation is a breeze.
Small detour into MAINE.EXE because it has all the juicy algorithms that will
explain the remaining unknown members of the highscore data structure, and
there's this one code segment here we need to get out of the way first.
The same function appears unused in TH02's MAINE.EXE. Separate commit because
this was painful enough and we can link the C version into FUUIN.EXE right
now.
I've looked at every openly available piece of PC-98 documentation, and there
don't seem to be any official names for the individual planes. The closest
thing I could find was the description at
http://island.geocities.jp/cklouch/column/pc98bas/pc98disphw2.htm
explaining that they represent the blue, red, green, and brightness component
when using the default PC-98 palette. However, these planes correspond to
nothing else but the 4 individual bits of the final index into the color
palette, and you can assign any color to every single palette slot. Therefore,
it's merely a convention that your own palettes don't have to follow (and in
Touhou, they don't).
Nevertheless, there doesn't seem to be an alternative, and the Neko Project II
source code uses the same B/R/G/E convention, so I'll go with that as well.
Yup, packfiles finally proved that we really have a different set of changes
to master.lib in every game. Also, there are bound to be more of these game-
specific small changes to otherwise identical code in ZUN's own code.
And hey, no need to define that value in the build scripts anymore.
(I've also considered just copying modified versions into the individual game
subdirectories, but it's not too nice to expect people to diff them in order
to actually understand why these copies exist and where the changes actually
are.)