Lovely. Turns out that all it needed to motivate the previous research
was one function that is simply too precious to be kept in ASM…
Part of P0137, funded by [Anonymous].
Turns out that this is one of the effects of the -WX option ("Create
DPMI application")… along with generally messing up code generation.
Nothing we can't work around though, luckily! Finally getting to cross
that off the list of reasons that prevent decompilation.
Part of P0137, funded by [Anonymous].
Might look uglier, but has the advantage of not generating an empty
segment with the default name… *and* the default padding, which will
really come in handy with the following breakthrough.
Part of P0137, funded by [Anonymous].
Whoops, turns out that the build has been broken on TASM32 version 5.3
(the one in the DevKit) ever since 7897bf1. In contrast to version 5.0
(which I use for my development), 5.3 actually defines 32-bit segments
if you specify a .386 CPU before using .MODEL.
That might have been the reason for the .286 workaround all along?
Turns out there's the USE16 modifier, which makes this much more
explicit than switching CPUs.
Finishing this push with some semi-maintenance… and yet another `inline`
function replacing a `#define` ASM macro with fully idiomatic C++ code.
Completes P0136, funded by [Anonymous].
Reason: Self-modifying. -.-
Also, why no GRCG? Would have allowed blitting via REP MOVSD… Might as
well optimize all the way if you're going the ASM route to begin with.
Part of P0136, funded by [Anonymous].
Eh, REP MOVSD is used too inconsistently across the games to justify
replacing these macros with an `inline` function. Still can use a
custom one here to make the register usage a bit more explicit, though.
Part of P0136, funded by [Anonymous].
Reason: Self-modifying. -.-
The TH05 version *might* be decompilable into a mess. Don't have time
for that right now, though.
Part of P0136, funded by [Anonymous].
eeb4e7e changed the final C translation unit that used this header to
C++, and we got some more helpful inline functions upcoming.
Part of P0136, funded by [Anonymous].
Almost undecompilable, until you remember that you can inhibit the
optimization of keeping a function parameter in a register by having an
`inline` function take a reference. Yet another function that wouldn't
have decompiled that nicely if we had restricted us to C…
Part of P0136, funded by [Anonymous].
If anything, these decompilations of barely decompilable functions can
at least indicate that I tried. 🤷 Nice __fastcall abuse though, which
allows us to least formalize *some* of the implicit state passed
between those functions.
Completes P0135, funded by [Anonymous].
Reasons:
• piano_fm_part_put_raw(): SI register referenced and not saved on
the stack
• piano_current_note_from(): Would be decompilable… into a mess.
Not worth adding a separate translation unit just for it.
• piano_part_keys_put_raw(): DI register saved before the SI register
• piano_pressed_key_put(): DI register referenced and not immediately
saved on the stack
• piano_label_put_raw(): SI and DI registered referenced and not saved
on the stack
• grcg_setcolor_direct_seg1_raw(): Let's procrastinate this one until
we have to reference all of these instances in C land.
And we could have even emitted that PIANO_KEY_PRESSED_TOP pixel data
into the code segment, by using `#pragma option -z` to give identical
names to both the code and the data segment. At least we can decompile
the first two functions here.
Part of P0135, funded by [Anonymous].
Not that it really fits there either, but I've been trying to keep the
th0?/ directories free from any actual code. They should only contain
the distinct translation units within the original three .EXE binaries,
`#include`ing files from subdirectories, along with maybe game-specific
`#pragma`s, but contain no code on their own. Port authors would simply
ignore those, and link everything from the subdirectories into one
binary. That approach has seemed to make the most sense for all of this
so far.
Part of P0135, funded by [Anonymous].
Allowing us to consistently mirror the declaration in pc98.inc
without adding a planar.inc file. 😛 And points us to two more
dots8_t* arrays that should have used the Planar<> template.
Part of P0135, funded by [Anonymous].
A decompilation of ZUN-written ASM that was almost worth it, for once!
Too bad that those aren't the <string.h> intrinsics that the
Wolfenstein 3D disassembly hinted at, though.
Part of P0135, funded by [Anonymous].
Reason: Pascal calling convention with function parameters but no stack
frame. Theoretically we can __emit__() everything inside this function,
but there's no way we can get a `RETN 8` this way. Oh, and it also
accesses SI and DI without backing them up to the stack.
And thanks to TLINK apparently not reporting fixup overflows when
segments are small enough (?), it took quite a while to get that CALL
correct and not weirdly offset by 32 bytes. 😕
Part of P0134, funded by [Anonymous].
> try to debug weird linker error
> find performance secret sauce
Removes roughly 1 second from the build time of the full repo!
Part of P0134, funded by [Anonymous].
That assembly is *worse* than what you would have gotten out of your
1994 C++ compiler with the 386 code generation switch!
Part of P0134, funded by [Anonymous].
> assigning to the DI register immediately before a CALL
Yeah, no amount of comma operator trickery can get *that* out of this
compiler. Also, these TH05 .PI functions are the only place in PC-98
Touhou with a `IMUL DI, imm8` instruction, which is impossible to get
out of Turbo C++'s built-in assembler.
Well, at least the `if` branches decompile somewhat nicely.
Part of P0134, funded by [Anonymous].
… especially because this one *is* actually undecompilable. Reason:
Base pointer assignment to BX, before saving the SI register on the
stack.
Part of P0134, funded by [Anonymous].
Well, it *would* have been decompilable, but that ridiculous placement
of the nullptr assignment would have forced the entire function call to
be spelled out in inline ASM, verbatim. No amount of comma operator
trickery would have generated the same instructions either. And for a
function this small and obvious in what its decompilation *should* be,
it really defeated the purpose of adding a separate translation unit…
Part of P0134, funded by [Anonymous].
Turns out that ARG RETURNS is only really necessary in DEFCONV
functions, which are explicitly declared to use either the C or PASCAL
calling convention. In functions without such a declaration, ARG by
itself works just fine, and won't emit any instructions on its own.
The parameter lists for PASCAL functions still have to be reversed in
that case, though… oh well, let's just comment these cases to hopefully
reduce the confusion.
Part of P0134, funded by [Anonymous].
`cPtrSize` is simply the wrong constant for calculating parameter
offsets on the stack, because it corresponds to the memory model's
default distance, not the function's distance. Luckily, ARG has a
RETURNS clause, and if you declare all parameters in there, ARG won't
emit that pesky and unnecessary `ENTER 0, 0` instruction. Big discovery
right there!
Sadly, ARG is unusable for ZUN's silly functions that keep the base
pointer in BX. TASM declares the resulting equates as `[BP+offset]`,
and it's apparently impossible to only get `offset` out of such an
equate later.
So, rather than staying with numbers, let's reimplement ARG for these
functions instead. This way, we can even abstract away the stack clear
size for the `RET` instructions.
It's a bit rough around the edges though, forcing you to explicitly
specify the function distance, and to pass the parameters in reverse
order compared to the C declaration (thankfully, all of these use the
PASCAL calling convention). It also doesn't work with more complex
types yet. But certainly better than numbers.
Part of P0134, funded by [Anonymous].
DOS is not the same thing as the underlying CPU, after all. A separate
file not only indicates to future port authors which parts of the code
are x86-specific, but it also speeds up build times…
… in theory, because removing 677 lines from 49 files each doesn't seem
to speed up the build as much as I had hoped? But apparently my whole
system mysteriously got faster in the meantime, and I was getting 22-23
seconds for the entire repo even before this commit. Good enough.
Part of P0134, funded by [Anonymous].
Turbo C++ 4.0J considers them different types internally, even though
they have the same size and signedness. Using `int` allows us to
more correctly redeclare a bunch of functions that originally use
`(unsigned) int` parameters as `(u)int16_t`.
Part of P0134, funded by [Anonymous].
It's not necessary for assigning `__seg` pointers to `far` ones, which
might even remove the <dos.h> dependency in some translation units.
Part of P0134, funded by [Anonymous].
Getting us completely macro-free there… even though it did require a
separate version of those functions if the ID is a pointer.
Part of P0134, funded by [Anonymous].
Reason: Wants to be word-aligned, and the previous version in OP.EXE,
game_exit(), is not, despite having an even length :(
Oh well, at least I'm confident enough about it by now to document it.
And out of all decompilations to be thrown away, this is a pretty
dispensable one.
Part of P0133, funded by [Anonymous].