Well, duh, of course, we *can* do this in order to allow decompilation to be
started at the end (not the beginning) of any segment. In fact, if we hadn't
done this, we would have had to start by moving _TEXT out to libraries....
Well, that became unbearable pretty quickly. Not sure whether I'm doing all
this Makefile business right, but this looks pretty nice.
It doesn't really help much at this point though because the 32-bit part is
still entirely separate and forces everything to rebuild all the time, but at
least it aborts on C compiler errors.
After spending a few hours on correctly decompiling ZUN's bulky custom text
renderer used in TH02 and TH03, it unfortunately turned out that TLINK doesn't
actually give us the fine-grained control over segment ordering we'd like to
have in a project like this, and that we can't slot code from one object file
in between segments from another object file. This means that yes, we really
have to decompile the functions in the order they appear in the executables,
starting on either end.
So, have a boring janitorial commit instead.
This took long enough, so we're not covering the COM files right now. Like, I
can't even tell how you're supposed to work around the forced word alignment
for the _TEXT segment. Guess we'll just have to decompile all of these in one
go, just like we did with ZUNSOFT.COM.
Also, it really seems as if we're merely trading one ugly workaround for
another in our quest for identical binaries.
Yup, we'll be linking against the original binary blob for the time being.
Don't worry though, we will (and in fact, have to) recompile the libraries
from source, separately for each game, as part of the build process in the
future, but we'll get to that once we've decompiled some of the non-TH01 code.
So yeah, that'll be our build environment - just plain batch files calling the
Borland command-line assembler, linker, and eventually C compiler. These are
the exact tools that ZUN used as well. There certainly are other assemblers,
compilers and linkers that could compile this code into 16-bit DOS
executables; Open Watcom is the only free one I know, and the master.lib
manual also mentions C compilers by Microsoft and Symantec. However, I favor
having one clear build path for a single toolchain that will, with the correct
command-line switches for each game, create builds that are bit-perfect to
ZUN's original ones over the possibility of cross-platform builds and the
maintenance nightmare they add.
So, Borland-only it is.
(Also, no Makefile, due to our messy build setup. I think I still prefer this
solution though, as we can have these really nice error messages that double
as build instructions without any dependencies on installed software.)
I kinda wanted to wait with this until I've brought REIIDEN.EXE down to at
least 65,536 lines as well, but that's not going to happen anytime soon, and
this split has annoyed me enough by now...
I've looked at every openly available piece of PC-98 documentation, and there
don't seem to be any official names for the individual planes. The closest
thing I could find was the description at
http://island.geocities.jp/cklouch/column/pc98bas/pc98disphw2.htm
explaining that they represent the blue, red, green, and brightness component
when using the default PC-98 palette. However, these planes correspond to
nothing else but the 4 individual bits of the final index into the color
palette, and you can assign any color to every single palette slot. Therefore,
it's merely a convention that your own palettes don't have to follow (and in
Touhou, they don't).
Nevertheless, there doesn't seem to be an alternative, and the Neko Project II
source code uses the same B/R/G/E convention, so I'll go with that as well.
Yup, the code for the first ZUN Soft logo is now completely position-
independent and ready to be decompiled.
(Also, TIL that the PC-98 GRCG has hardware support for double-buffering
through page flipping. Heh, at least one feature that makes it a viable system
for games...)
Turns out we're not quite done with reduction yet, as there still are a bunch
of macros in master.h that #define PC-98-specific hardware constants and I/O
ports.
Also covering the two variations for blitting only every second row or
blitting only a 320x200 quarter, as seen in the endings.
So yeah, there's indeed nothing wrong with piread.cpp. TH03 just uses that
separate function that only blits every second row of an image, and indeed
always loads the entire image as it would appear in a PNG conversion. Here's
what happens if you display these images using the non-interlacing function:
https://www.dropbox.com/s/885krj09d9l0890/th03%20PI%20no%20interlace.png
With TH03 changing the calling convention for most of the code from __cdecl to
__pascal, I've been getting more and more confused about this myself. So,
let's settle on the following consistent syntax for function calls:
* C where the calling convention is actually __cdecl and where TASM's emitted
__cdecl code matches the original binary
* PASCAL where the calling convention is actually __pascal
* STDCALL where the calling convention is actually __cdecl, but where
the caller either defers stack cleanup (summing up the stack size of
multiple functions, then cleaning it all in a single "add sp" instruction)
or where the stack is cleared in a different way (e.g. "pop cx").
Unfortunately though, when using the ARG directive to automatically generate
an appropriate RET instruction for the given calling convention, TASM always
emits ENTER and LEAVE instructions even when no local variables are declared,
which greatly limits the number of functions where we can use that syntax. -.-
AKA "the source of the infamous STOP message".
This is pretty much irreducible assembly code, so it may very well be that we
don't even touch this file ever again, but at least it completes our build.
This executable is embedded into all 4 versions of ZUN.COM. It was written by
KAJA, not ZUN, so we don't care about anything in there - not that it would
matter for porting anyway. We only need that binary to be able to create
bit-perfect rebuilds of ZUN.COM in the future.
Once again, TH05 demonstrates that it's not a mere copy of TH04 by introducing
another set of code changes. This time, the configuration structure is
initialized with the default values in this executable, not in OP.EXE.
The code doesn't give away the original filename in this game, so I'll follow
the pattern of naming these after the ID of the game's resident configuration
structure.
TH03 doesn't prepare the initial high score list (instead leaving that to
MAINL.EXE), and the config file creation is identical to the one in TH02.
2 functions, surrounded by 88.8% of library code. Way to go.
From what I can tell, this program does exactly three things:
• preparing the initial high score list
• writing default settings to HUUMA.CFG
• and allocating the game's resident configuration structure and writing its
segment address to bytes 6-7 of HUUMA.CFG
All that results in a COM file of 6.84 KiB, 83% of which is library code.
That's why C was once seen as a bloated high-level language as well.
Yep, we'll be needing some of those smaller executables embedded into ZUN.COM
after all in order to fully understand what's going on with things like that
persistent configuration structure used in each game, for example.
For now, I'll be keeping every one of these executables separately, for a
number of reasons:
• I can't get IDA to segment the code in a way that would reconstruct the
layout of the individual executables, since it unfortunately requires
segments to be aligned on paragraph boundaries...
• This, in turn, means that IDA can't apply FLIRT signatures, making
identification of the Borland C++ functions a bit harder. Probably not that
big of a deal at this point anymore, but still.
• There are bound to be multiple copies of Borland C++ and master.lib
functions in these. We are still using the "slice model", meaning that *all*
functions in an executable are part of the same namespace. Creating copies of
some source files just to allow a second instance of that function is not
too pretty.
• Lastly, we don't actually need to reproduce all executables. For example,
TH02's version of ZUNSOFT.COM is bit-identical to TH01's.
Hence, adding a separate build step to wrap these smaller executables back
into a bit-perfect version of ZUN.COM at a later point is a much better
option. (And it would be even better if we could track down the program used
to wrap those in the first place!)
Note how it's only one *mode* in TH02/TH03, but two *modes* in TH04/TH05,
since you can't select between FM and Beep sound effect modes in TH02/TH03 (or
even disable sounds altogether). Might be a bit confusing, but it seemed
appropriate enough to distinguish the two functions.
Well, the naming.
Even though only TH02 actually uses MIDI (and thus, the MMD driver), every
game since then contains interrupt instructions for both functions. We could
just name it "pmd", since it seems like that's what came first - the AH
numbers of the 6 functions that make up MMD's interrupt API are identical to
those of the equivalent functions in PMD, even including gaps in the numbering
for PMD functions that don't have an equivalent in MIDI. However, except for
the FM sound effect handling and the key display in TH05's Music Room, these 6
functions are all the games actually use. Also, we already distinguish between
PMD and MMD in the driver check functions, and it might be confusing to only
imply PMD from now on?
So, "kaja" it is, collectively referring to the shared aspects of both
drivers.
Thanks to the LOCALS directive, we do need to break compatibility to TASM at
one point after all. This is the rest we can reasonably change to get at least
through JWasm's first pass without errors while maintaining compatibility to
TASM.
Includes:
* the OPTION syntax to switch in and out of floating-point emulation mode
* REP CMPSB → REPE CMPSB
* Hacks for two 80-byte short jumps
* lack of support for floating-point stupidity ♥
as well as other issues that I covered in previous commits and overlooked in
some files.
From the TASM manual:
"NEAR labels defined with the colon directive (:) are considered block-scoped
if they are located inside a procedure, and you've selected a language
interfacing convention with the MODEL statement. However, these symbols are
not truly block-scoped; they can't be defined as anything other than a near
label elsewhere in the program."
MASM's own local label syntax - declaring labels using @@ and then jumping to
the next and previous @@ using @F and @B - is obviously too limiting for any
longer function, and is not even supported by TASM unless we switch it to MASM
mode completely.
While this is indeed ugly, it only affected 16 files, which is way less than
what we would get in a TASM build without LOCALS. In comparison to having a
modern, cross-platform assembler, that really is a small price to pay.
Really, Borland? You considered it necessary to add directives for object-
oriented programming (in Assembly!) and convenience features like bitfield
records or PUSHSTATE/POPSTATE, yet you never came up with the actually
*helpful* idea of just adding a simple basic pointer data type that depends
on the current memory model's data size?
Like, something like DP... oh wait, that's already taken, as an alias for
DF, the 48-bit 80386 far pointer type.
And this, exactly, is the problem with assemblers. The language itself is
undefined beyond the instructions themselves, but it's obviously very
uncomfortable to program anything with just that, so your assembler needs to
add custom directives on top of that, and of course everyone has different
ideas of the features and use cases that should (and should not) be covered by
syntax. (I'm looking especially at you, NASM.)
And then one of those developers sells their compiler division to a different
company, which then subsequently discontinues all products without ever
releasing the source code, trapping their nice extensions in a single
executable for a single platform that is not even legally available anymore.
tl;dr: http://xkcd.com/927/
For 32-bit immediate values, PUSH by itself is enough. For everything else,
PUSHD works in both TASM and JWasm.
Also, could it be...? Could we actually move to JWasm without breaking the
build in TASM at all?