2019-11-19 23:05:00 +00:00
|
|
|
|
## Welcome!
|
|
|
|
|
|
|
|
|
|
If we've seen you doing any kind of reverse-engineering or modding work on the
|
|
|
|
|
PC-98 Touhou games before, you might have already been [invited as a
|
|
|
|
|
collaborator][2]. In that case, feel free to create separate branches for your
|
|
|
|
|
work directly in this repository – this will immediately inform anyone who
|
|
|
|
|
watches this repo or subscribed to a webhook. If you prefer, you can still
|
|
|
|
|
use your own fork though.
|
|
|
|
|
|
|
|
|
|
### What can I do on these separate branches?
|
|
|
|
|
|
|
|
|
|
Anything – reverse-engineering and decompilation of original ZUN code (which
|
|
|
|
|
then could be merged back into `master` after review) or your own custom mods,
|
|
|
|
|
no matter how large or small.
|
|
|
|
|
|
|
|
|
|
For starters, simply naming functions or global variables to reflect their
|
|
|
|
|
actual intent will already be helpful. *Any* name is better than
|
|
|
|
|
`sub_<something>`, and can always be fixed or improved later.
|
|
|
|
|
|
2019-11-04 21:14:25 +00:00
|
|
|
|
# Contribution guidelines
|
|
|
|
|
|
|
|
|
|
## Rule #1
|
|
|
|
|
|
|
|
|
|
**`master` must never introduce code changes that change the decompressed
|
|
|
|
|
program image, or the unordered set of relocations, of any original game
|
|
|
|
|
binary, as compared using [mzdiff].** The only allowed exceptions are:
|
|
|
|
|
1) different encodings of identical x86 instructions within code segments
|
|
|
|
|
2) padding with `00` bytes at the end of the file.
|
|
|
|
|
|
2020-06-05 15:19:44 +00:00
|
|
|
|
These cases should gradually be removed as development goes along, though.
|
|
|
|
|
|
2019-11-04 21:14:25 +00:00
|
|
|
|
## Taste issues
|
|
|
|
|
|
|
|
|
|
* Use tabs for indentation.
|
|
|
|
|
|
|
|
|
|
* Spaces for alignment are allowed, especially if they end up giving the code
|
|
|
|
|
a nice visual structure, e.g. with multiple calls to the same function with
|
|
|
|
|
varying pixel coordinates.
|
|
|
|
|
|
|
|
|
|
* Don't indent `extern "C"` blocks that span the entire file.
|
|
|
|
|
|
|
|
|
|
* Always use `{ brackets }`, even around single-statement conditional
|
|
|
|
|
branches.
|
|
|
|
|
* Add spaces around binary operators. `for(i = 0; i < 12; i++)`
|
|
|
|
|
|
|
|
|
|
* Variables should be *signed* in the absence of any ASM instruction
|
|
|
|
|
(conditional jump, arithmetic, etc.) or further context (e.g. parameters
|
|
|
|
|
with a common source) that defines their signedness. If a variable is used
|
|
|
|
|
in both signed and unsigned contexts, declare it as the more common one.
|
|
|
|
|
|
2020-06-19 15:50:05 +00:00
|
|
|
|
## Compatibility
|
|
|
|
|
|
|
|
|
|
* Use `__asm` as the keyword for inline assembly. This form works in Borland
|
|
|
|
|
C++, Open Watcom, and Visual C++, which will ease future third-party ports.
|
|
|
|
|
|
2019-11-04 21:14:25 +00:00
|
|
|
|
## Code organization
|
|
|
|
|
|
|
|
|
|
* Try to avoid repeating numeric constants – after all, easy moddability
|
|
|
|
|
should be one of the goals of this project. For local arrays, use `sizeof()`
|
2019-11-19 21:08:45 +00:00
|
|
|
|
if the size can be expressed in terms of another array or type. Otherwise,
|
|
|
|
|
`#define` a macro if there is a clear intent behind a number.
|
|
|
|
|
(Counterexample: Small, insignificant amounts of pixels in e.g. entity
|
|
|
|
|
movement code.)
|
|
|
|
|
|
2020-01-16 19:26:17 +00:00
|
|
|
|
* Try rewriting padding instructions in ASM land into TASM directives:
|
|
|
|
|
|
|
|
|
|
* `db 0` / `NOP` → `even` / `align 2`
|
|
|
|
|
* `db ?` → `evendata`
|
|
|
|
|
|
|
|
|
|
This makes mzdiffs a bit shorter in common cases where a single byte was
|
|
|
|
|
erroneously added somewhere, by providing a chance for the code to catch up
|
|
|
|
|
to its original byte positions.
|
2019-12-24 15:51:43 +00:00
|
|
|
|
|
2019-11-04 21:14:25 +00:00
|
|
|
|
* Documenting function comments exclusively go into C/C++ header files, right
|
|
|
|
|
above the corresponding function prototype, *not* into ASM slices.
|
|
|
|
|
|
2020-01-15 19:23:27 +00:00
|
|
|
|
* Newly named symbols in ASM land (functions, global variables, `struc`ts, and
|
|
|
|
|
"sequence of numeric equate" enums) should immediately be reflected in C/C++
|
|
|
|
|
land, with the correct types and calling conventions. Typically, these
|
|
|
|
|
definitions would go into header files, but they can stay in .c/.cpp files
|
|
|
|
|
if they aren't part of a public interface, i.e., not used by unrelated
|
|
|
|
|
functions.
|
2019-11-04 21:14:25 +00:00
|
|
|
|
|
2020-01-17 21:26:26 +00:00
|
|
|
|
* Compress calls to *known* functions in ASM land to use TASM's one-line,
|
|
|
|
|
interfaced call syntax, whenever all parameters are passed via consecutive
|
|
|
|
|
`PUSH` instructions:
|
|
|
|
|
|
|
|
|
|
* `pascal`:
|
|
|
|
|
<table>
|
|
|
|
|
<tr>
|
|
|
|
|
<td>
|
|
|
|
|
<code>push param1</code><br />
|
|
|
|
|
<code>push param2</code><br />
|
|
|
|
|
<code>call foo</code>
|
|
|
|
|
</td>
|
|
|
|
|
<td>→</td>
|
|
|
|
|
<td>
|
|
|
|
|
<code>call foo pascal, param1, param2</code>
|
|
|
|
|
</td>
|
|
|
|
|
</tr>
|
|
|
|
|
</table>
|
|
|
|
|
|
|
|
|
|
* `__cdecl`, single call, single parameter:
|
|
|
|
|
<table>
|
|
|
|
|
<tr>
|
|
|
|
|
<td>
|
|
|
|
|
<code>push param1</code><br />
|
|
|
|
|
<code>call foo</code><br />
|
|
|
|
|
<code>pop cx</code>
|
|
|
|
|
</td>
|
|
|
|
|
<td>→</td>
|
|
|
|
|
<td>
|
|
|
|
|
<code>call foo stdcall, param1</code><br />
|
|
|
|
|
<code>pop cx</code>
|
|
|
|
|
</td>
|
|
|
|
|
</tr>
|
|
|
|
|
</table>
|
|
|
|
|
|
|
|
|
|
* `__cdecl`, single call, multiple parameters:
|
|
|
|
|
<table>
|
|
|
|
|
<tr>
|
|
|
|
|
<td>
|
|
|
|
|
<code>push param2</code><br />
|
|
|
|
|
<code>push param1</code><br />
|
|
|
|
|
<code>call foo</code><br />
|
|
|
|
|
<code>add sp, 4</code>
|
|
|
|
|
</td>
|
|
|
|
|
<td>→</td>
|
|
|
|
|
<td>
|
|
|
|
|
<code>call foo c, param1, param2</code>
|
|
|
|
|
</td>
|
|
|
|
|
</tr>
|
|
|
|
|
</table>
|
|
|
|
|
|
|
|
|
|
* `__cdecl`, single call, 32-bit parameters (Note that you have to use
|
|
|
|
|
`large` whenever a parameter happens to be 32-bit, even if the disassembly
|
|
|
|
|
didn't need it):
|
|
|
|
|
<table>
|
|
|
|
|
<tr>
|
|
|
|
|
<td>
|
|
|
|
|
<code>push 012345678h</code><br />
|
|
|
|
|
<code>pushd param1</code><br />
|
|
|
|
|
<code>call foo</code><br />
|
|
|
|
|
<code>add sp, 8</code>
|
|
|
|
|
</td>
|
|
|
|
|
<td>→</td>
|
|
|
|
|
<td>
|
|
|
|
|
<code>call foo c, large param1, large 012345678h</code>
|
|
|
|
|
</td>
|
|
|
|
|
</tr>
|
|
|
|
|
</table>
|
|
|
|
|
|
|
|
|
|
* `__cdecl`, multiple calls with a single `add sp` instruction for their
|
|
|
|
|
combined parameter size at the end:
|
|
|
|
|
<table>
|
|
|
|
|
<tr>
|
|
|
|
|
<td>
|
|
|
|
|
<code>push param2</code><br />
|
|
|
|
|
<code>push param1</code><br />
|
|
|
|
|
<code>call foo</code><br />
|
|
|
|
|
<code>[…]</code><br />
|
|
|
|
|
<code>push param2</code><br />
|
|
|
|
|
<code>pushd param1</code><br />
|
|
|
|
|
<code>call bar</code><br />
|
|
|
|
|
<code>add sp, 0Ah</code>
|
|
|
|
|
</td>
|
|
|
|
|
<td>→</td>
|
|
|
|
|
<td>
|
|
|
|
|
<code>call foo stdcall, param1, param2</code><br />
|
|
|
|
|
<code>[…]</code><br />
|
|
|
|
|
<code>call bar stdcall, large param1, param2</code><br />
|
|
|
|
|
<code>add sp, 10</code>
|
|
|
|
|
</td>
|
|
|
|
|
</tr>
|
|
|
|
|
</table>
|
|
|
|
|
|
2019-11-04 21:14:25 +00:00
|
|
|
|
* Try moving repeated sections of code into a separate `inline` function
|
|
|
|
|
before grabbing the `#define` hammer. Turbo C++ will generally inline
|
|
|
|
|
everything declared as `inline` that doesn't contain `do`, `for`, `while`,
|
|
|
|
|
`goto`, `switch`, `break`, `continue`, or `case`.
|
|
|
|
|
|
|
|
|
|
* These inlining rules also apply to C++ class methods, so feel free to
|
2019-11-19 21:08:45 +00:00
|
|
|
|
declare classes if you keep thinking "overloaded operators would be nice
|
2019-11-04 21:14:25 +00:00
|
|
|
|
here" or "this code would read really nicely if this functionality was
|
|
|
|
|
encapsulated in a method". (Sometimes, you will have little choice, in
|
|
|
|
|
fact!) Despite Turbo C++'s notoriously outdated C++ implementation, [there
|
|
|
|
|
are quite a lot of possibilites for abstractions that inline perfectly][1].
|
|
|
|
|
Subpixels, as seen in 9d121c7, are the prime example here. Don't overdo it,
|
|
|
|
|
though – use classes where they meaningfully enhance the original procedural
|
|
|
|
|
code, not to replace it with an overly nested, "enterprise-y" class
|
|
|
|
|
hierarchy.
|
|
|
|
|
|
2019-11-19 23:07:39 +00:00
|
|
|
|
## Decompilation
|
|
|
|
|
|
|
|
|
|
* Don't try to decompile self-modifying code. Yes, it may be *possible* by
|
|
|
|
|
calculating addresses relative to the start of the function, but as soon as
|
|
|
|
|
someone starts modding or porting that function, things *will* crash at
|
|
|
|
|
runtime. Inline ASM in C/C++ source files is fine, that will trip up future
|
|
|
|
|
port developers at compile time. Self-modifying code can only do the same if
|
|
|
|
|
it's kept in separate ASM files.
|
|
|
|
|
|
2020-03-30 18:17:28 +00:00
|
|
|
|
* Don't use TCC's `-a` command-line option to force a particular code or data
|
|
|
|
|
alignment. Instead, directly spell out the alignment by adding padding
|
|
|
|
|
members to structures, and additional global variables. It's simply not
|
|
|
|
|
worth requiring every structure to work around it. For functions with
|
|
|
|
|
`switch` tables that originally were word-alignment, put a single
|
|
|
|
|
`#pragma option -a2` at the top of the translation unit, after all header
|
|
|
|
|
inclusions.
|
|
|
|
|
|
2019-11-04 21:14:25 +00:00
|
|
|
|
## Naming conventions
|
|
|
|
|
|
2019-11-19 21:08:45 +00:00
|
|
|
|
* ASM file extensions: `.asm` if they emit code, `.inc` if they don't
|
2019-11-04 21:14:25 +00:00
|
|
|
|
* Macros defining the number of instances of an entity: `<ENTITY>_COUNT`
|
2020-02-09 20:34:35 +00:00
|
|
|
|
* Macros defining the number of distinct sprites in an animation: `*_CELS`
|
|
|
|
|
* Frame variables counting from a frame count to 0: `*_time`
|
|
|
|
|
* Frame variables and other counters starting from 0: `*_frames`
|
2019-11-04 21:14:25 +00:00
|
|
|
|
* Functionally identical reimplementations or micro-optimizations of
|
|
|
|
|
master.lib functions: `z_<master.lib function name>`
|
|
|
|
|
|
2019-11-24 12:35:34 +00:00
|
|
|
|
## Identifiers from ZUN's original code
|
|
|
|
|
|
|
|
|
|
On some occasions, ZUN leaked pieces of the actual PC-98 Touhou source code
|
|
|
|
|
during interviews. From these, we can derive ZUN's original names for certain
|
|
|
|
|
variables, functions, or macros. To indicate one of those and protect them
|
|
|
|
|
from being renamed, put a `/* ZUN symbol [reference] */` comment next to the
|
|
|
|
|
declaration of the identifier in question.
|
|
|
|
|
|
|
|
|
|
Currently, we know about the following [references]:
|
|
|
|
|
|
2020-05-24 18:47:51 +00:00
|
|
|
|
* `[Strings]`: The symbol name is mentioned in error or debug messages. Can be
|
|
|
|
|
easily verified by grepping over the ReC98 source tree.
|
2019-11-24 12:35:34 +00:00
|
|
|
|
* `[MAGNet2010]`: Interview with ZUN for the NHK BS2 TV program MAG・ネット
|
|
|
|
|
(MAG.Net), originally broadcast 2010-05-02. At 09m36s, ZUN's monitor briefly
|
|
|
|
|
displays a piece of TH04's `MAIN.EXE`, handling demo recording and the setup
|
|
|
|
|
of the game's EMS area.
|
|
|
|
|
|
2019-11-04 21:14:25 +00:00
|
|
|
|
[mzdiff]: https://github.com/nmlgc/mzdiff
|
2019-11-19 21:08:45 +00:00
|
|
|
|
[1]: Research/Borland%20C++%20decompilation.md#c
|
2019-11-19 23:05:00 +00:00
|
|
|
|
[2]: https://github.com/nmlgc/ReC98/invitations
|