The Touhou PC-98 Restoration Project
Go to file
nmlgc 9c99bc0eb1 [Contributing] Don't enforce preservation of symbol names from leaked ZUN code
We can do much better ourselves.

Part of P0168, funded by Blue Bolt and rosenrose.
2021-11-28 21:51:28 +01:00
.github [Readme] Add crowdfunding / progress badges and a GitHub funding button 2019-10-17 20:58:50 +02:00
Pipeline [Maintenance] Remove inconsistent newlines before `extern "C"` braces 2021-11-28 18:53:09 +01:00
Research [Decompilation] [th01] Player: Main control and rendering function 2021-10-20 10:59:36 +02:00
bin Add MASTERS.LIB and MASTER.H from the original distribution 2015-02-16 23:10:47 +01:00
libs [Maintenance] Use a dedicated enum for snd_load()'s function parameter 2021-05-12 14:31:02 +02:00
th01 [Maintenance] Consistently use singular for entity structure and file names 2021-11-28 19:14:02 +01:00
th02 [Maintenance] Consistently use singular for entity structure and file names 2021-11-28 19:14:02 +01:00
th03 [Maintenance] Consistently use singular for entity structure and file names 2021-11-28 19:14:02 +01:00
th04 [Maintenance] [th04/th05] Split segment #1 before the EMS functions 2021-11-28 19:15:22 +01:00
th05 [Maintenance] [th04/th05] Split segment #1 before the EMS functions 2021-11-28 19:15:22 +01:00
zuncom [zuncom] Get rid of moveup.asm 2018-04-15 20:22:41 +03:00
.gitattributes [Maintenance] Remove merge=union from .gitattributes 2019-09-21 13:03:17 +02:00
.gitignore [Build] 32-bit: Use Tup as a proper build system for the 32-bit part 2020-09-03 19:04:17 +02:00
CONTRIBUTING.md [Contributing] Don't enforce preservation of symbol names from leaked ZUN code 2021-11-28 21:51:28 +01:00
Makefile.mak [Maintenance] [th04/th05] Split segment #1 before the EMS functions 2021-11-28 19:15:22 +01:00
README.md [Readme] Add a workaround for `Error: Unable to execute command 'tlink.exe'` 2021-08-02 22:59:03 +02:00
ReC98.h [Maintenance] [th01] Start a new header for clamping macros 2021-09-28 18:05:24 +02:00
ReC98.inc [Decompilation] Find out how to bypass TCC's optimization of 0 immediates 2021-06-09 23:12:04 +02:00
Tupfile [Build] Don't compile any ZUN code directly to `bin/` 2021-10-20 00:06:16 +02:00
Tupfile.bat [Build] Don't compile any ZUN code directly to `bin/` 2021-10-20 00:06:16 +02:00
Tupfile.ini [Build] 32-bit: Use Tup as a proper build system for the 32-bit part 2020-09-03 19:04:17 +02:00
build.bat [Maintenance] Mark all batch and binary files as executable 2016-03-02 08:26:13 +01:00
build16b.bat [Build] 16-bit: Unconditionally rebuild everything by default 2020-09-03 19:04:15 +02:00
build32b.bat [Build] Assemble all .ASM files in the 32-bit build part 2020-09-07 17:25:56 +02:00
decomp.hpp [Maintenance] Add a template for stupid bytewise access to logical structures 2021-09-12 18:35:08 +02:00
defconv.h [Maintenance] `#pragma once` has no effect? 2019-12-22 15:31:24 +01:00
master.hpp [Maintenance] master.hpp transition: EMS functions 2021-11-28 19:15:22 +01:00
pc98.h [Decompilation] [th01] Boss defeat sequence: SinGyoku version + route selection 2021-11-07 23:32:21 +01:00
pc98.inc [Separate translation units] [th05] Music Room piano (undecompilable functions) 2021-03-19 23:23:06 +01:00
pc98kbd.h [Maintenance] Add address constants for the PC-98 BIOS key press bitmap 2021-01-30 18:30:57 +01:00
pc98kbd.inc [Separate translation units] [th04/th05] Low-level input (undecompilable) 2021-01-30 19:11:01 +01:00
planar.h [Decompilation] [th01] Add width and height getters for DotRect 2021-08-22 14:52:00 +02:00
platform.h [Maintenance] Define `(u)int16_t` as `int` rather than `short` 2021-02-20 15:49:20 +01:00
set_errorlevel_to_1.bat [Build] 32-bit: Fix tool checks and remove stderr redirection for Windows 9x 2020-09-03 19:04:18 +02:00
th01_fuuin.asm [Naming] [th01] Remove "back/front" terminology from inter-page copy functions 2021-11-07 23:27:30 +01:00
th01_op.asm [Naming] [th01] Remove "back/front" terminology from inter-page copy functions 2021-11-07 23:27:30 +01:00
th01_reiiden.asm [Maintenance] Consistently use singular for entity structure and file names 2021-11-28 19:14:02 +01:00
th02_main.asm [Maintenance] Remove all unused externs in ASM land 2021-09-28 18:05:24 +02:00
th02_maine.asm [Maintenance] Remove all unused externs in ASM land 2021-09-28 18:05:24 +02:00
th02_op.asm [Regression] Explicitly request 16-bit default segments when using .MODEL 2021-03-29 22:39:11 +02:00
th02_zuninit.asm Finally use standard segment names everywhere 2015-02-18 14:04:43 +01:00
th03_main.asm [Maintenance] Consistently use singular for entity structure and file names 2021-11-28 19:14:02 +01:00
th03_mainl.asm [Maintenance] Remove all unused externs in ASM land 2021-09-28 18:05:24 +02:00
th03_op.asm [Maintenance] Remove all unused externs in ASM land 2021-09-28 18:05:24 +02:00
th04_main.asm [Maintenance] [th04/th05] Split segment #1 before the EMS functions 2021-11-28 19:15:22 +01:00
th04_maine.asm [Maintenance] Remove all unused externs in ASM land 2021-09-28 18:05:24 +02:00
th04_memchk.asm [Position independence] False positives: 0xFF / 255 / -1 2020-09-16 22:30:57 +02:00
th04_op.asm [Maintenance] Remove all unused externs in ASM land 2021-09-28 18:05:24 +02:00
th04_zuninit.asm [th04/zuninit] Initial state 2020-09-16 22:16:49 +02:00
th05_gjinit.asm [Build] [th05] gjinit: Include the gaiji compiled from th05/sprites/gaiji.bmp 2020-09-16 22:30:50 +02:00
th05_main.asm [Maintenance] [th04/th05] Split segment #1 before the EMS functions 2021-11-28 19:15:22 +01:00
th05_maine.asm [Decompilation] [th03/th04/th05] cfg_load_resident_ptr() 2021-07-21 00:34:59 +02:00
th05_memchk.asm [Position independence] False positives: 0xFF / 255 / -1 2020-09-16 22:30:57 +02:00
th05_op.asm [Maintenance] Remove all unused externs in ASM land 2021-09-28 18:05:24 +02:00
th05_zuninit.asm [Regression] Explicitly request 16-bit default segments when using .MODEL 2021-03-29 22:39:11 +02:00
twobyte.h [Maintenance] Move twobyte_t into its own header file 2021-01-30 18:31:54 +01:00
twobyte.inc [Maintenance] Move twobyte_t into its own header file 2021-01-30 18:31:54 +01:00
x86real.h [Maintenance] Adopt the peek() and poke() inline functions from <dos.h> 2021-09-12 17:50:41 +02:00

README.md

The Touhou PC-98 Restoration Project ("ReC98")

4-week crowdfunding goal…

Reverse-engineered: All games… TH01… TH02… TH03… TH04… TH05…
Position independence: All games… TH01… TH02… TH03… TH04… TH05…

Check the homepage for more detailed progress numbers and information about the crowdfunding!


Overview

This project aims to perfectly reconstruct the source code of the first five Touhou Project games by ZUN Soft (now Team Shanghai Alice), which were originally released exclusively for the NEC PC-9801 system.

The original games in question are:

  • TH01: 東方靈異伝 ~ The Highly Responsive to Prayers (1997)
  • TH02: 東方封魔録 ~ the Story of Eastern Wonderland (1997)
  • TH03: 東方夢時空 ~ Phantasmagoria of Dim.Dream (1997)
  • TH04: 東方幻想郷 ~ Lotus Land Story (1998)
  • TH05: 東方怪綺談 ~ Mystic Square (1998)

Since we only have the binaries, we obviously can't know how ZUN named any variables and functions, and which comments the original code was surrounded with. Perfect therefore means that the binaries compiled from the code in the ReC98 repository are indistinguishable from ZUN's original builds, making it impossible to disprove that the original code couldn't have looked like this. This property is maintained for every Git commit along the way.

Aside from the preservation angle and the resulting deep insight into the games' mechanics, the code can then serve as the foundation for any type of mod, or any port to non-PC-98 platforms, developed by the community. This is also why ReC98 values readable and understandable code over a pure decompilation.

Why?

There are a number reasons why achieving moddability via full decompilation seems to be more worthwhile for the PC-98 games, in contrast to a PyTouhou-style black-box reimplementation:

  • While stage enemies and their bullet patterns are controlled by bytecode in TH04's and TH05's .STD files that could just be interpreted by an alternate VM, midboss and boss battles are entirely hardcoded into the executables.
  • Even though complete decompilation will take a long time, partial reverse-engineering results will be very useful to modders who just want to work on the original PC-98 versions of the games.
  • PC-98 emulation is messy and overly complicated. It has been getting better as of 2018 thanks to DOSBox-X adding support for the platform, but even at its best, it will always consume way more system resources than what would be appropriate for those games.
  • thcrap-style multilingual translation on PC-98 would be painful for languages with non-ASCII scripts. The obvious method of modifying the font ROM specifically for each language is ugly and won't work on real hardware, so a custom renderer would be needed. That by itself requires a lot of reverse-engineering and, preferably, compilable source code to avoid the limits of hex-editing. Or, even better, the prospect to do this entirely on a more modern system.
  • These games stopped being sold in 2002, ZUN has confirmed on multiple occasions to have lost all the data of the "earlier games" [citation needed], and PC-98 hardware is long obsolete. In short, these games are as abandoned as they can possibly be, and are unlikely to ever turn a profit again.

Is this even viable?

Definitely. During the development of the static English patches for these games, we identified two main libraries used across all 5 games, and even found their source code. These are:

  • master.lib, a 16-bit x86 assembly library providing an abstraction layer for all components of a PC-98 DOS system
  • as well as the Borland C/C++ runtime library, version 4.0.
  • Additionally, TH01 includes the Pi loader library by 電脳科学研究所/BERO,
  • and TH03's ZUNSP.COM (accessible via ZUN.COM -4) is a rebranded version of Promisence Soft's SPRITE16.COM, a 16-color PC-98 EGC display driver, version 0.04, which was bundled with the sample game StormySpace.

master.lib and the C/C++ runtime alone make up a sizable amount of the code in all the executables. In TH05, for example, they amount to 74% of all code in OP.EXE, and 40% of all code in MAIN.EXE. That's already quite a lot of code we do not have to deal with. Identifying the rest of the code shared across the games will further reduce the workload to a more acceptable amount.

With DOSBox-X and the Debug edition of Neko Project II, we now also have two open-source PC-9821 emulators capable of running the games. This will greatly help in understanding all hardware-specific code.

And while this project has made decent progress so far, completing the decompilation of even just a single game will still take a long time. Any help will be appreciated! If you are interested, check CONTRIBUTING.md for the general contribution guidelines.

Dumped executables

  • TH01: zunsoft.com, op.exe, reiiden.exe, fuuin.exe
  • TH02: zun.com (ongchk.com, zuninit.com, zun_res.com, zunsoft.com), op.exe, main.exe, maine.exe
  • TH03: zun.com (ongchk.com [-1], zuninit.com [-2], zunsoft.com [-3], zunsp.com [-4], res_yume.com [-5]), op.exe, main.exe, mainl.exe
  • TH04: zun.com (ongchk.com [-O], zuninit.com [-I], res_huma.com [-S], memchk.com [-M]), op.exe, main.exe, maine.exe
  • TH05: zun.com (ongchk.com [-O], zuninit.com [-I], res_kso.com [-S], gjinit.com [-G], memchk.com [-M]), op.exe, main.exe, maine.exe

Crossed-out files are identical to their version in the previous game. ONGCHK.COM is part of the PMD sound driver by KAJA, and therefore doesn't need to be disassembled either; we only need to keep the binary to allow bit-perfect rebuilds of ZUN.COM.

Building

Required tools

  • Borland Turbo C++ 4.0J

    This was the compiler ZUN originally used, so it's the only one that can deterministically compile this code to executables that are bit-perfect to ZUN's original ones.


  • Borland Turbo Assembler (TASM), version 5.0 or later, for 32-bit Windows (TASM32.EXE)

    Borland never made a cross compiler targeting 16-bit DOS that runs on 32-bit Windows, so the C++ parts have to be compiled using a 16-bit DOS program. The not yet decompiled ASM parts of the code, however, can be assembled using a 32-bit Windows tool. This not only way outperforms any 16-bit solution that would have to be emulated on modern 64-bit systems, making build times, well, tolerable. It also removes any potential EMS or XMS issues we might have had with TASMX.EXE on these emulators.

    These advantages were particularly relevant in the early days of ReC98, when the ASM files were pretty huge. That's also when I decided to freely use long file names that don't need to conform to the 8.3 convention… As a result, the build process still starts with a separate 32-bit part (build32b.bat), which must be run in Windows (or Wine).

    In the end though, we'd definitely like to have a single-step 16-bit build process that requires no 32-bit tools. This will probably happen some time after reaching 100% position independence over all games.


  • Borland C++ 5.5, for 32-bit Windows (BCC32.EXE)

    Released as freeware, and as of July 2020, still sort of officially downloadable from

    http://altd.embarcadero.com/download/bcppbuilder/freecommandLinetools.exe

    (SHA-256 433b44741f07f2ad673eb936511d498c5a6b7f260f98c4d9a6da70c41a56d855)

    Needed to fulfill the role of being "just any native C++ compiler" for our own tools that either don't necessarily have to run on 16-bit DOS, or are required by the 32-bit build step, as long as that one still exists (see above).

    Currently, this category of tools only includes the converter for hardcoded sprites. Since that one is written to be as platform-independent as possible, it could easily be compiled with any other native C compiler you happen to have already installed. (Which also means that future port developers hopefully have one less thing to worry about.) So, if you dislike additional dependencies, feel free to edit the Tupfile so that bmp2arr is compiled with any other C compiler of your choice.

    However, choosing Borland C++ 5.5 as a default for everyone else fits ReC98 very well for several reasons:

    • It still happens to be the most hassle- and bloat-free way to get any sort of 32-bit Windows C++ compiler to people, clearly beating Open Watcom, and the required registration for Borland/Embarcadero's own C++ 7.30. Depending on anything bigger would be way out of proportion, considering how little we use it for
    • We already rely on a 32-bit Windows tool
    • Turbo C++ 4.0J defines the lower bound for our allowed level of C++ features anyway, making Borland C++ 5.5's old age and lacking C++ standard compliance a non-issue
    • Unlike 7.30, 5.5 still works on Windows 9x, which is what typically runs on the real PC-98 hardware that some people might want to compile ReC98 on.
    • Other tiny C compilers have no C++ support. While the sprite converter is written in C, future tools might not be, and I'm too old to restrict people to C for no good reason.

  • Tup, for Windows (optional, but recommended)

    A sane, parallel build system, used to ensure minimal rebuilds during the 32-bit build part. Provides perfect tracking of dependencies via code injection and hooking a compiler's file opening syscalls, allowing it to automatically add all #included files to the build dependency graph. This makes it way superior to most make implementations, which lack this vital feature, and are therefore inherently unsuited for pretty much any programming language imaginable. With no abstractions for specific compilers, Tup also fits perfectly with the ancient Borland tools required for this project.

    As of September 2020, the Windows version of Tup requires Vista or higher. In case Tup can't run or isn't installed, the build process falls back on a dumb batch file, which always fully rebuilds the entire 32-bit part.


  • DOSBox (if you're running a 64-bit version of Windows, or a non-Windows operating system)

    For the most part, it shouldn't matter whether you use the original DOSBox or your favorite fork. A DOSBox with dynamic recompilation is highly recommended for faster compilation, though. Make sure to enable that feature by setting the following options in its configuration file (dosbox.conf for the original version):

    [cpu]
    core=dynamic
    cycles=max
    

The most performant OS for building ReC98 is therefore a 32-bit Windows ≥Vista, where both the 32-bit and 16-bit build parts can run natively from a single shell. The build process was tested and should work reliably on pretty much every system though from modern 64-bit Windows and Linux, down to Windows 95, which you might use on actual PC-98 hardware.

How to build

  • Make sure you've created the bin/bcc32.cfg and bin/ilink32.cfg files for Borland C++ 5.5, as pointed out in its readme.txt file. This fixes errors like

    Error E2209 Pipeline/bmp2arrl.c 12: Unable to open file 'io.h'
    

    that you will encounter otherwise.

  • Running on a 64-bit OS? Run build32b.bat in a Windows shell, followed build16b.bat in your DOSBox of choice.

  • Running on 32-bit Windows? Run just build.bat.

All batch files will abort with an error if any of the necessary tools can't be found in the PATH.

The final executables will be put into bin\th0?, using the same names as the originals.

Troubleshooting

  • TCC compiles, but fails to link, with Error: Unable to execute command 'tlink.exe'

    Cause: To locate TLINK, TCC needlessly copies the PATH environment variable into a statically allocated 128-byte buffer. It then constructs absolute tlink.exe filenames for each of the semicolon- or \0-terminated paths, writing these into a buffer that immediately follows the 128-byte PATH buffer in memory. The search is finished as soon as TCC found an existing file, which gives precedence to earlier paths in the PATH. If the search didn't complete until a potential "final" path that runs past the 128 bytes, the final attempted filename will consist of the part that still fit into the buffer, followed by the previously attempted path.

    Workaround: Make sure that the BIN\ path to Turbo C++ 4.0J is fully contained within the first 127 bytes of the PATH inside your DOS system. (The 128th byte must either be a separating ; or the terminating \0 of the PATH string.)

  • TLINK fails with Loader error (0000): Unrecognized Error on 32-bit Windows ≥Vista

    This can be fixed by configuring the NTVDM DPMI driver to be loaded into conventional memory rather than upper memory, by editing %WINDIR%\System32\autoexec.nt:

    REM Install DPMI support
    -LH %SystemRoot%\system32\dosx
    +%SystemRoot%\system32\dosx
    

    Requires a reboot after that edit to take effect.

    (Source)