The Touhou PC-98 Restoration Project
Go to file
nmlgc 8f824c4297 [Reverse-engineering] [th02/th04/th05] Point numeral sprites
Part of P0087, funded by -Tom-.
2020-04-15 21:34:21 +02:00
.github [Readme] Add crowdfunding / progress badges and a GitHub funding button 2019-10-17 20:58:50 +02:00
Pipeline [Asset pipeline] Add a .GRZ viewer 2020-03-07 21:43:08 +01:00
Research [Decompilation] [th01] VRAM text typing 2020-03-13 19:09:12 +01:00
bin Add MASTERS.LIB and MASTER.H from the original distribution 2015-02-16 23:10:47 +01:00
libs [Build] Use the minimum possible size for enums by default 2020-04-03 17:33:58 +02:00
th01 [Maintenance] Move all features exclusive to MAIN.EXE to a main/ subdirectory 2020-04-15 20:58:01 +02:00
th02 [Reverse-engineering] [th02/th04/th05] Point numeral sprites 2020-04-15 21:34:21 +02:00
th03 [Maintenance] Move all features exclusive to MAIN.EXE to a main/ subdirectory 2020-04-15 20:58:01 +02:00
th04 [Reverse-engineering] [th02/th04/th05] Point numeral sprites 2020-04-15 21:34:21 +02:00
th05 [Reverse-engineering] [th05] Curve bullet rendering 2020-04-15 21:34:20 +02:00
zuncom [zuncom] Get rid of moveup.asm 2018-04-15 20:22:41 +03:00
.gitattributes [Maintenance] Remove merge=union from .gitattributes 2019-09-21 13:03:17 +02:00
.gitignore [Maintenance] gitignore everything in bin 2019-11-02 19:22:05 +02:00
CONTRIBUTING.md [Build] Don't word-align everything by default 2020-04-03 17:35:57 +02:00
Makefile.mak [Build] Don't word-align everything by default 2020-04-03 17:35:57 +02:00
README.md [Readme] Remove the porting plans 2019-11-12 20:42:21 +01:00
ReC98.h [Maintenance] Decide how to handle pre-shifted sprites in C land 2020-04-03 17:32:50 +02:00
ReC98.inc [Position independence] Remaining references to _ctype 2020-04-15 21:34:20 +02:00
build.bat [Maintenance] Mark all batch and binary files as executable 2016-03-02 08:26:13 +01:00
build16b.bat [Build] Use Borland's real-mode MAKER.EXE 2018-03-19 23:52:42 +01:00
build32b.bat [Decompilation] [th05] RES_KSO.COM 2020-02-23 17:53:17 +01:00
defconv.h [Maintenance] `#pragma once` has no effect? 2019-12-22 15:31:24 +01:00
defconv.inc [Maintenance] Use a single DEFCONV definition file to cover all games 2019-12-17 23:27:01 +01:00
pc98.h [Build] Don't word-align everything by default 2020-04-03 17:35:57 +02:00
pc98.inc [Maintenance] Use @@locals for self-modifying code in bfnt_entry_pat() 2020-02-16 21:35:16 +01:00
platform.h [Reverse-engineering] [th01] Current back page 2020-01-14 21:48:40 +01:00
th01_fuuin.asm [Position independence] Remaining references to _ctype 2020-04-15 21:34:20 +02:00
th01_op.asm [Position independence] Remaining references to _ctype 2020-04-15 21:34:20 +02:00
th01_reiiden.asm [Position independence] Remaining references to _ctype 2020-04-15 21:34:20 +02:00
th02_main.asm [Reverse-engineering] [th02/th04/th05] Point numeral sprites 2020-04-15 21:34:21 +02:00
th02_maine.asm [Maintenance] Remove all dependencies on Borland C++ run-time source headers 2020-02-23 17:53:18 +01:00
th02_op.asm [Maintenance] Remove all dependencies on Borland C++ run-time source headers 2020-02-23 17:53:18 +01:00
th02_zuninit.asm Finally use standard segment names everywhere 2015-02-18 14:04:43 +01:00
th03_main.asm [Maintenance] Move all features exclusive to MAIN.EXE to a main/ subdirectory 2020-04-15 20:58:01 +02:00
th03_mainl.asm [Position independence] Remaining references to _ctype 2020-04-15 21:34:20 +02:00
th03_op.asm [Maintenance] Compress unknown BSS regions using byte arrays 2020-03-22 10:16:09 +01:00
th04_main.asm [Reverse-engineering] [th02/th04/th05] Point numeral sprites 2020-04-15 21:34:21 +02:00
th04_maine.asm [Position independence] Remaining references to _ctype 2020-04-15 21:34:20 +02:00
th04_op.asm [Maintenance] Compress unknown BSS regions using byte arrays 2020-03-22 10:16:09 +01:00
th05_main.asm [Reverse-engineering] [th02/th04/th05] Point numeral sprites 2020-04-15 21:34:21 +02:00
th05_maine.asm [Position independence] Remaining references to _ctype 2020-04-15 21:34:20 +02:00
th05_op.asm [Maintenance] Compress unknown BSS regions using byte arrays 2020-03-22 10:16:09 +01:00

README.md

The Touhou PC-98 Restoration Project ("ReC98")

4-week crowdfunding goal…

Reverse-engineered: All games… TH01… TH02… TH03… TH04… TH05…
Position independence: All games… TH01… TH02… TH03… TH04… TH05…

Check the homepage for more detailed progress numbers and information about the crowdfunding!


Overview

This project aims to perfectly reconstruct the source code of the first five Touhou Project games by ZUN Soft (now Team Shanghai Alice), which were originally released exclusively for the NEC PC-9801 system.

The original games in question are:

  • TH01: 東方靈異伝 ~ The Highly Responsive to Prayers (1997)
  • TH02: 東方封魔録 ~ the Story of Eastern Wonderland (1997)
  • TH03: 東方夢時空 ~ Phantasmagoria of Dim.Dream (1997)
  • TH04: 東方幻想郷 ~ Lotus Land Story (1998)
  • TH05: 東方怪綺談 ~ Mystic Square (1998)

Since we only have the binaries, we obviously can't know how ZUN named any variables and functions, and which comments the original code was surrounded with. Perfect therefore means that the binaries compiled from the code in the ReC98 repository are indistinguishable from ZUN's original builds, making it impossible to disprove that the original code couldn't have looked like this. This property is maintained for every Git commit along the way.

Aside from the preservation angle and the resulting deep insight into the games' mechanics, the code can then serve as the foundation for any type of mod, or any port to non-PC-98 platforms, developed by the community. This is also why ReC98 values readable and understandable code over a pure decompilation.

Why?

There are a number reasons why achieving moddability via full decompilation seems to be more worthwhile for the PC-98 games, in contrast to a PyTouhou-style black-box reimplementation:

  • While stage enemies and their bullet patterns are controlled by bytecode in TH04's and TH05's .STD files that could just be interpreted by an alternate VM, midboss and boss battles are entirely hardcoded into the executables.
  • Even though complete decompilation will take a long time, partial reverse-engineering results will be very useful to modders who just want to work on the original PC-98 versions of the games.
  • PC-98 emulation is messy and overly complicated. It has been getting better as of 2018 thanks to DOSBox-X adding support for the platform, but even at its best, it will always consume way more system resources than what would be appropriate for those games.
  • thcrap-style multilingual translation on PC-98 would be painful for languages with non-ASCII scripts. The obvious method of modifying the font ROM specifically for each language is ugly and won't work on real hardware, so a custom renderer would be needed. That by itself requires a lot of reverse-engineering and, preferably, compilable source code to avoid the limits of hex-editing. Or, even better, the prospect to do this entirely on a more modern system.
  • These games stopped being sold in 2002, ZUN has confirmed on multiple occasions to have lost all the data of the "earlier games" [citation needed], and PC-98 hardware is long obsolete. In short, these games are as abandoned as they can possibly be, and are unlikely to ever turn a profit again.

Is this even viable?

Definitely. During the development of the static English patches for these games, we identified two main libraries used across all 5 games, and even found their source code. These are:

  • master.lib, a 16-bit x86 assembly library providing an abstraction layer for all components of a PC-98 DOS system
  • as well as the Borland C/C++ runtime library, version 4.0.
  • Additionally, TH01 includes the Pi loader library by 電脳科学研究所/BERO,
  • and TH03's ZUNSP.COM (accessible via ZUN.COM -4) is a rebranded version of Promisence Soft's SPRITE16.COM, a 16-color PC-98 EGC display driver, version 0.04, which was bundled with the sample game StormySpace.

master.lib and the C/C++ runtime alone make up a sizable amount of the code in all the executables. In TH05, for example, they amount to 74% of all code in OP.EXE, and 40% of all code in MAIN.EXE. That's already quite a lot of code we do not have to deal with. Identifying the rest of the code shared across the games will further reduce the workload to a more acceptable amount.

With DOSBox-X and the Debug edition of Neko Project II, we now also have two open-source PC-9821 emulators capable of running the games. This will greatly help in understanding all hardware-specific code.

And while this project has made decent progress so far, completing the decompilation of even just a single game will still take a long time. Any help will be appreciated! If you are interested, check CONTRIBUTING.md for the general contribution guidelines.

Dumped executables

  • TH01: zunsoft.com, op.exe, reiiden.exe, fuuin.exe
  • TH02: zun.com (ongchk.com, zuninit.com, zun_res.com, zunsoft.com), op.exe, main.exe, maine.exe
  • TH03: zun.com (ongchk.com [-1], zuninit.com [-2], zunsoft.com [-3], zunsp.com [-4], res_yume.com [-5]), op.exe, main.exe, mainl.exe
  • TH04: zun.com (ongchk.com [-O], zuninit.com [-I], res_huma.com [-S], memchk.com [-M]), op.exe, main.exe, maine.exe
  • TH05: zun.com (ongchk.com [-O], zuninit.com [-I], res_kso.com [-S], gjinit.com [-G], memchk.com [-M]), op.exe, main.exe, maine.exe

Crossed-out files are identical to their version in the previous game. ONGCHK.COM is part of the PMD sound driver by KAJA, and therefore doesn't need to be disassembled either; we only need to keep the binary to allow bit-perfect rebuilds of ZUN.COM.

Building

You will need:

  • Borland Turbo C++ 4.0J

  • Borland Turbo Assembler (TASM), version 5.0 or later, in a 32-bit Windows version (TASM32.EXE) and version 4.1 or later in a 16-bit DOS version (TASM.EXE)

  • DOSBox if you're running a 64-bit version of Windows, or a non-Windows operating system

    To speed up compilation, make sure to set the following in dosbox.conf:

    [cpu]
    core=dynamic
    cycles=max
    

The Borland tools are the only ones that will, with the correct command-line switches for each game, deterministically compile this source code to executables that are bit-perfect to ZUN's original ones. Hence, they are the only supported build tools during all of the reconstruction phase.

To build, simply run build.bat and follow the instructions.

Since I, unfortunately, decided earlier in development to freely use long file names that don't need to conform to the 8.3 convention, the first part of the building process (build32b.bat) must be run in Windows (or Wine). This will be fixed as development goes along.

The final executables will be put into bin\th0?, using the same names as the originals.