From 0f75a77dee6c777d1740cceb260e3a44c39c49c5 Mon Sep 17 00:00:00 2001 From: nmlgc Date: Mon, 22 Mar 2021 18:16:38 +0100 Subject: [PATCH] [Research] Discover a workaround to word-align code segments from Turbo C++ MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Turns out that this is one of the effects of the -WX option ("Create DPMI application")… along with generally messing up code generation. Nothing we can't work around though, luckily! Finally getting to cross that off the list of reasons that prevent decompilation. Part of P0137, funded by [Anonymous]. --- CONTRIBUTING.md | 7 ++++++- Research/Borland C++ decompilation.md | 26 +++++++++++++++++--------- th03/hfliplut.asm | 4 +--- 3 files changed, 24 insertions(+), 13 deletions(-) diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 07215eea..e74d56ab 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -244,7 +244,11 @@ C++, Open Watcom, and Visual C++, which will ease future third-party ports. * Use `#pragma option -zC` and `#pragma option -zP` to rename code segments and their groups, not `#pragma codeseg`. Might look uglier, but has the - advantage of not generating an empty segment with the default name. + advantage of not generating an empty segment with the default name and the + default padding. This is particularly relevant [if the `-WX` option is used + to enforce word-aligned code segments][3]: That empty default segment would + otherwise also (unnecessarily) enforce word alignment for the segment that + ends up following the empty default one. * These options can only be used "at the beginning" of a translation unit – before the first non-preprocessor and non-comment C language token. Any @@ -323,3 +327,4 @@ Currently, we know about the following [references]: [mzdiff]: https://github.com/nmlgc/mzdiff [1]: Research/Borland%20C++%20decompilation.md#c [2]: https://github.com/nmlgc/ReC98/invitations +[3]: Research/Borland%20C++%20decompilation.md#padding-bytes-in-code-segments diff --git a/Research/Borland C++ decompilation.md b/Research/Borland C++ decompilation.md index 9125b7a0..8e8d2af3 100644 --- a/Research/Borland C++ decompilation.md +++ b/Research/Borland C++ decompilation.md @@ -368,18 +368,26 @@ static_storage db 2, 0, 6 @_STCON_$qv endp ``` - ## Padding bytes in code segments -- `0x00` is only emitted to word-align `switch` jump tables with `-a2`. - Anywhere else, it indicates the start or end of a word-aligned `SEGMENT` - compiled from assembly. Borland C++ never adds padding between functions or - segments. +* Usually, padding `0x00` bytes are only emitted to word-align `switch` jump + tables with `-a2`. Anywhere else, it typically indicates the start or end of + a word-aligned `SEGMENT` compiled from assembly. There are two potential + workarounds though: - **Certainty**: Would love to find a proper compiler or linker setting for - this, but it doesn't seem to exist. The `#pragma codestring \x00` workaround - doesn't respect different alignment requirements of surrounding translation - units, after all. + * The `-WX` option (Create DPMI application) *will* enforce word alignment + for the code segment, at the cost of slightly different code generation in + certain places. Since it also adds an additional `INC BP` instruction + before `PUSH BP`, and an additional `DEC BP` instruction after `POP BP`, + this option can only really be used in translation units with disabled + stack frames (`-k-`). + + * `#pragma codestring \x00` unconditionally emits a `0x00` byte. However, + this won't respect different alignment requirements of surrounding + translation units. + + **Certainty**: Reverse-engineering `TCC.EXE` confirmed that these are the + only ways. ## C++ diff --git a/th03/hfliplut.asm b/th03/hfliplut.asm index 3a823536..2fc27ae5 100644 --- a/th03/hfliplut.asm +++ b/th03/hfliplut.asm @@ -1,8 +1,6 @@ extern _hflip_lut:byte:256 -; The original function layouts unfortunately require this one to be placed at -; a word-aligned address, which can't be achieved from Turbo C++ alone :( Oh -; well, its decompilation would have been a mess anyway. +; Would have been decompilable into a mess. SHARED segment word public 'CODE' use16 assume cs:SHARED include th03/formats/hfliplut.asm