[Research] Discover a workaround to word-align code segments from Turbo C++

Turns out that this is one of the effects of the -WX option ("Create
DPMI application")… along with generally messing up code generation.
Nothing we can't work around though, luckily! Finally getting to cross
that off the list of reasons that prevent decompilation.

Part of P0137, funded by [Anonymous].
This commit is contained in:
nmlgc 2021-03-22 18:16:38 +01:00
parent 1244bd74e7
commit 0f75a77dee
3 changed files with 24 additions and 13 deletions

View File

@ -244,7 +244,11 @@ C++, Open Watcom, and Visual C++, which will ease future third-party ports.
* Use `#pragma option -zC` and `#pragma option -zP` to rename code segments
and their groups, not `#pragma codeseg`. Might look uglier, but has the
advantage of not generating an empty segment with the default name.
advantage of not generating an empty segment with the default name and the
default padding. This is particularly relevant [if the `-WX` option is used
to enforce word-aligned code segments][3]: That empty default segment would
otherwise also (unnecessarily) enforce word alignment for the segment that
ends up following the empty default one.
* These options can only be used "at the beginning" of a translation unit
before the first non-preprocessor and non-comment C language token. Any
@ -323,3 +327,4 @@ Currently, we know about the following [references]:
[mzdiff]: https://github.com/nmlgc/mzdiff
[1]: Research/Borland%20C++%20decompilation.md#c
[2]: https://github.com/nmlgc/ReC98/invitations
[3]: Research/Borland%20C++%20decompilation.md#padding-bytes-in-code-segments

View File

@ -368,18 +368,26 @@ static_storage db 2, 0, 6
@_STCON_$qv endp
```
## Padding bytes in code segments
- `0x00` is only emitted to word-align `switch` jump tables with `-a2`.
Anywhere else, it indicates the start or end of a word-aligned `SEGMENT`
compiled from assembly. Borland C++ never adds padding between functions or
segments.
* Usually, padding `0x00` bytes are only emitted to word-align `switch` jump
tables with `-a2`. Anywhere else, it typically indicates the start or end of
a word-aligned `SEGMENT` compiled from assembly. There are two potential
workarounds though:
**Certainty**: Would love to find a proper compiler or linker setting for
this, but it doesn't seem to exist. The `#pragma codestring \x00` workaround
doesn't respect different alignment requirements of surrounding translation
units, after all.
* The `-WX` option (Create DPMI application) *will* enforce word alignment
for the code segment, at the cost of slightly different code generation in
certain places. Since it also adds an additional `INC BP` instruction
before `PUSH BP`, and an additional `DEC BP` instruction after `POP BP`,
this option can only really be used in translation units with disabled
stack frames (`-k-`).
* `#pragma codestring \x00` unconditionally emits a `0x00` byte. However,
this won't respect different alignment requirements of surrounding
translation units.
**Certainty**: Reverse-engineering `TCC.EXE` confirmed that these are the
only ways.
## C++

View File

@ -1,8 +1,6 @@
extern _hflip_lut:byte:256
; The original function layouts unfortunately require this one to be placed at
; a word-aligned address, which can't be achieved from Turbo C++ alone :( Oh
; well, its decompilation would have been a mess anyway.
; Would have been decompilable into a mess.
SHARED segment word public 'CODE' use16
assume cs:SHARED
include th03/formats/hfliplut.asm