mirror of https://github.com/nmlgc/ReC98.git
835 lines
28 KiB
Markdown
835 lines
28 KiB
Markdown
## Local variables
|
||
|
||
| | |
|
||
|-|-|
|
||
| `DX` | First 8-bit variable declared *if no other function is called*<br />Second 16-bit variable declared *if no other function is called* |
|
||
| `[bp-1]` | First 8-bit variable declared *otherwise* |
|
||
| `SI` | First 16-bit variable declared |
|
||
| `DI` | Second 16-bit variable declared *if other functions are called* |
|
||
|
||
Example:
|
||
|
||
| ASM | Declaration sequence in C |
|
||
|----------|---------------------------|
|
||
| `SI` | `int near *var_1;` |
|
||
| `[bp-1]` | `char var_2;` |
|
||
| `[bp-2]` | `char var_3;` |
|
||
|
||
* Local `enum` variables with underlying 1-byte types are always word-aligned,
|
||
regardless of the value of `-a`.
|
||
|
||
### Grouping
|
||
|
||
Any structures or classes that contain more than a single scalar-type member
|
||
are grouped according to their declaration order, and placed *after* (that is,
|
||
further away from BP) than all scalar-type variables. This means that it's not
|
||
possible to bundle a set of variables with the same meaning into a structure
|
||
(e.g. pointers to all 4 VRAM planes) if a scalar-type variable is placed
|
||
inbetween two of these structure instances on the stack: Those structure
|
||
instances would be grouped and always placed next to each other, no matter
|
||
where the scalar-type variable is declared in relation to them.
|
||
|
||
## Signedness
|
||
|
||
| | |
|
||
|-|-|
|
||
| `MOV al, var`<br />`MOV ah, 0`| `var` is *unsigned char* |
|
||
| `MOV al, var`<br />`CBW` | `var` is *char*, `AX` is *int* |
|
||
|
||
## Integer arithmetic
|
||
|
||
| | |
|
||
|-|-|
|
||
| `ADD r8, imm8` | For non-`AL` registers, `imm8` must be `static_cast` to `uint8_t`. Otherwise, the addition is done on `AL` and then `MOV`ed to `r8`. |
|
||
| `ADD [m8], imm8` | Only achievable through a C++ method operating on a member? |
|
||
| `MOV AX, [m16]`<br />`ADD AL, [m8]` | Same – `[m16]` must be returned from an inlined function to avoid the optimization of it being directly shortened to 8 bits. |
|
||
| `MOV AL, [m8]`<br />`ADD AL, imm8`<br />`MOV [m8], AL` | Opposite; *not* an inlined function |
|
||
| `CWD`<br />`SUB AX, DX`<br />`SAR AX, 1` | `AX / 2`, `AX` is *int* |
|
||
| `MOV [new_var], AX`<br />`CWD`<br />`XOR AX, DX`<br />`SUB AX, DX` | `abs(AX)`, defined in `<stdlib.h>`. `AX` is *int* |
|
||
|
||
* When bit-testing a variable with a 16-bit mask via `&` in a conditional
|
||
expression, the `TEST` is optimized to cover just the high or low byte, if
|
||
possible:
|
||
```c
|
||
long v = 0xFFFFFFFF; // Works regardless of size or signedness
|
||
char b00 = (v & 0x00000001) != 0; // TEST BYTE PTR [v + 0], 1
|
||
char b08 = (v & 0x00000100) != 0; // TEST BYTE PTR [v + 1], 1
|
||
char b16 = (v & 0x00010000) != 0; // TEST DWORD PTR [v + 0], 0x00010000
|
||
char b24 = (v & 0x01000000) != 0; // TEST DWORD PTR [v + 0], 0x01000000
|
||
char b00_to_15 = (v & 0x0000FFFF) != 0; // TEST WORD PTR [v + 0], 0xFFFF
|
||
char b16_to_31 = (v & 0xFFFF0000) != 0; // TEST DWORD PTR [v + 0], 0xFFFF0000
|
||
char b08_to_23 = (v & 0x00FFFF00) != 0; // TEST DWORD PTR [v + 0], 0x00FFFF00
|
||
```
|
||
|
||
### Arithmetic on a register *after* assigning it to a variable?
|
||
|
||
Assignment is part of the C expression. If it's a comparison, that comparison
|
||
must be spelled out to silence the `Possibly incorrect assignment` warning.
|
||
|
||
| | |
|
||
|-|-|
|
||
| `CALL somefunc`<br />`MOV ??, AX`<br />`OR AX, AX`<br />`JNZ ↑` | `while(( ?? = somefunc() ) != NULL)` |
|
||
|
||
### `SUB ??, imm` vs. `ADD ??, -imm`
|
||
|
||
`SUB` means that `??` is unsigned. Might require suffixing `imm` with `u` in
|
||
case it's part of an arithmetic expression that was promoted to `int`.
|
||
|
||
### Comparisons
|
||
|
||
* Any comparison of a register with a literal 0 is optimized to `OR reg, reg`
|
||
followed by a conditional jump, no matter how many calculations and inlined
|
||
functions are involved. Any `CMP reg, 0` instructions must have either come
|
||
from assembly, or referred to a *pointer* at address 0:
|
||
|
||
```c++
|
||
extern void near *address_0; // Public symbol at near address 0
|
||
register int i;
|
||
|
||
if(i != reinterpret_cast<int>(address_0)) {
|
||
// ↑ Will emit `CMP reg, 0`
|
||
}
|
||
```
|
||
|
||
* `CMP` instructions not followed by jumps correspond to empty `if` statements:
|
||
|
||
```c++
|
||
if(foo > 100) { // CMP foo, 100
|
||
}
|
||
bar = 8; // MOV bar, 8
|
||
```
|
||
|
||
## Pointer arithmetic
|
||
|
||
* Using parentheses or subscripts in an offset calculation implies a certain
|
||
order of operations, which can greatly impact the generated code:
|
||
|
||
```c++
|
||
char far *plane = reinterpret_cast<char __seg *>(0xA800);
|
||
int y = (17 * (640 / 8)); // MOV DX, 1360
|
||
int x = 4; // MOV CX, 4
|
||
|
||
// LES BX, [plane]
|
||
// ADD BX, DX
|
||
// ADD BX, CX
|
||
// MOV AL, ES:[BX]
|
||
_AL = *(plane + y + x);
|
||
_AL = *(y + plane + x);
|
||
|
||
// LES BX, [plane]
|
||
// ADD BX, CX ; CX and DX swapped, compared to the one above
|
||
// ADD BX, DX
|
||
// MOV AL, ES:[BX]
|
||
_AL = *(y + (plane + x));
|
||
|
||
// MOV BX, DX
|
||
// ADD BX, CX
|
||
// MOV ES, WORD PTR [plane + 2]
|
||
// ADD BX, WORD PTR [plane]
|
||
// MOV AL, ES:[BX]
|
||
_AL = *(plane + (y + x));
|
||
_AL = plane[y + x];
|
||
```
|
||
|
||
## Floating-point arithmetic
|
||
|
||
* Since the x87 FPU can only load from memory, all temporary results of
|
||
arithmetic are spilled to one single compiler-generated variable (`fpu_tmp`)
|
||
on the stack, which is reused across all of the function:
|
||
|
||
| | |
|
||
|-|-|
|
||
| `MOV AX, myint`<br />`INC AX`<br />`MOV fpu_tmp, ax`<br />`FILD fpu_tmp`<br />`FSTP ret` | `float ret = (myint + 1)` |
|
||
|
||
* The same `fpu_tmp` variable is also used as the destination for `FNSTSW`,
|
||
used in comparisons.
|
||
|
||
* On the stack, `fpu_tmp` is placed after all variables declared at the
|
||
beginning of the function.
|
||
|
||
* Performing arithmetic or comparisons between `float` and `double` variables
|
||
*always* `FLD`s the `float` first, before emitting the corresponding FPU
|
||
instruction for the `double`, regardless of how the variables are placed in
|
||
the expression. The instruction order only matches the expression order for
|
||
literals:
|
||
|
||
```c++
|
||
char ret;
|
||
float f;
|
||
double d;
|
||
|
||
ret = (f > d); // FLD f, FCOMP d
|
||
ret = (d > f); // FLD f, FCOMP d
|
||
|
||
ret = (d > 3.14f); // FLD d, FCOMP 3.14f
|
||
ret = (3.14f > d); // FLD 3.14f, FCOMP d
|
||
ret = (f > 3.14); // FLD f, FCOMP 3.14 + 4
|
||
ret = (3.14 > f); // FLD 3.14, FCOMP f + 4
|
||
```
|
||
|
||
## Assignments
|
||
|
||
| | |
|
||
|-|-|
|
||
| `MOV ???, [SI+????]` | Only achievable through pointer arithmetic? |
|
||
|
||
* When assigning to an array element at a variable or non-0 index, the array
|
||
element address is typically evaluated before the expression to be assigned.
|
||
But when assigning
|
||
* the result of any arithmetic expression of a *16-bit type*
|
||
* to an element of a `far` array of a *16-bit type*,
|
||
|
||
the expression will be evaluated first, if its signedness differs from that
|
||
of the array:
|
||
|
||
```c
|
||
int far *s;
|
||
unsigned int far *u;
|
||
int s1, s2;
|
||
unsigned int u1, u2;
|
||
|
||
s[1] = (s1 | s2); // LES BX, [s]; MOV AX, s1; OR AX, s2; MOV ES:[BX+2], AX
|
||
s[1] = (s1 | u2); // MOV AX, s1; OR AX, u2; LES BX, [s]; MOV ES:[BX+2], AX
|
||
s[1] = (u1 | u2); // MOV AX, u1; OR AX, u2; LES BX, [s]; MOV ES:[BX+2], AX
|
||
|
||
u[1] = (s1 | s2); // MOV AX, s1; OR AX, s2; LES BX, [u]; MOV ES:[BX+2], AX
|
||
u[1] = (s1 | u2); // LES BX, [u]; MOV AX, s1; OR AX, u2; MOV ES:[BX+2], AX
|
||
u[1] = (u1 | u2); // LES BX, [u]; MOV AX, u1; OR AX, u2; MOV ES:[BX+2], AX
|
||
```
|
||
|
||
* Assigning `AX` to multiple variables in a row also indicates multiple
|
||
assignment in C:
|
||
|
||
```c
|
||
// Applies to all storage durations
|
||
int a, b, c;
|
||
|
||
a = 0; // MOV [a], 0
|
||
b = 0; // MOV [b], 0
|
||
c = 0; // MOV [c], 0
|
||
|
||
a = b = c = 0; // XOR AX, AX; MOV [c], AX; MOV [b], AX; MOV [a], AX;
|
||
// Note the opposite order of variables!
|
||
```
|
||
|
||
* For trivially copyable structures, copy assignments are optimized to an
|
||
equivalent of `memcpy()`:
|
||
|
||
| Structure size | (no flags) | -G |
|
||
|----------------|-------------|----------------------|
|
||
| 1 | via `AL` | via `AL` |
|
||
| 2 | via `AX` | via `AX` |
|
||
| 3 | `SCOPY@` | via `AX` and `AL` |
|
||
| 4 | via `DX:AX` | via `DX:AX` |
|
||
| 5, 7, 9 | `SCOPY@` | via `AX` and `AL` |
|
||
| 6, 8 | `SCOPY@` | via `AX` |
|
||
| 10, 12, 14, … | `SCOPY@` | `REP MOVSW` |
|
||
| 11, 13, 15, … | `SCOPY@` | `REP MOVSW`, `MOVSB` |
|
||
|
||
(With the `-3` flag, `EAX` is used instead of `DX:AX` in the 4-byte case,
|
||
but everything else stays the same.)
|
||
|
||
Breaking triviality by overloading `operator =` in any of the structure
|
||
members also breaks this optimization. In some cases, it might be possible
|
||
to recreate it, by simulating triviality in an overloaded copy assignment
|
||
operator inside the class in question:
|
||
|
||
```c++
|
||
struct Nontrivial {
|
||
nontrivial_char_t e[100];
|
||
// Functions containing local classes aren't expanded inline, so...
|
||
struct Trivial {
|
||
char e[100];
|
||
};
|
||
|
||
void operator =(const Nontrivial &other) {
|
||
reinterpret_cast<Trivial &>(*this) = (
|
||
reinterpret_cast<const Trivial &>(other)
|
||
);
|
||
}
|
||
};
|
||
```
|
||
|
||
However, this only generates identical code to the original optimization if
|
||
passing the `other` parameter can be inlined, which isn't always the case.
|
||
|
||
## Function pointers
|
||
|
||
Type syntax (cf. [platform.h](../platform.h)):
|
||
|
||
| | … near function | … far function |
|
||
|------------------|---------------------------|--------------------------|
|
||
| Near pointer to… | `int (near *near nn_t)()` | `int (far *near fn_t)()` |
|
||
| Far pointer to… | `int (near *far nf_t)()` | `int (far *far ff_t)()` |
|
||
|
||
Calling conventions can be added before the `*`.
|
||
|
||
## `switch` statements
|
||
|
||
* Sequence of the individual cases is identical in both C and ASM
|
||
* Multiple cases with the same offset in the table, to code that doesn't
|
||
return? Code was compiled with `-O`
|
||
* Having no more than 3 `case`s (4 with an additional `default`) generates
|
||
comparison/branching code instead of a jump table. The comparisons will be
|
||
sorted in ascending order of the `case` values, while the individual branch
|
||
bodies still match their order given in the code:
|
||
|
||
```c
|
||
switch(foo) { // MOV AX, foo
|
||
default: foo = 0; break; // CMP AX, 10; JZ @@case_10
|
||
case 30: foo = 3; break; // CMP AX, 20; JZ @@case_20
|
||
case 10: foo = 1; break; // CMP AX, 30; JZ @@case_30
|
||
case 20: foo = 2; break; // MOV foo, 0
|
||
} // JMP @@after_switch
|
||
// @@case_30: MOV foo, 3; JMP @@after_switch
|
||
// @@case_10: MOV foo, 1; JMP @@after_switch
|
||
// @@case_20: MOV foo, 2;
|
||
// @@after_switch:
|
||
```
|
||
|
||
* With the `-G` (Generate for speed) option, complicated `switch` statements
|
||
that require both value and jump tables are compiled to a binary search with
|
||
regular conditional branches:
|
||
|
||
```c
|
||
switch(foo) {
|
||
case 0x4B: /* […] */ break;
|
||
case 0x4D: /* […] */ break;
|
||
case 0x11: /* […] */ break;
|
||
case 0x1F: /* […] */ break;
|
||
case 0x20: /* […] */ break;
|
||
case 0x17: /* […] */ break;
|
||
case 0x26: /* […] */ break;
|
||
case 0x19: /* […] */ break;
|
||
case 0x01: /* […] */ break;
|
||
case 0x1C: /* […] */ break;
|
||
}
|
||
```
|
||
|
||
Resulting ASM:
|
||
|
||
```asm
|
||
@@switch:
|
||
MOV AX, foo
|
||
CMP AX, 1Fh
|
||
JZ @@case_1Fh
|
||
JG @@GT_1Fh
|
||
CMP AX, 17h
|
||
JZ @@case_17h
|
||
JG @@GT_17h_LT_1Fh
|
||
CMP AX, 01h
|
||
JZ @@case_01h
|
||
CMP AX, 11h
|
||
JZ @@case_11h
|
||
JMP @@no_case_found
|
||
|
||
@@GT_17h_LT_1Fh:
|
||
CMP AX, 1Ch
|
||
JZ @@case_1Ch
|
||
JMP @@no_case_found
|
||
|
||
@@GT_1Fh:
|
||
CMP AX, 4Bh
|
||
JZ @@case_4Bh
|
||
JG @@GT_4Bh
|
||
CMP AX, 20h
|
||
JZ @@case_
|
||
CMP AX, 26h
|
||
JZ @@case_26h
|
||
JMP @@no_case_found
|
||
|
||
@@GT_4Bh:
|
||
CMP AX, 4Dh
|
||
JZ @@case_4Dh
|
||
JMP @@no_case_found
|
||
```
|
||
|
||
## Function calls
|
||
|
||
### `NOP` insertion
|
||
|
||
Happens for every `far` call to outside of the current translation unit, even
|
||
if both the caller and callee end up being linked into the same code segment.
|
||
|
||
**Certainty:** Seems like there *might* be a way around that, apart from
|
||
temporarily spelling out these calls in ASM until both functions are compiled
|
||
as part of the same translation unit. Found nothing so far, though.
|
||
|
||
### Pushing byte arguments to functions
|
||
|
||
Borland C++ just pushes the entire word. Will cause IDA to mis-identify
|
||
certain local variables as `word`s when they aren't.
|
||
|
||
### Pushing pointers
|
||
|
||
Passing `far` pointers to subscripted array elements requires code to calculate
|
||
the offset. Turbo C++ emits this calculation (and not the `PUSH` itself) either
|
||
before or after the segment is `PUSH`ed. If either
|
||
|
||
1. the pointer is `near`, or
|
||
2. the parameter is a near or far `const *`,
|
||
|
||
the segment argument is always pushed immediately, before evaluating the
|
||
offset:
|
||
|
||
```c++
|
||
#pragma option -ml
|
||
|
||
struct s100 {
|
||
char c[100];
|
||
};
|
||
|
||
extern s100 structs[5];
|
||
|
||
void __cdecl process_mut(s100 *element);
|
||
void __cdecl process_const(const s100 *element);
|
||
|
||
void foo(int i) {
|
||
process_mut((s100 near *)(&structs[i])); // PUSH DS; (AX = offset); PUSH AX;
|
||
process_mut((s100 far *)(&structs[i])); // (AX = offset); PUSH DS; PUSH AX;
|
||
process_const((s100 near *)(&structs[i])); // PUSH DS; (AX = offset); PUSH AX;
|
||
process_const((s100 far *)(&structs[i])); // PUSH DS; (AX = offset); PUSH AX;
|
||
}
|
||
```
|
||
|
||
## Flags
|
||
|
||
### `-G` (Generate for speed)
|
||
|
||
* Replaces
|
||
|
||
```asm
|
||
ENTER <stack size>, 0
|
||
```
|
||
|
||
with
|
||
|
||
```asm
|
||
PUSH BP
|
||
MOV BP, SP
|
||
SUB SP, <stack size>
|
||
```
|
||
|
||
### `-Z` (Suppress register reloads)
|
||
|
||
* The tracked contents of `ES` are reset after a conditional statement. If the
|
||
original code had more `LES` instructions than necessary, this indicates a
|
||
specific layout of conditional branches:
|
||
|
||
```c++
|
||
struct foo {
|
||
char a, b;
|
||
|
||
char es_not_reset();
|
||
char es_reset();
|
||
};
|
||
|
||
char foo::es_not_reset() {
|
||
return (
|
||
a // LES BX, [bp+this]
|
||
&& b // `this` still remembered in ES, not reloaded
|
||
);
|
||
}
|
||
|
||
char foo::es_reset() {
|
||
if(a) return 1; // LES BX, [bp+this]
|
||
// Tracked contents of ES are reset
|
||
if(b) return 1; // LES BX, [bp+this]
|
||
return 0;
|
||
}
|
||
```
|
||
|
||
This also applies to divisors stored in `BX`.
|
||
|
||
### `-3` (80386 Instructions)
|
||
|
||
* 32-bit function return values are stored in `DX:AX` even with this option
|
||
enabled. Assigning such a returned value generates different instructions
|
||
if the signedness of the return type differs from the signedness of the
|
||
target variable:
|
||
|
||
```c
|
||
/* */ long ret_signed(void) { return 0x12345678; }
|
||
unsigned long ret_unsigned(void) { return 0x12345678; }
|
||
|
||
void foo(void) {
|
||
long s;
|
||
unsigned long u;
|
||
|
||
s = ret_signed(); // MOV WORD PTR [s+2], DX; MOV WORD PTR [s+0], AX;
|
||
s = ret_unsigned(); // PUSH DX; PUSH AX; POP EAX; MOV DWORD PTR [s], EAX;
|
||
u = ret_signed(); // PUSH DX; PUSH AX; POP EAX; MOV DWORD PTR [u], EAX;
|
||
u = ret_unsigned(); // MOV WORD PTR [u+2], DX; MOV WORD PTR [u+0], AX;
|
||
}
|
||
```
|
||
|
||
Without `-3`, the two-`MOV WORD PTR` variant is generated in all four cases.
|
||
|
||
### `-3` (80386 Instructions) + `-Z` (Suppress register reloads)
|
||
|
||
Bundles two consecutive 16-bit function parameters into a single 32-bit one,
|
||
passed via a single 32-bit `PUSH`. Currently confirmed to happen for literals
|
||
and structure members whose memory layout matches the parameter list and
|
||
calling convention. Signedness doesn't matter.
|
||
|
||
Won't happen for two consecutive 8-bit parameters, and can be circumvented by
|
||
casting a near pointer to a 16-bit integer and back.
|
||
|
||
```c
|
||
// Works for all storage durations
|
||
struct { int x, y; } p;
|
||
struct { unsigned int x, y; } q;
|
||
|
||
void __cdecl foo_c(char x, char y);
|
||
void __cdecl foo_s(int x, int y);
|
||
void __cdecl foo_u(unsigned int x, unsigned int y);
|
||
|
||
foo_s(640, 400); // PUSH LARGE 1900280h
|
||
foo_u(640, 400); // PUSH LARGE 1900280h
|
||
foo_s(p.x, p.y); // PUSH LARGE [p]
|
||
foo_u(p.x, p.y); // PUSH LARGE [p]
|
||
foo_s(q.x, q.y); // PUSH LARGE [p]
|
||
foo_u(q.x, q.y); // PUSH LARGE [p]
|
||
foo_c(100, 200); // PUSH 200; PUSH 100
|
||
|
||
// PUSH [p.x]; PUSH [p.y];
|
||
foo_u(*reinterpret_cast<int near *>(reinterpret_cast<unsigned int>(&p.x)), p.y);
|
||
foo_s(*reinterpret_cast<int near *>(reinterpret_cast<unsigned int>(&p.x)), p.y);
|
||
```
|
||
|
||
### `-O` (Optimize jumps)
|
||
|
||
Inhibited by:
|
||
|
||
* identical variable declarations within more than one scope – the
|
||
optimizer will only merge the code *after* the last ASM reference to that
|
||
declared variable. Yes, even though the emitted ASM would be identical:
|
||
|
||
```c
|
||
if(a) {
|
||
int v = set_v();
|
||
do_something_else();
|
||
use(v);
|
||
} else if(b) {
|
||
// Second declaration of [v]. Even though it's assigned to the same stack
|
||
// offset, the second `PUSH c` call will still be emitted separately.
|
||
// Thus, jump optimization only reuses the `CALL use` instruction.
|
||
// Move the `int v;` declaration to the beginning of the function to
|
||
// avoid this.
|
||
int v = set_v();
|
||
use(v);
|
||
}
|
||
```
|
||
|
||
* distinct instances of assignments of local variables in registers to itself
|
||
|
||
* inlined calls to empty functions
|
||
|
||
`-O` also merges the `ADD SP, imm8` or `POP CX` stack-clearing instructions
|
||
after successive `__cdecl` function calls into a single one with their combined
|
||
parameter size after the final function call in such a series. Declaring a
|
||
local variable after a function call, with or without assigning a value, will
|
||
interrupt such a series and force a stack-clearing instruction after the final
|
||
function call before the declaration.
|
||
|
||
* **[Bug:]** Any emitted call to `SCOPY@` will disable this feature of `-O` for
|
||
all generated code in a translation unit that follows the `SCOPY@` call.
|
||
|
||
This can explain why a function might seem impossible to decompile with the
|
||
wrong translation unit layout. If it
|
||
|
||
* *doesn't* contain the stack-clearing optimization,
|
||
* but *does* definitely contain optimized jumps,
|
||
* which couldn't be reproduced with the slight jump optimization provided by
|
||
`-O- -y`,
|
||
|
||
the translation unit is simply missing a `SCOPY@` before the function in
|
||
question.
|
||
|
||
### `-y` (Produce line number info)
|
||
|
||
Provides its own kind of slight jump optimization if combined with `-O-`. Yes,
|
||
seriously. Might be required to decompile code that seems to contain both some
|
||
of the jump optimizations from `-O` and the stack-clearing instructions after
|
||
every function call from `-O-`.
|
||
|
||
## Initialization
|
||
|
||
Any initialization of a variable with static storage duration (even a `const`
|
||
one) that involves function calls (even those that would regularly inline)
|
||
will emit a `#pragma startup` function to perform that initialization at
|
||
runtime.
|
||
This extends to C++ constructors, making macros the only way to initialize
|
||
such variables with arithmetic expressions at compile time.
|
||
|
||
```c
|
||
#define FOO(x) (x << 1)
|
||
|
||
inline char foo(const char x) {
|
||
return FOO(x);
|
||
}
|
||
|
||
const char static_storage[3] = { FOO(1), foo(2), FOO(3) };
|
||
```
|
||
Resulting ASM (abbreviated):
|
||
```asm
|
||
.data
|
||
static_storage db 2, 0, 6
|
||
|
||
.code
|
||
@_STCON_$qv proc near
|
||
push bp
|
||
mov bp, sp
|
||
mov static_storage[1], 4
|
||
pop bp
|
||
ret
|
||
@_STCON_$qv endp
|
||
```
|
||
|
||
## Padding bytes in code segments
|
||
|
||
* Usually, padding `0x00` bytes are only emitted to word-align `switch` jump
|
||
tables with `-a2`. Anywhere else, it typically indicates the start or end of
|
||
a word-aligned `SEGMENT` compiled from assembly. There are two potential
|
||
workarounds though:
|
||
|
||
* The `-WX` option (Create DPMI application) *will* enforce word alignment
|
||
for the code segment, at the cost of slightly different code generation in
|
||
certain places. Since it also adds an additional `INC BP` instruction
|
||
before `PUSH BP`, and an additional `DEC BP` instruction after `POP BP`,
|
||
this option can only really be used in translation units with disabled
|
||
stack frames (`-k-`).
|
||
|
||
* `#pragma codestring \x00` unconditionally emits a `0x00` byte. However,
|
||
this won't respect different alignment requirements of surrounding
|
||
translation units.
|
||
|
||
**Certainty**: Reverse-engineering `TCC.EXE` confirmed that these are the
|
||
only ways.
|
||
|
||
## Memory segmentation
|
||
|
||
The segment and group a function will be emitted into can be controlled via
|
||
`#pragma option -zC` / `#pragma option -zP` and `#pragma codeseg`. These
|
||
mechanisms apply equally to function declarations and definitions. The active
|
||
segment/group during a function's first reference will take precedence over any
|
||
later segment/group the function shows up in:
|
||
|
||
```c++
|
||
#pragma option -zCfoo_TEXT -zPfoo
|
||
|
||
void bar(void);
|
||
|
||
#pragma codeseg baz_TEXT baz
|
||
|
||
// Despite the segment change in the line above, this function will still be
|
||
// put into `foo_TEXT`, the active segment during the first appearance of the
|
||
// function name.
|
||
void bar(void) {
|
||
}
|
||
|
||
// This function hasn't been declared yet, so it will go into `baz_TEXT` as
|
||
// expected.
|
||
void baz(void) {
|
||
}
|
||
```
|
||
|
||
When fixing up near references, the linker takes the actual flat/linear address
|
||
and subtracts it from the base address of the reference's declared segment,
|
||
assuming that the respective segment register is set to that specific segment
|
||
at runtime. Therefore, incorrect segment declarations lead to incorrectly
|
||
calculated offsets, and the linker can't realistically warn about such cases.
|
||
There *is* the `Fixup overflow` error, but the linker only throws that one if
|
||
the calculated distance exceeds 64 KiB and thus couldn't be expressed in a near
|
||
reference to begin with.
|
||
|
||
Generally, it's better to just fix wrong segments at declaration time:
|
||
|
||
```c++
|
||
// Set the correct segment and group
|
||
#pragma codeseg bar_TEXT bar_correct_group
|
||
|
||
void near bar(void);
|
||
|
||
// Return to the default code segment
|
||
#pragma codeseg
|
||
```
|
||
|
||
However, there is a workaround if the intended near offset should simply be
|
||
relative to the actual segment of the symbol: Declaring the identifier as `far`
|
||
rather than near, and casting its implicit segment away for the near context.
|
||
In the case of [function pointers]:
|
||
|
||
```c++
|
||
void far bar();
|
||
static nn_t qux = reinterpret_cast<nn_t>(bar);
|
||
```
|
||
|
||
This works because a `far` symbol always includes the segment it was emitted
|
||
into. The cast simply reduces such a reference to its offset part within that
|
||
segment.\
|
||
This wrong declaration of `bar()` must, of course, not be `#include`d into the
|
||
translation unit that actually defines `bar()` as a `near` function, as it was
|
||
intended. It can't also be local to an inlined function that's part of a public
|
||
header, since those declarations seem to escape to the global scope there.
|
||
|
||
## C++
|
||
|
||
In C++ mode, the value of a `const` scalar-type variable declared at global
|
||
scope is always inlined, and not emitted into the data segment. Also, no
|
||
externally visible symbol for the variable is emitted into the .OBJ file, even
|
||
if the variable was not declared `static`. This makes such variables largely
|
||
equivalent to `#define` macros.
|
||
|
||
### Methods
|
||
|
||
Note the distinction between *`struct`/`class` distance* and *method distance*:
|
||
|
||
* Declaring the *type* as `near` or `far` controls whether `this` is passed as
|
||
a near or far pointer.
|
||
* Declaring a *method* as `near` or `far` controls whether a method call
|
||
generates a `CALL near ptr` or `CALL far ptr` instruction.
|
||
|
||
These can be freely combined, and one does not imply the other.
|
||
|
||
#### Inlining
|
||
|
||
Support for inlined functions is exclusive to C++ mode, with both top-level
|
||
`inline` and class methods defined inside class declarations (obviously) not
|
||
being supported in C mode. The compiler will inline every function defined in
|
||
one of these ways, unless it contains one of these language constructs:
|
||
|
||
* Loops (`do`, `for`, `while`, `break`, `continue`)
|
||
* `goto`
|
||
* `switch` and `case`
|
||
* `throw`
|
||
|
||
If it doesn't, inlining is always worth a try to get rid of a potential macro,
|
||
especially if all parameters are compile-time constants. There are a few
|
||
further constructs that typically don't inline optimally though:
|
||
|
||
* Assigning lvalues to value parameters (spills the value into a new
|
||
compiler-generated local variable)
|
||
* Assigning *either* rvalues *or* lvalues stored in registers to reference
|
||
parameters (same)
|
||
* Nested `if` statements – inlining will always generate a useless
|
||
`JMP SHORT $+2` at the end of the last branch.
|
||
|
||
Due to lazy code generation, values returned from inlined functions always
|
||
preserve their type during the instruction generated for the `return` statement,
|
||
and are only later cast into whatever the call site requires. Thus, certain
|
||
type-mismatched instructions in arithmetic expressions can only be generated by
|
||
returning one of the operands from an inlined function – unlike `static_cast`
|
||
and `reinterpret_cast`, which can be optimized away.
|
||
|
||
Class methods inline to their ideal representation if all of these are true:
|
||
|
||
* returns `void` || (returns `*this` && is at the first nesting level of
|
||
inlining)
|
||
* takes no parameters || takes only built-in, scalar-type parameters
|
||
|
||
Examples:
|
||
|
||
* A class method (first nesting level) calling an overloaded operator (second
|
||
nesting level) returning `*this` will generate (needless) instructions
|
||
equivalent to `MOV AX, *this`. Thus, any overloaded `=`, `+=`, `-=`, etc.
|
||
operator should always return `void`.
|
||
|
||
**Certainty**: See the examples in `9d121c7`. This is what allows us to use
|
||
custom types with overloaded assignment operators, with the resulting code
|
||
generation being indistinguishable from equivalent C preprocessor macros.
|
||
|
||
* Returning *anything else* but `void` or `*this` will first store that result
|
||
in `AX`, leading any branches at the call site to then refer to `AX`.
|
||
|
||
**Certainty**: Maybe Borland (not Turbo) C++ has an optimization option
|
||
against it?
|
||
|
||
### Boilerplate for constructors defined outside the class declaration
|
||
|
||
```c++
|
||
struct MyClass {
|
||
// Members…
|
||
|
||
MyClass();
|
||
};
|
||
|
||
MyClass::MyClass() {
|
||
// Initialization…
|
||
}
|
||
```
|
||
|
||
Resulting ASM:
|
||
|
||
```asm
|
||
; MyClass::MyClass(MyClass* this)
|
||
; Exact instructions differ depending on the memory model. Model-independent
|
||
; ASM instructions are in UPPERCASE.
|
||
@MyClass@$bctr$qv proc
|
||
PUSH BP
|
||
MOV BP, SP
|
||
; (saving SI and DI, if used in constructor code)
|
||
; (if this, 0)
|
||
JNZ @@ctor_code
|
||
PUSH sizeof(MyClass)
|
||
CALL @$bnew$qui ; operator new(uint)
|
||
POP CX
|
||
; (this = value_returned_from_new)
|
||
; (if this)
|
||
JZ @@ret
|
||
|
||
@@ctor_code:
|
||
; Initialization…
|
||
|
||
@@ret:
|
||
; (retval = this)
|
||
; (restoring DI and SI, if used in constructor code)
|
||
POP BP
|
||
RET
|
||
@MyClass@$bctr$qv endp
|
||
```
|
||
|
||
* Arrays of nontrivially constructible objects are always constructed via
|
||
`_vector_new_`. Conversely, no `_vector_new_` = no array.
|
||
|
||
## Limits of decompilability
|
||
|
||
### `MOV BX, SP`-style functions, or others with no standard stack frame
|
||
|
||
These almost certainly weren't compiled from C. By disabling stack frames
|
||
using `#pragma option -k-`, it *might* be possible to still get the exact same
|
||
code out of Turbo C++ – even though it will most certainly look horrible, and
|
||
barely more readable than assembly (or even less so), with tons of inline ASM
|
||
and register pseudovariables. However, it's futile to even try if the function
|
||
contains one of the following:
|
||
|
||
<a id="clobbering-di"></a>
|
||
|
||
* References to the `SI` or `DI` registers. In that case, Turbo C++ always
|
||
inserts
|
||
|
||
* a `PUSH (SI|DI)` at the beginning (after any `PUSH BP; MOV BP, SP`
|
||
instructions and *before* anything else)
|
||
* and a `POP (SI|DI)` before returning.
|
||
|
||
**Certainty:** Confirmed through reverse-engineering `TCC.EXE`, no way
|
||
around it.
|
||
|
||
## Compiler bugs
|
||
|
||
* Dereferencing a `far` pointer constructed from the `_FS` and `_GS`
|
||
pseudoregisters emits wrong segment prefix opcodes – 0x46 (`INC SI`) and
|
||
0x4E (`DEC SI`) rather than the correct 0x64 and 0x65, respectively.
|
||
|
||
**Workaround**: Not happening when compiling via TASM (`-B` on the command
|
||
line, or `#pragma inline`).
|
||
|
||
* Any emitted call to `SCOPY@` will disable the stack cleanup optimization
|
||
generated by [`-O`](#-o-optimize-jumps) for all generated code in a
|
||
translation unit that follows the `SCOPY@` call.
|
||
|
||
----
|
||
|
||
[Bug:]: #compiler-bugs
|
||
[function pointers]: #function-pointers
|