ReC98/Research/Borland C++ decompilation.md

## Local variables

| | |
|-|-|
| `DX` | First 8-bit variable declared *if no other function is called*<br />Second 16-bit variable declared *if no other function is called* |
| `[bp-1]` | First 8-bit variable declared *otherwise* |
| `SI` | First 16-bit variable declared |
| `DI` | Second 16-bit variable declared *if other functions are called* |

Example:

|      ASM | Declaration sequence in C |
|----------|---------------------------|
|     `SI` | `int near *var_1;`        |
| `[bp-1]` | `char var_2;`             |
| `[bp-2]` | `char var_3;`             |

### Grouping

Any structures or classes that contain more than a single scalar-type member
are grouped according to their declaration order, and placed *after* (that is,
further away from BP) than all scalar-type variables. This means that it's not
possible to bundle a set of variables with the same meaning into a structure
(e.g. pointers to all 4 VRAM planes) if a scalar-type variable is placed
inbetween two of these structure instances on the stack: Those structure
instances would be grouped and always placed next to each other, no matter
where the scalar-type variable is declared in relation to them.

## Signedness

| | |
|-|-|
| `MOV al, var`<br />`MOV ah, 0`| `var` is *unsigned char* |
| `MOV al, var`<br />`CBW` | `var` is *char*, `AX` is *int* |

## Integer arithmetic

| | |
|-|-|
| `ADD [m8], imm8` | Only achievable through a C++ method operating on a  member? |
| `MOV AL, [m8]`<br />`ADD AL, imm8`<br />`MOV [m8], AL` | Opposite; *not* an inlined function |
| `CWD`<br />`SUB AX, DX`<br />`SAR AX, 1` | `AX / 2`, `AX` is *int* |
| `MOV [new_var], AX`<br />`CWD`<br />`XOR AX, DX`<br />`SUB AX, DX` | `abs(AX)`, defined in `<stdlib.h>`. `AX` is *int* |

* When bit-testing the a variable with a 16-bit mask via `&` in a conditional
  expression, the `TEST` is optimized to cover just the high or low byte, if
  possible:
  ```c
  long v = 0xFFFFFFFF; // Works regardless of size or signedness
  char       b00 = (v & 0x00000001) != 0; // TEST  BYTE PTR [v + 0], 1
  char       b08 = (v & 0x00000100) != 0; // TEST  BYTE PTR [v + 1], 1
  char       b16 = (v & 0x00010000) != 0; // TEST DWORD PTR [v + 0], 0x00010000
  char       b24 = (v & 0x01000000) != 0; // TEST DWORD PTR [v + 0], 0x01000000
  char b00_to_15 = (v & 0x0000FFFF) != 0; // TEST  WORD PTR [v + 0], 0xFFFF
  char b16_to_31 = (v & 0xFFFF0000) != 0; // TEST DWORD PTR [v + 0], 0xFFFF0000
  char b08_to_23 = (v & 0x00FFFF00) != 0; // TEST DWORD PTR [v + 0], 0x00FFFF00
  ```

### Arithmetic on a register *after* assigning it to a variable?

Assigment is part of the C expression. If it's a comparison, that comparison
must be spelled out to silence the `Possibly incorrect assignment` warning.

| | |
|-|-|
| `CALL somefunc`<br />`MOV ??, AX`<br />`OR AX, AX`<br />`JNZ ↑` | `while(( ?? = somefunc() ) != NULL)` |

### `SUB ??, imm` vs. `ADD ??, -imm`

`SUB` means that `??` is unsigned. Might require suffixing `imm` with `u` in
case it's part of an arithmetic expression that was promoted to `int`.

## Floating-point arithmetic

* Since the x87 FPU can only load from memory, all temporary results of
  arithmetic are spilled to one single compiler-generated variable (`fpu_tmp`)
  on the stack, which is reused across all of the function:

  | | |
  |-|-|
  | `MOV  AX, myint`<br />`INC  AX`<br />`MOV  fpu_tmp, ax`<br />`FILD fpu_tmp`<br />`FSTP ret` | `float ret = (myint + 1)` |

* The same `fpu_tmp` variable is also used as the destination for `FNSTSW`,
  used in comparisons.

* Performing arithmetic or comparisons between `float` and `double` variables
  *always* `FLD`s the `float` first, before emitting the corresponding FPU
  instruction for the `double`, regardless of how the variables are placed in
  the expression. The instruction order only matches the expression order for
  literals:

  ```c++
  char ret;
  float f;
  double d;

  ret = (f > d);     // FLD f,     FCOMP d
  ret = (d > f);     // FLD f,     FCOMP d

  ret = (d > 3.14f); // FLD d,     FCOMP 3.14f
  ret = (3.14f > d); // FLD 3.14f, FCOMP d
  ret = (f > 3.14);  // FLD f,     FCOMP 3.14 + 4
  ret = (3.14 > f);  // FLD 3.14,  FCOMP f + 4
  ```

## Assignments

| | |
|-|-|
| `MOV ???, [SI+????]` | Only achievable through pointer arithmetic? |

* When assigning to a array element at a variable or non-0 index, the array
  element address is typically evaluated before the expression to be assigned.
  But when assigning
  * the result of any arithmetic expression of a *16-bit type*
  * to an element of a `far` array of a *16-bit type*,

  the expression will be evaluated first, if its signedness differs from that
  of the array:

  ```c
  int far *s;
  unsigned int far *u;
  int s1, s2;
  unsigned int u1, u2;

  s[1] = (s1 | s2); // LES BX, [s]; MOV AX, s1; OR AX, s2; MOV ES:[BX+2], AX
  s[1] = (s1 | u2); // MOV AX, s1; OR AX, u2; LES BX, [s]; MOV ES:[BX+2], AX
  s[1] = (u1 | u2); // MOV AX, u1; OR AX, u2; LES BX, [s]; MOV ES:[BX+2], AX

  u[1] = (s1 | s2); // MOV AX, s1; OR AX, s2; LES BX, [u]; MOV ES:[BX+2], AX
  u[1] = (s1 | u2); // LES BX, [u]; MOV AX, s1; OR AX, u2; MOV ES:[BX+2], AX
  u[1] = (u1 | u2); // LES BX, [u]; MOV AX, u1; OR AX, u2; MOV ES:[BX+2], AX
  ```

* Assigning `AX` to multiple variables in a row also indicates multiple
  assignment in C:

  ```c
  // Applies to all storage durations
  int a, b, c;

  a = 0;  // MOV [a], 0
  b = 0;  // MOV [b], 0
  c = 0;  // MOV [c], 0

  a = b = c = 0; // XOR AX, AX;  MOV [c], AX; MOV [b], AX;  MOV [a], AX;
                 // Note the opposite order of variables!
  ```

* For trivially copyable structures, copy assignments are optimized to an
  equivalent of `memcpy()`:

  | Structure size | (no flags)  |  -G                  |
  |----------------|-------------|----------------------|
  |              1 | via `AL`    | via `AL`             |
  |              2 | via `AX`    | via `AX`             |
  |              3 | `SCOPY@`    | via `AX` and `AL`    |
  |              4 | via `DX:AX` | via `DX:AX`          |
  |        5, 7, 9 | `SCOPY@`    | via `AX` and `AL`    |
  |           6, 8 | `SCOPY@`    | via `AX`             |
  |  10, 12, 14, … | `SCOPY@`    | `REP MOVSW`          |
  |  11, 13, 15, … | `SCOPY@`    | `REP MOVSW`, `MOVSB` |

  (With the `-3` flag, `EAX` is used instead of `DX:AX` in the 4-byte case,
  but everything else stays the same.)

  Breaking triviality by overloading `operator =` in any of the structure
  members also breaks this optimization. In some cases, it might be possible
  to recreate it, by simulating triviality in an overloaded copy assignment
  operator inside the class in question:

  ```c++
  struct Nontrivial {
    nontrivial_char_t e[100];
    // Functions containing local classes aren't expanded inline, so...
    struct Trivial {
      char e[100];
    };

    void operator =(const Nontrivial &other) {
      reinterpret_cast<Trivial &>(*this) = (
          reinterpret_cast<const Trivial &>(other)
      );
    }
  };
  ```

  However, this only generates identical code to the original optimization if
  passing the `other` parameter can be inlined, which isn't always the case.

## `switch` statements

* Sequence of the individual cases is identical in both C and ASM
* Multiple cases with the same offset in the table, to code that doesn't
  return? Code was compiled with `-O`

## Function calls

### `NOP` insertion

Happens for every `far` call to outside of the current translation unit, even
if both the caller and callee end up being linked into the same code segment.

**Certainty:** Seems like there *might* be a way around that, apart from
temporarily spelling out these calls in ASM until both functions are compiled
as part of the same translation unit. Found nothing so far, though.

### Pushing byte arguments to functions

Borland C++ just pushes the entire word. Will cause IDA to mis-identify
certain local variables as `word`s when they aren't.

### Pushing pointers

When passing a `near` pointer to a function that takes a `far` one, the
segment argument is sometimes `PUSH`ed immediately, before evaluating the
offset:

```c++
#pragma option -ml

struct s100 {
  char c[100];
};

extern s100 structs[5];

void __cdecl process(s100 *element);

void foo(int i) {
  process((s100 near *)(&structs[i])); // PUSH DS; (AX = offset); PUSH AX;
  process((s100 far *)(&structs[i]));  // (AX = offset); PUSH DS; PUSH AX;
}
```

## Flags

### `-Z` (Suppress register reloads)

* The tracked contents of `ES` are reset after a conditional statement. If the
  original code had more `LES` instructions than necessary, this indicates a
  specific layout of conditional branches:

  ```c++
  struct foo {
    char a, b;

    char es_not_reset();
    char es_reset();
  };

  char foo::es_not_reset() {
    return (
      a    // LES BX, [bp+this]
      && b // `this` still remembered in ES, not reloaded
    );
  }

  char foo::es_reset() {
    if(a) return 1; // LES BX, [bp+this]
    // Tracked contents of ES are reset
    if(b) return 1; // LES BX, [bp+this]
    return 0;
  }
  ```

### `-3` (80386 Instructions) + `-Z` (Suppress register reloads)

Bundles two consecutive 16-bit function parameters into a single 32-bit one,
passed via a single 32-bit `PUSH`. Currently confirmed to happen for literals
and structure members whose memory layout matches the parameter list and
calling convention. Signedness doesn't matter.

Won't happen for two consecutive 8-bit parameters.

```c
// Works for all storage durations
struct {          int  x, y; } p;
struct { unsigned int  x, y; } q;

void __cdecl foo_c(char x, char y);
void __cdecl foo_s(int x, int y);
void __cdecl foo_u(unsigned int x, unsigned int y);

foo_s(640, 400); // PUSH LARGE 1900280h
foo_u(640, 400); // PUSH LARGE 1900280h
foo_s(p.x, p.y); // PUSH LARGE [p]
foo_u(p.x, p.y); // PUSH LARGE [p]
foo_s(q.x, q.y); // PUSH LARGE [p]
foo_u(q.x, q.y); // PUSH LARGE [p]
foo_c(100, 200); // PUSH 200; PUSH 100
```

### `-O` (Optimize jumps)

Also merges individual `ADD SP, imm8` or `POP CX` stack-clearing instructions
after `__cdecl` function calls into a single one with their combined parameter
size.

Inhibited by:

* identical variable declarations within more than one scope – the
  optimizer will only merge the code *after* the last ASM reference to that
  declared variable. Yes, even though the emitted ASM would be identical:

  ```c
  if(a) {
      int v = set_v();
      do_something_else();
      use(v);
  } else if(underline) {
      // Second declaration of [v]. Even though it's assigned to the same stack
      // offset, the second `PUSH w` call will still be emitted separately.
      // Thus, jump optimization only reuses the `CALL use` instruction.
      // Move the `int v;` declaraion to the beginning of the function to avoid
      // this.
      int v = set_v();
      use(v);
  }
  ```

* distinct instances of assignments of local variables in registers to itself

* inlined calls to empty functions

## Inlining

Always worth a try to get rid of a potential macro. Some edge cases don't
inline optimally though:

* Assignments to a pointer in `SI` – that pointer is moved to `DI`,
  [clobbering that register](#clobbering-di). Try a [class method](#C++)
  instead.
* Nested `if` statements – inlining will always generate a useless
  `JMP SHORT $+2` at the end of the last branch.

## Initialization

Any initialization of a variable with static storage duration (even a `const`
one) that involves function calls (even those that would regularly inline)
will emit a `#pragma startup` function to perform that initialization at
runtime.
This extends to C++ constructors, making macros the only way to initialize
such variables with arithmetic expressions at compile time.

```c
#define FOO(x) (x << 1)

inline char foo(const char x)  {
    return FOO(x);
}

const char static_storage[3] = { FOO(1), foo(2), FOO(3) };
```
Resulting ASM (abbreviated):
```asm
  .data
static_storage  db  2, 0, 6

  .code
@_STCON_$qv	proc	near
	push bp
	mov  bp, sp
	mov  static_storage[1], 4
	pop	 bp
	ret
@_STCON_$qv	endp
```

## Padding bytes in code segments

* Usually, padding `0x00` bytes are only emitted to word-align `switch` jump
  tables with `-a2`. Anywhere else, it typically indicates the start or end of
  a word-aligned `SEGMENT` compiled from assembly. There are two potential
  workarounds though:

  * The `-WX` option (Create DPMI application) *will* enforce word alignment
    for the code segment, at the cost of slightly different code generation in
    certain places. Since it also adds an additional `INC BP` instruction
    before `PUSH BP`, and an additional `DEC BP` instruction after `POP BP`,
    this option can only really be used in translation units with disabled
    stack frames (`-k-`).

  * `#pragma codestring \x00` unconditionally emits a `0x00` byte. However,
    this won't respect different alignment requirements of surrounding
    translation units.

  **Certainty**: Reverse-engineering `TCC.EXE` confirmed that these are the
  only ways.

## C++

In C++ mode, the value of a `const` scalar-type variable declared at global
scope is always inlined, and not emitted into the data segment. Also, no
externally visible symbol for the variable is emitted into the .OBJ file, even
if the variable was not declared `static`. This makes such variables largely
equivalent to `#define` macros.

Class methods inline to their ideal representation if all of these are true:

* returns `void` || (returns `*this` && is at the first nesting level of
  inlining)
* takes no parameters || takes only built-in, scalar-type parameters

Examples:

* A class method (first nesting level) calling an overloaded operator (second
  nesting level) returning `*this` will generate (needless) instructions
  equivalent to `MOV AX, *this`. Thus, any overloaded `=`, `+=`, `-=`, etc.
  operator should always return `void`.

  **Certainty**: See the examples in `9d121c7`. This is what allows us to use
  custom types with overloaded assignment operators, with the resulting code
  generation being indistinguishable from equivalent C preprocessor macros.

* Returning *anything else* but `void` or `*this` will first store that result
  in `AX`, leading any branches at the call site to then refer to `AX`.

  **Certainty**: Maybe Borland (not Turbo) C++ has an optimization option
  against it?

### Boilerplate for constructors defined outside the class declaration

```c++
struct MyClass {
  // Members…

  MyClass();
};

MyClass::MyClass() {
  // Initialization…
}
```

Resulting ASM:

```asm
; MyClass::MyClass(MyClass* this)
; Exact instructions differ depending on the memory model. Model-independent
; ASM instructions are in UPPERCASE.
@MyClass@$bctr$qv proc
  PUSH  BP
  MOV   BP, SP
  ; (saving SI and DI, if used in constructor code)
  ; (if this, 0)
  JNZ   @@ctor_code
  PUSH  sizeof(MyClass)
  CALL  @$bnew$qui      ; operator new(uint)
  POP   CX
  ; (this = value_returned_from_new)
  ; (if this)
  JZ    @@ret

@@ctor_code:
  ; Initialization…

@@ret:
  ; (retval = this)
  ; (restoring DI and SI, if used in constructor code)
  POP   BP
  RET
@MyClass@$bctr$qv endp
```

## Limits of decompilability

### `MOV BX, SP`-style functions, or others with no standard stack frame

These almost certainly weren't compiled from C. By disabling stack frames
using `#pragma option -k-`, it *might* be possible to still get the exact same
code out of Turbo C++ – even though it will most certainly look horrible, and
barely more readable than assembly (or even less so), with tons of inline ASM
and register pseudovariables. However, it's futile to even try if the function
contains one of the following:

<a id="clobbering-di"></a>

* References to the `SI` or `DI` registers. In that case, Turbo C++ always
  inserts

  * a `PUSH (SI|DI)` at the beginning (after any `PUSH BP; MOV BP, SP`
  instructions and *before* anything else)
  * and a `POP (SI|DI)` before returning.

  **Certainty:** Confirmed through reverse-engineering `TCC.EXE`, no way
  around it.

### Compiler bugs

* Dereferencing a `far` pointer constructed from the `_FS` and `_GS`
  pseudoregisters emits wrong segment prefix opcodes – 0x46 (`INC SI`) and
  0x4E (`DEC SI`) rather than the correct 0x64 and 0x65, respectively.

  **Workaround**: Not happening when compiling via TASM (`-B` on the command
  line, or `#pragma inline`).