15 KiB
Local variables
DX |
First 8-bit variable declared if no other function is called Second 16-bit variable declared if no other function is called |
[bp-1] |
First 8-bit variable declared otherwise |
SI |
First 16-bit variable declared |
DI |
Second 16-bit variable declared if other functions are called |
Example:
ASM | Declaration sequence in C |
---|---|
SI |
int near *var_1; |
[bp-1] |
char var_2; |
[bp-2] |
char var_3; |
Grouping
Any structures or classes that contain more than a single scalar-type member are grouped according to their declaration order, and placed after (that is, further away from BP) than all scalar-type variables. This means that it's not possible to bundle a set of variables with the same meaning into a structure (e.g. pointers to all 4 VRAM planes) if a scalar-type variable is placed inbetween two of these structure instances on the stack: Those structure instances would be grouped and always placed next to each other, no matter where the scalar-type variable is declared in relation to them.
Signedness
MOV al, var MOV ah, 0 |
var is unsigned char |
MOV al, var CBW |
var is char, AX is int |
Integer arithmetic
ADD [m8], imm8 |
Only achievable through a C++ method operating on a member? |
MOV AL, [m8] ADD AL, imm8 MOV [m8], AL |
Opposite; not an inlined function |
CWD SUB AX, DX SAR AX, 1 |
AX / 2 , AX is int |
MOV [new_var], AX CWD XOR AX, DX SUB AX, DX |
abs(AX) , defined in <stdlib.h> . AX is int |
- When bit-testing the a variable with a 16-bit mask via
&
in a conditional expression, theTEST
is optimized to cover just the high or low byte, if possible:long v = 0xFFFFFFFF; // Works regardless of size or signedness char b00 = (v & 0x00000001) != 0; // TEST BYTE PTR [v + 0], 1 char b08 = (v & 0x00000100) != 0; // TEST BYTE PTR [v + 1], 1 char b16 = (v & 0x00010000) != 0; // TEST DWORD PTR [v + 0], 0x00010000 char b24 = (v & 0x01000000) != 0; // TEST DWORD PTR [v + 0], 0x01000000 char b00_to_15 = (v & 0x0000FFFF) != 0; // TEST WORD PTR [v + 0], 0xFFFF char b16_to_31 = (v & 0xFFFF0000) != 0; // TEST DWORD PTR [v + 0], 0xFFFF0000 char b08_to_23 = (v & 0x00FFFF00) != 0; // TEST DWORD PTR [v + 0], 0x00FFFF00
Arithmetic on a register after assigning it to a variable?
Assigment is part of the C expression. If it's a comparison, that comparison
must be spelled out to silence the Possibly incorrect assignment
warning.
CALL somefunc MOV ??, AX OR AX, AX JNZ ↑ |
while(( ?? = somefunc() ) != NULL) |
SUB ??, imm
vs. ADD ??, -imm
SUB
means that ??
is unsigned. Might require suffixing imm
with u
in
case it's part of an arithmetic expression that was promoted to int
.
Floating-point arithmetic
-
Since the x87 FPU can only load from memory, all temporary results of arithmetic are spilled to one single compiler-generated variable (
fpu_tmp
) on the stack, which is reused across all of the function:MOV AX, myint
INC AX
MOV fpu_tmp, ax
FILD fpu_tmp
FSTP ret
float ret = (myint + 1)
-
The same
fpu_tmp
variable is also used as the destination forFNSTSW
, used in comparisons. -
Performing arithmetic or comparisons between
float
anddouble
variables alwaysFLD
s thefloat
first, before emitting the corresponding FPU instruction for thedouble
, regardless of how the variables are placed in the expression. The instruction order only matches the expression order for literals:char ret; float f; double d; ret = (f > d); // FLD f, FCOMP d ret = (d > f); // FLD f, FCOMP d ret = (d > 3.14f); // FLD d, FCOMP 3.14f ret = (3.14f > d); // FLD 3.14f, FCOMP d ret = (f > 3.14); // FLD f, FCOMP 3.14 + 4 ret = (3.14 > f); // FLD 3.14, FCOMP f + 4
Assignments
MOV ???, [SI+????] |
Only achievable through pointer arithmetic? |
-
When assigning to a array element at a variable or non-0 index, the array element address is typically evaluated before the expression to be assigned. But when assigning
- the result of any arithmetic expression of a 16-bit type
- to an element of a
far
array of a 16-bit type,
the expression will be evaluated first, if its signedness differs from that of the array:
int far *s; unsigned int far *u; int s1, s2; unsigned int u1, u2; s[1] = (s1 | s2); // LES BX, [s]; MOV AX, s1; OR AX, s2; MOV ES:[BX+2], AX s[1] = (s1 | u2); // MOV AX, s1; OR AX, u2; LES BX, [s]; MOV ES:[BX+2], AX s[1] = (u1 | u2); // MOV AX, u1; OR AX, u2; LES BX, [s]; MOV ES:[BX+2], AX u[1] = (s1 | s2); // MOV AX, s1; OR AX, s2; LES BX, [u]; MOV ES:[BX+2], AX u[1] = (s1 | u2); // LES BX, [u]; MOV AX, s1; OR AX, u2; MOV ES:[BX+2], AX u[1] = (u1 | u2); // LES BX, [u]; MOV AX, u1; OR AX, u2; MOV ES:[BX+2], AX
-
Assigning
AX
to multiple variables in a row also indicates multiple assignment in C:// Applies to all storage durations int a, b, c; a = 0; // MOV [a], 0 b = 0; // MOV [b], 0 c = 0; // MOV [c], 0 a = b = c = 0; // XOR AX, AX; MOV [c], AX; MOV [b], AX; MOV [a], AX; // Note the opposite order of variables!
-
For trivially copyable structures, copy assignments are optimized to an equivalent of
memcpy()
:Structure size (no flags) -G 1 via AL
via AL
2 via AX
via AX
3 SCOPY@
via AX
andAL
4 via DX:AX
via DX:AX
5, 7, 9 SCOPY@
via AX
andAL
6, 8 SCOPY@
via AX
10, 12, 14, … SCOPY@
REP MOVSW
11, 13, 15, … SCOPY@
REP MOVSW
,MOVSB
(With the
-3
flag,EAX
is used instead ofDX:AX
in the 4-byte case, but everything else stays the same.)Breaking triviality by overloading
operator =
in any of the structure members also breaks this optimization. In some cases, it might be possible to recreate it, by simulating triviality in an overloaded copy assignment operator inside the class in question:struct Nontrivial { nontrivial_char_t e[100]; // Functions containing local classes aren't expanded inline, so... struct Trivial { char e[100]; }; void operator =(const Nontrivial &other) { reinterpret_cast<Trivial &>(*this) = ( reinterpret_cast<const Trivial &>(other) ); } };
However, this only generates identical code to the original optimization if passing the
other
parameter can be inlined, which isn't always the case.
switch
statements
- Sequence of the individual cases is identical in both C and ASM
- Multiple cases with the same offset in the table, to code that doesn't
return? Code was compiled with
-O
Function calls
NOP
insertion
Happens for every far
call to outside of the current translation unit, even
if both the caller and callee end up being linked into the same code segment.
Certainty: Seems like there might be a way around that, apart from temporarily spelling out these calls in ASM until both functions are compiled as part of the same translation unit. Found nothing so far, though.
Pushing byte arguments to functions
Borland C++ just pushes the entire word. Will cause IDA to mis-identify
certain local variables as word
s when they aren't.
Pushing pointers
When passing a near
pointer to a function that takes a far
one, the
segment argument is sometimes PUSH
ed immediately, before evaluating the
offset:
#pragma option -ml
struct s100 {
char c[100];
};
extern s100 structs[5];
void __cdecl process(s100 *element);
void foo(int i) {
process((s100 near *)(&structs[i])); // PUSH DS; (AX = offset); PUSH AX;
process((s100 far *)(&structs[i])); // (AX = offset); PUSH DS; PUSH AX;
}
Flags
-Z
(Suppress register reloads)
-
The tracked contents of
ES
are reset after a conditional statement. If the original code had moreLES
instructions than necessary, this indicates a specific layout of conditional branches:struct foo { char a, b; char es_not_reset(); char es_reset(); }; char foo::es_not_reset() { return ( a // LES BX, [bp+this] && b // `this` still remembered in ES, not reloaded ); } char foo::es_reset() { if(a) return 1; // LES BX, [bp+this] // Tracked contents of ES are reset if(b) return 1; // LES BX, [bp+this] return 0; }
-3
(80386 Instructions) + -Z
(Suppress register reloads)
Bundles two consecutive 16-bit function parameters into a single 32-bit one,
passed via a single 32-bit PUSH
. Currently confirmed to happen for literals
and structure members whose memory layout matches the parameter list and
calling convention. Signedness doesn't matter.
Won't happen for two consecutive 8-bit parameters.
// Works for all storage durations
struct { int x, y; } p;
struct { unsigned int x, y; } q;
void __cdecl foo_c(char x, char y);
void __cdecl foo_s(int x, int y);
void __cdecl foo_u(unsigned int x, unsigned int y);
foo_s(640, 400); // PUSH LARGE 1900280h
foo_u(640, 400); // PUSH LARGE 1900280h
foo_s(p.x, p.y); // PUSH LARGE [p]
foo_u(p.x, p.y); // PUSH LARGE [p]
foo_s(q.x, q.y); // PUSH LARGE [p]
foo_u(q.x, q.y); // PUSH LARGE [p]
foo_c(100, 200); // PUSH 200; PUSH 100
-O
(Optimize jumps)
Also merges individual ADD SP, imm8
or POP CX
stack-clearing instructions
after __cdecl
function calls into a single one with their combined parameter
size.
Inhibited by:
-
identical variable declarations within more than one scope – the optimizer will only merge the code after the last ASM reference to that declared variable. Yes, even though the emitted ASM would be identical:
if(a) { int v = set_v(); do_something_else(); use(v); } else if(underline) { // Second declaration of [v]. Even though it's assigned to the same stack // offset, the second `PUSH w` call will still be emitted separately. // Thus, jump optimization only reuses the `CALL use` instruction. // Move the `int v;` declaraion to the beginning of the function to avoid // this. int v = set_v(); use(v); }
-
distinct instances of assignments of local variables in registers to itself
-
inlined calls to empty functions
Inlining
Always worth a try to get rid of a potential macro. Some edge cases don't inline optimally though:
- Assignments to a pointer in
SI
– that pointer is moved toDI
, clobbering that register. Try a class method instead. - Nested
if
statements – inlining will always generate a uselessJMP SHORT $+2
at the end of the last branch.
Initialization
Any initialization of a variable with static storage duration (even a const
one) that involves function calls (even those that would regularly inline)
will emit a #pragma startup
function to perform that initialization at
runtime.
This extends to C++ constructors, making macros the only way to initialize
such variables with arithmetic expressions at compile time.
#define FOO(x) (x << 1)
inline char foo(const char x) {
return FOO(x);
}
const char static_storage[3] = { FOO(1), foo(2), FOO(3) };
Resulting ASM (abbreviated):
.data
static_storage db 2, 0, 6
.code
@_STCON_$qv proc near
push bp
mov bp, sp
mov static_storage[1], 4
pop bp
ret
@_STCON_$qv endp
C++
Class methods inline to their ideal representation if all of these are true:
- returns
void
|| (returns*this
&& is at the first nesting level of inlining) - takes no parameters || takes only built-in, scalar-type parameters
Examples:
-
A class method (first nesting level) calling an overloaded operator (second nesting level) returning
*this
will generate (needless) instructions equivalent toMOV AX, *this
. Thus, any overloaded=
,+=
,-=
, etc. operator should always returnvoid
.Certainty: See the examples in
9d121c7
. This is what allows us to use custom types with overloaded assignment operators, with the resulting code generation being indistinguishable from equivalent C preprocessor macros. -
Returning anything else but
void
or*this
will first store that result inAX
, leading any branches at the call site to then refer toAX
.Certainty: Maybe Borland (not Turbo) C++ has an optimization option against it?
Boilerplate for constructors defined outside the class declaration
struct MyClass {
// Members…
MyClass();
};
MyClass::MyClass() {
// Initialization…
}
Resulting ASM:
; MyClass::MyClass(MyClass* this)
; Exact instructions differ depending on the memory model. Model-independent
; ASM instructions are in UPPERCASE.
@MyClass@$bctr$qv proc
PUSH BP
MOV BP, SP
; (saving SI and DI, if used in constructor code)
; (if this, 0)
JNZ @@ctor_code
PUSH sizeof(MyClass)
CALL @$bnew$qui ; operator new(uint)
POP CX
; (this = value_returned_from_new)
; (if this)
JZ @@ret
@@ctor_code:
; Initialization…
@@ret:
; (retval = this)
; (restoring DI and SI, if used in constructor code)
POP BP
RET
@MyClass@$bctr$qv endp
Limits of decompilability
MOV BX, SP
-style functions, or others with no standard stack frame
These almost certainly weren't compiled from C. By disabling stack frames
using #pragma option -k-
, it might be possible to still get the exact same
code out of Turbo C++ – even though it will most certainly look horrible, and
barely more readable than assembly (or even less so), with tons of inline ASM
and register pseudovariables. However, it's futile to even try if the function
contains one of the following:
-
A reference to the
DI
register. In that case, Turbo C++ always inserts aPUSH DI
at the beginning (before theMOV BX, SP
), and aPOP DI
before returning.Certainty: Confirmed through reverse-engineering
TCC.EXE
, no way around it.
Compiler bugs
-
Dereferencing a
far
pointer constructed from the_FS
and_GS
pseudoregisters emits wrong segment prefix opcodes – 0x46 (INC SI
) and 0x4E (DEC SI
) rather than the correct 0x64 and 0x65, respectively.Workaround: Not happening when compiling via TASM (
-B
on the command line, or#pragma inline
).