Pattern Language
The Pattern Language is ImHex custom built programming language used to create binary patterns/templates. These patterns are applied to a binary data in order to parse it and display the decoded values neatly in a tree-hierarchy. The syntax follows the the same style as other C-like languages and is therefore easy to read, understand, learn and use. This document is meant as an overview of all the features the Pattern Language has.
THIS DOCUMENTATION IS OUTDATED AND ONLY KEPT FOR LEGACY REASONS
Check out the new and much more complete Documentation over at https://imhex.werwolv.net/docs
Table of Contents
- Comments
- Built-in Types
- Endian specification
- Variable Placements
- Arrays
- Structs
- Unions
- Pointers
- Enums
- Bitfields
- Type Aliasing
- Attributes
- Mathematical Expressions
- Conditionals
- Preprocessor
Comments
Comments are a simple way to add documentation or instructions for other developers to your code or to remove parts of it temporarily. There are two styles of comments available: Single line comments and multi line comments.
Single line comments are started with //
two forward slashes and will include everything after them until the next new line.
Multi line comments are started with /*
a forward slash followed by a star and will include everything after them until a */
star followed by a forward slash are found. Multiple multiline comments cannot be nested.
/* This is a
multi line
comment
*/
// This is a single line comment
Built-in Types
Built-in types are the fundamental types used in the language. Supported are various unsigned types, signed types, floating point types as well as a few special types.
Unsigned Types: u8
, u16
, u32
, u64
, u128
Signed Types: s8
, s16
, s32
, s64
, s128
Floating point Types: float
, double
Special Types: char
, bool
Unsigned and signed types denote their size in bits in the name of the type. s8
is 1 byte long, u32
is 4 bytes long and so on.
Floating point types use the same sizes and encodings as their host system which in most cases is 32 bit for floats and 64 bit for doubles with the IEEE 754 encoding.
The special types char
and bool
are both one byte long and for the most part the same as s8
and u8
. The only difference is, they produce a more relevant output in the pattern data view.
Endian specification
Every type may be prefixed with either be
or le
to set if this variable should be treated as big endian or little endian.
be u32 bigEndianVariable @ 0x00;
le u32 littleEndianVariable @ 0x00;
u32 defaultEndianVariable @ 0x00;
Variable Placements
To get started with extracting data from binary data, variables need to be defined and they need to be placed at some offset within the data. This is done using the following syntax:
<type> <variableName> @ <expression>;
// Example
u32 headerMagic @ 0x00;
s8 type @ 0x1234;
Doing this will cause 4 bytes at address 0x00
to 0x03
to be parsed as an unsigned 32 bit value and 1 byte at offset 0x1234
to be parsed as an unsigned 8 bit value. These results will then be displayed in the Pattern Data View within ImHex.
Arrays
Arrays are used to parse a list of values that all share the same type and are placed contiguously in memory. To place an array at a specific offset, again the variable placement syntax may be used in combination with the array syntax.
<type> <variableName>[<expression>] @ <expression>;
// Example
u32 ids[0x100] @ 0x50;
This will cause a new branch node to appear which contains the decoded values of all entries within the array.
Strings
Strings are a special kind of array that do not necessarily need to have a size specified. They can be created by specifying a array of char
s
char sizedString[13];
char unsizedString[];
If no size is specified, string will end at the next null terminator 0x00
.
Structs
Structs can be used to group multiple types together in order to form a new type. All members of the struct will be placed right after each other in memory with no padding inserted between them. Therefore the size of the complete struct will be the sizes of all members summed up.
struct <typeName> {
<variableDeclaration>
...
};
// Example
struct Header {
u8 magic[4];
u32 type;
bool flag;
};
Header header @ 0x00;
This code will create a new type named Header
which again may be placed at any point in memory using the variable placement syntax. Multiple structs can also be nested to create more complex types all of which create a new branch node in the Pattern Data View.
Padding
If padding between members is needed, it may be manually inserted using the padding
keyword.
struct PaddedData {
u8 index;
padding[7];
u64 height;
u32 checksum;
};
This will create a 7 byte gap between the index
and height
member which will not be displayed in the Pattern Data View
Unions
Syntactically, unions look and work exactly the same as structs. The difference however is that all members are placed at the same address on top of each other in contrast to the struct where all members are placed after each other (The same as in C/C++). Therefore the size of the union will be the size of the biggest member within in union.
union <typeName> {
<variableDeclaration>
...
};
// Example
union Color {
u32 rgba;
u8 components[4];
};
Color color @ 0x100;
Pointers
A pointer is a member that points to another place in memory. It uses the value at its address as an offset from the start of the current data to find the location of the value that it points to.
To define a pointer, first the type of the value being pointed to is specified followed by a *
star and the name of the variable. After the :
colon, the size of the pointer is required. This needs to be an integral, built-in type which specifies what data gets interpreted as an offset.
<typeName> *<pointerName> : <builtinTypeName>;
//Example
struct Child {
u32 value;
};
struct Parent {
Child *child : u16;
};
Parent parent @ 0x200;
Enums
Enums are types whose value is restricted to a distinct number of values. When placed in memory, the Pattern Data View will show the relevant enum entry name instead of the numerical value.
Every enum has an underlying type which is used to specify the size of the enum when placed in memory. u32
will create a 4 byte enum, s8
will create a 1 byte enum.
Every enum entry can be set to a distinct value using the <identifier> = <expression>
syntax as seen below. If no value is specified for an entry, it's value will be the value of the last entry plus one. Counting starts at zero.
enum <typeName> : <builtinTypeName> {
<enumEntry>
...
};
// Example
enum Architecture : u8 {
x86 = 0x20,
x64, // Value 0x21
ARM32 = 0x35,
ARM64 // Value 0x36
};
Architecture arch @ 0x100;
Bitfields
If you're trying to parse a region of memory that is not aligned to the usual 8 bit boundaries or has variables that are smaller than one 8 bits (such as bit flags), a bitfield can be used.
Bitfields allow variables to be specified with a custom number bits used. This is done by using the <identifier> : <expression>
syntax where the identifier before the colon specifies the field name and the expression after the colon the size of the field in bits.
There is no padding inserted between members, however the size of the entire bitfield will be rounded up to the next 8 bit boundary.
bitfield <typeName> {
<bitfieldEntry>
...
};
// Example
bitfield Permission {
r : 1;
w : 1;
x : 1;
};
Permission perm @ 0x20;
Type Aliasing
To give an existing type a new name, a using
declaration can be used. This will not replace the old name of the type with a new one, it will create a new type with a new name that is the same as the old type. Therefore both can be used afterwards.
using <newTypeName> = <oldTypeName>;
// Example
using uint32_t = u32;
using Header = ElfHeader;
Attributes
Attributes are a way to change extra settings about variables.
<type> <variableName> [[attributeName("attributeValue")]];
// Example
struct Test {
u32 magic [[name("Header Magic")]];
u8 type [[comment("Test type")]];
}
Available attributes are:
[[name("New name")]]
- Overrides the name of the variable displayed in the pattern data view
[[color("FF00FFFF")]]
- Overrides the color of the variable. The value is a RGBA8 color
[[comment("Comment")]]
- Adds a comment to a variable that appears as tooltip when hovered over it in the pattern data view
Mathematical Expressions
In any place where a numeric value is required, a mathematical expression can be inserted. This can be as easy as 1 + 1
but can get much more complex as well by accessing values within structs or enum constants. These expressions work the same as in basically every other language as well with the following operators being supported:
a + b
: Additiona - b
: Subtractiona * b
: Multiplicationa / b
: Divisiona % b
: Modulusa >> b
: Bit shift lefta << b
: Bit shift righta & b
: Bitwise ANDa | b
: Bitwise ORa ^ b
: Bitwise XORa == b
: Equality comparisona != b
: Inequality comparisona > b
: Greater-than comparisona >= b
: Greater-than-or-equals comparisona < b
: Less-than comparisona <= b
: Less-than-or-equals comparisona && b
: Boolean ANDa || b
: Boolean ORa ^^ b
: Boolean XORa ? b : c
: Ternary comparison$
: Current offset
Additionally, variable names and the dot .
operator may be used access the value of variables in these expressions.
struct SubHeader {
u16 numEntries;
};
struct Entry {
// ...
};
struct Header {
u32 magic;
SubHeader subHeader;
Entry entries[subHeader.numEntries + 5];
};
To use constants in an expression, the ::
scope resolution operator can be used.
enum Offsets {
Header = 0x00,
SectionList = 0x1000,
StringList = 0x5000
};
Section sections[10] @ Offsets::SectionList;
As seen above, this may be used to create arrays whose size depends on the value of other members and similar things.
Built-in Function calls
Additional functionality for mathematical expressions are provided through built-in functions. All built-in functions take in zero or more numerical values as parameter and return a new numerical value as a result.
ElfHeader header @ findSequence(0, 0x7F, 'E', 'L', 'F');
The following functions are currently supported:
u64 findSequence(u32 index, u8 ... bytes)
- Finds the Nth occurrence (specified by the
index
parameter) of the list ofbytes
provided afterwards. - The address at which this sequence was found will be returned
- Finds the Nth occurrence (specified by the
u(size * 8) readUnsigned(u64 address, u8 size)
- Reads
size
bytes ataddress
and returns their unsigned value - Allowed sizes are 1, 2, 4, 8 and 16
- Reads
s(size * 8) readSigned(u64 address, u8 size)
- Reads
size
bytes ataddress
and returns their signed value - Allowed sizes are 1, 2, 4, 8 and 16
- Reads
u64 addressof(string path)
- Returns the address of a variable
u64 nextAfter(string path)
- Returns the address of the first byte after a variable
u64 alignTo(u64 value, u64 alignment)
- Returns
value
aligned toalignment
- Returns
Conditionals
Sometimes structs may have different members depending on some condition. This is where if
, else
and else if
statements come into play.
Inside structs and unions, these may be used to only evaluate certain members if some condition is met.
enum Type : u8 {
Height,
PValue,
IValue,
DValue
};
struct Message {
Type type;
if (type == Type::Height) {
u8 index;
u32 height;
}
else if (type == PValue || type == IValue)
float value;
else if (type == DValue)
double value;
};
Message messages[10] @ 0x100;
Preprocessor
The pre-processor can be used to modify the source code before it's even being processed by the lexer.
Defines
A define
preprocessor instruction replaces a name with something else.
For example, the following statement will cause the preprocessor to replace every occurrence of the sequence PI
with 3.14159265
. It is not aware of any syntax of the language, it's a simple find and replace.
#define PI 3.14159265
Includes
A include
directive takes the content of another file mentioned in the directive and pastes its content into the current file.
#include <cstdint.hexpat>
// or
#include "cstdint.hexpat"
It can be used to add the content of files found in the includes
folder or any files relative to it.
Pragmas
pragma
s are meta-instructions used to configure the Pattern Language evaluator or ImHex in general. The following pragma
directives are available:
#pragma endian [little|big|native]
- Sets the default endianess of all variables created to big, little or native endian
#pragma MIME <mime/type>
- Sets the MIME type of the files this pattern is relevant for.
- If this file is present in the
patterns
folder and a file is loaded that matches this MIME type, ImHex will ask the user if they want to load this pattern.