From 3a210fb377c36f4f3e33d9434dfccecd89301dab Mon Sep 17 00:00:00 2001 From: WerWolv Date: Fri, 8 Jan 2021 01:31:21 +0100 Subject: [PATCH] Added preprocessor directives, endian specification and strings --- Pattern-Language-Guide.md | 85 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 85 insertions(+) diff --git a/Pattern-Language-Guide.md b/Pattern-Language-Guide.md index 016ce6e..eb4f99e 100644 --- a/Pattern-Language-Guide.md +++ b/Pattern-Language-Guide.md @@ -6,8 +6,10 @@ This document is meant as an overview of all the features the Pattern Language h ## Table of Contents - [Comments](#comments) - [Built-in Types](#built-in-types) +- [Endian specification](#endian-specification) - [Variable Placements](#variable-placements) - [Arrays](#arrays) + - [Strings](#strings) - [Structs](#structs) - [Padding](#padding) - [Unions](#unions) @@ -15,6 +17,12 @@ This document is meant as an overview of all the features the Pattern Language h - [Enums](#enums) - [Bitfields](#bitfields) - [Mathematical Expressions](#mathematical-expressions) + - [Built-in Function calls](#built-in-function-calls) +- [Conditionals](#conditionals) +- [Preprocessor](#preprocessor) + - [Defines](#defines) + - [Includes](#includes) + - [Pragmas](#pragmas) ## Comments @@ -46,6 +54,16 @@ Unsigned and signed types denote their size in bits in the name of the type. `s8 Floating point types use the same sizes and encodings as their host system which in most cases is 32 bit for floats and 64 bit for doubles with the IEEE 754 encoding. The special types `char` and `bool` are both one byte long and for the most part the same as `s8` and `u8`. The only difference is, they produce a more relevant output in the pattern data view. +## Endian specification + +Every type may be prefixed with either `be` or `le` to set if this variable should be treated as big endian or little endian. + +```cpp +be u32 bigEndianVariable @ 0x00; +le u32 littleEndianVariable @ 0x00; +u32 defaultEndianVariable @ 0x00; +``` + ## Variable Placements To get started with extracting data from binary data, variables need to be defined and they need to be placed at some offset within the data. @@ -77,6 +95,19 @@ This will cause a new branch node to appear which contains the decoded values of ![Arrays](https://puu.sh/H4LJc/2702c5d94b.png) +### Strings + +Strings are a special kind of array that do not necessarily need to have a size specified. They can be created by specifying a array of `char`s + +``` +char sizedString[13]; +char unsizedString[]; +``` + +If no size is specified, string will end at the next null terminator `0x00`. + +![Strings](https://puu.sh/H4NYy/374ec8af26.png) + ## Structs Structs can be used to group multiple types together in order to form a new type. All members of the struct will be placed right after each other in memory with no padding inserted between them. Therefore the size of the complete struct will be the sizes of all members summed up. @@ -275,6 +306,26 @@ Section sections[10] @ Offsets::SectionList; As seen above, this may be used to create arrays whose size depends on the value of other members and similar things. +### Built-in Function calls + +Additional functionality for mathematical expressions are provided through built-in functions. All built-in functions take in zero or more numerical values as parameter and return a new numerical value as a result. + +```cpp +ElfHeader header @ findSequence(0, 0x7F, 'E', 'L', 'F'); +``` + +The following functions are currently supported: + +- `u64 findSequence(u32 index, u8 ... bytes)` + - Finds the Nth occurrence (specified by the `index` parameter) of the list of `bytes` provided afterwards. + - The address at which this sequence was found will be returned +- `u(size * 8) readUnsigned(u64 address, u8 size)` + - Reads `size` bytes at `address` and returns their unsigned value + - Allowed sizes are 1, 2, 4, 8 and 16 +- `s(size * 8) readSigned(u64 address, u8 size)` + - Reads `size` bytes at `address` and returns their signed value + - Allowed sizes are 1, 2, 4, 8 and 16 + ## Conditionals Sometimes structs may have different members depending on some condition. This is where `if`, `else` and `else if` statements come into play. @@ -303,3 +354,37 @@ struct Message { Message messages[10] @ 0x100; ``` +## Preprocessor + +The pre-processor can be used to modify the source code before it's even being processed by the lexer. + +### Defines + +A `define` preprocessor instruction replaces a name with something else. + +For example, the following statement will cause the preprocessor to replace every occurrence of the sequence `PI` with `3.14159265`. It is not aware of any syntax of the language, it's a simple find and replace. +```cpp +#define PI 3.14159265 +``` + +### Includes + +A `include` directive takes the content of another file mentioned in the directive and pastes its content into the current file. + +```cpp +#include +// or +#include "cstdint.hexpat" +``` + +It can be used to add the content of files found in the `includes` folder or any files relative to it. + +### Pragmas + +`pragma`s are meta-instructions used to configure the Pattern Language evaluator or ImHex in general. The following `pragma` directives are available: + +- `#pragma endian [little|big|native]` + - Sets the default endianess of all variables created to big, little or native endian +- `#pragma MIME ` + - Sets the MIME type of the files this pattern is relevant for. + - If this file is present in the `patterns` folder and a file is loaded that matches this MIME type, ImHex will ask the user if they want to load this pattern. \ No newline at end of file