Documentation changes to clarify FlatBuffer internals.
Change-Id: I3759a07385f0d8d172ca2f88ac1759b71bee5a6a
This commit is contained in:
parent
41a6d35e74
commit
66de19ace8
|
@ -61,20 +61,20 @@ $(document).ready(function(){initNavTree('md__internals.html','');});
|
|||
<p>The format also doesn't contain information for format identification and versioning, which is also by design. FlatBuffers is a statically typed system, meaning the user of a buffer needs to know what kind of buffer it is. FlatBuffers can of course be wrapped inside other containers where needed, or you can use its union feature to dynamically identify multiple possible sub-objects stored. Additionally, it can be used together with the schema parser if full reflective capabilities are desired.</p>
|
||||
<p>Versioning is something that is intrinsically part of the format (the optionality / extensibility of fields), so the format itself does not need a version number (it's a meta-format, in a sense). We're hoping that this format can accommodate all data needed. If format breaking changes are ever necessary, it would become a new kind of format rather than just a variation.</p>
|
||||
<h3>Offsets</h3>
|
||||
<p>The most important and generic offset type (see <code>flatbuffers.h</code>) is <code>offset_t</code>, which is currently always a <code>uint32_t</code>, and is used to refer to all tables/unions/strings/vectors. 32bit is intentional, since we want to keep the format binary compatible between 32 and 64bit systems, and a 64bit offset would bloat the size for almost all uses. A version of this format with 64bit (or 16bit) offsets is easy to set when needed. Unsigned means they can only point in one direction, which typically is forward (towards a higher memory location). Any backwards offsets will be explicitly marked as such.</p>
|
||||
<p>The format starts with an <code>offset_t</code> to the root object in the buffer.</p>
|
||||
<p>The most important and generic offset type (see <code>flatbuffers.h</code>) is <code>uoffset_t</code>, which is currently always a <code>uint32_t</code>, and is used to refer to all tables/unions/strings/vectors (these are never stored in-line). 32bit is intentional, since we want to keep the format binary compatible between 32 and 64bit systems, and a 64bit offset would bloat the size for almost all uses. A version of this format with 64bit (or 16bit) offsets is easy to set when needed. Unsigned means they can only point in one direction, which typically is forward (towards a higher memory location). Any backwards offsets will be explicitly marked as such.</p>
|
||||
<p>The format starts with an <code>uoffset_t</code> to the root object in the buffer.</p>
|
||||
<p>We have two kinds of objects, structs and tables.</p>
|
||||
<h3>Structs</h3>
|
||||
<p>These are the simplest, and as mentioned, intended for simple data that benefits from being extra efficient and doesn't need versioning / extensibility. They are always stored inline in their parent (a struct, table, or vector) for maximum compactness. Structs define a consistent memory layout where all components are aligned to their size, and structs aligned to their largest scalar member. This is done independent of the alignment rules of the underlying compiler to guarantee a cross platform compatible layout. This layout is then enforced in the generated code.</p>
|
||||
<h3>Tables</h3>
|
||||
<p>These start with an <code>soffset_t</code> to a vtable (signed version of <code>offset_t</code>, since vtables may be stored anywhere), followed by all the fields as aligned scalars. Unlike structs, not all fields need to be present. There is no set order and layout.</p>
|
||||
<p>These start with an <code>soffset_t</code> to a vtable (signed version of <code>uoffset_t</code>, since vtables may be stored anywhere), followed by all the fields as aligned scalars (or offsets). Unlike structs, not all fields need to be present. There is no set order and layout.</p>
|
||||
<p>To be able to access fields regardless of these uncertainties, we go through a vtable of offsets. Vtables are shared between any objects that happen to have the same vtable values.</p>
|
||||
<p>The elements of a vtable are all of type <code>voffset_t</code>, which is currently a <code>uint16_t</code>. The first element is the number of elements of the vtable, including this one. The second one is the size of the object, in bytes (including the vtable offset). This size is used for streaming, to know how many bytes to read to be able to access all fields of the object. The remaining elements are N the offsets, where N is the amount of field declared in the schema when the code that constructed this buffer was compiled (thus, the size of the table is N + 2).</p>
|
||||
<p>The elements of a vtable are all of type <code>voffset_t</code>, which is a <code>uint16_t</code>. The first element is the number of elements of the vtable, including this one. The second one is the size of the object, in bytes (including the vtable offset). This size is used for streaming, to know how many bytes to read to be able to access all fields of the object. The remaining elements are the N offsets, where N is the amount of fields declared in the schema when the code that constructed this buffer was compiled (thus, the size of the table is N + 2).</p>
|
||||
<p>All accessor functions in the generated code for tables contain the offset into this table as a constant. This offset is checked against the first field (the number of elements), to protect against newer code reading older data. If this offset is out of range, or the vtable entry is 0, that means the field is not present in this object, and the default value is return. Otherwise, the entry is used as offset to the field to be read.</p>
|
||||
<h3>Strings and Vectors</h3>
|
||||
<p>Strings are simply a vector of bytes, and are always null-terminated. Vectors are stored as contiguous aligned scalar elements prefixed by a count.</p>
|
||||
<p>Strings are simply a vector of bytes, and are always null-terminated. Vectors are stored as contiguous aligned scalar elements prefixed by a 32bit element count (not including any null termination).</p>
|
||||
<h3>Construction</h3>
|
||||
<p>The current implementation constructs these buffers backwards, since that significantly reduces the amount of bookkeeping and simplifies the construction API.</p>
|
||||
<p>The current implementation constructs these buffers backwards (starting at the highest memory address of the buffer), since that significantly reduces the amount of bookkeeping and simplifies the construction API.</p>
|
||||
<h3>Code example</h3>
|
||||
<p>Here's an example of the code that gets generated for the <code>samples/monster.fbs</code>. What follows is the entire file, broken up by comments: </p><pre class="fragment">// automatically generated, do not modify
|
||||
|
||||
|
|
|
@ -97,7 +97,7 @@ root_type Monster;
|
|||
<li>16 bit: <code>short ushort</code></li>
|
||||
<li>32 bit: <code>int uint float</code></li>
|
||||
<li>64 bit: <code>long ulong double</code></li>
|
||||
<li>Vector of any other type (denoted with <code>[type]</code>). Nesting vectors require you wrap the inner vector in a struct/table rather than writing <code>[[type]]</code>.</li>
|
||||
<li>Vector of any other type (denoted with <code>[type]</code>). Nesting vectors is not supported, instead you can wrap the inner vector in a table.</li>
|
||||
<li><code>string</code>, which may only hold UTF-8 or 7-bit ASCII. For other text encodings or general binary data use vectors (<code>[byte]</code> or <code>[ubyte]</code>) instead.</li>
|
||||
<li>References to other tables or structs, enums or unions (see below).</li>
|
||||
</ul>
|
||||
|
|
|
@ -43,8 +43,9 @@ than just a variation.
|
|||
### Offsets
|
||||
|
||||
The most important and generic offset type (see `flatbuffers.h`) is
|
||||
`offset_t`, which is currently always a `uint32_t`, and is used to
|
||||
refer to all tables/unions/strings/vectors. 32bit is
|
||||
`uoffset_t`, which is currently always a `uint32_t`, and is used to
|
||||
refer to all tables/unions/strings/vectors (these are never stored
|
||||
in-line). 32bit is
|
||||
intentional, since we want to keep the format binary compatible between
|
||||
32 and 64bit systems, and a 64bit offset would bloat the size for almost
|
||||
all uses. A version of this format with 64bit (or 16bit) offsets is easy to set
|
||||
|
@ -52,7 +53,7 @@ when needed. Unsigned means they can only point in one direction, which
|
|||
typically is forward (towards a higher memory location). Any backwards
|
||||
offsets will be explicitly marked as such.
|
||||
|
||||
The format starts with an `offset_t` to the root object in the buffer.
|
||||
The format starts with an `uoffset_t` to the root object in the buffer.
|
||||
|
||||
We have two kinds of objects, structs and tables.
|
||||
|
||||
|
@ -71,20 +72,20 @@ code.
|
|||
### Tables
|
||||
|
||||
These start with an `soffset_t` to a vtable (signed version of
|
||||
`offset_t`, since vtables may be stored anywhere), followed by all the
|
||||
fields as aligned scalars. Unlike structs, not all fields need to be
|
||||
present. There is no set order and layout.
|
||||
`uoffset_t`, since vtables may be stored anywhere), followed by all the
|
||||
fields as aligned scalars (or offsets). Unlike structs, not all fields
|
||||
need to be present. There is no set order and layout.
|
||||
|
||||
To be able to access fields regardless of these uncertainties, we go
|
||||
through a vtable of offsets. Vtables are shared between any objects that
|
||||
happen to have the same vtable values.
|
||||
|
||||
The elements of a vtable are all of type `voffset_t`, which is currently
|
||||
The elements of a vtable are all of type `voffset_t`, which is
|
||||
a `uint16_t`. The first element is the number of elements of the vtable,
|
||||
including this one. The second one is the size of the object, in bytes
|
||||
(including the vtable offset). This size is used for streaming, to know
|
||||
how many bytes to read to be able to access all fields of the object.
|
||||
The remaining elements are N the offsets, where N is the amount of field
|
||||
The remaining elements are the N offsets, where N is the amount of fields
|
||||
declared in the schema when the code that constructed this buffer was
|
||||
compiled (thus, the size of the table is N + 2).
|
||||
|
||||
|
@ -100,11 +101,13 @@ field to be read.
|
|||
|
||||
Strings are simply a vector of bytes, and are always
|
||||
null-terminated. Vectors are stored as contiguous aligned scalar
|
||||
elements prefixed by a count.
|
||||
elements prefixed by a 32bit element count (not including any
|
||||
null termination).
|
||||
|
||||
### Construction
|
||||
|
||||
The current implementation constructs these buffers backwards, since
|
||||
The current implementation constructs these buffers backwards (starting
|
||||
at the highest memory address of the buffer), since
|
||||
that significantly reduces the amount of bookkeeping and simplifies the
|
||||
construction API.
|
||||
|
||||
|
|
|
@ -88,8 +88,7 @@ Builtin scalar types are:
|
|||
- 64 bit: `long ulong double`
|
||||
|
||||
- Vector of any other type (denoted with `[type]`). Nesting vectors
|
||||
require you wrap the inner vector in a struct/table rather than
|
||||
writing `[[type]]`.
|
||||
is not supported, instead you can wrap the inner vector in a table.
|
||||
|
||||
- `string`, which may only hold UTF-8 or 7-bit ASCII. For other text encodings
|
||||
or general binary data use vectors (`[byte]` or `[ubyte]`) instead.
|
||||
|
|
Loading…
Reference in New Issue