flatbuffers/docs/html/md__internals.html

170 lines
14 KiB
HTML

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/xhtml;charset=UTF-8"/>
<meta http-equiv="X-UA-Compatible" content="IE=9"/>
<meta name="generator" content="Doxygen 1.8.7"/>
<title>FlatBuffers: FlatBuffer Internals</title>
<link href="tabs.css" rel="stylesheet" type="text/css"/>
<script type="text/javascript" src="jquery.js"></script>
<script type="text/javascript" src="dynsections.js"></script>
<link href="navtree.css" rel="stylesheet" type="text/css"/>
<script type="text/javascript" src="resize.js"></script>
<script type="text/javascript" src="navtree.js"></script>
<script type="text/javascript">
$(document).ready(initResizable);
$(window).load(resizeHeight);
</script>
<link href="doxygen.css" rel="stylesheet" type="text/css" />
</head>
<body>
<div id="top"><!-- do not remove this div, it is closed by doxygen! -->
<div id="titlearea">
<table cellspacing="0" cellpadding="0">
<tbody>
<tr style="height: 56px;">
<td style="padding-left: 0.5em;">
<div id="projectname">FlatBuffers
</div>
</td>
</tr>
</tbody>
</table>
</div>
<!-- end header part -->
<!-- Generated by Doxygen 1.8.7 -->
</div><!-- top -->
<div id="side-nav" class="ui-resizable side-nav-resizable">
<div id="nav-tree">
<div id="nav-tree-contents">
<div id="nav-sync" class="sync"></div>
</div>
</div>
<div id="splitbar" style="-moz-user-select:none;"
class="ui-resizable-handle">
</div>
</div>
<script type="text/javascript">
$(document).ready(function(){initNavTree('md__internals.html','');});
</script>
<div id="doc-content">
<div class="header">
<div class="headertitle">
<div class="title">FlatBuffer Internals </div> </div>
</div><!--header-->
<div class="contents">
<div class="textblock"><p>This section is entirely optional for the use of FlatBuffers. In normal usage, you should never need the information contained herein. If you're interested however, it should give you more of an appreciation of why FlatBuffers is both efficient and convenient.</p>
<h3>Format components</h3>
<p>A FlatBuffer is a binary file and in-memory format consisting mostly of scalars of various sizes, all aligned to their own size. Each scalar is also always represented in little-endian format, as this corresponds to all commonly used CPUs today. FlatBuffers will also work on big-endian machines, but will be slightly slower because of additional byte-swap intrinsics.</p>
<p>On purpose, the format leaves a lot of details about where exactly things live in memory undefined, e.g. fields in a table can have any order, and objects to some extend can be stored in many orders. This is because the format doesn't need this information to be efficient, and it leaves room for optimization and extension (for example, fields can be packed in a way that is most compact). Instead, the format is defined in terms of offsets and adjacency only. This may mean two different implementations may produce different binaries given the same input values, and this is perfectly valid.</p>
<h3>Format identification</h3>
<p>The format also doesn't contain information for format identification and versioning, which is also by design. FlatBuffers is a statically typed system, meaning the user of a buffer needs to know what kind of buffer it is. FlatBuffers can of course be wrapped inside other containers where needed, or you can use its union feature to dynamically identify multiple possible sub-objects stored. Additionally, it can be used together with the schema parser if full reflective capabilities are desired.</p>
<p>Versioning is something that is intrinsically part of the format (the optionality / extensibility of fields), so the format itself does not need a version number (it's a meta-format, in a sense). We're hoping that this format can accommodate all data needed. If format breaking changes are ever necessary, it would become a new kind of format rather than just a variation.</p>
<h3>Offsets</h3>
<p>The most important and generic offset type (see <code>flatbuffers.h</code>) is <code>uoffset_t</code>, which is currently always a <code>uint32_t</code>, and is used to refer to all tables/unions/strings/vectors (these are never stored in-line). 32bit is intentional, since we want to keep the format binary compatible between 32 and 64bit systems, and a 64bit offset would bloat the size for almost all uses. A version of this format with 64bit (or 16bit) offsets is easy to set when needed. Unsigned means they can only point in one direction, which typically is forward (towards a higher memory location). Any backwards offsets will be explicitly marked as such.</p>
<p>The format starts with an <code>uoffset_t</code> to the root object in the buffer.</p>
<p>We have two kinds of objects, structs and tables.</p>
<h3>Structs</h3>
<p>These are the simplest, and as mentioned, intended for simple data that benefits from being extra efficient and doesn't need versioning / extensibility. They are always stored inline in their parent (a struct, table, or vector) for maximum compactness. Structs define a consistent memory layout where all components are aligned to their size, and structs aligned to their largest scalar member. This is done independent of the alignment rules of the underlying compiler to guarantee a cross platform compatible layout. This layout is then enforced in the generated code.</p>
<h3>Tables</h3>
<p>These start with an <code>soffset_t</code> to a vtable. This is a signed version of <code>uoffset_t</code>, since vtables may be stored anywhere relative to the object. This offset is substracted (not added) from the object start to arrive at the vtable start. This offset is followed by all the fields as aligned scalars (or offsets). Unlike structs, not all fields need to be present. There is no set order and layout.</p>
<p>To be able to access fields regardless of these uncertainties, we go through a vtable of offsets. Vtables are shared between any objects that happen to have the same vtable values.</p>
<p>The elements of a vtable are all of type <code>voffset_t</code>, which is a <code>uint16_t</code>. The first element is the size of the vtable in bytes, including the size element. The second one is the size of the object, in bytes (including the vtable offset). This size could be used for streaming, to know how many bytes to read to be able to access all fields of the object. The remaining elements are the N offsets, where N is the amount of fields declared in the schema when the code that constructed this buffer was compiled (thus, the size of the table is N + 2).</p>
<p>All accessor functions in the generated code for tables contain the offset into this table as a constant. This offset is checked against the first field (the number of elements), to protect against newer code reading older data. If this offset is out of range, or the vtable entry is 0, that means the field is not present in this object, and the default value is return. Otherwise, the entry is used as offset to the field to be read.</p>
<h3>Strings and Vectors</h3>
<p>Strings are simply a vector of bytes, and are always null-terminated. Vectors are stored as contiguous aligned scalar elements prefixed by a 32bit element count (not including any null termination).</p>
<h3>Construction</h3>
<p>The current implementation constructs these buffers backwards (starting at the highest memory address of the buffer), since that significantly reduces the amount of bookkeeping and simplifies the construction API.</p>
<h3>Code example</h3>
<p>Here's an example of the code that gets generated for the <code>samples/monster.fbs</code>. What follows is the entire file, broken up by comments: </p><pre class="fragment">// automatically generated, do not modify
#include "flatbuffers/flatbuffers.h"
namespace MyGame {
namespace Sample {
</pre><p>Nested namespace support. </p><pre class="fragment">enum {
Color_Red = 0,
Color_Green = 1,
Color_Blue = 2,
};
inline const char **EnumNamesColor() {
static const char *names[] = { "Red", "Green", "Blue", nullptr };
return names;
}
inline const char *EnumNameColor(int e) { return EnumNamesColor()[e]; }
</pre><p>Enums and convenient reverse lookup. </p><pre class="fragment">enum {
Any_NONE = 0,
Any_Monster = 1,
};
inline const char **EnumNamesAny() {
static const char *names[] = { "NONE", "Monster", nullptr };
return names;
}
inline const char *EnumNameAny(int e) { return EnumNamesAny()[e]; }
</pre><p>Unions share a lot with enums. </p><pre class="fragment">struct Vec3;
struct Monster;
</pre><p>Predeclare all data types since circular references between types are allowed (circular references between object are not, though). </p><pre class="fragment">MANUALLY_ALIGNED_STRUCT(4) Vec3 {
private:
float x_;
float y_;
float z_;
public:
Vec3(float x, float y, float z)
: x_(flatbuffers::EndianScalar(x)), y_(flatbuffers::EndianScalar(y)), z_(flatbuffers::EndianScalar(z)) {}
float x() const { return flatbuffers::EndianScalar(x_); }
float y() const { return flatbuffers::EndianScalar(y_); }
float z() const { return flatbuffers::EndianScalar(z_); }
};
STRUCT_END(Vec3, 12);
</pre><p>These ugly macros do a couple of things: they turn off any padding the compiler might normally do, since we add padding manually (though none in this example), and they enforce alignment chosen by FlatBuffers. This ensures the layout of this struct will look the same regardless of compiler and platform. Note that the fields are private: this is because these store little endian scalars regardless of platform (since this is part of the serialized data). <code>EndianScalar</code> then converts back and forth, which is a no-op on all current mobile and desktop platforms, and a single machine instruction on the few remaining big endian platforms. </p><pre class="fragment">struct Monster : private flatbuffers::Table {
const Vec3 *pos() const { return GetStruct&lt;const Vec3 *&gt;(4); }
int16_t mana() const { return GetField&lt;int16_t&gt;(6, 150); }
int16_t hp() const { return GetField&lt;int16_t&gt;(8, 100); }
const flatbuffers::String *name() const { return GetPointer&lt;const flatbuffers::String *&gt;(10); }
const flatbuffers::Vector&lt;uint8_t&gt; *inventory() const { return GetPointer&lt;const flatbuffers::Vector&lt;uint8_t&gt; *&gt;(14); }
int8_t color() const { return GetField&lt;int8_t&gt;(16, 2); }
};
</pre><p>Tables are a bit more complicated. A table accessor struct is used to point at the serialized data for a table, which always starts with an offset to its vtable. It derives from <code>Table</code>, which contains the <code>GetField</code> helper functions. GetField takes a vtable offset, and a default value. It will look in the vtable at that offset. If the offset is out of bounds (data from an older version) or the vtable entry is 0, the field is not present and the default is returned. Otherwise, it uses the entry as an offset into the table to locate the field. </p><pre class="fragment">struct MonsterBuilder {
flatbuffers::FlatBufferBuilder &amp;fbb_;
flatbuffers::uoffset_t start_;
void add_pos(const Vec3 *pos) { fbb_.AddStruct(4, pos); }
void add_mana(int16_t mana) { fbb_.AddElement&lt;int16_t&gt;(6, mana, 150); }
void add_hp(int16_t hp) { fbb_.AddElement&lt;int16_t&gt;(8, hp, 100); }
void add_name(flatbuffers::Offset&lt;flatbuffers::String&gt; name) { fbb_.AddOffset(10, name); }
void add_inventory(flatbuffers::Offset&lt;flatbuffers::Vector&lt;uint8_t&gt;&gt; inventory) { fbb_.AddOffset(14, inventory); }
void add_color(int8_t color) { fbb_.AddElement&lt;int8_t&gt;(16, color, 2); }
MonsterBuilder(flatbuffers::FlatBufferBuilder &amp;_fbb) : fbb_(_fbb) { start_ = fbb_.StartTable(); }
flatbuffers::Offset&lt;Monster&gt; Finish() { return flatbuffers::Offset&lt;Monster&gt;(fbb_.EndTable(start_, 7)); }
};
</pre><p><code>MonsterBuilder</code> is the base helper struct to construct a table using a <code>FlatBufferBuilder</code>. You can add the fields in any order, and the <code>Finish</code> call will ensure the correct vtable gets generated. </p><pre class="fragment">inline flatbuffers::Offset&lt;Monster&gt; CreateMonster(flatbuffers::FlatBufferBuilder &amp;_fbb, const Vec3 *pos, int16_t mana, int16_t hp, flatbuffers::Offset&lt;flatbuffers::String&gt; name, flatbuffers::Offset&lt;flatbuffers::Vector&lt;uint8_t&gt;&gt; inventory, int8_t color) {
MonsterBuilder builder_(_fbb);
builder_.add_inventory(inventory);
builder_.add_name(name);
builder_.add_pos(pos);
builder_.add_hp(hp);
builder_.add_mana(mana);
builder_.add_color(color);
return builder_.Finish();
}
</pre><p><code>CreateMonster</code> is a convenience function that calls all functions in <code>MonsterBuilder</code> above for you. Note that if you pass values which are defaults as arguments, it will not actually construct that field, so you can probably use this function instead of the builder class in almost all cases. </p><pre class="fragment">inline const Monster *GetMonster(const void *buf) { return flatbuffers::GetRoot&lt;Monster&gt;(buf); }
</pre><p>This function is only generated for the root table type, to be able to start traversing a FlatBuffer from a raw buffer pointer. </p><pre class="fragment">}; // namespace MyGame
}; // namespace Sample</pre> </div></div><!-- contents -->
</div><!-- doc-content -->
<!-- Google Analytics -->
<script>
(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
})(window,document,'script','//www.google-analytics.com/analytics.js','ga');
ga('create', 'UA-49880327-7', 'auto');
ga('send', 'pageview');
</script>
</body>
</html>