128 lines
5.6 KiB
Markdown
128 lines
5.6 KiB
Markdown
|
# FlatBuffers white paper
|
||
|
|
||
|
This document tries to shed some light on to the "why" of FlatBuffers, a
|
||
|
new serialization library.
|
||
|
|
||
|
## Motivation
|
||
|
|
||
|
Back in the good old days, performance was all about instructions and
|
||
|
cycles. Nowadays, processing units have run so far ahead of the memory
|
||
|
subsystem, that making an efficient application should start and finish
|
||
|
with thinking about memory. How much you use of it. How you lay it out
|
||
|
and access it. How you allocate it. When you copy it.
|
||
|
|
||
|
Serialization is a pervasive activity in a lot programs, and a common
|
||
|
source of memory inefficiency, with lots of temporary data structures
|
||
|
needed to parse and represent data, and inefficient allocation patterns
|
||
|
and locality.
|
||
|
|
||
|
If it would be possible to do serialization with no temporary objects,
|
||
|
no additional allocation, no copying, and good locality, this could be
|
||
|
of great value. The reason serialization systems usually don't manage
|
||
|
this is because it goes counter to forwards/backwards compatability, and
|
||
|
platform specifics like endianness and alignment.
|
||
|
|
||
|
FlatBuffers is what you get if you try anyway.
|
||
|
|
||
|
In particular, FlatBuffers focus is on mobile hardware (where memory
|
||
|
size and memory bandwidth is even more constrained than on desktop
|
||
|
hardware), and applications that have the highest performance needs:
|
||
|
games.
|
||
|
|
||
|
## FlatBuffers
|
||
|
|
||
|
*This is a summary of FlatBuffers functionality, with some rationale.
|
||
|
A more detailed description can be found in the FlatBuffers
|
||
|
documentation.*
|
||
|
|
||
|
### Summary
|
||
|
|
||
|
A FlatBuffer is a binary buffer containing nested objects (structs,
|
||
|
tables, vectors,..) organized using offsets so that the data can be
|
||
|
traversed in-place just like any pointer-based data structure. Unlike
|
||
|
most in-memory data structures however, it uses strict rules of
|
||
|
alignment and endianness (always little) to ensure these buffers are
|
||
|
cross platform. Additionally, for objects that are tables, FlatBuffers
|
||
|
provides forwards/backwards compatibility and general optionality of
|
||
|
fields, to support most forms of format evolution.
|
||
|
|
||
|
You define your object types in a schema, which can then be compiled to
|
||
|
C++ or Java for low to zero overhead reading & writing.
|
||
|
Optionally, JSON data can be dynamically parsed into buffers.
|
||
|
|
||
|
### Tables
|
||
|
|
||
|
Tables are the cornerstone of FlatBuffers, since format evolution is
|
||
|
essential for most applications of serialization. Typically, dealing
|
||
|
with format changes is something that can be done transparently during
|
||
|
the parsing process of most serialization solutions out there.
|
||
|
But a FlatBuffer isn't parsed before it is accessed.
|
||
|
|
||
|
Tables get around this by using an extra indirection to access fields,
|
||
|
through a *vtable*. Each table comes with a vtable (which may be shared
|
||
|
between multiple tables with the same layout), and contains information
|
||
|
where fields for this particular kind of instance of vtable are stored.
|
||
|
The vtable may also indicate that the field is not present (because this
|
||
|
FlatBuffer was written with an older version of the software, of simply
|
||
|
because the information was not necessary for this instance, or deemed
|
||
|
deprecated), in which case a default value is returned.
|
||
|
|
||
|
Tables have a low overhead in memory (since vtables are small and
|
||
|
shared) and in access cost (an extra indirection), but provide great
|
||
|
flexibility. Tables may even cost less memory than the equivalent
|
||
|
struct, since fields do not need to be stored when they are equal to
|
||
|
their default.
|
||
|
|
||
|
FlatBuffers additionally offers "naked" structs, which do not offer
|
||
|
forwards/backwards compatibility, but can be even smaller (useful for
|
||
|
very small objects that are unlikely to change, like e.g. a coordinate
|
||
|
pair or a RGBA color).
|
||
|
|
||
|
### Schemas
|
||
|
|
||
|
While schemas reduce some generality (you can't just read any data
|
||
|
without having its schema), they have a lot of upsides:
|
||
|
|
||
|
- Most information about the format can be factored into the generated
|
||
|
code, reducing memory needed to store data, and time to access it.
|
||
|
|
||
|
- The strong typing of the data definitions means less error
|
||
|
checking/handling at runtime (less can go wrong).
|
||
|
|
||
|
- A schema enables us to access a buffer without parsing.
|
||
|
|
||
|
FlatBuffer schemas are fairly similar to those of the incumbent,
|
||
|
Protocol Buffers, and generally should be readable to those familiar
|
||
|
with the C family of languages. We chose to improve upon the features
|
||
|
offered by .proto files in the following ways:
|
||
|
|
||
|
- Deprecation of fields instead of manual field id assignment.
|
||
|
Extending an object in a .proto means hunting for a free slot among
|
||
|
the numbers (preferring lower numbers since they have a more compact
|
||
|
representation). Besides being inconvenient, it also makes removing
|
||
|
fields problematic: you either have to keep them, not making it
|
||
|
obvious that this field shouldn't be read/written anymore, and still
|
||
|
generating accessors. Or you remove it, but now you risk that
|
||
|
there's still old data around that uses that field by the time
|
||
|
someone reuses that field id, with nasty consequences.
|
||
|
|
||
|
- Differentiating between tables and structs (see above). Effectively
|
||
|
all table fields are `optional`, and all struct fields are
|
||
|
`required`.
|
||
|
|
||
|
- Having a native vector type instead of `repeated`. This gives you a
|
||
|
length without having to collect all items, and in the case of
|
||
|
scalars provides for a more compact representation, and one that
|
||
|
guarantees adjacency.
|
||
|
|
||
|
- Having a native `union` type instead of using a series of `optional`
|
||
|
fields, all of which must be checked individually.
|
||
|
|
||
|
- Being able to define defaults for all scalars, instead of having to
|
||
|
deal with their optionality at each access.
|
||
|
|
||
|
- A parser that can deal with both schemas and data definitions (JSON
|
||
|
compatible) uniformly.
|
||
|
|
||
|
|