686 lines
29 KiB
Markdown
686 lines
29 KiB
Markdown
Writing a schema {#flatbuffers_guide_writing_schema}
|
|
================
|
|
|
|
The syntax of the schema language (aka IDL, [Interface Definition Language][])
|
|
should look quite familiar to users of any of the C family of
|
|
languages, and also to users of other IDLs. Let's look at an example
|
|
first:
|
|
|
|
// example IDL file
|
|
|
|
namespace MyGame;
|
|
|
|
attribute "priority";
|
|
|
|
enum Color : byte { Red = 1, Green, Blue }
|
|
|
|
union Any { Monster, Weapon, Pickup }
|
|
|
|
struct Vec3 {
|
|
x:float;
|
|
y:float;
|
|
z:float;
|
|
}
|
|
|
|
table Monster {
|
|
pos:Vec3;
|
|
mana:short = 150;
|
|
hp:short = 100;
|
|
name:string;
|
|
friendly:bool = false (deprecated, priority: 1);
|
|
inventory:[ubyte];
|
|
color:Color = Blue;
|
|
test:Any;
|
|
}
|
|
|
|
root_type Monster;
|
|
|
|
(`Weapon` & `Pickup` not defined as part of this example).
|
|
|
|
### Tables
|
|
|
|
Tables are the main way of defining objects in FlatBuffers, and consist of a
|
|
name (here `Monster`) and a list of fields. Each field has a name, a type, and
|
|
optionally a default value. If the default value is not specified in the schema,
|
|
it will be `0` for scalar types, or `null` for other types. Some languages
|
|
support setting a scalar's default to `null`. This makes the scalar optional.
|
|
|
|
Fields do not have to appear in the wire representation, and you can choose
|
|
to omit fields when constructing an object. You have the flexibility to add
|
|
fields without fear of bloating your data. This design is also FlatBuffer's
|
|
mechanism for forward and backwards compatibility. Note that:
|
|
|
|
- You can add new fields in the schema ONLY at the end of a table
|
|
definition. Older data will still
|
|
read correctly, and give you the default value when read. Older code
|
|
will simply ignore the new field.
|
|
If you want to have flexibility to use any order for fields in your
|
|
schema, you can manually assign ids (much like Protocol Buffers),
|
|
see the `id` attribute below.
|
|
|
|
- You cannot delete fields you don't use anymore from the schema,
|
|
but you can simply
|
|
stop writing them into your data for almost the same effect.
|
|
Additionally you can mark them as `deprecated` as in the example
|
|
above, which will prevent the generation of accessors in the
|
|
generated C++, as a way to enforce the field not being used any more.
|
|
(careful: this may break code!).
|
|
|
|
- You may change field names and table names, if you're ok with your
|
|
code breaking until you've renamed them there too.
|
|
|
|
See "Schema evolution examples" below for more on this
|
|
topic.
|
|
|
|
### Structs
|
|
|
|
Similar to a table, only now none of the fields are optional (so no defaults
|
|
either), and fields may not be added or be deprecated. Structs may only contain
|
|
scalars or other structs. Use this for
|
|
simple objects where you are very sure no changes will ever be made
|
|
(as quite clear in the example `Vec3`). Structs use less memory than
|
|
tables and are even faster to access (they are always stored in-line in their
|
|
parent object, and use no virtual table).
|
|
|
|
### Types
|
|
|
|
Built-in scalar types are
|
|
|
|
- 8 bit: `byte` (`int8`), `ubyte` (`uint8`), `bool`
|
|
|
|
- 16 bit: `short` (`int16`), `ushort` (`uint16`)
|
|
|
|
- 32 bit: `int` (`int32`), `uint` (`uint32`), `float` (`float32`)
|
|
|
|
- 64 bit: `long` (`int64`), `ulong` (`uint64`), `double` (`float64`)
|
|
|
|
The type names in parentheses are alias names such that for example
|
|
`uint8` can be used in place of `ubyte`, and `int32` can be used in
|
|
place of `int` without affecting code generation.
|
|
|
|
Built-in non-scalar types:
|
|
|
|
- Vector of any other type (denoted with `[type]`). Nesting vectors
|
|
is not supported, instead you can wrap the inner vector in a table.
|
|
|
|
- `string`, which may only hold UTF-8 or 7-bit ASCII. For other text encodings
|
|
or general binary data use vectors (`[byte]` or `[ubyte]`) instead.
|
|
|
|
- References to other tables or structs, enums or unions (see
|
|
below).
|
|
|
|
You can't change types of fields once they're used, with the exception
|
|
of same-size data where a `reinterpret_cast` would give you a desirable result,
|
|
e.g. you could change a `uint` to an `int` if no values in current data use the
|
|
high bit yet.
|
|
|
|
### Arrays
|
|
|
|
Arrays are a convenience short-hand for a fixed-length collection of elements.
|
|
Arrays can be used to replace the following schema:
|
|
|
|
struct Vec3 {
|
|
x:float;
|
|
y:float;
|
|
z:float;
|
|
}
|
|
|
|
with the following schema:
|
|
|
|
struct Vec3 {
|
|
v:[float:3];
|
|
}
|
|
|
|
Both representations are binary equivalent.
|
|
|
|
Arrays are currently only supported in a `struct`.
|
|
|
|
### Default, Optional and Required Values
|
|
|
|
There are three, mutually exclusive, reactions to the non-presence of a table's
|
|
field in the binary data:
|
|
|
|
1. Default valued fields will return the default value (as defined in the schema).
|
|
2. Optional valued fields will return some form of `null` depending on the
|
|
local language. (In a sense, `null` is the default value).
|
|
3. Required fields will cause an error. Flatbuffer verifiers would
|
|
consider the whole buffer invalid. See the `required` tag below.
|
|
|
|
When writing a schema, values are a sequence of digits. Values may be optionally
|
|
followed by a decimal point (`.`) and more digits, for float constants, or
|
|
optionally prefixed by a `-`. Floats may also be in scientific notation;
|
|
optionally ending with an `e` or `E`, followed by a `+` or `-` and more digits.
|
|
Values can also be the keyword `null`.
|
|
|
|
Only scalar values can have defaults, non-scalar (string/vector/table) fields
|
|
default to `null` when not present.
|
|
|
|
You generally do not want to change default values after they're initially
|
|
defined. Fields that have the default value are not actually stored in the
|
|
serialized data (see also Gotchas below). Values explicitly written by code
|
|
generated by the old schema old version, if they happen to be the default, will
|
|
be read as a different value by code generated with the new schema. This is
|
|
slightly less bad when converting an optional scalar into a default valued
|
|
scalar since non-presence would not be overloaded with a previous default value.
|
|
There are situations, however, where this may be desirable, especially if you
|
|
can ensure a simultaneous rebuild of all code.
|
|
|
|
### Enums
|
|
|
|
Define a sequence of named constants, each with a given value, or
|
|
increasing by one from the previous one. The default first value
|
|
is `0`. As you can see in the enum declaration, you specify the underlying
|
|
integral type of the enum with `:` (in this case `byte`), which then determines
|
|
the type of any fields declared with this enum type.
|
|
|
|
Only integer types are allowed, i.e. `byte`, `ubyte`, `short` `ushort`, `int`,
|
|
`uint`, `long` and `ulong`.
|
|
|
|
Typically, enum values should only ever be added, never removed (there is no
|
|
deprecation for enums). This requires code to handle forwards compatibility
|
|
itself, by handling unknown enum values.
|
|
|
|
### Unions
|
|
|
|
Unions share a lot of properties with enums, but instead of new names
|
|
for constants, you use names of tables. You can then declare
|
|
a union field, which can hold a reference to any of those types, and
|
|
additionally a field with the suffix `_type` is generated that holds
|
|
the corresponding enum value, allowing you to know which type to cast
|
|
to at runtime.
|
|
|
|
It's possible to give an alias name to a type union. This way a type can even be
|
|
used to mean different things depending on the name used:
|
|
|
|
table PointPosition { x:uint; y:uint; }
|
|
table MarkerPosition {}
|
|
union Position {
|
|
Start:MarkerPosition,
|
|
Point:PointPosition,
|
|
Finish:MarkerPosition
|
|
}
|
|
|
|
Unions contain a special `NONE` marker to denote that no value is stored so that
|
|
name cannot be used as an alias.
|
|
|
|
Unions are a good way to be able to send multiple message types as a FlatBuffer.
|
|
Note that because a union field is really two fields, it must always be
|
|
part of a table, it cannot be the root of a FlatBuffer by itself.
|
|
|
|
If you have a need to distinguish between different FlatBuffers in a more
|
|
open-ended way, for example for use as files, see the file identification
|
|
feature below.
|
|
|
|
There is an experimental support only in C++ for a vector of unions (and
|
|
types). In the example IDL file above, use [Any] to add a vector of Any to
|
|
Monster table. There is also experimental support for other types besides
|
|
tables in unions, in particular structs and strings. There's no direct support
|
|
for scalars in unions, but they can be wrapped in a struct at no space cost.
|
|
|
|
### Namespaces
|
|
|
|
These will generate the corresponding namespace in C++ for all helper
|
|
code, and packages in Java. You can use `.` to specify nested namespaces /
|
|
packages.
|
|
|
|
### Includes
|
|
|
|
You can include other schemas files in your current one, e.g.:
|
|
|
|
include "mydefinitions.fbs";
|
|
|
|
This makes it easier to refer to types defined elsewhere. `include`
|
|
automatically ensures each file is parsed just once, even when referred to
|
|
more than once.
|
|
|
|
When using the `flatc` compiler to generate code for schema definitions,
|
|
only definitions in the current file will be generated, not those from the
|
|
included files (those you still generate separately).
|
|
|
|
### Root type
|
|
|
|
This declares what you consider to be the root table of the serialized
|
|
data. This is particularly important for parsing JSON data, which doesn't
|
|
include object type information.
|
|
|
|
### File identification and extension
|
|
|
|
Typically, a FlatBuffer binary buffer is not self-describing, i.e. it
|
|
needs you to know its schema to parse it correctly. But if you
|
|
want to use a FlatBuffer as a file format, it would be convenient
|
|
to be able to have a "magic number" in there, like most file formats
|
|
have, to be able to do a sanity check to see if you're reading the
|
|
kind of file you're expecting.
|
|
|
|
Now, you can always prefix a FlatBuffer with your own file header,
|
|
but FlatBuffers has a built-in way to add an identifier to a
|
|
FlatBuffer that takes up minimal space, and keeps the buffer
|
|
compatible with buffers that don't have such an identifier.
|
|
|
|
You can specify in a schema, similar to `root_type`, that you intend
|
|
for this type of FlatBuffer to be used as a file format:
|
|
|
|
file_identifier "MYFI";
|
|
|
|
Identifiers must always be exactly 4 characters long. These 4 characters
|
|
will end up as bytes at offsets 4-7 (inclusive) in the buffer.
|
|
|
|
For any schema that has such an identifier, `flatc` will automatically
|
|
add the identifier to any binaries it generates (with `-b`),
|
|
and generated calls like `FinishMonsterBuffer` also add the identifier.
|
|
If you have specified an identifier and wish to generate a buffer
|
|
without one, you can always still do so by calling
|
|
`FlatBufferBuilder::Finish` explicitly.
|
|
|
|
After loading a buffer, you can use a call like
|
|
`MonsterBufferHasIdentifier` to check if the identifier is present.
|
|
|
|
Note that this is best for open-ended uses such as files. If you simply wanted
|
|
to send one of a set of possible messages over a network for example, you'd
|
|
be better off with a union.
|
|
|
|
Additionally, by default `flatc` will output binary files as `.bin`.
|
|
This declaration in the schema will change that to whatever you want:
|
|
|
|
file_extension "ext";
|
|
|
|
### RPC interface declarations
|
|
|
|
You can declare RPC calls in a schema, that define a set of functions
|
|
that take a FlatBuffer as an argument (the request) and return a FlatBuffer
|
|
as the response (both of which must be table types):
|
|
|
|
rpc_service MonsterStorage {
|
|
Store(Monster):StoreResponse;
|
|
Retrieve(MonsterId):Monster;
|
|
}
|
|
|
|
What code this produces and how it is used depends on language and RPC system
|
|
used, there is preliminary support for GRPC through the `--grpc` code generator,
|
|
see `grpc/tests` for an example.
|
|
|
|
### Comments & documentation
|
|
|
|
May be written as in most C-based languages. Additionally, a triple
|
|
comment (`///`) on a line by itself signals that a comment is documentation
|
|
for whatever is declared on the line after it
|
|
(table/struct/field/enum/union/element), and the comment is output
|
|
in the corresponding C++ code. Multiple such lines per item are allowed.
|
|
|
|
### Attributes
|
|
|
|
Attributes may be attached to a declaration, behind a field/enum value,
|
|
or after the name of a table/struct/enum/union. These may either have
|
|
a value or not. Some attributes like `deprecated` are understood by
|
|
the compiler; user defined ones need to be declared with the attribute
|
|
declaration (like `priority` in the example above), and are
|
|
available to query if you parse the schema at runtime.
|
|
This is useful if you write your own code generators/editors etc., and
|
|
you wish to add additional information specific to your tool (such as a
|
|
help text).
|
|
|
|
Current understood attributes:
|
|
|
|
- `id: n` (on a table field): manually set the field identifier to `n`.
|
|
If you use this attribute, you must use it on ALL fields of this table,
|
|
and the numbers must be a contiguous range from 0 onwards.
|
|
Additionally, since a union type effectively adds two fields, its
|
|
id must be that of the second field (the first field is the type
|
|
field and not explicitly declared in the schema).
|
|
For example, if the last field before the union field had id 6,
|
|
the union field should have id 8, and the unions type field will
|
|
implicitly be 7.
|
|
IDs allow the fields to be placed in any order in the schema.
|
|
When a new field is added to the schema it must use the next available ID.
|
|
- `deprecated` (on a field): do not generate accessors for this field
|
|
anymore, code should stop using this data. Old data may still contain this
|
|
field, but it won't be accessible anymore by newer code. Note that if you
|
|
deprecate a field that was previous required, old code may fail to validate
|
|
new data (when using the optional verifier).
|
|
- `required` (on a non-scalar table field): this field must always be set.
|
|
By default, fields do not need to be present in the binary. This is
|
|
desirable, as it helps with forwards/backwards compatibility, and
|
|
flexibility of data structures. By specifying this attribute, you make non-
|
|
presence in an error for both reader and writer. The reading code may access
|
|
the field directly, without checking for null. If the constructing code does
|
|
not initialize this field, they will get an assert, and also the verifier
|
|
will fail on buffers that have missing required fields. Both adding and
|
|
removing this attribute may be forwards/backwards incompatible as readers
|
|
will be unable read old or new data, respectively, unless the data happens to
|
|
always have the field set.
|
|
- `force_align: size` (on a struct): force the alignment of this struct
|
|
to be something higher than what it is naturally aligned to. Causes
|
|
these structs to be aligned to that amount inside a buffer, IF that
|
|
buffer is allocated with that alignment (which is not necessarily
|
|
the case for buffers accessed directly inside a `FlatBufferBuilder`).
|
|
Note: currently not guaranteed to have an effect when used with
|
|
`--object-api`, since that may allocate objects at alignments less than
|
|
what you specify with `force_align`.
|
|
- `force_align: size` (on a vector): force the alignment of this vector to be
|
|
something different than what the element size would normally dictate.
|
|
Note: Now only work for generated C++ code.
|
|
- `bit_flags` (on an unsigned enum): the values of this field indicate bits,
|
|
meaning that any unsigned value N specified in the schema will end up
|
|
representing 1<<N, or if you don't specify values at all, you'll get
|
|
the sequence 1, 2, 4, 8, ...
|
|
- `nested_flatbuffer: "table_name"` (on a field): this indicates that the field
|
|
(which must be a vector of ubyte) contains flatbuffer data, for which the
|
|
root type is given by `table_name`. The generated code will then produce
|
|
a convenient accessor for the nested FlatBuffer.
|
|
- `flexbuffer` (on a field): this indicates that the field
|
|
(which must be a vector of ubyte) contains flexbuffer data. The generated
|
|
code will then produce a convenient accessor for the FlexBuffer root.
|
|
- `key` (on a field): this field is meant to be used as a key when sorting
|
|
a vector of the type of table it sits in. Can be used for in-place
|
|
binary search.
|
|
- `hash` (on a field). This is an (un)signed 32/64 bit integer field, whose
|
|
value during JSON parsing is allowed to be a string, which will then be
|
|
stored as its hash. The value of attribute is the hashing algorithm to
|
|
use, one of `fnv1_32` `fnv1_64` `fnv1a_32` `fnv1a_64`.
|
|
- `original_order` (on a table): since elements in a table do not need
|
|
to be stored in any particular order, they are often optimized for
|
|
space by sorting them to size. This attribute stops that from happening.
|
|
There should generally not be any reason to use this flag.
|
|
- 'native_*'. Several attributes have been added to support the [C++ object
|
|
Based API](@ref flatbuffers_cpp_object_based_api). All such attributes
|
|
are prefixed with the term "native_".
|
|
|
|
|
|
## JSON Parsing
|
|
|
|
The same parser that parses the schema declarations above is also able
|
|
to parse JSON objects that conform to this schema. So, unlike other JSON
|
|
parsers, this parser is strongly typed, and parses directly into a FlatBuffer
|
|
(see the compiler documentation on how to do this from the command line, or
|
|
the C++ documentation on how to do this at runtime).
|
|
|
|
Besides needing a schema, there are a few other changes to how it parses
|
|
JSON:
|
|
|
|
- It accepts field names with and without quotes, like many JSON parsers
|
|
already do. It outputs them without quotes as well, though can be made
|
|
to output them using the `strict_json` flag.
|
|
- If a field has an enum type, the parser will recognize symbolic enum
|
|
values (with or without quotes) instead of numbers, e.g.
|
|
`field: EnumVal`. If a field is of integral type, you can still use
|
|
symbolic names, but values need to be prefixed with their type and
|
|
need to be quoted, e.g. `field: "Enum.EnumVal"`. For enums
|
|
representing flags, you may place multiple inside a string
|
|
separated by spaces to OR them, e.g.
|
|
`field: "EnumVal1 EnumVal2"` or `field: "Enum.EnumVal1 Enum.EnumVal2"`.
|
|
- Similarly, for unions, these need to specified with two fields much like
|
|
you do when serializing from code. E.g. for a field `foo`, you must
|
|
add a field `foo_type: FooOne` right before the `foo` field, where
|
|
`FooOne` would be the table out of the union you want to use.
|
|
- A field that has the value `null` (e.g. `field: null`) is intended to
|
|
have the default value for that field (thus has the same effect as if
|
|
that field wasn't specified at all).
|
|
- It has some built in conversion functions, so you can write for example
|
|
`rad(180)` where ever you'd normally write `3.14159`.
|
|
Currently supports the following functions: `rad`, `deg`, `cos`, `sin`,
|
|
`tan`, `acos`, `asin`, `atan`.
|
|
|
|
When parsing JSON, it recognizes the following escape codes in strings:
|
|
|
|
- `\n` - linefeed.
|
|
- `\t` - tab.
|
|
- `\r` - carriage return.
|
|
- `\b` - backspace.
|
|
- `\f` - form feed.
|
|
- `\"` - double quote.
|
|
- `\\` - backslash.
|
|
- `\/` - forward slash.
|
|
- `\uXXXX` - 16-bit unicode code point, converted to the equivalent UTF-8
|
|
representation.
|
|
- `\xXX` - 8-bit binary hexadecimal number XX. This is the only one that is
|
|
not in the JSON spec (see http://json.org/), but is needed to be able to
|
|
encode arbitrary binary in strings to text and back without losing
|
|
information (e.g. the byte 0xFF can't be represented in standard JSON).
|
|
|
|
It also generates these escape codes back again when generating JSON from a
|
|
binary representation.
|
|
|
|
When parsing numbers, the parser is more flexible than JSON.
|
|
A format of numeric literals is more close to the C/C++.
|
|
According to the [grammar](@ref flatbuffers_grammar), it accepts the following
|
|
numerical literals:
|
|
|
|
- An integer literal can have any number of leading zero `0` digits.
|
|
Unlike C/C++, the parser ignores a leading zero, not interpreting it as the
|
|
beginning of the octal number.
|
|
The numbers `[081, -00094]` are equal to `[81, -94]` decimal integers.
|
|
- The parser accepts unsigned and signed hexadecimal integer numbers.
|
|
For example: `[0x123, +0x45, -0x67]` are equal to `[291, 69, -103]` decimals.
|
|
- The format of float-point numbers is fully compatible with C/C++ format.
|
|
If a modern C++ compiler is used the parser accepts hexadecimal and special
|
|
floating-point literals as well:
|
|
`[-1.0, 2., .3e0, 3.e4, 0x21.34p-5, -inf, nan]`.
|
|
|
|
The following conventions for floating-point numbers are used:
|
|
- The exponent suffix of hexadecimal floating-point number is mandatory.
|
|
- Parsed `NaN` converted to unsigned IEEE-754 `quiet-NaN` value.
|
|
|
|
Extended floating-point support was tested with:
|
|
- x64 Windows: `MSVC2015` and higher.
|
|
- x64 Linux: `LLVM 6.0`, `GCC 4.9` and higher.
|
|
|
|
For details, see [Use in C++](@ref flatbuffers_guide_use_cpp) section.
|
|
|
|
- For compatibility with a JSON lint tool all numeric literals of scalar
|
|
fields can be wrapped to quoted string:
|
|
`"1", "2.0", "0x48A", "0x0C.0Ep-1", "-inf", "true"`.
|
|
|
|
## Guidelines
|
|
|
|
### Efficiency
|
|
|
|
FlatBuffers is all about efficiency, but to realize that efficiency you
|
|
require an efficient schema. There are usually multiple choices on
|
|
how to represent data that have vastly different size characteristics.
|
|
|
|
It is very common nowadays to represent any kind of data as dictionaries
|
|
(as in e.g. JSON), because of its flexibility and extensibility. While
|
|
it is possible to emulate this in FlatBuffers (as a vector
|
|
of tables with key and value(s)), this is a bad match for a strongly
|
|
typed system like FlatBuffers, leading to relatively large binaries.
|
|
FlatBuffer tables are more flexible than classes/structs in most systems,
|
|
since having a large number of fields only few of which are actually
|
|
used is still efficient. You should thus try to organize your data
|
|
as much as possible such that you can use tables where you might be
|
|
tempted to use a dictionary.
|
|
|
|
Similarly, strings as values should only be used when they are
|
|
truly open-ended. If you can, always use an enum instead.
|
|
|
|
FlatBuffers doesn't have inheritance, so the way to represent a set
|
|
of related data structures is a union. Unions do have a cost however,
|
|
so an alternative to a union is to have a single table that has
|
|
all the fields of all the data structures you are trying to
|
|
represent, if they are relatively similar / share many fields.
|
|
Again, this is efficient because non-present fields are cheap.
|
|
|
|
FlatBuffers supports the full range of integer sizes, so try to pick
|
|
the smallest size needed, rather than defaulting to int/long.
|
|
|
|
Remember that you can share data (refer to the same string/table
|
|
within a buffer), so factoring out repeating data into its own
|
|
data structure may be worth it.
|
|
|
|
### Style guide
|
|
|
|
Identifiers in a schema are meant to translate to many different programming
|
|
languages, so using the style of your "main" language is generally a bad idea.
|
|
|
|
For this reason, below is a suggested style guide to adhere to, to keep schemas
|
|
consistent for interoperation regardless of the target language.
|
|
|
|
Where possible, the code generators for specific languages will generate
|
|
identifiers that adhere to the language style, based on the schema identifiers.
|
|
|
|
- Table, struct, enum and rpc names (types): UpperCamelCase.
|
|
- Table and struct field names: snake_case. This is translated to lowerCamelCase
|
|
automatically for some languages, e.g. Java.
|
|
- Enum values: UpperCamelCase.
|
|
- namespaces: UpperCamelCase.
|
|
|
|
Formatting (this is less important, but still worth adhering to):
|
|
|
|
- Opening brace: on the same line as the start of the declaration.
|
|
- Spacing: Indent by 2 spaces. None around `:` for types, on both sides for `=`.
|
|
|
|
For an example, see the schema at the top of this file.
|
|
|
|
## Gotchas
|
|
|
|
### Schemas and version control
|
|
|
|
FlatBuffers relies on new field declarations being added at the end, and earlier
|
|
declarations to not be removed, but be marked deprecated when needed. We think
|
|
this is an improvement over the manual number assignment that happens in
|
|
Protocol Buffers (and which is still an option using the `id` attribute
|
|
mentioned above).
|
|
|
|
One place where this is possibly problematic however is source control. If user
|
|
A adds a field, generates new binary data with this new schema, then tries to
|
|
commit both to source control after user B already committed a new field also,
|
|
and just auto-merges the schema, the binary files are now invalid compared to
|
|
the new schema.
|
|
|
|
The solution of course is that you should not be generating binary data before
|
|
your schema changes have been committed, ensuring consistency with the rest of
|
|
the world. If this is not practical for you, use explicit field ids, which
|
|
should always generate a merge conflict if two people try to allocate the same
|
|
id.
|
|
|
|
### Schema evolution examples (tables)
|
|
|
|
Some examples to clarify what happens as you change a schema:
|
|
|
|
If we have the following original schema:
|
|
|
|
table { a:int; b:int; }
|
|
|
|
And we extend it:
|
|
|
|
table { a:int; b:int; c:int; }
|
|
|
|
This is ok. Code compiled with the old schema reading data generated with the
|
|
new one will simply ignore the presence of the new field. Code compiled with the
|
|
new schema reading old data will get the default value for `c` (which is 0
|
|
in this case, since it is not specified).
|
|
|
|
table { a:int (deprecated); b:int; }
|
|
|
|
This is also ok. Code compiled with the old schema reading newer data will now
|
|
always get the default value for `a` since it is not present. Code compiled
|
|
with the new schema now cannot read nor write `a` anymore (any existing code
|
|
that tries to do so will result in compile errors), but can still read
|
|
old data (they will ignore the field).
|
|
|
|
table { c:int; a:int; b:int; }
|
|
|
|
This is NOT ok, as this makes the schemas incompatible. Old code reading newer
|
|
data will interpret `c` as if it was `a`, and new code reading old data
|
|
accessing `a` will instead receive `b`.
|
|
|
|
table { c:int (id: 2); a:int (id: 0); b:int (id: 1); }
|
|
|
|
This is ok. If your intent was to order/group fields in a way that makes sense
|
|
semantically, you can do so using explicit id assignment. Now we are compatible
|
|
with the original schema, and the fields can be ordered in any way, as long as
|
|
we keep the sequence of ids.
|
|
|
|
table { b:int; }
|
|
|
|
NOT ok. We can only remove a field by deprecation, regardless of whether we use
|
|
explicit ids or not.
|
|
|
|
table { a:uint; b:uint; }
|
|
|
|
This is MAYBE ok, and only in the case where the type change is the same size,
|
|
like here. If old data never contained any negative numbers, this will be
|
|
safe to do.
|
|
|
|
table { a:int = 1; b:int = 2; }
|
|
|
|
Generally NOT ok. Any older data written that had 0 values were not written to
|
|
the buffer, and rely on the default value to be recreated. These will now have
|
|
those values appear to `1` and `2` instead. There may be cases in which this
|
|
is ok, but care must be taken.
|
|
|
|
table { aa:int; bb:int; }
|
|
|
|
Occasionally ok. You've renamed fields, which will break all code (and JSON
|
|
files!) that use this schema, but as long as the change is obvious, this is not
|
|
incompatible with the actual binary buffers, since those only ever address
|
|
fields by id/offset.
|
|
|
|
#### Schema evolution examples (unions)
|
|
|
|
Suppose we have the following schema:
|
|
```
|
|
union Foo { A, B }
|
|
```
|
|
We can add another variant at the end.
|
|
```
|
|
union Foo { A, B, another_a: A }
|
|
```
|
|
and this will be okay. Old code will not recognize `another_a`.
|
|
However if we add `another_a` anywhere but the end, e.g.
|
|
```
|
|
union Foo { A, another_a: A, B }
|
|
```
|
|
this is not okay. When new code writes `another_a`, old code will
|
|
misinterpret it as `B` (and vice versa). However you can explicitly
|
|
set the union's "discriminant" value like so:
|
|
```
|
|
union Foo { A = 1, another_a: A = 3, B = 2 }
|
|
```
|
|
This is okay.
|
|
|
|
```
|
|
union Foo { original_a: A = 1, another_a: A = 3, B = 2 }
|
|
```
|
|
Renaming fields will break code and any saved human readable representations,
|
|
such as json files, but the binary buffers will be the same.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
<br>
|
|
|
|
### Testing whether a field is present in a table
|
|
|
|
Most serialization formats (e.g. JSON or Protocol Buffers) make it very
|
|
explicit in the format whether a field is present in an object or not,
|
|
allowing you to use this as "extra" information.
|
|
|
|
FlatBuffers will not write fields that are equal to their default value,
|
|
sometimes resulting in significant space savings. However, this also means we
|
|
cannot disambiguate the meaning of non-presence as "written default value" or
|
|
"not written at all". This only applies to scalar fields since only they support
|
|
default values. Unless otherwise specified, their default is 0.
|
|
|
|
If you care about the presence of scalars, most languages support "optional
|
|
scalars." You can set `null` as the default value in the schema. `null` is a
|
|
value that's outside of all types, so we will always write if `add_field` is
|
|
called. The generated field accessor should use the local language's canonical
|
|
optional type.
|
|
|
|
Some `FlatBufferBuilder` implementations have an option called `force_defaults`
|
|
that circumvents this "not writing defaults" behavior you can then use
|
|
`IsFieldPresent` to query presence.
|
|
/
|
|
Another option that works in all languages is to wrap a scalar field in a
|
|
struct. This way it will return null if it is not present. This will be slightly
|
|
less ergonomic but structs don't take up any more space than the scalar they
|
|
represent.
|
|
|
|
[Interface Definition Language]: https://en.wikipedia.org/wiki/Interface_description_language
|
|
|
|
## Writing your own code generator.
|
|
|
|
See [our intermediate representation](@ref intermediate_representation).
|