Sorted Vector & binary search functionality.

Bug: 16659276
Tested: on Linux & Windows.

Change-Id: Ie7a73810345fad4cf0a3ad03dfaa5464e3ed5ac8
This commit is contained in:
Wouter van Oortmerssen 2015-01-07 17:51:31 -08:00
parent 73582b145c
commit 3550899987
11 changed files with 166 additions and 30 deletions

View File

@ -101,7 +101,15 @@ $(document).ready(function(){initNavTree('md__cpp_usage.html','');});
<div class="fragment"><div class="line"><span class="keyword">auto</span> inv = monster-&gt;inventory();</div> <div class="fragment"><div class="line"><span class="keyword">auto</span> inv = monster-&gt;inventory();</div>
<div class="line">assert(inv);</div> <div class="line">assert(inv);</div>
<div class="line">assert(inv-&gt;Get(9) == 9);</div> <div class="line">assert(inv-&gt;Get(9) == 9);</div>
</div><!-- fragment --><h3>Direct memory access</h3> </div><!-- fragment --><h3>Storing maps / dictionaries in a FlatBuffer</h3>
<p>FlatBuffers doesn't support maps natively, but there is support to emulate their behavior with vectors and binary search, which means you can have fast lookups directly from a FlatBuffer without having to unpack your data into a <code>std::map</code> or similar.</p>
<p>To use it:</p><ul>
<li>Designate one of the fields in a table as they "key" field. You do this by setting the <code>key</code> attribute on this field, e.g. <code>name:string (key)</code>. You may only have one key field, and it must be of string or scalar type.</li>
<li>Write out tables of this type as usual, collect their offsets in an array or vector.</li>
<li>Instead of <code>CreateVector</code>, call <code>CreateVectorOfSortedTables</code>, which will first sort all offsets such that the tables they refer to are sorted by the key field, then serialize it.</li>
<li>Now when you're accessing the FlatBuffer, you can use <code>Vector::LookupByKey</code> instead of just <code>Vector::Get</code> to access elements of the vector, e.g.: <code>myvector-&gt;LookupByKey("Fred")</code>, which returns a pointer to the corresponding table type, or <code>nullptr</code> if not found. <code>LookupByKey</code> performs a binary search, so should have a similar speed to <code>std::map</code>, though may be faster because of better caching. <code>LookupByKey</code> only works if the vector has been sorted, it will likely not find elements if it hasn't been sorted.</li>
</ul>
<h3>Direct memory access</h3>
<p>As you can see from the above examples, all elements in a buffer are accessed through generated accessors. This is because everything is stored in little endian format on all platforms (the accessor performs a swap operation on big endian machines), and also because the layout of things is generally not known to the user.</p> <p>As you can see from the above examples, all elements in a buffer are accessed through generated accessors. This is because everything is stored in little endian format on all platforms (the accessor performs a swap operation on big endian machines), and also because the layout of things is generally not known to the user.</p>
<p>For structs, layout is deterministic and guaranteed to be the same accross platforms (scalars are aligned to their own size, and structs themselves to their largest member), and you are allowed to access this memory directly by using <code>sizeof()</code> and <code>memcpy</code> on the pointer to a struct, or even an array of structs.</p> <p>For structs, layout is deterministic and guaranteed to be the same accross platforms (scalars are aligned to their own size, and structs themselves to their largest member), and you are allowed to access this memory directly by using <code>sizeof()</code> and <code>memcpy</code> on the pointer to a struct, or even an array of structs.</p>
<p>To compute offsets to sub-elements of a struct, make sure they are a structs themselves, as then you can use the pointers to figure out the offset without having to hardcode it. This is handy for use of arrays of structs with calls like <code>glVertexAttribPointer</code> in OpenGL or similar APIs.</p> <p>To compute offsets to sub-elements of a struct, make sure they are a structs themselves, as then you can use the pointers to figure out the offset without having to hardcode it. This is handy for use of arrays of structs with calls like <code>glVertexAttribPointer</code> in OpenGL or similar APIs.</p>

View File

@ -146,6 +146,7 @@ root_type Monster;
<li><code>force_align: size</code> (on a struct): force the alignment of this struct to be something higher than what it is naturally aligned to. Causes these structs to be aligned to that amount inside a buffer, IF that buffer is allocated with that alignment (which is not necessarily the case for buffers accessed directly inside a <code>FlatBufferBuilder</code>).</li> <li><code>force_align: size</code> (on a struct): force the alignment of this struct to be something higher than what it is naturally aligned to. Causes these structs to be aligned to that amount inside a buffer, IF that buffer is allocated with that alignment (which is not necessarily the case for buffers accessed directly inside a <code>FlatBufferBuilder</code>).</li>
<li><code>bit_flags</code> (on an enum): the values of this field indicate bits, meaning that any value N specified in the schema will end up representing 1&lt;&lt;N, or if you don't specify values at all, you'll get the sequence 1, 2, 4, 8, ...</li> <li><code>bit_flags</code> (on an enum): the values of this field indicate bits, meaning that any value N specified in the schema will end up representing 1&lt;&lt;N, or if you don't specify values at all, you'll get the sequence 1, 2, 4, 8, ...</li>
<li><code>nested_flatbuffer: "table_name"</code> (on a field): this indicates that the field (which must be a vector of ubyte) contains flatbuffer data, for which the root type is given by <code>table_name</code>. The generated code will then produce a convenient accessor for the nested FlatBuffer.</li> <li><code>nested_flatbuffer: "table_name"</code> (on a field): this indicates that the field (which must be a vector of ubyte) contains flatbuffer data, for which the root type is given by <code>table_name</code>. The generated code will then produce a convenient accessor for the nested FlatBuffer.</li>
<li><code>key</code> (on a field): this field is meant to be used as a key when sorting a vector of the type of table it sits in. Can be used for in-place binary search.</li>
</ul> </ul>
<h2>JSON Parsing</h2> <h2>JSON Parsing</h2>
<p>The same parser that parses the schema declarations above is also able to parse JSON objects that conform to this schema. So, unlike other JSON parsers, this parser is strongly typed, and parses directly into a FlatBuffer (see the compiler documentation on how to do this from the command line, or the C++ documentation on how to do this at runtime).</p> <p>The same parser that parses the schema declarations above is also able to parse JSON objects that conform to this schema. So, unlike other JSON parsers, this parser is strongly typed, and parses directly into a FlatBuffer (see the compiler documentation on how to do this from the command line, or the C++ documentation on how to do this at runtime).</p>

View File

@ -157,6 +157,32 @@ Similarly, we can access elements of the inventory array:
assert(inv->Get(9) == 9); assert(inv->Get(9) == 9);
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
### Storing maps / dictionaries in a FlatBuffer
FlatBuffers doesn't support maps natively, but there is support to
emulate their behavior with vectors and binary search, which means you
can have fast lookups directly from a FlatBuffer without having to unpack
your data into a `std::map` or similar.
To use it:
- Designate one of the fields in a table as they "key" field. You do this
by setting the `key` attribute on this field, e.g.
`name:string (key)`.
You may only have one key field, and it must be of string or scalar type.
- Write out tables of this type as usual, collect their offsets in an
array or vector.
- Instead of `CreateVector`, call `CreateVectorOfSortedTables`,
which will first sort all offsets such that the tables they refer to
are sorted by the key field, then serialize it.
- Now when you're accessing the FlatBuffer, you can use `Vector::LookupByKey`
instead of just `Vector::Get` to access elements of the vector, e.g.:
`myvector->LookupByKey("Fred")`, which returns a pointer to the
corresponding table type, or `nullptr` if not found.
`LookupByKey` performs a binary search, so should have a similar speed to
`std::map`, though may be faster because of better caching. `LookupByKey`
only works if the vector has been sorted, it will likely not find elements
if it hasn't been sorted.
### Direct memory access ### Direct memory access
As you can see from the above examples, all elements in a buffer are As you can see from the above examples, all elements in a buffer are

View File

@ -275,6 +275,9 @@ Current understood attributes:
(which must be a vector of ubyte) contains flatbuffer data, for which the (which must be a vector of ubyte) contains flatbuffer data, for which the
root type is given by `table_name`. The generated code will then produce root type is given by `table_name`. The generated code will then produce
a convenient accessor for the nested FlatBuffer. a convenient accessor for the nested FlatBuffer.
- `key` (on a field): this field is meant to be used as a key when sorting
a vector of the type of table it sits in. Can be used for in-place
binary search.
## JSON Parsing ## JSON Parsing

View File

@ -288,6 +288,31 @@ public:
return reinterpret_cast<const uint8_t *>(&length_ + 1); return reinterpret_cast<const uint8_t *>(&length_ + 1);
} }
template<typename K> return_type LookupByKey(K key) const {
auto span = size();
uoffset_t start = 0;
// Perform binary search for key.
while (span) {
// Compare against middle element of current span.
auto middle = span / 2;
auto table = Get(start + middle);
auto comp = table->KeyCompareWithValue(key);
if (comp > 0) {
// Greater than. Adjust span and try again.
span = middle;
} else if (comp < 0) {
// Less than. Adjust span and try again.
middle++;
start += middle;
span -= middle;
} else {
// Found element.
return table;
}
}
return nullptr; // Key not found.
}
protected: protected:
// This class is only used to access pre-existing data. Don't ever // This class is only used to access pre-existing data. Don't ever
// try to construct these manually. // try to construct these manually.
@ -304,6 +329,10 @@ template<typename T> static inline size_t VectorLength(const Vector<T> *v) {
struct String : public Vector<char> { struct String : public Vector<char> {
const char *c_str() const { return reinterpret_cast<const char *>(Data()); } const char *c_str() const { return reinterpret_cast<const char *>(Data()); }
bool operator <(const String &o) const {
return strcmp(c_str(), o.c_str()) < 0;
}
}; };
// Simple indirection for buffer allocation, to allow this to be overridden // Simple indirection for buffer allocation, to allow this to be overridden
@ -646,6 +675,40 @@ class FlatBufferBuilder FLATBUFFERS_FINAL_CLASS {
return Offset<Vector<T>>(EndVector(len)); return Offset<Vector<T>>(EndVector(len));
} }
template<typename T> Offset<Vector<T>> CreateVector(const std::vector<T> &v) {
return CreateVector(v.data(), v.size());
}
template<typename T> Offset<Vector<const T *>> CreateVectorOfStructs(
const T *v, size_t len) {
NotNested();
StartVector(len * sizeof(T) / AlignOf<T>(), AlignOf<T>());
PushBytes(reinterpret_cast<const uint8_t *>(v), sizeof(T) * len);
return Offset<Vector<const T *>>(EndVector(len));
}
template<typename T> Offset<Vector<const T *>> CreateVectorOfStructs(
const std::vector<T> &v) {
return CreateVectorOfStructs(v.data(), v.size());
}
template<typename T> Offset<Vector<Offset<T>>> CreateVectorOfSortedTables(
Offset<T> *v, size_t len) {
std::sort(v, v + len,
[this](const Offset<T> &a, const Offset<T> &b) -> bool {
auto table_a = reinterpret_cast<T *>(buf_.data_at(a.o));
auto table_b = reinterpret_cast<T *>(buf_.data_at(b.o));
return table_a->KeyCompareLessThan(table_b);
}
);
return CreateVector(v, len);
}
template<typename T> Offset<Vector<Offset<T>>> CreateVectorOfSortedTables(
std::vector<T> *v) {
return CreateVectorOfSortedTables(v->data(), v->size());
}
// Specialized version for non-copying use cases. Write the data any time // Specialized version for non-copying use cases. Write the data any time
// later to the returned buffer pointer `buf`. // later to the returned buffer pointer `buf`.
uoffset_t CreateUninitializedVector(size_t len, size_t elemsize, uoffset_t CreateUninitializedVector(size_t len, size_t elemsize,
@ -662,23 +725,6 @@ class FlatBufferBuilder FLATBUFFERS_FINAL_CLASS {
reinterpret_cast<uint8_t **>(buf)); reinterpret_cast<uint8_t **>(buf));
} }
template<typename T> Offset<Vector<T>> CreateVector(const std::vector<T> &v) {
return CreateVector(v.data(), v.size());
}
template<typename T> Offset<Vector<const T *>> CreateVectorOfStructs(
const T *v, size_t len) {
NotNested();
StartVector(len * sizeof(T) / AlignOf<T>(), AlignOf<T>());
PushBytes(reinterpret_cast<const uint8_t *>(v), sizeof(T) * len);
return Offset<Vector<const T *>>(EndVector(len));
}
template<typename T> Offset<Vector<const T *>> CreateVectorOfStructs(
const std::vector<T> &v) {
return CreateVectorOfStructs(v.data(), v.size());
}
static const size_t kFileIdentifierLength = 4; static const size_t kFileIdentifierLength = 4;
// Finish serializing a buffer by writing the root offset. // Finish serializing a buffer by writing the root offset.

View File

@ -186,11 +186,14 @@ struct Definition {
}; };
struct FieldDef : public Definition { struct FieldDef : public Definition {
FieldDef() : deprecated(false), required(false), padding(0), used(false) {} FieldDef() : deprecated(false), required(false), key(false), padding(0),
used(false) {}
Value value; Value value;
bool deprecated; bool deprecated; // Field is allowed to be present in old data, but can't be
bool required; // written in new data nor accessed in new code.
bool required; // Field must always be present.
bool key; // Field functions as a key for creating sorted vectors.
size_t padding; // Bytes to always pad after this field. size_t padding; // Bytes to always pad after this field.
bool used; // Used during JSON parsing to check for repeated fields. bool used; // Used during JSON parsing to check for repeated fields.
}; };
@ -200,6 +203,7 @@ struct StructDef : public Definition {
: fixed(false), : fixed(false),
predecl(true), predecl(true),
sortbysize(true), sortbysize(true),
has_key(false),
minalign(1), minalign(1),
bytesize(0) bytesize(0)
{} {}
@ -214,6 +218,7 @@ struct StructDef : public Definition {
bool fixed; // If it's struct, not a table. bool fixed; // If it's struct, not a table.
bool predecl; // If it's used before it was defined. bool predecl; // If it's used before it was defined.
bool sortbysize; // Whether fields come in the declaration or size order. bool sortbysize; // Whether fields come in the declaration or size order.
bool has_key; // It has a key field.
size_t minalign; // What the whole object needs to be aligned to. size_t minalign; // What the whole object needs to be aligned to.
size_t bytesize; // Size if fixed. size_t bytesize; // Size if fixed.
}; };
@ -271,6 +276,7 @@ class Parser {
namespaces_.push_back(new Namespace()); namespaces_.push_back(new Namespace());
known_attributes_.insert("deprecated"); known_attributes_.insert("deprecated");
known_attributes_.insert("required"); known_attributes_.insert("required");
known_attributes_.insert("key");
known_attributes_.insert("id"); known_attributes_.insert("id");
known_attributes_.insert("force_align"); known_attributes_.insert("force_align");
known_attributes_.insert("bit_flags"); known_attributes_.insert("bit_flags");

View File

@ -245,6 +245,24 @@ static void GenTable(const Parser &parser, StructDef &struct_def,
code += "_nested_root() { return flatbuffers::GetRoot<"; code += "_nested_root() { return flatbuffers::GetRoot<";
code += nested_root->name + ">(" + field.name + "()->Data()); }\n"; code += nested_root->name + ">(" + field.name + "()->Data()); }\n";
} }
// Generate a comparison function for this field if it is a key.
if (field.key) {
code += " bool KeyCompareLessThan(const " + struct_def.name;
code += " *o) const { return ";
if (field.value.type.base_type == BASE_TYPE_STRING) code += "*";
code += field.name + "() < ";
if (field.value.type.base_type == BASE_TYPE_STRING) code += "*";
code += "o->" + field.name + "(); }\n";
code += " int KeyCompareWithValue(";
if (field.value.type.base_type == BASE_TYPE_STRING) {
code += "const char *val) const { return strcmp(" + field.name;
code += "()->c_str(), val); }\n";
} else {
code += GenTypeBasic(parser, field.value.type, false);
code += " val) const { return " + field.name + "() < val ? -1 : ";
code += field.name + "() > val; }\n";
}
}
} }
} }
// Generate a verifier function that can check a buffer from an untrusted // Generate a verifier function that can check a buffer from an untrusted

View File

@ -395,6 +395,17 @@ void Parser::ParseField(StructDef &struct_def) {
if (field.required && (struct_def.fixed || if (field.required && (struct_def.fixed ||
IsScalar(field.value.type.base_type))) IsScalar(field.value.type.base_type)))
Error("only non-scalar fields in tables may be 'required'"); Error("only non-scalar fields in tables may be 'required'");
field.key = field.attributes.Lookup("key") != nullptr;
if (field.key) {
if (struct_def.has_key)
Error("only one field may be set as 'key'");
struct_def.has_key = true;
if (!IsScalar(field.value.type.base_type)) {
field.required = true;
if (field.value.type.base_type != BASE_TYPE_STRING)
Error("'key' field must be string or scalar type");
}
}
auto nested = field.attributes.Lookup("nested_flatbuffer"); auto nested = field.attributes.Lookup("nested_flatbuffer");
if (nested) { if (nested) {
if (nested->type.base_type != BASE_TYPE_STRING) if (nested->type.base_type != BASE_TYPE_STRING)

View File

@ -30,7 +30,7 @@ table Monster {
pos:Vec3 (id: 0); pos:Vec3 (id: 0);
hp:short = 100 (id: 2); hp:short = 100 (id: 2);
mana:short = 150 (id: 1); mana:short = 150 (id: 1);
name:string (id: 3, required); name:string (id: 3, required, key);
color:Color = Blue (id: 6); color:Color = Blue (id: 6);
inventory:[ubyte] (id: 5); inventory:[ubyte] (id: 5);
friendly:bool = false (deprecated, priority: 1, id: 4); friendly:bool = false (deprecated, priority: 1, id: 4);

View File

@ -125,6 +125,8 @@ struct Monster FLATBUFFERS_FINAL_CLASS : private flatbuffers::Table {
int16_t mana() const { return GetField<int16_t>(6, 150); } int16_t mana() const { return GetField<int16_t>(6, 150); }
int16_t hp() const { return GetField<int16_t>(8, 100); } int16_t hp() const { return GetField<int16_t>(8, 100); }
const flatbuffers::String *name() const { return GetPointer<const flatbuffers::String *>(10); } const flatbuffers::String *name() const { return GetPointer<const flatbuffers::String *>(10); }
bool KeyCompareLessThan(const Monster *o) const { return *name() < *o->name(); }
int KeyCompareWithValue(const char *val) const { return strcmp(name()->c_str(), val); }
const flatbuffers::Vector<uint8_t> *inventory() const { return GetPointer<const flatbuffers::Vector<uint8_t> *>(14); } const flatbuffers::Vector<uint8_t> *inventory() const { return GetPointer<const flatbuffers::Vector<uint8_t> *>(14); }
Color color() const { return static_cast<Color>(GetField<int8_t>(16, 8)); } Color color() const { return static_cast<Color>(GetField<int8_t>(16, 8)); }
Any test_type() const { return static_cast<Any>(GetField<uint8_t>(18, 0)); } Any test_type() const { return static_cast<Any>(GetField<uint8_t>(18, 0)); }

View File

@ -81,10 +81,19 @@ std::string CreateFlatBufferTest() {
// create monster with very few fields set: // create monster with very few fields set:
// (same functionality as CreateMonster below, but sets fields manually) // (same functionality as CreateMonster below, but sets fields manually)
flatbuffers::Offset<Monster> mlocs[3];
auto fred = builder.CreateString("Fred"); auto fred = builder.CreateString("Fred");
MonsterBuilder mb(builder); auto barney = builder.CreateString("Barney");
mb.add_name(fred); auto wilma = builder.CreateString("Wilma");
auto mloc2 = mb.Finish(); MonsterBuilder mb1(builder);
mb1.add_name(fred);
mlocs[0] = mb1.Finish();
MonsterBuilder mb2(builder);
mb2.add_name(barney);
mlocs[1] = mb2.Finish();
MonsterBuilder mb3(builder);
mb3.add_name(wilma);
mlocs[2] = mb3.Finish();
// Create an array of strings: // Create an array of strings:
flatbuffers::Offset<flatbuffers::String> strings[2]; flatbuffers::Offset<flatbuffers::String> strings[2];
@ -92,12 +101,12 @@ std::string CreateFlatBufferTest() {
strings[1] = builder.CreateString("fred"); strings[1] = builder.CreateString("fred");
auto vecofstrings = builder.CreateVector(strings, 2); auto vecofstrings = builder.CreateVector(strings, 2);
// Create an array of tables: // Create an array of sorted tables, can be used with binary search when read:
auto vecoftables = builder.CreateVector(&mloc2, 1); auto vecoftables = builder.CreateVectorOfSortedTables(mlocs, 3);
// shortcut for creating monster with all fields set: // shortcut for creating monster with all fields set:
auto mloc = CreateMonster(builder, &vec, 150, 80, name, inventory, Color_Blue, auto mloc = CreateMonster(builder, &vec, 150, 80, name, inventory, Color_Blue,
Any_Monster, mloc2.Union(), // Store a union. Any_Monster, mlocs[1].Union(), // Store a union.
testv, vecofstrings, vecoftables, 0); testv, vecofstrings, vecoftables, 0);
FinishMonsterBuffer(builder, mloc); FinishMonsterBuffer(builder, mloc);
@ -163,9 +172,15 @@ void AccessFlatBufferTest(const std::string &flatbuf) {
// Example of accessing a vector of tables: // Example of accessing a vector of tables:
auto vecoftables = monster->testarrayoftables(); auto vecoftables = monster->testarrayoftables();
TEST_EQ(vecoftables->Length(), 1U); TEST_EQ(vecoftables->Length(), 3U);
for (auto it = vecoftables->begin(); it != vecoftables->end(); ++it) for (auto it = vecoftables->begin(); it != vecoftables->end(); ++it)
TEST_EQ(strcmp(it->name()->c_str(), "Fred"), 0); TEST_EQ(strlen(it->name()->c_str()) >= 4, true);
TEST_EQ(strcmp(vecoftables->Get(0)->name()->c_str(), "Barney"), 0);
TEST_EQ(strcmp(vecoftables->Get(1)->name()->c_str(), "Fred"), 0);
TEST_EQ(strcmp(vecoftables->Get(2)->name()->c_str(), "Wilma"), 0);
TEST_NOTNULL(vecoftables->LookupByKey("Barney"));
TEST_NOTNULL(vecoftables->LookupByKey("Fred"));
TEST_NOTNULL(vecoftables->LookupByKey("Wilma"));
// Since Flatbuffers uses explicit mechanisms to override the default // Since Flatbuffers uses explicit mechanisms to override the default
// compiler alignment, double check that the compiler indeed obeys them: // compiler alignment, double check that the compiler indeed obeys them: