perkeep/doc/schema
mpl db2604f981 pkg/schema: break static-sets in subsets for large directories
The current maximum size for a schema blob is 1MB. For a large enough
directory (~20000 children), the resulting static-set JSON schema is
over that maximum size.

We could increase that maximum, but we would eventually hit the maximum
blob size (16MB), which would only allow for ~300000 children. Even if
that is an uncommon size, it is technically possible to have such large
directories, so I don't think it would be reasonable to restrict users
to such a limit. So it does not seems like enough of a solution.

The solution proposed in this CL is to spread the children of a
directory (when they are more numerous than a given maximum, here set to
10000) onto several static-sets, recursively if needed. These
static-sets (subsets of the whole lot of children) are stored in the new
"mergeSets" field of their parent static-set schema. The actual fileRefs
or dirRefs, are still stored in the "members" field of the subset they were
spread in. The "mergeSets" and "members" field of a static-set are therefore
mutually exclusive.

Fixes #924

Change-Id: Ibe47b50795d5288fe904d3cce0cc7f780d313408
2018-02-09 01:36:38 +01:00
..
README.md website: first pass of s/Camlistore/Perkeep/ on contents 2017-12-18 16:46:08 +01:00
TODO website: keep doc/schema flat 2016-04-26 17:34:47 -07:00
attributes.md website: keep doc/schema flat 2016-04-26 17:34:47 -07:00
blob-magic.md website: first pass of s/Camlistore/Perkeep/ on contents 2017-12-18 16:46:08 +01:00
bytes.md website: convert schema docs to markdown 2016-04-26 16:32:38 -07:00
common.md website: keep doc/schema flat 2016-04-26 17:34:47 -07:00
delete.md website: keep doc/schema flat 2016-04-26 17:34:47 -07:00
directory.md website: keep doc/schema flat 2016-04-26 17:34:47 -07:00
fifo.md website: keep doc/schema flat 2016-04-26 17:34:47 -07:00
file.md website: keep doc/schema flat 2016-04-26 17:34:47 -07:00
inode.md website: keep doc/schema flat 2016-04-26 17:34:47 -07:00
keep.md website: keep doc/schema flat 2016-04-26 17:34:47 -07:00
permanode.md doc: warn that multi claims are not implemented 2018-01-24 01:34:52 +01:00
share.md website: first pass of s/Camlistore/Perkeep/ on contents 2017-12-18 16:46:08 +01:00
socket.md website: keep doc/schema flat 2016-04-26 17:34:47 -07:00
static-set.md pkg/schema: break static-sets in subsets for large directories 2018-02-09 01:36:38 +01:00
symlink.md website: keep doc/schema flat 2016-04-26 17:34:47 -07:00

README.md

Schema

At the lowest layer, Perkeep doesn't care what you put in it (everything is just dumb bytes) and you're free to adopt your own data model. However, the upper layers of Perkeep standardize on a common schema to represent various classes of data.

Schema blobs are JSON objects with at least two attributes always set: camliVersion, which is always 1, and camliType, which tells you the type of metadata the blob contains.

Here are some of the data types we've started to formalize a JSON schema for:

  • Bytes

  • Common Attributes

  • Delete Claim

  • Directory

  • FIFO

  • Files: traditional filesystems. Files, directories, inodes, symlinks, etc. Uses the file, directory, symlink, and inode camliTypes.

  • Inode

  • "Keep" claims: Normally, any object that isn't referenced by a permanode could theoretically be garbage collected. Keep claims prevent that from happening. Indicated by the keep camliType.

  • Permanodes: the immutable root "anchor" of mutable Perkeep objects (see terminology). Users create signed claim schema blobs which reference a permanode and define some mutation for the permanode.

    Permanodes are used to model many kinds of mutable data, including mutable files, dynamic directories, and more.

    Uses the permanode and claim camliTypes.

  • Permanode Attributes

  • Share Claim

  • Socket

  • Static Sets: Immutable lists of other blobs by their refs. Indicated by the static-set camliType.

  • Symlink