2011-05-03 23:40:50 +00:00
<!DOCTYPE html>
<!--
Google I/O 2011 HTML slides template
Authors: Luke Mahé (code)
Marcin Wichary (code and design)
Dominic Mazzoni (browser compatibility)
Charles Chen (ChromeVox support)
URL: http://code.google.com/p/io-2011-slides/
-->
< html >
< head >
< title > Camlistore< / title >
< meta charset = 'utf-8' / >
< script src = 'slides.js' > < / script >
< / head >
< style >
/* Your individual styles here, or just use inline styles if that’ s
what you want. */
2011-05-04 23:11:51 +00:00
.smaller {
font-size: 80%;
}
2011-05-03 23:40:50 +00:00
2011-05-04 23:11:51 +00:00
ul li ul {
margin-top: 1.5em;
margin-bottom: 1em;
}
ul li ul li {
margin-top: 1em;
font-size: 80%;
}
ul li ul.dense li {
margin-top: 0em;
margin-bottom: 0em;
font-size: 80%;
}
h1.center {
text-align: center;
font-style: italic;
}
2011-05-03 23:40:50 +00:00
< / style >
< body style = 'display: none' >
< section class = 'slides layout-regular' >
<!-- Your slides (<article>s) go here. Delete or comment out the
slides below. -->
< article >
< h1 >
Camlistore
< / h1 >
< p >
Brad Fitzpatrick
< br >
2011-05-07
< / p >
< / article >
2011-05-04 23:11:51 +00:00
< article >
< h3 >
Who am I?
< / h3 >
< ul class = 'nobuild' >
< li >
Brad Fitzpatrick < brad@danga.com>
< / li >
< li > Perl Hacker since 1994< / li >
< li > Projects:
< table > < tr valign = 'top' >
< th > Danga / 6A (Perl)< / th >
< th > Google< / th >
< / tr >
< td class = 'nobuild' >
< div > LiveJournal< / div >
< div > memcached< / div >
< div > Perlbal< / div >
< div > MogileFS< / div >
< div class = 'blue' > OpenID< / div >
< / td >
< td class = 'nobuild' >
< div > < nobr > Social Graph API (< span class = 'blue' > XFN / FOAF< / a > )< / nobr > < / div >
< div class = 'blue' > WebFinger< / div >
< div class = 'blue' > PubSubHubbub< / div >
< div > Android< / div >
< div > Go< / div >
< / td >
< / table >
< div style = 'font-size: 70%; margin-top: 1em' > * < span class = 'blue' > decentralized social< / span > < / div >
< / li >
< / ul >
< / article >
< article >
< h3 >
But why am I in Brazil?
< / h3 >
< ul class = 'nobuild' >
< li >
< i > "Hey, want to come speak at a Perl conference in Brazil?"< / i >
< / li >
< li > "Yes, totally, but... I don't write much Perl these days. :-("< / li > <!-- " -->
< li style = "margin-top: 2em" > < i > "You could speak on memcached."< / i > < / li >
< li > "But that's an old topic, no?"< / li >
< li style = "margin-top: 2em" > < i > "You have any new project you're excited about?"< / i > < / li >
< / li >
< / ul >
< / article >
< article >
< h1 align = 'center' >
Camlistore!
< / h1 >
< / article >
< article >
< h3 >
Camlistore
< / h1 >
< ul >
< li > New open source project< / li >
< li > Almost a year old< / li >
< li > Still in development< / li >
< li > Starting to be useful :-)< / li >
< li > Hard to easily describe...< / li >
< / article >
< article >
< q >
Camlistore is a way to store, sync, share, model and back up content
< / q >
< div class = 'author' >
camlistore.org
< / div >
< / article >
< article >
< h3 >
Motivation
< / h3 >
< ul >
< li > I've written too many Content Management Systems
< ul >
< li > blogs, comments, photos, emails, backups, scanned paperwork, ...< / li >
< li > is a scanned photo a scan, a photo, or a blog post? who cares.< / li >
< li > write < b > one CMS to rule them all< / b > < / li >
< li > ... or at least a good framework for higher-level CMSes< / li >
< / ul >
< / li >
< / ul >
< / article >
< article >
< h3 >
Motivation (cont)
< / h3 >
< ul >
2011-05-05 22:06:49 +00:00
< li > I still want to help solve the Decentralized Social Network Problem
2011-05-04 23:11:51 +00:00
< ul >
< li > protocols, not companies< / li >
< li > gmail, hotmail: hosted versions of SMTP, IMAP< / li >
< li > ... but I can run my own SMTP/IMAP server if I want.< / li >
< li > ... or change my SMTP/IMAP provider< / li >
< / ul >
< / li >
< / ul >
< / article >
< article >
< h3 >
Motivation (cont)
< / h3 >
< ul >
< li > I wanted something conceptually simple.< / li >
< li > HTTP interfaces, not language-specific< / li >
< li > I use lots of machines; don't want to think about sync or conflicts.< / li >
< li > Data archaeology: should be easy and obvious to
reconstruct in 20 or 100 years< / li >
< / ul >
< / article >
< article >
< h3 >
2011-05-04 23:43:50 +00:00
The Product
2011-05-04 23:11:51 +00:00
< / h3 >
< ul >
< li > one private dumping ground to store anything< / li >
2011-05-05 22:06:49 +00:00
< li > backups, filesystems, objects, photos, likes, bookmarks, shares, my website, ...< / li >
2011-05-04 23:11:51 +00:00
< li > live backup my phone< / li >
< li > live replicate / sync my dumping group between my house & laptop & Amazon & Google< / li >
< li > web UI (ala gmail, docs.google.com, etc) or FUSE filesystem< / li >
2011-05-04 23:43:50 +00:00
< li > Easy for end-users; powerful for dorks< / li >
2011-05-04 23:11:51 +00:00
< / ul >
< / article >
2011-05-05 22:06:49 +00:00
< article >
< h3 >
Security Model
< / h3 >
< ul >
< li > < i > < b > your< / b > < / i > private repo, for life< / li >
< li > everything private by default< / li >
< li > grant access to specific objects/trees with friends or the world< / li >
< li > web UI or CLI tools let you share< / li >
< / ul >
< / article >
2011-05-04 23:11:51 +00:00
< article >
< h1 class = 'center' >
So what's with the silly name?
< / h1 >
< / article >
< article >
< h3 >
Camlistore
< / h3 >
< ul >
< li > Content-< / li >
< li > Addressable< / li >
< li > Multi-< / li >
< li > Layer-< / li >
< li > Indexed< / li >
< li > Storage< / li >
< / ul >
< / article >
2011-05-03 23:40:50 +00:00
2011-05-04 23:11:51 +00:00
< article >
< h3 >
Content-Addressable
< / h3 >
< ul >
< li > At the core, everything is stored & addressed by its digest (e.g. SHA1, MD5, etc)< / li >
< li > e.g. < tt class = 'smaller' > "sha1-0beec7b5ea3f0fdbc95d0dd47f3c5bc275da8a33"< / tt > for the blob < tt class = 'smaller' > "foo"< / tt > < / li >
< li > Great properties:
< ul >
< li > no versions of content: change it changes the new digest too< / li >
< li > no versions: no sync conflicts< / li >
< li > no versions: perfect caching (have it or don't)< / li >
< / ul >
< / li >
< / ul >
< / article >
2011-05-03 23:40:50 +00:00
2011-05-04 23:11:51 +00:00
< article >
< h3 >
Multi-Layer, Indexed
< / h3 >
< ul >
< li > Unix philosophy: small pieces with well-defined interfaces that can be chained or composed< / li >
< li > Camlistore pieces include:
< ul class = 'dense' >
< li style = 'margin-top: 1em' > Blob storage: memory, disk, S3, Google, MySQL index, etc< / li >
< li > Schema< / li >
< li > Signing< / li >
< li > Replication< / li >
< li > Indexing: (e.g. replicate from disk to MySQL index)< / li >
< li > Search< / li >
< li > HTML UI< / li >
< / ul >
< / li >
< / ul >
< / article >
2011-05-05 00:38:15 +00:00
< article >
< h3 > Logically< / h3 >
< img src = 'arch.png' width = '100%' / >
< / article >
< article >
< h3 > In reality< / h3 >
< ul >
< li > End-users: use a hosted version< / li >
< li > Dorks: single server binary with all the logical pieces< / li >
< / ul >
< / article >
2011-05-05 22:06:49 +00:00
< article >
< h2 >
From the bottom up...
< / h2 >
< / article >
< article >
< h2 >
Blob Server
< / h2 >
< / article >
< article >
< h3 > Blob Server: how dumb it is< / h3 >
< ul >
< li > "Blob" == zero or more bytes. < i class = 'red' > no< / i > meta-data< / li >
< li > private operations, to owner of data only:< / li >
< ul >
< li class = 'green' > get(blobref) → blob< / li >
< li class = 'green' > stat(blobref+) → [(blobref, size), ...]< / li >
< li class = 'green' > put(blobref, blob)< / li >
< li class = 'green' > enumerate(..) → [(blobref, size)...] (sorted by blobref)< / li >
< / ul >
< li > no public (non-owner) access< / li >
< li > HTTP interface: < tt > GET /camli/sha1-xxxxxxx HTTP/1.1< / tt > < / li >
< li > < span class = 'green' > delete(blobref)< / span > is disabled by default, privileged op for GC or replication queues only< / li >
< / ul >
< / article >
< article >
< h3 > Blob Server: seriously, no metadata< / h3 >
< ul >
< li > no filenames< / li >
< li > no "mime types"< / li >
< li > no "{create,mod,access} time"< / li >
< li > size is implicit< / li >
< li > blob: just some bytes< / li >
< li > metadata? layers above.< / li >
< / ul >
< / article >
< article >
< h1 class = 'center' >
Uh, what can you do with that?
< / h1 >
< / article >
< article >
< h3 > Uh, what can you do with that?< / h3 >
< ul >
< li > with just a blob server?< / li >
< li > not much< / li >
< li > but let's start with an easy example...< / li >
< / ul >
< / article >
< article >
< h1 class = 'center' >
Filesystem Backups
< / h1 >
< / article >
< article >
< h3 > Filesystem Backups< / h3 >
< ul >
< li > previous project: brackup
< ul >
< li > good: Perl, slide/dice/encrypt S3 backup, content-addressed, good iterative backups< / li >
< li > bad: large (several MB) "backup manifest" text files
< / ul >
< / li >
< li > fossil/venti, git, etc: directories content-addressed by content of their children, hash trees, etc< / li >
< li > git: "tree objects", "commmit objects", etc< / li >
< li > Camlistore: "schema blobs"< / li >
< / ul >
< / article >
< article >
< h3 > Schema: how to model your content< / h3 >
< ul >
< li > Camlistore defines < i > one possible< / i > schema< / li >
< li > but blobserver doesn't know about it all< / li >
< li > tools generate schema,< / li >
< li > indexer + search understand the schema.< / li >
< / ul >
< / article >
< article >
< h3 > Schema Blobs< / h3 >
< ul >
< li > so if all blobs are just dumb blobs of bytes with no metadata,< / li >
< li > how do you store metadata?< / li >
< li > as blobs themselves!< / li >
< / ul >
< / article >
< article >
< h3 > Minimal Schema Blob< / h3 >
< section >
< pre > {
"camliVersion": 1,
"camliType": "whatever"
}< / pre >
< / section >
< p > Whitespace doesn't matter. Just must be valid JSON in its
entirety. Use whatever JSON libraries you've got.< / p >
< p > That one is named< br / > < tt class = 'smaller' > sha1-19e851fe3eb3d1f3d9d1cefe9f92c6f3c7d754f6< / tt > < / p >
< p > or perhaps: < tt class = 'smaller' > sha512-2c6746aba012337aaf113fd63c24d994a0703d33eb5d6ed58859e45dc4e02dcf< br / > dae5c4d46c5c757fb85d5aff342245fe4edb780c028a6f3c994c1295236c931e< / tt > < / p >
< / article >
2011-05-03 23:40:50 +00:00
<!-- END -->
< article >
2011-05-05 22:06:49 +00:00
< h3 > Schema blob; type "file"< / h3 >
< section > < pre > {"camliVersion": 1,
< span style = 'background: #fff' > "camliType": "file",< / span >
"fileName": "foo.dat",
"unixPermission": "0644",
...,
"size": 6000133,
"contentParts": [
{"blobRef": "sha1-...dead", "size": 111},
{"blobRef": "sha1-...beef", "size": 5000000, "offset": 492 },
{"size": 1000000},
{"blobRef": "digalg-blobref", "size": 22},
]
}< / pre > < / section >
< / article >
< article >
< h3 > Schema blob; type "directory"< / h3 >
< section > < pre > {"camliVersion": 1,
< span style = 'background: #fff' > "camliType": "directory",< / span >
"fileName": "foodir",
"unixPermission": "0755",
...,
"entries": < span style = 'background: #fff' > "sha1-c3764bc2138338d5e2936def18ff8cc9cda38455"< / span >
}< / pre > < / section >
< / article >
< article >
< h3 > Schema blob; type "static-set"< / h3 >
< section > < pre > {"camliVersion": 1,
< span style = 'background: #fff' > "camliType": "static-set",< / span >
"members": [
"sha1-xxxxxxxxxxxx",
"sha1-xxxxxxxxxxxx",
"sha1-xxxxxxxxxxxx",
"sha1-xxxxxxxxxxxx",
"sha1-xxxxxxxxxxxx",
"sha1-xxxxxxxxxxxx",
]
}< / pre > < / section >
2011-05-03 23:40:50 +00:00
< / article >
< article >
< h3 >
2011-05-05 22:06:49 +00:00
Backup a directory...
2011-05-03 23:40:50 +00:00
< / h3 >
2011-05-05 22:06:49 +00:00
< section > < pre > $ camput --file $HOME
sha1-8659a52f726588dc44d38dfb22d84a4da2902fed< / pre > < / section >
< p > (like git/hg/fossil, that identifier represents everything down.)< / p >
< p > Iterative backups are cheap, easy identifier to share, etc< / p >
< p > But how will you remember that identifier? (later)< / p >
2011-05-03 23:40:50 +00:00
< / article >
2011-05-05 22:06:49 +00:00
< article >
2011-05-03 23:40:50 +00:00
< h3 >
2011-05-05 22:06:49 +00:00
But what about mutable data?
2011-05-03 23:40:50 +00:00
< / h3 >
2011-05-05 22:06:49 +00:00
< ul >
< li > immutable data is easy to represent & reference< / li >
< ul >
< li > < tt class = 'smaller' > sha1-8659a52f726588dc44d38dfb22d84a4da2902fed< / tt > is an immutable snapshot< / li >
< / ul >
< li > how to represent mutable data in an immutable, content-addressed world?< / li >
< li > how to share a reference to a mutable object when changing an object mutates its name?< / li >
< / ul >
< / article >
< article >
< h1 class = 'center' >
Objects & "Permanodes"
< / h1 >
< / article >
< article >
< h3 >
Terminology
< / h3 >
< ul >
< li > < span class = 'red' > blob< / span > : just dumb, immutable series of bytes< / li >
< li > < span class = 'red' > schema blob< / span > : a blob that's a valid JSON object w/ camliVersion & camliType< / li >
< li > < span class = 'red' > signed schema blob< / span > aka "< span class = 'red' > claim< / span > ": a schema blob with an embedded OpenPGP signature< / li >
< li > < span class = 'red' > object< / span > : something mutable. represented as an anchor "< span class = 'blue' > permanode< / span > " + a set of mutations (< span class = 'blue' > claims< / span > )< / li >
< li > < span class = 'red' > permanode< / span > : a stable reference. an anchor. just a < span class = 'blue' > signed schema blob< / span > , but of almost no content...< / li >
< / ul >
< / article >
< article >
< h3 >
Permanode
< / h3 >
< section > < pre > < span style = 'font-weight: bold' class = 'blue' > $ camput --permanode< / span >
sha1-ea799271abfbf85d8e22e4577f15f704c8349026
< span style = 'font-weight: bold' class = "blue" > $ camget sha1-ea799271abfbf85d8e22e4577f15f704c8349026< / span >
< span style = "background: #ff7" > {"camliVersion": 1,
"camliSigner": "sha1-c4da9d771661563a27704b91b67989e7ea1e50b8",
< span style = 'font-weight: bold' > "camliType": "permanode"< / span > ,
"random": "oj)r}$Wa/[J|XQThNdhE"< / span >
,"camliSig":"iQEcBAABAgAGBQJNRxceAAoJEGjzeDN/6vt8ihIH/Aov7FRIq4dODAPWGDwqL
1X9Ko2ZtSSO1lwHxCQVdCMquDtAdI3387fDlEG/ALoT/LhmtXQgYTt8QqDxVdu
EK1or6/jqo3RMQ8tTgZ+rW2cj9f3Q/dg7el0Ngoq03hyYXdo3whxCH2x0jajSt4RCc
gdXN6XmLlOgD/LVQEJ303Du1OhCvKX1A40BIdwe1zxBc5zkLmoa8rClAlHdqwo
gxYFY4cwFm+jJM5YhSPemNrDe8W7KT6r0oA7SVfOan1NbIQUel65xwIZBD0ah
CXBx6WXvfId6AdiahnbZiBup1fWSzxeeW7Y2/RQwv5IZ8UgfBqRHvnxcbNmScrzl
p3V3ZoY"}< / pre > < / section >
< / article >
< article >
< h3 >
Backup a directory...
< / h3 >
< section > < pre > < span style = 'font-weight: bold' > $ camput --file $HOME< / span >
sha1-8659a52f726588dc44d38dfb22d84a4da2902fed
< span style = 'font-weight: bold' > $ camput --permanode --file $HOME< / span >
sha1-ea799271abfbf85d8e22e4577f15f704c8349026
< span style = 'font-weight: bold' > $ camput --permanode --name="Brad's home directory" --file $HOME< / span >
sha1-ea799271abfbf85d8e22e4577f15f704c8349026< / pre > < / section >
< ul >
< li > all the file data blobs, file/dir schema blobs,< / li >
< li > a new permanode, owned by you< / li >
< li > a mutation: permanode's content attribute == directory root< / li >
< li > a mutation: permanode's name attribute == "Brad's home directory"< / li >
< / ul >
< / article >
< article class = 'fill' >
< p > < img src = "fsbackup.png" height = "100%" / > < / p >
< / article >
< article >
< h1 class = 'center' >
Modeling non-filesystem objects
< / h1 >
< / article >
< article >
< h3 > Example: a photo gallery< / h3 >
< ul >
< li > Photos are objects< / li >
< li > Galleries (sets) are objects< / li >
< li > Photos are members of galleries< / li >
< li > Photos & galleries have attributes (single-valued: "title", multi-valued: "tag")< / li >
< li > Photos might be updated over time:
< ul >
< li > EXIF GPS updated, cropping, white balance< / li >
< li > don't want to break links!< / li >
< / ul >
< / li >
< / ul >
< / article >
< article class = 'fill' >
< p > < img src = "blobjects.png" width = "100%" / > < / p >
< / article >
< article >
< h1 class = 'center' >
How to make sense of that?
< / h1 >
2011-05-03 23:40:50 +00:00
< / article >
2011-05-05 22:50:16 +00:00
< article >
< h1 class = 'center' >
Indexing & Search
< / h1 >
< / article >
2011-05-05 22:06:49 +00:00
2011-05-03 23:40:50 +00:00
< article >
< h3 >
2011-05-05 22:50:16 +00:00
Indexing: summary
2011-05-03 23:40:50 +00:00
< / h3 >
2011-05-05 22:50:16 +00:00
< p style = 'margin-top: 2em' > For each blob, build an index of:
2011-05-03 23:40:50 +00:00
< ul >
2011-05-05 22:50:16 +00:00
< li > directed graph of inter-blob references< / li >
< li > (permanode, time) => resolved attributes< / li >
< li > (permanode, time) => set memberships< / li >
< li > etc...< / li >
2011-05-03 23:40:50 +00:00
< / ul >
< / article >
< article >
< h3 >
2011-05-05 22:50:16 +00:00
Indexing & Replication
2011-05-03 23:40:50 +00:00
< / h3 >
2011-05-05 22:50:16 +00:00
< ul >
< li > indexing is real-time, no polling< / li >
< li > MySQL index speaks the blob server protocol< / li >
< li > just replicated to MySQL (etc) just like Amazon S3 (etc)< / li >
2011-05-03 23:40:50 +00:00
< / ul >
2011-05-05 22:50:16 +00:00
< center > < img src = 'repl.png' / > < / center >
2011-05-03 23:40:50 +00:00
< / article >
2011-05-05 22:50:16 +00:00
< article >
2011-05-03 23:40:50 +00:00
< h3 >
2011-05-05 22:50:16 +00:00
Search
2011-05-03 23:40:50 +00:00
< / h3 >
< ul >
2011-05-05 22:50:16 +00:00
< li > Permanodes created by $who, sorted by date desc, type "photo", tagged "funny"< / li >
< li > My recent backups with attribute "hostname" == "camlistore.org",< / l >
< li > All friends' galleries in which this photo appears,< / li >
< li > etc...< / li >
2011-05-03 23:40:50 +00:00
< / ul >
2011-05-05 22:50:16 +00:00
< p > ...similar to your email, or docs.google.com. "My stuff" or "My bookmarks".< / p >
2011-05-03 23:40:50 +00:00
< / article >
< article >
< h3 >
2011-05-05 22:50:16 +00:00
Privacy Model
2011-05-03 23:40:50 +00:00
< / h3 >
2011-05-05 22:50:16 +00:00
< ul >
< li > all your blobs & objects & searches are private< / li >
< li > nothing is public by default< / li >
< / ul >
2011-05-03 23:40:50 +00:00
< / article >
2011-05-05 22:50:16 +00:00
< article >
< h1 >
What if you want to share with friends, or globally publish something?
< / h1 >
< / article >
< article >
2011-05-03 23:40:50 +00:00
< h3 >
2011-05-05 22:50:16 +00:00
Sharing & Share Blobs
2011-05-03 23:40:50 +00:00
< / h3 >
< ul >
2011-05-05 22:50:16 +00:00
< li > the act of sharing involves creating a new < span class = 'red' > share claim< / span > , just another blob, signed.< / li >
2011-05-03 23:40:50 +00:00
< / ul >
< / article >
< article class = 'fill' >
< h3 >
Image filling the slide (with optional header)
< / h3 >
< p >
< img src = 'images/example-cat.jpg' >
< / p >
< div class = 'source white' >
2011-05-05 22:50:16 +00:00
Cat
2011-05-03 23:40:50 +00:00
< / div >
< / article >
< article class = 'nobackground' >
< h3 >
A slide with an embed + title
< / h3 >
< iframe src = 'http://www.google.com/doodle4google/history.html' > < / iframe >
< / article >
< article class = 'nobackground' >
< iframe src = 'http://www.google.com/doodle4google/history.html' > < / iframe >
< / article >
< article class = 'fill' >
< h3 >
Full-slide embed with (optional) slide title on top
< / h3 >
< iframe src = 'http://www.google.com/doodle4google/history.html' > < / iframe >
< / article >
< article >
< h3 >
Thank you!
< / h3 >
< ul >
2011-05-05 22:50:16 +00:00
< li > Brad Fitzpatrick, brad@danga.com< / li >
< li > camlistore.org< / li >
2011-05-03 23:40:50 +00:00
< / ul >
< / article >
< / section >
< / body >
< / html >