perkeep/talks/2011-05-07-Camlistore-Sao-P.../index.html

692 lines
20 KiB
HTML
Raw Normal View History

2011-05-03 23:40:50 +00:00
<!DOCTYPE html>
<!--
2011-05-06 12:36:59 +00:00
TODO;
* sharing,
* sharing an object vs. blob vs. search
* replication queue/details,
* tree sync vs. graph sync,
* project status, impl details, google, go
- python app engine version
- android client
- android gpg
- perl test suite
2011-05-03 23:40:50 +00:00
2011-05-06 12:36:59 +00:00
Google I/O 2011 HTML slides template
2011-05-03 23:40:50 +00:00
URL: http://code.google.com/p/io-2011-slides/
-->
<html>
<head>
<title>Camlistore</title>
<meta charset='utf-8' />
<script src='slides.js'></script>
</head>
<style>
/* Your individual styles here, or just use inline styles if thats
what you want. */
2011-05-04 23:11:51 +00:00
.smaller {
font-size: 80%;
}
2011-05-03 23:40:50 +00:00
2011-05-04 23:11:51 +00:00
ul li ul {
margin-top: 1.5em;
margin-bottom: 1em;
}
ul li ul li {
margin-top: 1em;
font-size: 80%;
}
ul li ul.dense li {
margin-top: 0em;
margin-bottom: 0em;
font-size: 80%;
}
h1.center {
text-align: center;
font-style: italic;
}
2011-05-03 23:40:50 +00:00
</style>
<body style='display: none'>
<section class='slides layout-regular'>
<!-- Your slides (<article>s) go here. Delete or comment out the
slides below. -->
<article>
<h1>
Camlistore
</h1>
<p>
Brad Fitzpatrick
<br>
2011-05-07
</p>
</article>
2011-05-04 23:11:51 +00:00
<article>
<h3>
Who am I?
</h3>
<ul class='nobuild'>
<li>
Brad Fitzpatrick &lt;brad@danga.com&gt;
</li>
<li>Perl Hacker since 1994</li>
<li>Projects:
<table><tr valign='top'>
<th>Danga / 6A (Perl)</th>
<th>Google</th>
</tr>
<td class='nobuild'>
<div>LiveJournal</div>
<div>memcached</div>
<div>Perlbal</div>
<div>MogileFS</div>
<div class='blue'>OpenID</div>
</td>
<td class='nobuild'>
<div><nobr>Social Graph API (<span class='blue'>XFN / FOAF</a>)</nobr></div>
<div class='blue'>WebFinger</div>
<div class='blue'>PubSubHubbub</div>
<div>Android</div>
<div>Go</div>
</td>
</table>
<div style='font-size: 70%; margin-top: 1em'>* <span class='blue'>decentralized social</span></div>
</li>
</ul>
</article>
<article>
<h3>
But why am I in Brazil?
</h3>
<ul class='nobuild'>
<li>
<i>"Hey, want to come speak at a Perl conference in Brazil?"</i>
</li>
<li>"Yes, totally, but... I don't write much Perl these days. :-("</li> <!-- " -->
<li style="margin-top: 2em"><i>"You could speak on memcached."</i></li>
<li>"But that's an old topic, no?"</li>
<li style="margin-top: 2em"><i>"You have any new project you're excited about?"</i></li>
</li>
</ul>
</article>
<article>
<h1 align='center'>
Camlistore!
</h1>
</article>
<article>
<h3>
Camlistore
</h1>
<ul>
<li>New open source project</li>
<li>Almost a year old</li>
<li>Still in development</li>
<li>Starting to be useful :-)</li>
<li>Hard to easily describe...</li>
</article>
<article>
<q>
Camlistore is a way to store, sync, share, model and back up content
</q>
<div class='author'>
camlistore.org
</div>
</article>
<article>
<h3>
Motivation
</h3>
<ul>
<li>I've written too many Content Management Systems
<ul>
<li>blogs, comments, photos, emails, backups, scanned paperwork, ...</li>
<li>is a scanned photo a scan, a photo, or a blog post? who cares.</li>
<li>write <b>one CMS to rule them all</b></li>
<li>... or at least a good framework for higher-level CMSes</li>
</ul>
</li>
</ul>
</article>
<article>
<h3>
Motivation (cont)
</h3>
<ul>
2011-05-05 22:06:49 +00:00
<li>I still want to help solve the Decentralized Social Network Problem
2011-05-04 23:11:51 +00:00
<ul>
<li>protocols, not companies</li>
<li>gmail, hotmail: hosted versions of SMTP, IMAP</li>
<li>... but I can run my own SMTP/IMAP server if I want.</li>
<li>... or change my SMTP/IMAP provider</li>
</ul>
</li>
</ul>
</article>
<article>
<h3>
Motivation (cont)
</h3>
<ul>
<li>I wanted something conceptually simple.</li>
<li>HTTP interfaces, not language-specific</li>
<li>I use lots of machines; don't want to think about sync or conflicts.</li>
<li>Data archaeology: should be easy and obvious to
reconstruct in 20 or 100 years</li>
</ul>
</article>
<article>
<h3>
2011-05-04 23:43:50 +00:00
The Product
2011-05-04 23:11:51 +00:00
</h3>
<ul>
<li>one private dumping ground to store anything</li>
2011-05-05 22:06:49 +00:00
<li>backups, filesystems, objects, photos, likes, bookmarks, shares, my website, ...</li>
2011-05-04 23:11:51 +00:00
<li>live backup my phone</li>
<li>live replicate / sync my dumping group between my house & laptop & Amazon & Google</li>
<li>web UI (ala gmail, docs.google.com, etc) or FUSE filesystem</li>
2011-05-04 23:43:50 +00:00
<li>Easy for end-users; powerful for dorks</li>
2011-05-04 23:11:51 +00:00
</ul>
</article>
2011-05-05 22:06:49 +00:00
<article>
<h3>
Security Model
</h3>
<ul>
<li><i><b>your</b></i> private repo, for life</li>
<li>everything private by default</li>
<li>grant access to specific objects/trees with friends or the world</li>
<li>web UI or CLI tools let you share</li>
</ul>
</article>
2011-05-04 23:11:51 +00:00
<article>
<h1 class='center'>
So what's with the silly name?
</h1>
</article>
<article>
<h3>
Camlistore
</h3>
<ul>
<li>Content-</li>
<li>Addressable</li>
<li>Multi-</li>
<li>Layer-</li>
<li>Indexed</li>
<li>Storage</li>
</ul>
</article>
2011-05-03 23:40:50 +00:00
2011-05-04 23:11:51 +00:00
<article>
<h3>
Content-Addressable
</h3>
<ul>
<li>At the core, everything is stored &amp; addressed by its digest (e.g. SHA1, MD5, etc)</li>
<li>e.g. <tt class='smaller'>"sha1-0beec7b5ea3f0fdbc95d0dd47f3c5bc275da8a33"</tt> for the blob <tt class='smaller'>"foo"</tt></li>
<li>Great properties:
<ul>
<li>no versions of content: change it changes the new digest too</li>
<li>no versions: no sync conflicts</li>
<li>no versions: perfect caching (have it or don't)</li>
</ul>
</li>
</ul>
</article>
2011-05-03 23:40:50 +00:00
2011-05-04 23:11:51 +00:00
<article>
<h3>
Multi-Layer, Indexed
</h3>
<ul>
<li>Unix philosophy: small pieces with well-defined interfaces that can be chained or composed</li>
<li>Camlistore pieces include:
<ul class='dense'>
<li style='margin-top: 1em'>Blob storage: memory, disk, S3, Google, MySQL index, etc</li>
<li>Schema</li>
<li>Signing</li>
<li>Replication</li>
<li>Indexing: (e.g. replicate from disk to MySQL index)</li>
<li>Search</li>
<li>HTML UI</li>
</ul>
</li>
</ul>
</article>
2011-05-05 00:38:15 +00:00
<article>
<h3>Logically</h3>
<img src='arch.png' width='100%'/>
</article>
<article>
<h3>In reality</h3>
<ul>
<li>End-users: use a hosted version</li>
<li>Dorks: single server binary with all the logical pieces</li>
</ul>
</article>
2011-05-05 22:06:49 +00:00
<article>
<h2>
From the bottom up...
</h2>
</article>
<article>
<h2>
Blob Server
</h2>
</article>
<article>
<h3>Blob Server: how dumb it is</h3>
<ul>
<li>"Blob" == zero or more bytes. <i class='red'>no</i> meta-data</li>
<li>private operations, to owner of data only:</li>
<ul>
<li class='green'>get(blobref) → blob</li>
<li class='green'>stat(blobref+) → [(blobref, size), ...]</li>
<li class='green'>put(blobref, blob)</li>
<li class='green'>enumerate(..) → [(blobref, size)...] (sorted by blobref)</li>
</ul>
<li>no public (non-owner) access</li>
<li>HTTP interface: <tt>GET /camli/sha1-xxxxxxx HTTP/1.1</tt></li>
<li><span class='green'>delete(blobref)</span> is disabled by default, privileged op for GC or replication queues only</li>
</ul>
</article>
<article>
<h3>Blob Server: seriously, no metadata</h3>
<ul>
<li>no filenames</li>
<li>no "mime types"</li>
<li>no "{create,mod,access} time"</li>
<li>size is implicit</li>
<li>blob: just some bytes</li>
<li>metadata? layers above.</li>
</ul>
</article>
<article>
<h1 class='center'>
Uh, what can you do with that?
</h1>
</article>
<article>
<h3>Uh, what can you do with that?</h3>
<ul>
<li>with just a blob server?</li>
<li>not much</li>
<li>but let's start with an easy example...</li>
</ul>
</article>
<article>
<h1 class='center'>
Filesystem Backups
</h1>
</article>
<article>
<h3>Filesystem Backups</h3>
<ul>
<li>previous project: brackup
<ul>
<li>good: Perl, slide/dice/encrypt S3 backup, content-addressed, good iterative backups</li>
<li>bad: large (several MB) "backup manifest" text files
</ul>
</li>
<li>fossil/venti, git, etc: directories content-addressed by content of their children, hash trees, etc</li>
<li>git: "tree objects", "commmit objects", etc</li>
<li>Camlistore: "schema blobs"</li>
</ul>
</article>
<article>
<h3>Schema: how to model your content</h3>
<ul>
<li>Camlistore defines <i>one possible</i> schema</li>
<li>but blobserver doesn't know about it all</li>
<li>tools generate schema,</li>
<li>indexer + search understand the schema.</li>
</ul>
</article>
<article>
<h3>Schema Blobs</h3>
<ul>
<li>so if all blobs are just dumb blobs of bytes with no metadata,</li>
<li>how do you store metadata?</li>
<li>as blobs themselves!</li>
</ul>
</article>
<article>
<h3>Minimal Schema Blob</h3>
<section>
<pre>{
"camliVersion": 1,
"camliType": "whatever"
}</pre>
</section>
<p>Whitespace doesn't matter. Just must be valid JSON in its
entirety. Use whatever JSON libraries you've got.</p>
<p>That one is named<br/><tt class='smaller'>sha1-19e851fe3eb3d1f3d9d1cefe9f92c6f3c7d754f6</tt></p>
<p>or perhaps: <tt class='smaller'>sha512-2c6746aba012337aaf113fd63c24d994a0703d33eb5d6ed58859e45dc4e02dcf<br/>dae5c4d46c5c757fb85d5aff342245fe4edb780c028a6f3c994c1295236c931e</tt></p>
</article>
2011-05-03 23:40:50 +00:00
<!-- END -->
<article>
2011-05-05 22:06:49 +00:00
<h3>Schema blob; type "file"</h3>
<section><pre>{"camliVersion": 1,
<span style='background: #fff'>"camliType": "file",</span>
"fileName": "foo.dat",
"unixPermission": "0644",
...,
"size": 6000133,
"contentParts": [
{"blobRef": "sha1-...dead", "size": 111},
{"blobRef": "sha1-...beef", "size": 5000000, "offset": 492 },
{"size": 1000000},
{"blobRef": "digalg-blobref", "size": 22},
]
}</pre></section>
</article>
<article>
<h3>Schema blob; type "directory"</h3>
<section><pre>{"camliVersion": 1,
<span style='background: #fff'>"camliType": "directory",</span>
"fileName": "foodir",
"unixPermission": "0755",
...,
"entries": <span style='background: #fff'>"sha1-c3764bc2138338d5e2936def18ff8cc9cda38455"</span>
}</pre></section>
</article>
<article>
<h3>Schema blob; type "static-set"</h3>
<section><pre>{"camliVersion": 1,
<span style='background: #fff'>"camliType": "static-set",</span>
"members": [
"sha1-xxxxxxxxxxxx",
"sha1-xxxxxxxxxxxx",
"sha1-xxxxxxxxxxxx",
"sha1-xxxxxxxxxxxx",
"sha1-xxxxxxxxxxxx",
"sha1-xxxxxxxxxxxx",
]
}</pre></section>
2011-05-03 23:40:50 +00:00
</article>
<article>
<h3>
2011-05-05 22:06:49 +00:00
Backup a directory...
2011-05-03 23:40:50 +00:00
</h3>
2011-05-05 22:06:49 +00:00
<section><pre>$ camput --file $HOME
sha1-8659a52f726588dc44d38dfb22d84a4da2902fed</pre></section>
<p>(like git/hg/fossil, that identifier represents everything down.)</p>
<p>Iterative backups are cheap, easy identifier to share, etc</p>
<p>But how will you remember that identifier? (later)</p>
2011-05-03 23:40:50 +00:00
</article>
2011-05-05 22:06:49 +00:00
<article>
2011-05-03 23:40:50 +00:00
<h3>
2011-05-05 22:06:49 +00:00
But what about mutable data?
2011-05-03 23:40:50 +00:00
</h3>
2011-05-05 22:06:49 +00:00
<ul>
<li>immutable data is easy to represent & reference</li>
<ul>
<li><tt class='smaller'>sha1-8659a52f726588dc44d38dfb22d84a4da2902fed</tt> is an immutable snapshot</li>
</ul>
<li>how to represent mutable data in an immutable, content-addressed world?</li>
<li>how to share a reference to a mutable object when changing an object mutates its name?</li>
</ul>
</article>
<article>
<h1 class='center'>
Objects & "Permanodes"
</h1>
</article>
<article>
<h3>
Terminology
</h3>
<ul>
<li><span class='red'>blob</span>: just dumb, immutable series of bytes</li>
<li><span class='red'>schema blob</span>: a blob that's a valid JSON object w/ camliVersion & camliType</li>
<li><span class='red'>signed schema blob</span> aka "<span class='red'>claim</span>": a schema blob with an embedded OpenPGP signature</li>
<li><span class='red'>object</span>: something mutable. represented as an anchor "<span class='blue'>permanode</span>" + a set of mutations (<span class='blue'>claims</span>)</li>
<li><span class='red'>permanode</span>: a stable reference. an anchor. just a <span class='blue'>signed schema blob</span>, but of almost no content...</li>
</ul>
</article>
<article>
<h3>
Permanode
</h3>
<section><pre><span style='font-weight: bold' class='blue'>$ camput --permanode</span>
sha1-ea799271abfbf85d8e22e4577f15f704c8349026
<span style='font-weight: bold' class="blue">$ camget sha1-ea799271abfbf85d8e22e4577f15f704c8349026</span>
<span style="background: #ff7">{"camliVersion": 1,
"camliSigner": "sha1-c4da9d771661563a27704b91b67989e7ea1e50b8",
<span style='font-weight: bold'>"camliType": "permanode"</span>,
"random": "oj)r}$Wa/[J|XQThNdhE"</span>
,"camliSig":"iQEcBAABAgAGBQJNRxceAAoJEGjzeDN/6vt8ihIH/Aov7FRIq4dODAPWGDwqL
1X9Ko2ZtSSO1lwHxCQVdCMquDtAdI3387fDlEG/ALoT/LhmtXQgYTt8QqDxVdu
EK1or6/jqo3RMQ8tTgZ+rW2cj9f3Q/dg7el0Ngoq03hyYXdo3whxCH2x0jajSt4RCc
gdXN6XmLlOgD/LVQEJ303Du1OhCvKX1A40BIdwe1zxBc5zkLmoa8rClAlHdqwo
gxYFY4cwFm+jJM5YhSPemNrDe8W7KT6r0oA7SVfOan1NbIQUel65xwIZBD0ah
CXBx6WXvfId6AdiahnbZiBup1fWSzxeeW7Y2/RQwv5IZ8UgfBqRHvnxcbNmScrzl
p3V3ZoY"}</pre></section>
</article>
<article>
<h3>
Backup a directory...
</h3>
<section><pre><span style='font-weight: bold'>$ camput --file $HOME</span>
sha1-8659a52f726588dc44d38dfb22d84a4da2902fed
<span style='font-weight: bold'>$ camput --permanode --file $HOME</span>
sha1-ea799271abfbf85d8e22e4577f15f704c8349026
<span style='font-weight: bold'>$ camput --permanode --name="Brad's home directory" --file $HOME</span>
sha1-ea799271abfbf85d8e22e4577f15f704c8349026</pre></section>
<ul>
<li>all the file data blobs, file/dir schema blobs,</li>
<li>a new permanode, owned by you</li>
<li>a mutation: permanode's content attribute == directory root</li>
<li>a mutation: permanode's name attribute == "Brad's home directory"</li>
</ul>
</article>
<article class='fill'>
<p><img src="fsbackup.png" height="100%"/></p>
</article>
<article>
<h1 class='center'>
Modeling non-filesystem objects
</h1>
</article>
<article>
<h3>Example: a photo gallery</h3>
<ul>
<li>Photos are objects</li>
<li>Galleries (sets) are objects</li>
<li>Photos are members of galleries</li>
<li>Photos & galleries have attributes (single-valued: "title", multi-valued: "tag")</li>
<li>Photos might be updated over time:
<ul>
<li>EXIF GPS updated, cropping, white balance</li>
<li>don't want to break links!</li>
</ul>
</li>
</ul>
</article>
<article class='fill'>
<p><img src="blobjects.png" width="100%"/></p>
</article>
<article>
<h1 class='center'>
How to make sense of that?
</h1>
2011-05-03 23:40:50 +00:00
</article>
2011-05-05 22:50:16 +00:00
<article>
<h1 class='center'>
Indexing & Search
</h1>
</article>
2011-05-05 22:06:49 +00:00
2011-05-03 23:40:50 +00:00
<article>
<h3>
2011-05-05 22:50:16 +00:00
Indexing: summary
2011-05-03 23:40:50 +00:00
</h3>
2011-05-05 22:50:16 +00:00
<p style='margin-top: 2em'>For each blob, build an index of:
2011-05-03 23:40:50 +00:00
<ul>
2011-05-05 22:50:16 +00:00
<li>directed graph of inter-blob references</li>
<li>(permanode, time) => resolved attributes</li>
<li>(permanode, time) => set memberships</li>
<li>etc...</li>
2011-05-03 23:40:50 +00:00
</ul>
</article>
<article>
<h3>
2011-05-05 22:50:16 +00:00
Indexing & Replication
2011-05-03 23:40:50 +00:00
</h3>
2011-05-05 22:50:16 +00:00
<ul>
<li>indexing is real-time, no polling</li>
<li>MySQL index speaks the blob server protocol</li>
<li>just replicated to MySQL (etc) just like Amazon S3 (etc)</li>
2011-05-03 23:40:50 +00:00
</ul>
2011-05-05 22:50:16 +00:00
<center><img src='repl.png' /></center>
2011-05-03 23:40:50 +00:00
</article>
2011-05-05 22:50:16 +00:00
<article>
2011-05-03 23:40:50 +00:00
<h3>
2011-05-05 22:50:16 +00:00
Search
2011-05-03 23:40:50 +00:00
</h3>
<ul>
2011-05-05 22:50:16 +00:00
<li>Permanodes created by $who, sorted by date desc, type "photo", tagged "funny"</li>
<li>My recent backups with attribute "hostname" == "camlistore.org",</l>
<li>All friends' galleries in which this photo appears,</li>
<li>etc...</li>
2011-05-03 23:40:50 +00:00
</ul>
2011-05-05 22:50:16 +00:00
<p>...similar to your email, or docs.google.com. "My stuff" or "My bookmarks".</p>
2011-05-03 23:40:50 +00:00
</article>
<article>
<h3>
2011-05-05 22:50:16 +00:00
Privacy Model
2011-05-03 23:40:50 +00:00
</h3>
2011-05-05 22:50:16 +00:00
<ul>
<li>all your blobs & objects & searches are private</li>
<li>nothing is public by default</li>
</ul>
2011-05-03 23:40:50 +00:00
</article>
2011-05-05 22:50:16 +00:00
<article>
<h1>
What if you want to share with friends, or globally publish something?
</h1>
</article>
<article>
2011-05-03 23:40:50 +00:00
<h3>
2011-05-05 22:50:16 +00:00
Sharing & Share Blobs
2011-05-03 23:40:50 +00:00
</h3>
<ul>
2011-05-05 22:50:16 +00:00
<li>the act of sharing involves creating a new <span class='red'>share claim</span>, just another blob, signed.</li>
2011-05-03 23:40:50 +00:00
</ul>
</article>
<article class='fill'>
<h3>
Image filling the slide (with optional header)
</h3>
<p>
<img src='images/example-cat.jpg'>
</p>
<div class='source white'>
2011-05-05 22:50:16 +00:00
Cat
2011-05-03 23:40:50 +00:00
</div>
</article>
<article class='nobackground'>
<h3>
A slide with an embed + title
</h3>
<iframe src='http://www.google.com/doodle4google/history.html'></iframe>
</article>
<article class='nobackground'>
<iframe src='http://www.google.com/doodle4google/history.html'></iframe>
</article>
<article class='fill'>
<h3>
Full-slide embed with (optional) slide title on top
</h3>
<iframe src='http://www.google.com/doodle4google/history.html'></iframe>
</article>
<article>
<h3>
Thank you!
</h3>
<ul>
2011-05-05 22:50:16 +00:00
<li>Brad Fitzpatrick, brad@danga.com</li>
<li>camlistore.org</li>
2011-05-03 23:40:50 +00:00
</ul>
</article>
</section>
</body>
</html>