There are two TODO lists. This file (good for airplanes) and the online bug tracker: https://github.com/perkeep/perkeep/issues Offline list: -- fix the presubmit's gofmt to be happy about emacs: go fmt perkeep.org/cmd... perkeep.org/dev... perkeep.org/misc... perkeep.org/pkg... perkeep.org/server... stat pkg/blobserver/.#multistream_test.go: no such file or directory exit status 2 make: *** [fmt] Error 1 -- add HTTP handler for blobstreamer. stream a tar file? where to put continuation token? special file after each tar entry? special file at the end? HTTP Trailers? (but nobody supports them) -- reindexing: * add streaming interface to localdisk? maybe, even though not ideal, but really: migrate my personal instance from localdisk to blobpacked + maybe diskpacked for loose blobs? start by migrating to blobpacked and measuring size of loose. * add blobserver.EnumerateAllUnsorted (which could use StreamBlobs if available, else use EnumerateAll, else maybe even use a new interface method that goes forever and can't resume at a point, but can be canceled, and localdisk could implement that at least) * add buffered sorted.KeyValue implementation: a memory one (of configurable max size) in front of a real disk one. add a Flush method to it. also Flush when memory gets big enough. In progress: pkg/sorted/buffer -- stop using the "cond" blob router storage type in genconfig, as well as the /bs-and-index/ "replica" storage type, and just let the index register its own AddReceiveHook like the sync handler (pkg/server/sync.go). But whereas the sync handler only synchronously _enqueues_ the blob to replicate, the indexer should synchronously do the ReceiveBlob (ooo-reindex) on it too before returning. But the sync handler, despite technically only synchronously-enqueueing and being therefore async, is still very fast. It's likely the sync handler will therefore send a ReceiveBlob to the indexer at the ~same time the indexer is already indexing it. So the indexer should have some dup/merge suppression, and not do double work. singleflight should work. The loser should still consume the source io.Reader body and reply with the same error value. -- ditch the importer.Interrupt type and pass along a context.Context instead, which has its Done channel for cancelation. -- S3-only mode doesn't work with a local disk index (kvfile) because there's no directory for us to put the kv in. -- fault injection many more places with pkg/fault. maybe even in all handlers automatically somehow? -- sync handler's shard validation doesn't retry on error. only reports the errors now. -- export blobserver.checkHashReader and document it with the blob.Fetcher docs. -- "filestogether" handler, putting related blobs (e.g. files) next to each other in bigger blobs / separate files, and recording offsets of small blobs into bigger ones -- diskpacked doesn't seem to sync its index quickly enough. A new blob receieved + process exit + read in a new process doesn't find that blob. kv bug? Seems to need an explicit Close. This feels broken. Add tests & debug. -- websocket upload protocol. different write & read on same socket, as opposed to HTTP, to have multiple chunks in flight. -- extension to blobserver upload protocol to minimize fsyncs: maybe a client can say "no rush" on a bunch of data blobs first (which still don't get acked back over websocket until they've been fsynced), and then when the client uploads the schema/vivivy blob, that websocket message won't have the "no rush" flag, calling the optional blobserver.Storage method to fsync (in the case of diskpacked/localdisk) and getting all the "uploaded" messages back for the data chunks that were written-but-not-synced. -- measure FUSE operations, latency, round-trips, performance. see next item: -- ... we probaby need a "describe all chunks in file" HTTP handler. then FUSE (when it sees sequential access) can say "what's the list of all chunks in this file?" and then fetch them all at once. see next item: -- ... HTTP handler to get multiple blobs at once. multi-download in multipart/mime body. we have this for stat and upload, but not download. -- ... if we do blob fetching over websocket too, then we can support cancellation of blob requests. Then we can combine the previous two items: FUSE client can ask the server, over websockets, for a list of all chunks, and to also start streaming them all. assume a high-latency (but acceptable bandwidth) link. the chunks are already in flight, but some might be redundant. once the client figures out some might be redundant, it can issue "stop send" messages over that websocket connection to prevent dups. this should work on both "files" and "bytes" types. -- cacher: configurable policy on max cache size. clean oldest things (consider mtime+atime) to get back under max cache size. maybe prefer keeping small things (metadata blobs) too, and only delete large data chunks. -- UI: video, at least thumbnailing (use external program, like VLC or whatever nautilus uses?) -- rename server.ImageHandler to ThumbnailRequest or something? It's not really a Handler in the normal sense. It's not built once and called repeatedly; it's built for every ServeHTTP request. -- unexport more stuff from pkg/server. Cache, etc. -- look into garbage from openpgp signing -- make leveldb memdb's iterator struct only 8 bytes, pointing to a recycled object, and just nil out that pointer at EOF. -- bring in the google glog package to third_party and use it in places that want selective logging (e.g. pkg/index/receive.go) -- (Mostly done) verify all ReceiveBlob calls and see which should be blobserver.Receive instead, or ReceiveNoHash. git grep -E "\.ReceiveBlob\(" And maybe ReceiveNoHash should go away and be replaced with a "ReceiveString" method which combines the blobref-from-string and ReceiveNoHash at once. -- union storage target. sharder can be thought of a specialization of union. sharder already unions, but has a hard-coded policy of where to put new blobs. union could a library (used by sharder) with a pluggable policy on that. -- support for running pk-mount under perkeepd. especially for OS X, where the lifetime of the background daemon will be the same as the user's login session. -- website: add godoc for /server/perkeepd (also without a "go get" line) -- tests for all cmd/* stuff, perhaps as part of some integration tests. -- move most of pk-put into a library, not a package main. -- server cron support: full syncs, pk-put file backups, integrity checks. -- status in top right of UI: sync, crons. (in-progress, un-acked problems) -- finish metadata compaction on the encryption blobserver.Storage wrapper. -- get security review on encryption wrapper. (agl?) -- peer-to-peer server and blobserver target to store encrypted blobs on stranger's hardrives. server will be open source so groups of friends/family can run their own for small circles, or some company could run a huge instance. spray encrypted backup chunks across friends' machines, and have central server(s) present challenges to the replicas to have them verify what they have and how big, and also occasionally say what the SHA-1("challenge" + blob-data) is. -- sharing: make pk-get work with permanode sets too, not just "directory" and "file" things. -- sharing: when hitting e.g. http://myserver/share/sha1-xxxxx, if a web browser and not a smart client (Accept header? User-Agent?) then redirect or render a cutesy gallery or file browser instead, still with machine-readable data for slurping. -- rethink the directory schema so it can a) represent directories with millions of files (without making a >1MB or >16MB schema blob), probably forming a tree, similar to files. but rather than rolling checksum, just split lexically when nodes get too big. -- delete mostly-obsolete camsigd. see big TODO in camsigd.go. -- we used to be able live-edit js/css files in server/perkeepd/ui when running under the App Engine dev_appserver.py. That's now broken with my latest efforts to revive it. The place to start looking is: server/perkeepd/ui/fileembed_appengine.go -- should a "share" claim be not a claim but its own permanode, so it can be rescinded? right now you can't really unshare a "haveref" claim. or rather, TODO: verify we support "delete" claims to delete any claim, and verify the share system and indexer all support it. I think the indexer might, but not the share system. Also TODO: "pk-put delete" or "rescind" subcommand. Also TODO: document share claims in doc/schema/ and on website. -- make the -transitive flag for "pk-put share -transitive" be a tri-state: unset, true, false, and unset should then mean default to true for "file" and "directory" schema blobs, and "false" for other things. -- index: static directory recursive sizes: search: ask to see biggest directories? -- index: index dates in filenames ("yyyy-mm-dd-Foo-Trip", "yyyy-mm blah", etc). -- get webdav server working again, for mounting on Windows. This worked before Go 1 but bitrot when we moved pkg/fs to use the rsc/fuse. -- BUG: osutil paths.go on OS X: should use Library everywhere instead of mix of Library and ~/.camlistore? OLD: -- add CROS support? Access-Control-Allow-Origin: * + w/ OPTIONS http://hacks.mozilla.org/2009/07/cross-site-xmlhttprequest-with-cors/ -- brackup integration, perhaps sans GPG? (requires Perl client?) -- blobserver: clean up channel-closing consistency in blobserver interface (most close, one doesn't. all should probably close) Android: [ ] Fix wake locks in UploadThread. need to hold CPU + WiFi whenever something's enqueued at all and we're running. Move out of the Thread that's uploading itself. [ ] GPG signing of blobs (brad) http://code.google.com/p/android-privacy-guard/ http://www.thialfihar.org/projects/apg/ (supports signing in code, but not an Intent?) http://code.google.com/p/android-privacy-guard/wiki/UsingApgForDevelopment ... mailed the author. Client libraries: [X] Go [X] JavaScript [/] Python (Brett); but see https://github.com/tsileo/camlipy [ ] Perl [ ] Ruby [ ] PHP