perkeep/TODO

240 lines
10 KiB
Plaintext

There are two TODO lists. This file (good for airplanes) and the online bug tracker:
https://github.com/perkeep/perkeep/issues
Offline list:
-- fix the presubmit's gofmt to be happy about emacs:
go fmt perkeep.org/cmd... perkeep.org/dev... perkeep.org/misc... perkeep.org/pkg... perkeep.org/server...
stat pkg/blobserver/.#multistream_test.go: no such file or directory
exit status 2
make: *** [fmt] Error 1
-- add HTTP handler for blobstreamer. stream a tar file? where to put
continuation token? special file after each tar entry? special file
at the end? HTTP Trailers? (but nobody supports them)
-- reindexing:
* add streaming interface to localdisk? maybe, even though not ideal, but
really: migrate my personal instance from localdisk to blobpacked +
maybe diskpacked for loose blobs? start by migrating to blobpacked and
measuring size of loose.
* add blobserver.EnumerateAllUnsorted (which could use StreamBlobs
if available, else use EnumerateAll, else maybe even use a new
interface method that goes forever and can't resume at a point,
but can be canceled, and localdisk could implement that at least)
* add buffered sorted.KeyValue implementation: a memory one (of
configurable max size) in front of a real disk one. add a Flush method
to it. also Flush when memory gets big enough.
In progress: pkg/sorted/buffer
-- stop using the "cond" blob router storage type in genconfig, as
well as the /bs-and-index/ "replica" storage type, and just let the
index register its own AddReceiveHook like the sync handler
(pkg/server/sync.go). But whereas the sync handler only synchronously
_enqueues_ the blob to replicate, the indexer should synchronously
do the ReceiveBlob (ooo-reindex) on it too before returning.
But the sync handler, despite technically only synchronously-enqueueing
and being therefore async, is still very fast. It's likely the
sync handler will therefore send a ReceiveBlob to the indexer
at the ~same time the indexer is already indexing it. So the indexer
should have some dup/merge suppression, and not do double work.
singleflight should work. The loser should still consume the
source io.Reader body and reply with the same error value.
-- ditch the importer.Interrupt type and pass along a context.Context
instead, which has its Done channel for cancelation.
-- S3-only mode doesn't work with a local disk index (kvfile) because
there's no directory for us to put the kv in.
-- fault injection many more places with pkg/fault. maybe even in all
handlers automatically somehow?
-- sync handler's shard validation doesn't retry on error.
only reports the errors now.
-- export blobserver.checkHashReader and document it with
the blob.Fetcher docs.
-- "filestogether" handler, putting related blobs (e.g. files)
next to each other in bigger blobs / separate files, and recording
offsets of small blobs into bigger ones
-- diskpacked doesn't seem to sync its index quickly enough.
A new blob receieved + process exit + read in a new process
doesn't find that blob. kv bug? Seems to need an explicit Close.
This feels broken. Add tests & debug.
-- websocket upload protocol. different write & read on same socket,
as opposed to HTTP, to have multiple chunks in flight.
-- extension to blobserver upload protocol to minimize fsyncs: maybe a
client can say "no rush" on a bunch of data blobs first (which
still don't get acked back over websocket until they've been
fsynced), and then when the client uploads the schema/vivivy blob,
that websocket message won't have the "no rush" flag, calling the
optional blobserver.Storage method to fsync (in the case of
diskpacked/localdisk) and getting all the "uploaded" messages back
for the data chunks that were written-but-not-synced.
-- measure FUSE operations, latency, round-trips, performance.
see next item:
-- ... we probaby need a "describe all chunks in file" HTTP handler.
then FUSE (when it sees sequential access) can say "what's the
list of all chunks in this file?" and then fetch them all at once.
see next item:
-- ... HTTP handler to get multiple blobs at once. multi-download
in multipart/mime body. we have this for stat and upload, but
not download.
-- ... if we do blob fetching over websocket too, then we can support
cancellation of blob requests. Then we can combine the previous
two items: FUSE client can ask the server, over websockets, for a
list of all chunks, and to also start streaming them all. assume a
high-latency (but acceptable bandwidth) link. the chunks are
already in flight, but some might be redundant. once the client figures
out some might be redundant, it can issue "stop send" messages over
that websocket connection to prevent dups. this should work on
both "files" and "bytes" types.
-- cacher: configurable policy on max cache size. clean oldest
things (consider mtime+atime) to get back under max cache size.
maybe prefer keeping small things (metadata blobs) too,
and only delete large data chunks.
-- UI: video, at least thumbnailing (use external program,
like VLC or whatever nautilus uses?)
-- rename server.ImageHandler to ThumbnailRequest or something? It's
not really a Handler in the normal sense. It's not built once and
called repeatedly; it's built for every ServeHTTP request.
-- unexport more stuff from pkg/server. Cache, etc.
-- look into garbage from openpgp signing
-- make leveldb memdb's iterator struct only 8 bytes, pointing to a recycled
object, and just nil out that pointer at EOF.
-- bring in the google glog package to third_party and use it in
places that want selective logging (e.g. pkg/index/receive.go)
-- (Mostly done) verify all ReceiveBlob calls and see which should be
blobserver.Receive instead, or ReceiveNoHash. git grep -E
"\.ReceiveBlob\(" And maybe ReceiveNoHash should go away and be
replaced with a "ReceiveString" method which combines the
blobref-from-string and ReceiveNoHash at once.
-- union storage target. sharder can be thought of a specialization
of union. sharder already unions, but has a hard-coded policy
of where to put new blobs. union could a library (used by sharder)
with a pluggable policy on that.
-- support for running pk-mount under perkeepd. especially for OS X,
where the lifetime of the background daemon will be the same as the
user's login session.
-- website: add godoc for /server/perkeepd (also without a "go get"
line)
-- tests for all cmd/* stuff, perhaps as part of some integration
tests.
-- move most of pk-put into a library, not a package main.
-- server cron support: full syncs, pk-put file backups, integrity
checks.
-- status in top right of UI: sync, crons. (in-progress, un-acked
problems)
-- finish metadata compaction on the encryption blobserver.Storage wrapper.
-- get security review on encryption wrapper. (agl?)
-- peer-to-peer server and blobserver target to store encrypted blobs
on stranger's hardrives. server will be open source so groups of
friends/family can run their own for small circles, or some company
could run a huge instance. spray encrypted backup chunks across
friends' machines, and have central server(s) present challenges to
the replicas to have them verify what they have and how big, and
also occasionally say what the SHA-1("challenge" + blob-data) is.
-- sharing: make camget work with permanode sets too, not just
"directory" and "file" things.
-- sharing: when hitting e.g. http://myserver/share/sha1-xxxxx, if
a web browser and not a smart client (Accept header? User-Agent?)
then redirect or render a cutesy gallery or file browser instead,
still with machine-readable data for slurping.
-- rethink the directory schema so it can a) represent directories
with millions of files (without making a >1MB or >16MB schema blob),
probably forming a tree, similar to files. but rather than rolling checksum,
just split lexically when nodes get too big.
-- delete mostly-obsolete camsigd. see big TODO in camsigd.go.
-- we used to be able live-edit js/css files in server/perkeepd/ui when
running under the App Engine dev_appserver.py. That's now broken with my
latest efforts to revive it. The place to start looking is:
server/perkeepd/ui/fileembed_appengine.go
-- should a "share" claim be not a claim but its own permanode, so it
can be rescinded? right now you can't really unshare a "haveref"
claim. or rather, TODO: verify we support "delete" claims to
delete any claim, and verify the share system and indexer all
support it. I think the indexer might, but not the share system.
Also TODO: "pk-put delete" or "rescind" subcommand.
Also TODO: document share claims in doc/schema/ and on website.
-- make the -transitive flag for "pk-put share -transitive" be a tri-state:
unset, true, false, and unset should then mean default to true for "file"
and "directory" schema blobs, and "false" for other things.
-- index: static directory recursive sizes: search: ask to see biggest directories?
-- index: index dates in filenames ("yyyy-mm-dd-Foo-Trip", "yyyy-mm blah", etc).
-- get webdav server working again, for mounting on Windows. This worked before Go 1
but bitrot when we moved pkg/fs to use the rsc/fuse.
-- BUG: osutil paths.go on OS X: should use Library everywhere instead of mix of
Library and ~/.camlistore?
OLD:
-- add CROS support? Access-Control-Allow-Origin: * + w/ OPTIONS
http://hacks.mozilla.org/2009/07/cross-site-xmlhttprequest-with-cors/
-- brackup integration, perhaps sans GPG? (requires Perl client?)
-- blobserver: clean up channel-closing consistency in blobserver interface
(most close, one doesn't. all should probably close)
Android:
[ ] Fix wake locks in UploadThread. need to hold CPU + WiFi whenever
something's enqueued at all and we're running. Move out of the Thread
that's uploading itself.
[ ] GPG signing of blobs (brad)
http://code.google.com/p/android-privacy-guard/
http://www.thialfihar.org/projects/apg/
(supports signing in code, but not an Intent?)
http://code.google.com/p/android-privacy-guard/wiki/UsingApgForDevelopment
... mailed the author.
Client libraries:
[X] Go
[X] JavaScript
[/] Python (Brett); but see https://github.com/tsileo/camlipy
[ ] Perl
[ ] Ruby
[ ] PHP