perkeep/cmd
mpl da9020ec71 cmd/camput: compact LevelDB on HaveCache setup
This CL is about levelDB as the HaveCache for camput, and there are
several aspects to it. To describe it, I'll take the particular example
where you want to add many permanodes (~33k) to a given set, with
camput. Something like:

for _, blob := range blobs {
	do("camput attr -add sha1-foobar camliMember " + blob)
}

In a "normal" levelDB use case, everytime the number of level-0 .ldb
files goes over 4 (by default), a background compaction task is
started to transform these SST into level-1 ones, and remove the level-0
ones.
However, since our particular camput call is very short lived
(especially on a local Perkeep), not only might there be not enough time
for the compaction to be triggered, but even if it is, when the DB is
flushed (on a Close call), any ongoing compactions are cancelled. This
makes level-0 compactions very unlikely to happen on short-lived camput
calls. As a result, the number of level-0 files keeps growing until
levelDB fails while trying to open them all, because it hits the current
process ulimit.

Now, in this CL, what we propose is to systematically force a compaction
as soon as the HaveCache is opened. It is not scheduled concurrently, so
we are sure that the compaction happens before the DB actually gets used
by camput. This seems to make sure that the number of level-0 tables
never grows too much. With this change, I was able to run the above
example on 33K blobs without hitting the ulimit error.

However, it should be noted that potential problems might remain. The
compaction for levels above 0 is triggered based only on the total size
of the level (e.g. at 100MB by default for level-1), and not on the
number of files. Since we're creating many tiny tables (basically 1
entry per table), the number of files grows very fast while the total
size does not, and the compaction does not get triggered, even if forced
with CompactRange. This does not seem to be a problem for our use case,
as levelDB does not seem to need to open many of the level-1 files at
the same time, so we're not hitting the ulimit problem because of that.

If needed, there's at least one way this problem (if it is one?) could
be fixed: make the compaction trigger on other conditions, such as
number of files per level. I've experimented with it (forcing the
level-1 compaction to trigger at the 100 files limit), and it seems to
be working. But I had to do change the goleveldb code itself, and I
don't think levelDB implementations are supposed to do that.

For information, at the end of the run on the 33K blobs:
$ du -sch *.ldb
...
83M	total
$ ll | wc -l
20988

And indeed, when asking for leveldb.stats on the table:
 Level |   Tables   |    Size(MB)   |
-------+------------+---------------+
   0   |          1 |       0.00015 |
   1   |      20981 |       3.47307 |

Also, update github.com/syndtr/goleveldb to
34011bf325bce385408353a30b101fe5e923eb6e
And remove github.com/syndtr/gosnappy as goleveldb does not use it
anymore.

Also apply this change to StatCache.

Fixes #1008

Change-Id: If9f790a003e67f3c075881470e52e5f2174afa73
2018-01-16 00:46:11 +01:00
..
camget all: rename cammount to pk-mount 2018-01-08 09:54:41 -08:00
camput cmd/camput: compact LevelDB on HaveCache setup 2018-01-16 00:46:11 +01:00
camtool vendor/github.com/go-sql-driver/mysql: upgrade to v1.3.0 2018-01-12 10:02:14 -08:00
pk-deploy Rename camdeploy to pk-deploy. 2018-01-02 22:10:37 -08:00
pk-devimport Rename import paths from camlistore.org to perkeep.org. 2018-01-01 16:03:34 -08:00
pk-mount all: rename cammount to pk-mount 2018-01-08 09:54:41 -08:00