For reasons yet to be explained, it can happen that zip blobs with the
exact same overall data contents get written to the large blobpacked.
Some of their packed schema blobs might differ though, e.g. if the
file names are different.
This is a problem, at least because of the added storage space used. And
also because it is not supposed to happen as there is a wholeRef check
on reception of a file.
However, it seems harmless to let these duplicates get indexed as z:
rows.
Therefore, this change improves a bit the inspection of duplicates and
the logging around it, to help users delete them by hand if they decide
to do so, but also adds detected duplicates as found zip blobs to the
index. This means finding duplicates is not longer considered a failure
of the recovery process.
Updates issue #1079
Change-Id: I5fec2f2226818eefd03ffce82c768de867fad6b0
As discussed in issue #1068, part of the problem with reindexing and
then checking the integrity of the index, is that neither b: nor w:
meta rows are keyed by the blobRef of the packed blob (the "zipRef").
This can lead, among other things, to reindexing actually erasing all
trace of a zipRef as it may have been in the value of a row, that gets
subsequently overwritten by a row that has a different value, but an
identical key (since e.g. w: rows are keyed by wholeRef).
To address that issue, this change introduces a new kind of meta row,
prefixed by z: , which indexes the following information:
key: blobref of the zip, prefixed by "z:"
value: size of the zip, blobref of the contents of the whole file, size
of the whole file, position in the whole file of the data in the zip,
size of the data in the zip.
Fixes#1068
Change-Id: Iae61fe823cda0accb22e55ea075407a2b8fd11f8
Also:
* lowercase the SHOUTYCASE log prefix
* close the meta index on Close (from mpl's CL)
For #1068
Change-Id: I26d9d77338ac850a20d9d631c0424129f6f98fe2
Check that all blobpacked zips are in the blobpacked meta, and
vice-versa (that all entries in meta do exist as a zip in large
storage).
As the current recovery code would not fix the case of stale entries
(large blobRefs in the blobpacked index of large blobs that don't exist
anymore), this change also adds a new recovery mode, which wipes the
existing blobpacked index, before rebuilding it.
In doing so, the recovery var in blobpacked pkg, as well as the
flagRecovery in camlistored.go have been changed to ints instead of bools,
to take into account that we now have several modes of operation for
recovery.
Fixes#946
Change-Id: I1fe76b805af34933e362d70c9f27bfd5403e3f3a
- correct logging that logged functions instead of their value
- use ID vs Id naming
- use correct function names in comments
Change-Id: I61562cef7ebac7337ec6c85312cdf7915cb1a84b
I had intended for this to be a small change.
I was going to just add context.Context to the BlobReceiver interface,
but then I saw blob.Fetcher could also use one, so I decided to do two
in one CL.
And then it got a bit infectious and ended up touching everything.
I ended up doing SubFetch in the process by necessity.
At a certain point I finally started using context.TODO() in a few
spots, but not too many. But removing context.TODO() will come in the
future. There are more blob storage interfaces lacking context, too,
like RemoveBlobs.
Updates #733
Change-Id: Idf273180b3f8e397ac5929c6d7f520ccc5cdce08
Remove the blob.SHA{1,224}From{Bytes,String} constructors too. No
longer used. This adds blob.RefFromBytes which was missing. We had
blob.RefFromString. Now everything uses blob.RefFrom* instead of
specifying a hash function.
Some tests set a flag to force use of SHA-1 because there was too much
golden data to update. We can remove those one-by-one over time as we
fix up tests.
Updates #537
Change-Id: Ibe6428089a6221594c2b751f53f98b03b5a28dc2
This addresses a long-standing TODO in the BlobStatter interface to
clean it up. Just like all new Go programmers, I misused channels in
APIs. I should've cleaned this up years ago.
While here, I also added a context.
The rest should get contexts later.
This also cleans up a few things here & there.
The pkg/client statting no longer does batching, which added a lot of
complexity. There was a comment saying something like "once we have
SPDY, we can delete this". Well, we have HTTP/2 now, so seems
deletable.
All tests pass.
Change-Id: I034ce07d9b70e5cc9e5482213368993e638d4bc8
Notably: pkg/misc all moves.
And pkg/googlestorage is deleted, since it's not used. Only the
x/net/http2/h2demo code used to use it, but that ended in
https://go-review.googlesource.com/33230 (our vendored code is old).
So just nuke that dir for now. When it's refreshed, it'll either be
gone (dep prune) or new enough to not need googlestorage.
Also move pkg/pools, pkg/leak, and pkg/geocode to internal.
More remains.
Change-Id: I2640c4d18424062fdb8461ba451f1ce26719ae9d
Part of the project renaming, issue #981.
After this, users will need to mv their $GOPATH/src/camlistore.org to
$GOPATH/src/perkeep.org. Sorry.
This doesn't yet rename the tools like camlistored, camput, camget,
camtool, etc.
Also, this only moves the lru package to internal. More will move to
internal later.
Also, this doesn't yet remove the "/pkg/" directory. That'll likely
happen later.
This updates some docs, but not all.
devcam test now passes again, even with Go 1.10 (which requires vet
checks are clean too). So a bunch of vet tests are fixed in this CL
too, and a bunch of other broken tests are now fixed (introduced from
the past week of merging the CL backlog).
Change-Id: If580db1691b5b99f8ed6195070789b1f44877dd4
After the rollsum fix in 4723d0f452
landed, the way files were truncated in blobs changed, and hence some
expected hashsums as well.
This CL adjusts such expectations.
Change-Id: I44fc1f5ce1922d7bc99f9a8096ef4b8d212571dc
This switches most usages of the pre-1.7 context library to use the
standard library. Remaining usages are in:
app/publisher/main.go
pkg/fs/...
Change-Id: Ia74acc39499dcb39892342a2c9a2776537cf49f1
When the blobpacked index gets corrupted/destroyed, it needs to be
rebuilt and hence camlistored refuses to start without the -recovery
flag being set if it detects it is needed.
This is a problem on GCE, because camlistored is handled by systemd, and
so the only way to restart camlistored with -recovery is to edit the
systemd service, then issue 'systemctl daemon-reload', and 'systemctl
restart camlistored'. Then revert the change for the next time
camlistored restarts. This is not user-friendly.
This change adds a check on camlistored startup based on the presence
and value (as a boolean) of the "camlistore-recovery" key as an instance
attribute. Therefore, the user only has to add the
("camlistore-recovery", "true") attribute in the Custom metadata of
their Google Cloud instance page to switch to recovery mode. (And
restart the instance).
As an additional feature (for non-GCE users), this change also adds the
option to restart the server in recovery mode from the status page of
the web UI.
Change-Id: I44f5ca293ddd0a0033fc5d9c2edca1bac0ee9c8f
When reindexing on a (My)SQL based sorted.KeyValue, we should recreate
the database schema from scratch, which means dropping the tables.
However, index.Reindex just calls Wipe on the newly created
sorted.KeyValue, which only deletes the rows, and does not drop the
tables.
Therefore, this CL changes the implementation of Wipe in the MySQL case,
so that it takes care of dropping the tables, and doing everything that
needs to be done afterwards to set up the sorted.KeyValue.
In addition, with the introduction of the sorted.NeedWipeError, we detect
upon initialization of a sorted.KeyValue if it failed because it needed
a schema update. If that is the case, and we're in reindex mode, we can
fix the sorted.KeyValue with a Wipe and carry on.
Finally, we introduce the new sorted.NewKeyValueMaybeWipe function that
automatically wipes a KeyValue when a NeedWipeError was returned upon
its creation.
Next, do the same with other sorted SQLs.
Fixes#806
Change-Id: I2032781cbf453a364880bd3e2e8b3c09aac7aed9
The -recovery flag from camlistored, now forces the blobpacked index to
be rebuilt, regardless of its state.
Fixes#876
Change-Id: I4e6bd5374ec68d7bb32de9fc119abbc881707625
compress/flate changed at tip. compressed zips can have different
digests now.
Updates camlistore/camlistore#710
Change-Id: Ida572ca88fe4729c57772db85f69876d28c3742f
The import path was added to the go file that included the package
documentation if one existed. Otherwise, I used what seemed to be the
primary file for the package.
Fixes#689
Change-Id: If51be0e86529fd6f179e80af6781e639f8550fd2
Most of it replaced with vendor/go4.org/types and
vendor/go4.org/readerutil
u32 went where needed in pkg/blobserver/*
invertedBool went in pkg/types/serverconfig
atomics64 went in pkg/fs
Change-Id: I230426cda35be4b45ed67e869f14e6fdae89be22
Previously pkg/jsonconfig and pkg/errorutil
Copied from go4.org at rev d1b8a2fb2de6160036e4801aa5e4d855571078b8
Change-Id: I673ed55b0825baa2607289b6082f205100261d7a
They were internal packages (under pkg), which we are now moving to
go4.org, so we in turn need to vendor them in now.
Change-Id: I92224f731404d0bd4ca1c57492bed37cb3367ed4
testSubFetcher in blobserver/storagetest was already checking that we'd
get specific error messages in the case of negative input parameters or
an out of range offset.
This change rationalizes these constraints with named errors
(ErrNegativeSubFetch and ErrOutOfRangeOffsetSubFetch) specified
in the SubFetcher interface.
It also fixes the googlestorage and s3 implementations so that they pass
the aforementioned test.
Change-Id: I25b72b842855b90ee3cab44c90654581dccf4b8e