A bunch of mine had a larger initial ftyp box, which broke the second
part of the pattern. But the second part of the pattern doesn't matter
anyway. This only needs to casually recognize them. A later full
parse will determine what they really are.
This also adds some new debugging when CAMLI_DEBUG is true.
Change-Id: Ib4adc9b5447a64ba4682624e42b55f1d65779ef7
Otherwise claims that are actually from the same signer, end up being
treated as from different signers, because some of the claims were
signed by the sha1 version and others by the sha224 one.
TODO in follow-up CLs: similar fixes in rest of the corpus, such as with
claimPtrsAttrValue. See if non-corpus index functions/methods suffer
from the same problem.
Change-Id: Icbc70e97edc569f46e575d79aaf4359b33996053
I had intended for this to be a small change.
I was going to just add context.Context to the BlobReceiver interface,
but then I saw blob.Fetcher could also use one, so I decided to do two
in one CL.
And then it got a bit infectious and ended up touching everything.
I ended up doing SubFetch in the process by necessity.
At a certain point I finally started using context.TODO() in a few
spots, but not too many. But removing context.TODO() will come in the
future. There are more blob storage interfaces lacking context, too,
like RemoveBlobs.
Updates #733
Change-Id: Idf273180b3f8e397ac5929c6d7f520ccc5cdce08
Remove the blob.SHA{1,224}From{Bytes,String} constructors too. No
longer used. This adds blob.RefFromBytes which was missing. We had
blob.RefFromString. Now everything uses blob.RefFrom* instead of
specifying a hash function.
Some tests set a flag to force use of SHA-1 because there was too much
golden data to update. We can remove those one-by-one over time as we
fix up tests.
Updates #537
Change-Id: Ibe6428089a6221594c2b751f53f98b03b5a28dc2
Part of the project renaming, issue #981.
After this, users will need to mv their $GOPATH/src/camlistore.org to
$GOPATH/src/perkeep.org. Sorry.
This doesn't yet rename the tools like camlistored, camput, camget,
camtool, etc.
Also, this only moves the lru package to internal. More will move to
internal later.
Also, this doesn't yet remove the "/pkg/" directory. That'll likely
happen later.
This updates some docs, but not all.
devcam test now passes again, even with Go 1.10 (which requires vet
checks are clean too). So a bunch of vet tests are fixed in this CL
too, and a bunch of other broken tests are now fixed (introduced from
the past week of merging the CL backlog).
Change-Id: If580db1691b5b99f8ed6195070789b1f44877dd4
This switches most usages of the pre-1.7 context library to use the
standard library. Remaining usages are in:
app/publisher/main.go
pkg/fs/...
Change-Id: Ia74acc39499dcb39892342a2c9a2776537cf49f1
If a zip archive is created without specifying the modtimes of the
files, they'll end up with a default modtime set to the MSDOS epoch
(1980-01-01 modulo some timezone and silly details), which is a common
enough occurrence.
Even when the index has a better information, such as the EXIF time,
when clients of the index (the web UI, through the search package) sort
by creation time, they use the oldest indexed time available, which is
unfortunate in that case.
Therefore, this CL makes the indexer ignore the oldest time found, if it
is before the MSDOS epoch, and if we have another time available, when
receiving a file.
Also fixed the use of hardcoded value of keyFileTimes.name, to help with
reading/searching code.
Change-Id: I9c2c39b319fdf6cd5214cab8928dd025451077ac
We relied on missTrackFetcher to return errMissingDep when the
underlying Fetch() returned os.ErrNotExist. The caller could then know
how to act if some indexing operation failed because of an errMissingDep
error.
This was wrong for 2 reasons:
1) if a function fn(tf blob.Fetcher) error does:
if _, _, err := tf.Fetch(br); err != nil {
return fmt.Errorf("wrapping this error in a nicer error
message: %v", err)
}
when we call err := fn(tf), we lose the ability to directly determine
whether err is an errMissingDep. We'd have to parse the error string,
which is gross.
This is exactly what happens in populateDir, when we call
dr.StaticSet().
And in order to fix issue #738, we want to be able to tell when a call
to dr.StaticSet() failed because the underlying Fetch() operation
failed.
2) The blob.Fetcher interface specifically states that os.ErrNotExist
should be returned when a blob is not found. We were breaking that rule
by returning errMissingDep.
In order to address both 1) and 2), it seemed like we could add an err
field to missTrackFetcher to keep track of when an os.ErrNotExist
occurred during a Fetch, and let Fetch return an os.ErrNotExist.
However, that would not work, as a missTrackFetcher is used concurrently
by several callers, so a given caller wouldn't be able to tell whether
"its" Fetch failed or a Fetch from a concurrent caller failed.
Therefore, we introduce trackErrorsFetcher, that has such an error field,
and that wraps the missTrackFetcher. All the callers can keep on sharing
the missTrackFetcher, but each of them initialize their own
trackErrorsFetcher, and can check the errors field after a failed call to a
function is suspected to be the result of a failed Fetch.
Also added a test to demonstrate that issue #738 is fixed.
Fixes#738
Change-Id: Ia5c3081b71c77be1e8cff0bbc847ade68f019bf9
There's a race that takes place at the end of the reindexing process.
In func (x *Index) Reindex(), we wait for all the reindexing goroutines
to be done, with wg.Wait. However, any (or all) of those goroutines
could have triggered (they call indexBlob, which calls
blobserver.Receive, which calls noteBlobIndexed, which can send on
tickleOoo) an asynchronous out of order reindexing which is NOT waited
on.
The race can be trivially demonstrated by changing:
WaitTickle:
for range ix.tickleOoo {
+ time.Sleep(5*time.Second)
in receive.go, and running:
go test ./pkg/index/ -run TestReindex_*
This CL rewrites the out of order indexing implementation to make it
simpler, and in the process, fixes the above bug.
Fixes#756
Change-Id: If79fb1ad8869cefce4a095ef2becdba333732bce
When receiving a file, we were only trying to guess its MIME type
through its contents (pkg/magic). We're now making a better effort at it
by guessing from the filename extension if needed.
Also:
pkg/magic: get rid of all the extra video extensions that are already
covered by mime.TypeByExtension. Because it's redundant and
confusing.
app/publisher, pkg/types/camtypes: also use mime.TypeByExtension as an
extra effort. Especially since a reindex would be necessary to benefit
from the pkg/index change.
There are other places in Camlistore that could use such an effort.
Maybe we should have a camtypes.*FileInfo.MIME() method that tries all
the ways to guess the MIME type of the file?
Change-Id: Ib9a2bc42af77c5394dac578ae415524b5111ad4e
A word of caution: relatedly to the issue demonstrated by the added
tests, an infinite loop can also occur, as it already could in
TestReindex_LevelDB. As it is, after all, a consequence of a race, I
haven't been able to determine what exactly makes the loop occur. But
what I observed is:
1) It seems to be occuring much more easily with LevelDB, which is why I
ended up just disabling TestReindex_LevelDB.
2) I've never seen it happen in TestReindex_Kvfile, but who knows.
3) I've seen it rarely happen with TestShowReindexRace_Kvfile, but it
seems that adding in TestShowReindexRace_Kvfile the kind of timed kill
that I had added TestReindex_LevelDB, actually makes the loop happen
much more often. And it ends up eclipsing the original issue that we
want to demonstrate, which is why I decided against it.
TL;DR: if you use -show_reindex_race=true , be prepared to maybe
have to kill(1) the test manually.
Change-Id: I47fd3c55363c8d0dda17ad19665115cb96f3d58f
The describe requests were launching a storm of RLocks which weren't
safe in the presence of goroutines trying to acquire write locks.
Instead, make the corpus locking the responsibility of the caller and
add Lock/Unlock/RLock/RUnlock methods to the index and move locking up
a level.
This also adds a fair bit of context.Context plumbing which was used
in earlier debugging.
Fixescamlistore/camlistore#709
Change-Id: I8d7254d1e1da541f8c080d62f5408aac807fd3b1
Most of it replaced with vendor/go4.org/types and
vendor/go4.org/readerutil
u32 went where needed in pkg/blobserver/*
invertedBool went in pkg/types/serverconfig
atomics64 went in pkg/fs
Change-Id: I230426cda35be4b45ed67e869f14e6fdae89be22
This change is in anticipation of moving pkg/images to go4.org, where it
should not depend on packages in third_party.
So:
third_party/github.com/nf/cr2 -> vendor/github.com/nf/cr2
third_party/github.com/rwcarlsen/goexif -> vendor/github.com/rwcarlsen/goexif
third_party/golang.org/x/image/tiff -> vendor/golang.org/x/image/tiff
Note that third_party/go/pkg/image/jpeg was also a dependency of
pkg/images. We had vendored image/jpeg from tip at the time because it
offered advantages over the version from Go1.3
(https://github.com/camlistore/camlistore/issues/463).
Since we now depend on Go1.5, we can go back to depend on the stdlib
version, so we simply remove third_party/go/pkg/image/jpeg and adjust
the imports accordingly.
Change-Id: Ifc8ffae0551102e644a0a0c67f3ff89e04df15c7
When reading EXIF data for large tiff files, our optimistic
file prefix sometimes isn't enough and we need to pass in
the whole file. We already did this in several places (image
decoding and indexing), this change adds it for finding the
modtime (for which we try to use EXIF data when
available). It pulls common functionality out into a
separate func and changes the existing uses of this pattern
to make use of the func.
Change-Id: I2b786775f168f47f46fb5ac707e3744991139a21
When reading EXIF data from larger TIFF files, we might fail
to read the EXIF data when we only pass in the in-memory
prefix. This change identifies when the third-party library
encounters a short read on a tag/EXIF data and triggers a
retry with the whole file by returning an ErrUnexpectedEOF.
Change-Id: Ie5cdc1613db6ccac49d91a69827f11ca3406a74b
So when you describe a file, you also gets its wholeref.
TODO: we'll need to migrate old indexes to this new format on
start-up.
Change-Id: I4a3fb000d68bde46474275c2070ef285a6d6ecfc
Go's image.DecodeConfig needs more than 1MiB on some images (e.g. some
Lens Blur pics taken with Google Camera). Now we first try a 512KiB header
and retry with a full FileReader if that fails.
https://camlistore.org/bugs/477
Change-Id: I286d15d86a69951737d94dd3692d4e9e1992b324
This pulls the changes from the current HEAD of
https://github.com/rwcarlsen/goexif
(eb2943811adc24a1a40d6dc0525995d4f8563d08)
Notable changes:
- Removed explicit panics in favor of error returns
- renamed TypeCategory to Format and made format calculated upon
decoding rather than repeatedly for every format call
- Merged contributions from Camlistore (exif.LatLong(), exif.DateTime()
etc.)
- Change String method to just return the string value - and don't have
square brackets if only a single value
- add separate Int and Int64 retrieval methods
- Doc updates
Minor changes in camlistore.org/pkg/* were neccessary to reflect
changes in the API (handling of returned errors) and in names of
exported fields and methods.
Change-Id: I50412b5e68d2c9ca766ff2ad1a4ac26926baccab
In the advent of github.com/camlistore/goexif to be closed, this
commit renames the goexif folder in third_party to match the
upstream on GitHub.
The affected import paths have been rewritten accordingly.
Change-Id: I5a8871efd01987944b7f5e93979307857ae16fe7
This pulls the changes from the current HEAD of
https://github.com/rwcarlsen/goexif
(rev cf045e9d6ba052fd348f82394d364cca4937589a)
Changes to goexif from upstream:
- Add support for reading Nikon and Canon maker notes
- Adds parser registration to exif package
- Renamed cmd to exifstat
- Renamed exported fields and methods in goexif/tiff
- adds support for bare tiff images. bug fix and minor cosmetics
- support pulling the thumbnail
- adds thumbnail support to exif pkg
- tiff defines DataType and constants for the datatype values
- Update covnertVals and TypeCategory to use new constants for DataType
- Renamed test data dir in exif tests
- created type+constants for raw tiff tag data types
Not merged from upstream:
- ~1 MB of test JPGs in goexif/exif/samples
Minor changes in camlistore.org/pkg/* were neccessary to reflect the
name changes in the exported fields and methods.
Change-Id: I0fdcad2d7b5e01e0d4160a5eb52b8ec750d353cf
problem: the out-of-order mechanism based on the outOfOrderIndexerLoop
was not working for some claims.
Let C be a delete claim on permanode P. If C was received before P was,
C was marked as being received with the "have" index row. However, for
the deletion to be marked in the index, some information about P is
needed (its meta row), so C could not be fully indexed upon reception.
Then, when P was finally received, the outOfOrderIndexerLoop would kick
in and retry indexing C. Which would fail, because a test based on the
"have" row would (wrongly) detect that C is already indexed and return
early.
In this patch:
-we introduce the "|indexed" suffix to the "have" - value part - row
(receive.go). If a blob is received but some of its dependencies are
missing, the have row value is written without the suffix. Upon
reception of a blob, we now test for the presence of the suffix in the
have row. If missing, the reception continues instead of returning
early. The existing mechanism that was detecting missing dependencies
for file blobs has been adapted to work with this suffix too.
-the index enumeration (enumstat.go), which relies on "have" rows, has
been adapted to work with the new "have" row format, while staying
compatible with the old format. And related tests have been added.
http://camlistore.org/issue/454
Change-Id: I2559d08a12b2a4e0f0691fc7e31f1ed1f874625e
index.New was starting outOfOrderIndexerLoop in a goroutine. And
outOfOrderIndexerLoop had an if index.BlobSource == nil check, on which
it relied to go on. However, since BlobSource was public and unguarded,
the following sequence was possible:
ix, _ := index.New()
ix.BlobSource = bs
which is racy because the BlobSource assignment may or may not happen
before the check within outOfOrderIndexerLoop.
TestOutOfOrderIndexing was relying on the fact that apparently most
of the time the assignment seems to be happening before the check.
This patch:
-makes BlobSource (now blobSource) private, rendering the race impossible
out of the index package.
-moves the initialization of blobSource, as well as the execution of
outOfOrderIndexerLoop at a unique point, in InitBlobSource (new method).
-makes sure all accesses to blobSource are guarded with the index mutex
(now a RWMutex).
Context: while working on tests for http://camlistore.org/issue/454
Change-Id: I9605f26b41abd62b42880be0620b06ce143761bc
Index "MusicBrainz Album ID" ID3v2 frames as
"musicbrainzalbumid" media tags to facilitate downloading
cover art from coverartarchive.org.
Change-Id: Ie81017dd6f76ec355ee0d1daedfb7180cb70ad59