Commit Graph

98 Commits

Author SHA1 Message Date
Alexandre Viau fb961cf310
make codebase go-vet-clean (#1379)
Co-authored-by: Bob Glickstein <bobg@emphatic.com>
2021-07-26 21:19:53 -04:00
aviau 231ba4233f pkg/schema: create CamliType type
Create a CamliType type in pkg/schema and use it in a couple of
packages.

It can be implemented in other packages as we go.
2021-01-19 00:47:58 -05:00
Brad Fitzpatrick 6b88e2a73f internal/images: broaden pattern that matches HEIC images
A bunch of mine had a larger initial ftyp box, which broke the second
part of the pattern. But the second part of the pattern doesn't matter
anyway.  This only needs to casually recognize them. A later full
parse will determine what they really are.

This also adds some new debugging when CAMLI_DEBUG is true.

Change-Id: Ib4adc9b5447a64ba4682624e42b55f1d65779ef7
2018-04-27 12:29:00 -07:00
Brad Fitzpatrick a13abdeb8c cmd/camput: add flag to specify hash function for raw blobs
Also a bit more logging around indexing in debug mode.

Change-Id: I2eb67cfec12cff102ba64b17de0369bde38e416a
2018-04-20 21:02:43 -07:00
Brad Fitzpatrick 61de9881da pkg/index: redo indexing of a blob when CAMLI_REDO_INDEX_ON_RECEIVE is set
So an upcoming HEIC reindexing tool.

Updates #969

Change-Id: If194904d770bb670fa581c4e5e09c303806bc784
2018-04-20 19:40:11 -07:00
mpl 3014bd7413 pkg/index: read EXIF bytes when HEIC file
Updates #969

Change-Id: I11cc1668b853fe5bc2d076addda1a60c08be1dc5
2018-04-18 17:39:38 +02:00
mpl 139cd8bd01 vendor: add go4.org/media/heif
At rev 7b81d6948d11710f710d0c4ef52daac1dc7c936b

Updates issue #969

Change-Id: I6f21de58c0865d3cbc8186b3a6834444b6d1206e
2018-04-13 00:10:11 +02:00
mpl 0c1b99b4c5 pkg/index: fix location in corpus to use signer ID instead of blobRef
Follow-up of ec66bcc871

Updates #537

Change-Id: Ib4dc62ae6be91c061dc6de4bfdd617785abdaab6
2018-01-23 19:11:52 +01:00
Mathieu Lonjaret 4ed9175f01 Merge "pkg/index: use the gpg ID (and not its blobRef) in corpus" 2018-01-19 16:56:38 +00:00
mpl ec66bcc871 pkg/index: use the gpg ID (and not its blobRef) in corpus
Otherwise claims that are actually from the same signer, end up being
treated as from different signers, because some of the claims were
signed by the sha1 version and others by the sha224 one.

TODO in follow-up CLs: similar fixes in rest of the corpus, such as with
claimPtrsAttrValue. See if non-corpus index functions/methods suffer
from the same problem.

Change-Id: Icbc70e97edc569f46e575d79aaf4359b33996053
2018-01-19 17:26:48 +01:00
Brad Fitzpatrick 194d4f9443 blobserver, all: add contexts to ReceiveBlob, Fetch & million resulting deps
I had intended for this to be a small change.

I was going to just add context.Context to the BlobReceiver interface,
but then I saw blob.Fetcher could also use one, so I decided to do two
in one CL.

And then it got a bit infectious and ended up touching everything.

I ended up doing SubFetch in the process by necessity.

At a certain point I finally started using context.TODO() in a few
spots, but not too many. But removing context.TODO() will come in the
future. There are more blob storage interfaces lacking context, too,
like RemoveBlobs.

Updates #733

Change-Id: Idf273180b3f8e397ac5929c6d7f520ccc5cdce08
2018-01-18 16:22:16 -08:00
Brad Fitzpatrick 38f10a7bd0 all, testhooks: use sha224 by default, add hook for some tests to use sha-1
Remove the blob.SHA{1,224}From{Bytes,String} constructors too. No
longer used. This adds blob.RefFromBytes which was missing. We had
blob.RefFromString. Now everything uses blob.RefFrom* instead of
specifying a hash function.

Some tests set a flag to force use of SHA-1 because there was too much
golden data to update. We can remove those one-by-one over time as we
fix up tests.

Updates #537

Change-Id: Ibe6428089a6221594c2b751f53f98b03b5a28dc2
2018-01-09 20:03:38 -08:00
Brad Fitzpatrick 57648c6b83 all: update copyright holder from Google Inc to The Perkeep Authors
The AUTHORS file is the list of copyright holders.
2018-01-03 16:52:49 -08:00
Brad Fitzpatrick eb0024f164 Merge "pkg/index: ignore unset msdos time when possible" 2018-01-03 06:18:09 +00:00
Brad Fitzpatrick c3d05cdce9 Move more packages out of pkg/ and into internal/
Moved hashutil, httputil, osutil, netutil,
images, media, magic, video, and rollsum.
2018-01-02 21:03:30 -08:00
Brad Fitzpatrick d6a0b05df0 Rename import paths from camlistore.org to perkeep.org.
Part of the project renaming, issue #981.

After this, users will need to mv their $GOPATH/src/camlistore.org to
$GOPATH/src/perkeep.org. Sorry.

This doesn't yet rename the tools like camlistored, camput, camget,
camtool, etc.

Also, this only moves the lru package to internal. More will move to
internal later.

Also, this doesn't yet remove the "/pkg/" directory. That'll likely
happen later.

This updates some docs, but not all.

devcam test now passes again, even with Go 1.10 (which requires vet
checks are clean too). So a bunch of vet tests are fixed in this CL
too, and a bunch of other broken tests are now fixed (introduced from
the past week of merging the CL backlog).

Change-Id: If580db1691b5b99f8ed6195070789b1f44877dd4
2018-01-01 16:03:34 -08:00
Paul Lindner 1383869054 all: lint fixes for "receiver name should be consistent with previous receiver name"
Change-Id: I05275cd20c92349e37365e2cbd29fa9f8d834101
2017-12-13 11:31:25 -08:00
Paul Lindner ba92702834 all: lint fixes for "should omit 2nd value from range"
Change-Id: I7bb19d376f96a39ecae7dbdb4d6808f704bae5fb
2017-12-13 11:31:25 -08:00
Paul Lindner b09cd377d7 Switch to stdlib context from golang.org/x/net/context
This switches most usages of the pre-1.7 context library to use the
standard library.  Remaining usages are in:

  app/publisher/main.go
  pkg/fs/...

Change-Id: Ia74acc39499dcb39892342a2c9a2776537cf49f1
2017-11-26 01:12:26 -08:00
mpl ee13a3060b pkg/index: ignore NaN in EXIF lat/long
Fixes #927

Change-Id: I40b151ca0af30a65263c1daf9597221136ccdf54
2017-05-23 22:36:44 +02:00
mpl 3182297641 pkg/index: ignore unset msdos time when possible
If a zip archive is created without specifying the modtimes of the
files, they'll end up with a default modtime set to the MSDOS epoch
(1980-01-01 modulo some timezone and silly details), which is a common
enough occurrence.

Even when the index has a better information, such as the EXIF time,
when clients of the index (the web UI, through the search package) sort
by creation time, they use the oldest indexed time available, which is
unfortunate in that case.

Therefore, this CL makes the indexer ignore the oldest time found, if it
is before the MSDOS epoch, and if we have another time available, when
receiving a file.

Also fixed the use of hardcoded value of keyFileTimes.name, to help with
reading/searching code.

Change-Id: I9c2c39b319fdf6cd5214cab8928dd025451077ac
2017-03-13 18:11:23 +01:00
mpl 1951498e63 pkg/index: use missing dep mechanism for static sets too
We relied on missTrackFetcher to return errMissingDep when the
underlying Fetch() returned os.ErrNotExist. The caller could then know
how to act if some indexing operation failed because of an errMissingDep
error.

This was wrong for 2 reasons:

1) if a function fn(tf blob.Fetcher) error does:

	if _, _, err := tf.Fetch(br); err != nil {
		return fmt.Errorf("wrapping this error in a nicer error
message: %v", err)
	}

when we call err := fn(tf), we lose the ability to directly determine
whether err is an errMissingDep. We'd have to parse the error string,
which is gross.

This is exactly what happens in populateDir, when we call
dr.StaticSet().

And in order to fix issue #738, we want to be able to tell when a call
to dr.StaticSet() failed because the underlying Fetch() operation
failed.

2) The blob.Fetcher interface specifically states that os.ErrNotExist
should be returned when a blob is not found. We were breaking that rule
by returning errMissingDep.

In order to address both 1) and 2), it seemed like we could add an err
field to missTrackFetcher to keep track of when an os.ErrNotExist
occurred during a Fetch, and let Fetch return an os.ErrNotExist.
However, that would not work, as a missTrackFetcher is used concurrently
by several callers, so a given caller wouldn't be able to tell whether
"its" Fetch failed or a Fetch from a concurrent caller failed.

Therefore, we introduce trackErrorsFetcher, that has such an error field,
and that wraps the missTrackFetcher. All the callers can keep on sharing
the missTrackFetcher, but each of them initialize their own
trackErrorsFetcher, and can check the errors field after a failed call to a
function is suspected to be the result of a failed Fetch.

Also added a test to demonstrate that issue #738 is fixed.

Fixes #738

Change-Id: Ia5c3081b71c77be1e8cff0bbc847ade68f019bf9
2017-03-03 00:11:42 +01:00
mpl 66d75cbcf9 pkg/index: simplify out of order indexing
There's a race that takes place at the end of the reindexing process.

In func (x *Index) Reindex(), we wait for all the reindexing goroutines
to be done, with wg.Wait. However, any (or all) of those goroutines
could have triggered (they call indexBlob, which calls
blobserver.Receive, which calls noteBlobIndexed, which can send on
tickleOoo) an asynchronous out of order reindexing which is NOT waited
on.

The race can be trivially demonstrated by changing:

  WaitTickle:
  	for range ix.tickleOoo {
+  		time.Sleep(5*time.Second)

in receive.go, and running:
  go test ./pkg/index/ -run TestReindex_*

This CL rewrites the out of order indexing implementation to make it
simpler, and in the process, fixes the above bug.

Fixes #756

Change-Id: If79fb1ad8869cefce4a095ef2becdba333732bce
2017-02-17 20:29:02 +01:00
Brad Fitzpatrick db50bae0c4 pkg/index: read blob before acquiring index mutex
For #878

Change-Id: I8abaf5d923fc6dee7e8a9a3e84f82d4cf7484329
2016-11-08 08:47:59 -08:00
mpl 25652d66d9 pkg/index: use mime.TypeByExtension to record MIMEType
When receiving a file, we were only trying to guess its MIME type
through its contents (pkg/magic). We're now making a better effort at it
by guessing from the filename extension if needed.

Also:

pkg/magic: get rid of all the extra video extensions that are already
covered by mime.TypeByExtension. Because it's redundant and
confusing.

app/publisher, pkg/types/camtypes: also use mime.TypeByExtension as an
extra effort. Especially since a reindex would be necessary to benefit
from the pkg/index change.
There are other places in Camlistore that could use such an effort.
Maybe we should have a camtypes.*FileInfo.MIME() method that tries all
the ways to guess the MIME type of the file?

Change-Id: Ib9a2bc42af77c5394dac578ae415524b5111ad4e
2016-09-06 16:26:09 +02:00
mpl f9a8e002b8 pkg/index: test showing issue #756
A word of caution: relatedly to the issue demonstrated by the added
tests, an infinite loop can also occur, as it already could in
TestReindex_LevelDB. As it is, after all, a consequence of a race, I
haven't been able to determine what exactly makes the loop occur. But
what I observed is:

1) It seems to be occuring much more easily with LevelDB, which is why I
ended up just disabling TestReindex_LevelDB.
2) I've never seen it happen in TestReindex_Kvfile, but who knows.
3) I've seen it rarely happen with TestShowReindexRace_Kvfile, but it
seems that adding in TestShowReindexRace_Kvfile the kind of timed kill
that I had added TestReindex_LevelDB, actually makes the loop happen
much more often. And it ends up eclipsing the original issue that we
want to demonstrate, which is why I decided against it.

TL;DR: if you use -show_reindex_race=true , be prepared to maybe
have to kill(1) the test manually.

Change-Id: I47fd3c55363c8d0dda17ad19665115cb96f3d58f
2016-08-05 16:37:50 +02:00
Eric Drechsel 95f9a6b9a8 index: store exifgps keys without exponent
also check bounds of long, lat before storing

Fixes #758

Change-Id: Ife59ebeec23210bcb821a47765319c76688f7daa
2016-05-16 09:39:30 -07:00
Brad Fitzpatrick e93e4f3822 Fix deadlock in search/index.
The describe requests were launching a storm of RLocks which weren't
safe in the presence of goroutines trying to acquire write locks.

Instead, make the corpus locking the responsibility of the caller and
add Lock/Unlock/RLock/RUnlock methods to the index and move locking up
a level.

This also adds a fair bit of context.Context plumbing which was used
in earlier debugging.

Fixes camlistore/camlistore#709

Change-Id: I8d7254d1e1da541f8c080d62f5408aac807fd3b1
2016-04-22 14:57:10 -07:00
Tamás Gulácsi 8d6b156a0b Misc syntax cleanup found by gosimple.
https://github.com/dominikh/go-simple

Thanks to Dominik Honnef for this great little tool!

Change-Id: I789b3a37e18f535df1ff0da47c0366ed01b2429e
2016-04-04 17:19:57 +02:00
mpl e0d719ba21 pkg/types: remove
Most of it replaced with vendor/go4.org/types and
vendor/go4.org/readerutil

u32 went where needed in pkg/blobserver/*
invertedBool went in pkg/types/serverconfig
atomics64 went in pkg/fs

Change-Id: I230426cda35be4b45ed67e869f14e6fdae89be22
2016-02-05 18:28:47 +01:00
Stephen Searles 23457fb56a adding keys to fields to make go vet happy
Change-Id: I28e38da6f5499c3284e647b1c123bcfc882120f7
2016-01-09 00:34:55 -08:00
mpl 6af01f6c71 vendor: move pkg/images dependencies from third_party
This change is in anticipation of moving pkg/images to go4.org, where it
should not depend on packages in third_party.

So:
third_party/github.com/nf/cr2 -> vendor/github.com/nf/cr2
third_party/github.com/rwcarlsen/goexif -> vendor/github.com/rwcarlsen/goexif
third_party/golang.org/x/image/tiff -> vendor/golang.org/x/image/tiff

Note that third_party/go/pkg/image/jpeg was also a dependency of
pkg/images. We had vendored image/jpeg from tip at the time because it
offered advantages over the version from Go1.3
(https://github.com/camlistore/camlistore/issues/463).
Since we now depend on Go1.5, we can go back to depend on the stdlib
version, so we simply remove third_party/go/pkg/image/jpeg and adjust
the imports accordingly.

Change-Id: Ifc8ffae0551102e644a0a0c67f3ff89e04df15c7
2015-12-18 22:15:33 +01:00
mpl a7ccb62bf6 vendor: mv github.com/hjfreyer/taglib-go from third_party
Also bump it at 0ef8bba9c41b66c12f60ce9833786838d2c2d3d8 to fix panic

Fixes #647

Change-Id: Ic348ef6a19446de6a027d93aab748224b5f46a1d
2015-10-27 23:18:07 +01:00
mpl 2b2ad502e5 pkg/index/receive: unless CAMLI_DEBUG_IMAGES, be less EXIF-verbose
Change-Id: Iec0c9acae268285980ce04943c58d56ae19fb711
2015-10-22 17:37:21 +02:00
Felix Geller 036f8b19af pkg/index: Pass whole file for finding modtime when needed
When reading EXIF data for large tiff files, our optimistic
file prefix sometimes isn't enough and we need to pass in
the whole file. We already did this in several places (image
decoding and indexing), this change adds it for finding the
modtime (for which we try to use EXIF data when
available). It pulls common functionality out into a
separate func and changes the existing uses of this pattern
to make use of the func.

Change-Id: I2b786775f168f47f46fb5ac707e3744991139a21
2015-05-26 08:16:10 +12:00
Felix Geller f4ff53bbac pkg/index: Retry whole file when EXIF data can't be read
When reading EXIF data from larger TIFF files, we might fail
to read the EXIF data when we only pass in the in-memory
prefix. This change identifies when the third-party library
encounters a short read on a tag/EXIF data and triggers a
retry with the whole file by returning an ErrUnexpectedEOF.

Change-Id: Ie5cdc1613db6ccac49d91a69827f11ca3406a74b
2015-05-25 07:52:50 +12:00
Brad Fitzpatrick 8229c19850 search, index: add WholeRef to pkg camtypes' FileInfo struct
So when you describe a file, you also gets its wholeref.

TODO: we'll need to migrate old indexes to this new format on
start-up.

Change-Id: I4a3fb000d68bde46474275c2070ef285a6d6ecfc
2015-02-04 21:04:39 -08:00
mpl fda1399e9c index,camtool: try and cope better with broken exif
http://camlistore.org/issue/493

Change-Id: I40aebd67252cf82a3a5a143af6c258d7ed2aecda
2014-11-10 19:05:35 +01:00
Salmān Aljammāz 0d6e0c6425 index: avoid shadowing err when retrying to index a full file
Change-Id: Ie683739039116dfb2758c6647382afebaa6e1ece
2014-10-08 18:40:00 +01:00
mpl f15c5a7cd2 index/receive: address last comments from http://camlistore.org/r/3271
Change-Id: Id41278e5e01b9ea9310b392859709a3261dc3f52
2014-10-07 17:21:14 +02:00
Salmān Aljammāz e14c122c52 indexer: images: try a FileReader if the prefix is too small for DecodeConfig
Go's image.DecodeConfig needs more than 1MiB on some images (e.g. some
Lens Blur pics taken with Google Camera). Now we first try a 512KiB header
and retry with a full FileReader if that fails.

https://camlistore.org/bugs/477

Change-Id: I286d15d86a69951737d94dd3692d4e9e1992b324
2014-10-07 12:13:33 +00:00
Fabian Wickborn 59a451c2dc Merge upstream goexif
This pulls the changes from the current HEAD of
https://github.com/rwcarlsen/goexif
(eb2943811adc24a1a40d6dc0525995d4f8563d08)

Notable changes:
- Removed explicit panics in favor of error returns
- renamed TypeCategory to Format and made format calculated upon
  decoding rather than repeatedly for every format call
- Merged contributions from Camlistore (exif.LatLong(), exif.DateTime()
  etc.)
- Change String method to just return the string value - and don't have
  square brackets if only a single value
- add separate Int and Int64 retrieval methods
- Doc updates

Minor changes in camlistore.org/pkg/* were neccessary to reflect
changes in the API (handling of returned errors) and in names of
exported fields and methods.

Change-Id: I50412b5e68d2c9ca766ff2ad1a4ac26926baccab
2014-09-17 10:40:38 +02:00
Fabian Wickborn 2aed1b8241 Renamed goexif folder to match upstream URL
In the advent of github.com/camlistore/goexif to be closed, this
commit renames the goexif folder in third_party to match the
upstream on GitHub.

The affected import paths have been rewritten accordingly.

Change-Id: I5a8871efd01987944b7f5e93979307857ae16fe7
2014-09-05 17:27:59 +02:00
Fabian Wickborn f0d9c04bc2 Merge goexif with upstream package
This pulls the changes from the current HEAD of
https://github.com/rwcarlsen/goexif
(rev cf045e9d6ba052fd348f82394d364cca4937589a)

Changes to goexif from upstream:
- Add support for reading Nikon and Canon maker notes
- Adds parser registration to exif package
- Renamed cmd to exifstat
- Renamed exported fields and methods in goexif/tiff
- adds support for bare tiff images. bug fix and minor cosmetics
- support pulling the thumbnail
- adds thumbnail support to exif pkg
- tiff defines DataType and constants for the datatype values
- Update covnertVals and TypeCategory to use new constants for DataType
- Renamed test data dir in exif tests
- created type+constants for raw tiff tag data types

Not merged from upstream:
- ~1 MB of test JPGs in goexif/exif/samples

Minor changes in camlistore.org/pkg/* were neccessary to reflect the
name changes in the exported fields and methods.

Change-Id: I0fdcad2d7b5e01e0d4160a5eb52b8ec750d353cf
2014-09-05 08:36:42 +02:00
mpl f1953edb88 Merge "index: actually reindex when out of order" 2014-08-14 20:00:00 +00:00
mpl 0628249db1 index: actually reindex when out of order
problem: the out-of-order mechanism based on the outOfOrderIndexerLoop
was not working for some claims.

Let C be a delete claim on permanode P. If C was received before P was,
C was marked as being received with the "have" index row. However, for
the deletion to be marked in the index, some information about P is
needed (its meta row), so C could not be fully indexed upon reception.
Then, when P was finally received, the outOfOrderIndexerLoop would kick
in and retry indexing C. Which would fail, because a test based on the
"have" row would (wrongly) detect that C is already indexed and return
early.

In this patch:

-we introduce the "|indexed" suffix to the "have" - value part - row
(receive.go). If a blob is received but some of its dependencies are
missing, the have row value is written without the suffix. Upon
reception of a blob, we now test for the presence of the suffix in the
have row. If missing, the reception continues instead of returning
early. The existing mechanism that was detecting missing dependencies
for file blobs has been adapted to work with this suffix too.

-the index enumeration (enumstat.go), which relies on "have" rows, has
been adapted to work with the new "have" row format, while staying
compatible with the old format. And related tests have been added.

http://camlistore.org/issue/454

Change-Id: I2559d08a12b2a4e0f0691fc7e31f1ed1f874625e
2014-08-14 17:03:26 +02:00
Brad Fitzpatrick 286b53f119 index: use Exif.LatLong accessor
This code is now moved to the exif package.

Change-Id: Ifba2e0b6a96c076e75179528e8ea9a4c0641d843
2014-07-13 10:25:04 -07:00
Bill Thiede eb7f66fe28 jpeg: enable images/jpeg imported from Go tip.
Addresses https://camlistore.org/issue/463

Change-Id: Ie7b8f937ded78d95875f4cd13b024d0429136981
2014-07-02 21:22:15 -07:00
mpl 443f405962 index: fix data race on BlobSource, make it private.
index.New was starting outOfOrderIndexerLoop in a goroutine. And
outOfOrderIndexerLoop had an if index.BlobSource == nil check, on which
it relied to go on. However, since BlobSource was public and unguarded,
the following sequence was possible:

ix, _ := index.New()
ix.BlobSource = bs

which is racy because the BlobSource assignment may or may not happen
before the check within outOfOrderIndexerLoop.

TestOutOfOrderIndexing was relying on the fact that apparently most
of the time the assignment seems to be happening before the check.

This patch:
-makes BlobSource (now blobSource) private, rendering the race impossible
out of the index package.
-moves the initialization of blobSource, as well as the execution of
outOfOrderIndexerLoop at a unique point, in InitBlobSource (new method).
-makes sure all accesses to blobSource are guarded with the index mutex
(now a RWMutex).

Context: while working on tests for http://camlistore.org/issue/454

Change-Id: I9605f26b41abd62b42880be0620b06ce143761bc
2014-06-27 22:29:29 +02:00
Daniel Erat aa391ecdd1 index: Index MusicBrainz album IDs from music files.
Index "MusicBrainz Album ID" ID3v2 frames as
"musicbrainzalbumid" media tags to facilitate downloading
cover art from coverartarchive.org.

Change-Id: Ie81017dd6f76ec355ee0d1daedfb7180cb70ad59
2014-05-05 20:45:58 -07:00