When reading EXIF data for large tiff files, our optimistic
file prefix sometimes isn't enough and we need to pass in
the whole file. We already did this in several places (image
decoding and indexing), this change adds it for finding the
modtime (for which we try to use EXIF data when
available). It pulls common functionality out into a
separate func and changes the existing uses of this pattern
to make use of the func.
Change-Id: I2b786775f168f47f46fb5ac707e3744991139a21
When reading EXIF data from larger TIFF files, we might fail
to read the EXIF data when we only pass in the in-memory
prefix. This change identifies when the third-party library
encounters a short read on a tag/EXIF data and triggers a
retry with the whole file by returning an ErrUnexpectedEOF.
Change-Id: Ie5cdc1613db6ccac49d91a69827f11ca3406a74b
So when you describe a file, you also gets its wholeref.
TODO: we'll need to migrate old indexes to this new format on
start-up.
Change-Id: I4a3fb000d68bde46474275c2070ef285a6d6ecfc
Go's image.DecodeConfig needs more than 1MiB on some images (e.g. some
Lens Blur pics taken with Google Camera). Now we first try a 512KiB header
and retry with a full FileReader if that fails.
https://camlistore.org/bugs/477
Change-Id: I286d15d86a69951737d94dd3692d4e9e1992b324
This pulls the changes from the current HEAD of
https://github.com/rwcarlsen/goexif
(eb2943811adc24a1a40d6dc0525995d4f8563d08)
Notable changes:
- Removed explicit panics in favor of error returns
- renamed TypeCategory to Format and made format calculated upon
decoding rather than repeatedly for every format call
- Merged contributions from Camlistore (exif.LatLong(), exif.DateTime()
etc.)
- Change String method to just return the string value - and don't have
square brackets if only a single value
- add separate Int and Int64 retrieval methods
- Doc updates
Minor changes in camlistore.org/pkg/* were neccessary to reflect
changes in the API (handling of returned errors) and in names of
exported fields and methods.
Change-Id: I50412b5e68d2c9ca766ff2ad1a4ac26926baccab
In the advent of github.com/camlistore/goexif to be closed, this
commit renames the goexif folder in third_party to match the
upstream on GitHub.
The affected import paths have been rewritten accordingly.
Change-Id: I5a8871efd01987944b7f5e93979307857ae16fe7
This pulls the changes from the current HEAD of
https://github.com/rwcarlsen/goexif
(rev cf045e9d6ba052fd348f82394d364cca4937589a)
Changes to goexif from upstream:
- Add support for reading Nikon and Canon maker notes
- Adds parser registration to exif package
- Renamed cmd to exifstat
- Renamed exported fields and methods in goexif/tiff
- adds support for bare tiff images. bug fix and minor cosmetics
- support pulling the thumbnail
- adds thumbnail support to exif pkg
- tiff defines DataType and constants for the datatype values
- Update covnertVals and TypeCategory to use new constants for DataType
- Renamed test data dir in exif tests
- created type+constants for raw tiff tag data types
Not merged from upstream:
- ~1 MB of test JPGs in goexif/exif/samples
Minor changes in camlistore.org/pkg/* were neccessary to reflect the
name changes in the exported fields and methods.
Change-Id: I0fdcad2d7b5e01e0d4160a5eb52b8ec750d353cf
problem: the out-of-order mechanism based on the outOfOrderIndexerLoop
was not working for some claims.
Let C be a delete claim on permanode P. If C was received before P was,
C was marked as being received with the "have" index row. However, for
the deletion to be marked in the index, some information about P is
needed (its meta row), so C could not be fully indexed upon reception.
Then, when P was finally received, the outOfOrderIndexerLoop would kick
in and retry indexing C. Which would fail, because a test based on the
"have" row would (wrongly) detect that C is already indexed and return
early.
In this patch:
-we introduce the "|indexed" suffix to the "have" - value part - row
(receive.go). If a blob is received but some of its dependencies are
missing, the have row value is written without the suffix. Upon
reception of a blob, we now test for the presence of the suffix in the
have row. If missing, the reception continues instead of returning
early. The existing mechanism that was detecting missing dependencies
for file blobs has been adapted to work with this suffix too.
-the index enumeration (enumstat.go), which relies on "have" rows, has
been adapted to work with the new "have" row format, while staying
compatible with the old format. And related tests have been added.
http://camlistore.org/issue/454
Change-Id: I2559d08a12b2a4e0f0691fc7e31f1ed1f874625e
index.New was starting outOfOrderIndexerLoop in a goroutine. And
outOfOrderIndexerLoop had an if index.BlobSource == nil check, on which
it relied to go on. However, since BlobSource was public and unguarded,
the following sequence was possible:
ix, _ := index.New()
ix.BlobSource = bs
which is racy because the BlobSource assignment may or may not happen
before the check within outOfOrderIndexerLoop.
TestOutOfOrderIndexing was relying on the fact that apparently most
of the time the assignment seems to be happening before the check.
This patch:
-makes BlobSource (now blobSource) private, rendering the race impossible
out of the index package.
-moves the initialization of blobSource, as well as the execution of
outOfOrderIndexerLoop at a unique point, in InitBlobSource (new method).
-makes sure all accesses to blobSource are guarded with the index mutex
(now a RWMutex).
Context: while working on tests for http://camlistore.org/issue/454
Change-Id: I9605f26b41abd62b42880be0620b06ce143761bc
Index "MusicBrainz Album ID" ID3v2 frames as
"musicbrainzalbumid" media tags to facilitate downloading
cover art from coverartarchive.org.
Change-Id: Ie81017dd6f76ec355ee0d1daedfb7180cb70ad59
Also, upon server --reindex, check that no out-of-order blobs are
pending. From a quick reading, they shouldn't be, but I'm curious to
see. Will do a full reindex of my data later.
Change-Id: Idebf93cc264e55512afcfb99e47320dd0ae745d1
Keep track of missing dependencies both in memory and in the index's
underlying sorted.KeyValue. When we see a dependent blob arrive, see
if we can reindex things.
Fixes camlistore.org/issue/102
Change-Id: I3d8cfc463e4b8c9d158be8f9656e772839b093b9
StreamingFetcher is now just Fetcher, and its FetchStreaming is now
just Fetch.
SeekFetcher is gone. Blobs are max 16 MB anyway, so we can slurp to
memory when needed. The main thing that cared about SeekFetcher
was the GET handler, ServeBlobref, because http.ServeContent needed
one for range requests. That's rewritten in an earlier commit, using
the FakeSeeker from another earlier commit.
Lot of code got simpler as a result.
Change-Id: Ib819413e48a8f9b8d97f596d0fbf771dab211f11
Not just in blob.SizedRef, but in blobserver.Fetch and
blobserver.FetchStreaming, too.
Blobs have a max size of 10-32 MB anyway, and the index.Corpus is now using
uint32 to save memory.
Change-Id: I1172445c2f9463fdaee55bfe0f1218d44be4aa53
Add disc and mediaref (a hash of the audio portion of the
file).
Also relocate taglib code to
third_party/github.com/hjfreyer/taglib-go.
Change-Id: I58364f525b787484af894663125163095256d7c6
With the sync handler + indexer in same process subscribing to all
incoming blobs, we were indexing everything twice.
Fixes camlistore.org/issue/306
Change-Id: I7da54a0e18ac613eeae36d6db29b6cdb73a37196
Adds more tests to cover rotations with resize when used with
MaxWidth/MaxHeight, previously only ScaledWidth/ScaledHeight were
tested.
Improve tests to compare bounds when determining equality, otherwise
an image sized 0x0 is equal to all other images.
Sort test image filenames so test order is stable and obvious.
Keep more data in memory when indexing images upon receive. Some
largish CR2 files need more data or the EXIF parsing will fail.
Should address some or all of https://camlistore.org/issue/274
Change-Id: I80d90c33538c9d62ce4480ccb58c003e18ee6629
1) pkg/search: documented that deletions times do not
qualify as modtimes
2) pkg/index: got rid of DeletedAt, and keyDeletes
http://camlistore.org/issue/191
Change-Id: I39578913345454d36af4599e29e7053f46577846
This change:
1) Checks if the incoming claim is a delete claim with the use
of GetBlobMeta.
2) write the keyDeleted and keyDeletes keys when it's a delete
claim, plus the usual keys when the target is a permanode.
Yet to be done in the next CLs:
1) update the index deletes cache upon reception of a delete claim
2) update most of the search functions so they use deletedAt properly
3) add new keys necessary for GetRecentPermanodes to give a fully
correct result.
I also made indextest.DumpIndex public because it turned to be useful
to debug within pkg/search/ as well.
http://camlistore.org/issue/191
Change-Id: I8d8b9d12a535b8b1de0018b4a0e359241f14d52a
index in sync, both at start-up and while running and receiving blobs.
They both use the same mechanism now.
Also adds KeyId to the index and Corpus, as the next step. Plenty more
row types remain...
Change-Id: Id79955ba25dc79d5fbd94b0e5248d33dcf71d97e
keep blob metadata in memory, and start of testing all search queries in three modes:
classic index.Storage scanning, all in-memory with corpus scanned from the index.Storage,
and the in-memory corpus built up over time as blobs arrive.
Change-Id: I40536e498a63bece5bd4897cdbbd0cef78085f44
creates new package types/camtypes for misc types needed by both. might eventually go away as
search matures.
Change-Id: Ib771ead7bea39936ba478b7e5d58de997060861b
When indexing upon a blob reception, we first populate
all the mutations in a map instead of in a batch mutation.
Then we transfer all the mutations in a batch and commit
it immediately. This makes the window when the batch mutation
is open much shorter, and will ease future indexing because
it allows reading from the index while writing the mutations
to the map.
Change-Id: I276282388f59ca543835bfa5ec64986453b23fe1
The index entry prefixed by "claim" had no keyType and
was always built "by hand" with pipes concatenation.
This change adds the documented keyPermanodeClaim to fix
that.
Change-Id: Ic59f7dbcccc6b223b155d5bffbf8e636209800cb
This change introduces a new index entry
to help with finding the children of a static directory.
It also fixes ResolvePrefixHop so that it takes
into account static directories, and not only collections.
This is the first step to support publishing static directories.
http://camlistore.org/issue/179
Change-Id: I5666e5caa6c782004054ae4c19a6b6119d4fda8b
Does a few things:
1) Adds gotaglib to third_party. If you'd like to review that, feel
free, though there's a bit of organization I'd like to do first.
2) Adds an "audioTag" key type.
3) Indexes wholerefs by various audio tags. Doesn't yet add a map from
wholeref to tags, but I can add that next.
Change-Id: I8e2a5bc27260086bad3351ac57973d1ac23cff44
Move up a layer to the HTTP. Also, start to remove ContextWrapper
stuff. We've done it differently for App Engine instead, and will do
it differently yet moving forward.
Also add blobserver.Receive and use it in most places, moving checksum
verification up a layer.
Bunch of other cleanup and TODO fixing too.
Much simpler and cleaner.
Change-Id: I12e56c5d4e53bfcf82bdd8fb0b6d57c248ff605c
misc.CountingReader moves into readerutil.
pkg/atomics is folded into pkg/types.
pkg/test/testdep is folded into pkg/test, with better name/docs.
Old cruft from pkg/webserver is deleted.
Change-Id: I3f72d8b29804254ef944995fb085837c878f79f5