Commit Graph

64 Commits

Author SHA1 Message Date
Felix Geller 036f8b19af pkg/index: Pass whole file for finding modtime when needed
When reading EXIF data for large tiff files, our optimistic
file prefix sometimes isn't enough and we need to pass in
the whole file. We already did this in several places (image
decoding and indexing), this change adds it for finding the
modtime (for which we try to use EXIF data when
available). It pulls common functionality out into a
separate func and changes the existing uses of this pattern
to make use of the func.

Change-Id: I2b786775f168f47f46fb5ac707e3744991139a21
2015-05-26 08:16:10 +12:00
Felix Geller f4ff53bbac pkg/index: Retry whole file when EXIF data can't be read
When reading EXIF data from larger TIFF files, we might fail
to read the EXIF data when we only pass in the in-memory
prefix. This change identifies when the third-party library
encounters a short read on a tag/EXIF data and triggers a
retry with the whole file by returning an ErrUnexpectedEOF.

Change-Id: Ie5cdc1613db6ccac49d91a69827f11ca3406a74b
2015-05-25 07:52:50 +12:00
Brad Fitzpatrick 8229c19850 search, index: add WholeRef to pkg camtypes' FileInfo struct
So when you describe a file, you also gets its wholeref.

TODO: we'll need to migrate old indexes to this new format on
start-up.

Change-Id: I4a3fb000d68bde46474275c2070ef285a6d6ecfc
2015-02-04 21:04:39 -08:00
mpl fda1399e9c index,camtool: try and cope better with broken exif
http://camlistore.org/issue/493

Change-Id: I40aebd67252cf82a3a5a143af6c258d7ed2aecda
2014-11-10 19:05:35 +01:00
Salmān Aljammāz 0d6e0c6425 index: avoid shadowing err when retrying to index a full file
Change-Id: Ie683739039116dfb2758c6647382afebaa6e1ece
2014-10-08 18:40:00 +01:00
mpl f15c5a7cd2 index/receive: address last comments from http://camlistore.org/r/3271
Change-Id: Id41278e5e01b9ea9310b392859709a3261dc3f52
2014-10-07 17:21:14 +02:00
Salmān Aljammāz e14c122c52 indexer: images: try a FileReader if the prefix is too small for DecodeConfig
Go's image.DecodeConfig needs more than 1MiB on some images (e.g. some
Lens Blur pics taken with Google Camera). Now we first try a 512KiB header
and retry with a full FileReader if that fails.

https://camlistore.org/bugs/477

Change-Id: I286d15d86a69951737d94dd3692d4e9e1992b324
2014-10-07 12:13:33 +00:00
Fabian Wickborn 59a451c2dc Merge upstream goexif
This pulls the changes from the current HEAD of
https://github.com/rwcarlsen/goexif
(eb2943811adc24a1a40d6dc0525995d4f8563d08)

Notable changes:
- Removed explicit panics in favor of error returns
- renamed TypeCategory to Format and made format calculated upon
  decoding rather than repeatedly for every format call
- Merged contributions from Camlistore (exif.LatLong(), exif.DateTime()
  etc.)
- Change String method to just return the string value - and don't have
  square brackets if only a single value
- add separate Int and Int64 retrieval methods
- Doc updates

Minor changes in camlistore.org/pkg/* were neccessary to reflect
changes in the API (handling of returned errors) and in names of
exported fields and methods.

Change-Id: I50412b5e68d2c9ca766ff2ad1a4ac26926baccab
2014-09-17 10:40:38 +02:00
Fabian Wickborn 2aed1b8241 Renamed goexif folder to match upstream URL
In the advent of github.com/camlistore/goexif to be closed, this
commit renames the goexif folder in third_party to match the
upstream on GitHub.

The affected import paths have been rewritten accordingly.

Change-Id: I5a8871efd01987944b7f5e93979307857ae16fe7
2014-09-05 17:27:59 +02:00
Fabian Wickborn f0d9c04bc2 Merge goexif with upstream package
This pulls the changes from the current HEAD of
https://github.com/rwcarlsen/goexif
(rev cf045e9d6ba052fd348f82394d364cca4937589a)

Changes to goexif from upstream:
- Add support for reading Nikon and Canon maker notes
- Adds parser registration to exif package
- Renamed cmd to exifstat
- Renamed exported fields and methods in goexif/tiff
- adds support for bare tiff images. bug fix and minor cosmetics
- support pulling the thumbnail
- adds thumbnail support to exif pkg
- tiff defines DataType and constants for the datatype values
- Update covnertVals and TypeCategory to use new constants for DataType
- Renamed test data dir in exif tests
- created type+constants for raw tiff tag data types

Not merged from upstream:
- ~1 MB of test JPGs in goexif/exif/samples

Minor changes in camlistore.org/pkg/* were neccessary to reflect the
name changes in the exported fields and methods.

Change-Id: I0fdcad2d7b5e01e0d4160a5eb52b8ec750d353cf
2014-09-05 08:36:42 +02:00
mpl f1953edb88 Merge "index: actually reindex when out of order" 2014-08-14 20:00:00 +00:00
mpl 0628249db1 index: actually reindex when out of order
problem: the out-of-order mechanism based on the outOfOrderIndexerLoop
was not working for some claims.

Let C be a delete claim on permanode P. If C was received before P was,
C was marked as being received with the "have" index row. However, for
the deletion to be marked in the index, some information about P is
needed (its meta row), so C could not be fully indexed upon reception.
Then, when P was finally received, the outOfOrderIndexerLoop would kick
in and retry indexing C. Which would fail, because a test based on the
"have" row would (wrongly) detect that C is already indexed and return
early.

In this patch:

-we introduce the "|indexed" suffix to the "have" - value part - row
(receive.go). If a blob is received but some of its dependencies are
missing, the have row value is written without the suffix. Upon
reception of a blob, we now test for the presence of the suffix in the
have row. If missing, the reception continues instead of returning
early. The existing mechanism that was detecting missing dependencies
for file blobs has been adapted to work with this suffix too.

-the index enumeration (enumstat.go), which relies on "have" rows, has
been adapted to work with the new "have" row format, while staying
compatible with the old format. And related tests have been added.

http://camlistore.org/issue/454

Change-Id: I2559d08a12b2a4e0f0691fc7e31f1ed1f874625e
2014-08-14 17:03:26 +02:00
Brad Fitzpatrick 286b53f119 index: use Exif.LatLong accessor
This code is now moved to the exif package.

Change-Id: Ifba2e0b6a96c076e75179528e8ea9a4c0641d843
2014-07-13 10:25:04 -07:00
Bill Thiede eb7f66fe28 jpeg: enable images/jpeg imported from Go tip.
Addresses https://camlistore.org/issue/463

Change-Id: Ie7b8f937ded78d95875f4cd13b024d0429136981
2014-07-02 21:22:15 -07:00
mpl 443f405962 index: fix data race on BlobSource, make it private.
index.New was starting outOfOrderIndexerLoop in a goroutine. And
outOfOrderIndexerLoop had an if index.BlobSource == nil check, on which
it relied to go on. However, since BlobSource was public and unguarded,
the following sequence was possible:

ix, _ := index.New()
ix.BlobSource = bs

which is racy because the BlobSource assignment may or may not happen
before the check within outOfOrderIndexerLoop.

TestOutOfOrderIndexing was relying on the fact that apparently most
of the time the assignment seems to be happening before the check.

This patch:
-makes BlobSource (now blobSource) private, rendering the race impossible
out of the index package.
-moves the initialization of blobSource, as well as the execution of
outOfOrderIndexerLoop at a unique point, in InitBlobSource (new method).
-makes sure all accesses to blobSource are guarded with the index mutex
(now a RWMutex).

Context: while working on tests for http://camlistore.org/issue/454

Change-Id: I9605f26b41abd62b42880be0620b06ce143761bc
2014-06-27 22:29:29 +02:00
Daniel Erat aa391ecdd1 index: Index MusicBrainz album IDs from music files.
Index "MusicBrainz Album ID" ID3v2 frames as
"musicbrainzalbumid" media tags to facilitate downloading
cover art from coverartarchive.org.

Change-Id: Ie81017dd6f76ec355ee0d1daedfb7180cb70ad59
2014-05-05 20:45:58 -07:00
Brad Fitzpatrick bf2764cdfe index: rename the reindex method to indexBlob, to be less confusing.
Also, upon server --reindex, check that no out-of-order blobs are
pending.  From a quick reading, they shouldn't be, but I'm curious to
see.  Will do a full reindex of my data later.

Change-Id: Idebf93cc264e55512afcfb99e47320dd0ae745d1
2014-04-06 14:03:38 -07:00
Brad Fitzpatrick bf2f09cab3 index: reschedule indexing a claim blob if public key blob isn't yet available
Change-Id: Ie0174bf830eb4790080b2b5e7cdc4ea0af25406f
2014-04-02 13:39:36 -07:00
Brad Fitzpatrick bfc607fee7 index: reindex blobs when dependent blobs arrive out-of-order
Keep track of missing dependencies both in memory and in the index's
underlying sorted.KeyValue. When we see a dependent blob arrive, see
if we can reindex things.

Fixes camlistore.org/issue/102

Change-Id: I3d8cfc463e4b8c9d158be8f9656e772839b093b9
2014-03-15 08:44:09 -07:00
Brad Fitzpatrick bf94a73859 Get rid of SeekFetcher vs StreamingFetcher distinction and complexity.
StreamingFetcher is now just Fetcher, and its FetchStreaming is now
just Fetch.

SeekFetcher is gone. Blobs are max 16 MB anyway, so we can slurp to
memory when needed. The main thing that cared about SeekFetcher
was the GET handler, ServeBlobref, because http.ServeContent needed
one for range requests. That's rewritten in an earlier commit, using
the FakeSeeker from another earlier commit.

Lot of code got simpler as a result.

Change-Id: Ib819413e48a8f9b8d97f596d0fbf771dab211f11
2014-03-14 12:29:13 -07:00
Brad Fitzpatrick bf01b14961 index: move seekFetcherMissTracker up a layer
In prep for missing blob dependency rescheduling in indexer.

Change-Id: I1d492e6aa64cfb658daec17e4621d1453c6d3607
2014-03-14 09:14:46 -07:00
Tamás Gulácsi 97520583b8 Use 'uint32' instead of 'int64' for blob sizes everywhere.
Not just in blob.SizedRef, but in blobserver.Fetch and
blobserver.FetchStreaming, too.
Blobs have a max size of 10-32 MB anyway, and the index.Corpus is now using
uint32 to save memory.

Change-Id: I1172445c2f9463fdaee55bfe0f1218d44be4aa53
2014-02-08 17:58:12 +01:00
Daniel Erat 5603ea8e0d pkg/index: Index audio duration.
Add pkg/media with code to calculate MPEG audio duration.
Index it in a "durationms" property.

Change-Id: Ifb6251657cadc365ef3f5667a0512fde17575560
2014-01-25 10:40:06 -08:00
Daniel Erat 404548d31a pkg/index: Index more music-related properties.
Add disc and mediaref (a hash of the audio portion of the
file).

Also relocate taglib code to
third_party/github.com/hjfreyer/taglib-go.

Change-Id: I58364f525b787484af894663125163095256d7c6
2014-01-22 21:25:05 -08:00
Daniel Erat 704d3c6bfc pkg/index: Rename audiotag to mediatag.
Also fix up keys and values and add tests.

Change-Id: I7e6c5c4315705442e3517456f2ba16419af49f2f
2014-01-20 21:46:39 -08:00
Brad Fitzpatrick 5b03c3f8fb search, index: let media tags be searchable too.
git push from Dolores Park. Sorry, no tests. Dan Erat will tell me if
this doesn't work.

Change-Id: I557cc3d07983390b8a15b7756ee0825fced2f503
2014-01-20 15:47:36 -08:00
Brad Fitzpatrick 14b950496f index, corpus: prevent indexing dup blobs
With the sync handler + indexer in same process subscribing to all
incoming blobs, we were indexing everything twice.

Fixes camlistore.org/issue/306

Change-Id: I7da54a0e18ac613eeae36d6db29b6cdb73a37196
2013-12-30 20:17:47 -08:00
Brad Fitzpatrick a11ff22b8e camlistored: add --reindex flag; make sqlkv a sorted.Wiper
Change-Id: I6b16c1c32187fb754d3acdbe852d02a506236078
2013-12-23 19:07:17 -08:00
Brad Fitzpatrick a7b3f4ee01 index: index all photo EXIF tags
Change-Id: I00b2eebfc75de38eed5c212ac6d52e0da07297bc
2013-12-23 16:21:19 -08:00
Bill Thiede 2d4fb25c34 images: fix Decode when resize + rotate + max W/H.
Adds more tests to cover rotations with resize when used with
MaxWidth/MaxHeight, previously only ScaledWidth/ScaledHeight were
tested.

Improve tests to compare bounds when determining equality, otherwise
an image sized 0x0 is equal to all other images.

Sort test image filenames so test order is stable and obvious.

Keep more data in memory when indexing images upon receive.  Some
largish CR2 files need more data or the EXIF parsing will fail.

Should address some or all of https://camlistore.org/issue/274

Change-Id: I80d90c33538c9d62ce4480ccb58c003e18ee6629
2013-12-16 10:01:07 -08:00
Brad Fitzpatrick 91d735df4b index: start of re-indexing smartly when dependent blobs are missing
See https://camlistore.org/issue/102

Change-Id: Ia5f69475d8f47398bc228a96e7694d59edf277bf
2013-11-30 23:15:17 -08:00
mpl 6c75ceb8b5 pkg/index: do not record a keySignerAttrValue on DelAttributeClaim
Change-Id: Ib1f81fe4879de2be7d484a5a40cc6bf0449893d5
2013-11-30 00:56:09 +01:00
mpl 1ee5fd20c5 search: deletions are not modifications
1) pkg/search: documented that deletions times do not
qualify as modtimes

2) pkg/index: got rid of DeletedAt, and keyDeletes

http://camlistore.org/issue/191

Change-Id: I39578913345454d36af4599e29e7053f46577846
2013-11-29 00:29:57 +01:00
mpl 42e37d4456 pkg/index: update the deletes cache when receiving a delete claim
http://camlistore.org/issue/191

Change-Id: I49da2ef4e43675fba6a80db29ba96a473c159403
2013-11-27 18:44:39 +01:00
mpl c81f3147f6 pkg/index: write relevant keys when receiving a delete claim
This change:

1) Checks if the incoming claim is a delete claim with the use
of GetBlobMeta.

2) write the keyDeleted and keyDeletes keys when it's a delete
claim, plus the usual keys when the target is a permanode.

Yet to be done in the next CLs:
1) update the index deletes cache upon reception of a delete claim
2) update most of the search functions so they use deletedAt properly
3) add new keys necessary for GetRecentPermanodes to give a fully
correct result.

I also made indextest.DumpIndex public because it turned to be useful
to debug within pkg/search/ as well.

http://camlistore.org/issue/191

Change-Id: I8d8b9d12a535b8b1de0018b4a0e359241f14d52a
2013-11-19 18:02:12 +01:00
Brad Fitzpatrick e8603b1293 Put claims in memory too for in-memory search. Required index schema version bump.
Change-Id: I194d65476bddea111277cd0b1472c56b5527226b
2013-11-17 16:52:51 -08:00
Brad Fitzpatrick 3eb493599e in-memory search: better structure for keeping memory corpus and kv
index in sync, both at start-up and while running and receiving blobs.
They both use the same mechanism now.

Also adds KeyId to the index and Corpus, as the next step. Plenty more
row types remain...

Change-Id: Id79955ba25dc79d5fbd94b0e5248d33dcf71d97e
2013-11-17 09:41:45 -08:00
Brad Fitzpatrick f3cc3c7ed9 search: more in-memory search work. make tests verify Scan doesn't hit Storage.
also some string interning work.

Change-Id: I7864b56eb97318bce943afdca3b1212f4729a9a8
2013-11-16 18:50:01 -08:00
Brad Fitzpatrick 2984897ac7 search: more in-memory search work.
keep blob metadata in memory, and start of testing all search queries in three modes:
classic index.Storage scanning, all in-memory with corpus scanned from the index.Storage,
and the in-memory corpus built up over time as blobs arrive.

Change-Id: I40536e498a63bece5bd4897cdbbd0cef78085f44
2013-11-16 17:24:02 -08:00
Brad Fitzpatrick 705107ad80 search/index: invert depedency. search now depends on index.
creates new package types/camtypes for misc types needed by both. might eventually go away as
search matures.

Change-Id: Ib771ead7bea39936ba478b7e5d58de997060861b
2013-11-16 15:00:30 -08:00
mpl e03d923fe1 pkg/index: use a map to populate the mutations
When indexing upon a blob reception, we first populate
all the mutations in a map instead of in a batch mutation.
Then we transfer all the mutations in a batch and commit
it immediately. This makes the window when the batch mutation
is open much shorter, and will ease future indexing because
it allows reading from the index while writing the mutations
to the map.

Change-Id: I276282388f59ca543835bfa5ec64986453b23fe1
2013-11-15 01:23:21 +01:00
mpl 5031b01880 pkg/index: keyType keyPermanodeClaim for "claim" index entry
The index entry prefixed by "claim" had no keyType and
was always built "by hand" with pipes concatenation.
This change adds the documented keyPermanodeClaim to fix
that.

Change-Id: Ic59f7dbcccc6b223b155d5bffbf8e636209800cb
2013-11-08 16:20:43 +01:00
Brad Fitzpatrick 8319411ab4 Convert more ReceiveBlob into blobserver.Receive or blobserver.ReceiveNoHash
Change-Id: I9199555324b617167a6062a8b55ed09b449bae4f
2013-09-16 15:57:14 +01:00
mpl d488c576fc search: support for static directory children
This change introduces a new index entry
to help with finding the children of a static directory.
It also fixes ResolvePrefixHop so that it takes
into account static directories, and not only collections.

This is the first step to support publishing static directories.

http://camlistore.org/issue/179

Change-Id: I5666e5caa6c782004054ae4c19a6b6119d4fda8b
2013-09-10 23:06:48 +02:00
Brad Fitzpatrick 00d8ff5275 index: remove now-longer-necessary blob hash check
Change-Id: Ia2a79655832a840d37666b94a1f101042861c8ff
2013-09-08 12:38:20 -07:00
Hunter Freyer 6940b3991f Basic code to index id3 (and other audio) tags.
Does a few things:

1) Adds gotaglib to third_party. If you'd like to review that, feel
free, though there's a bit of organization I'd like to do first.

2) Adds an "audioTag" key type.

3) Indexes wholerefs by various audio tags. Doesn't yet add a map from
wholeref to tags, but I can add that next.

Change-Id: I8e2a5bc27260086bad3351ac57973d1ac23cff44
2013-09-02 14:39:51 -04:00
Brad Fitzpatrick b24cad68dd Cleanup: remove BlobHub and time.Duration waits from storage interface
Move up a layer to the HTTP.  Also, start to remove ContextWrapper
stuff.  We've done it differently for App Engine instead, and will do
it differently yet moving forward.

Also add blobserver.Receive and use it in most places, moving checksum
verification up a layer.

Bunch of other cleanup and TODO fixing too.

Much simpler and cleaner.

Change-Id: I12e56c5d4e53bfcf82bdd8fb0b6d57c248ff605c
2013-08-21 13:57:28 -07:00
Brad Fitzpatrick 0bdf20884b all: delete pkg/blobref; convert all from *blobref.BlobRef to new blob.Ref
Change-Id: Id2dfb7f19452bedf4f3c9310b36227fd8117b225
2013-08-03 19:54:30 -07:00
Brad Fitzpatrick 9468e5ba70 More docs. Every package is documented now.
misc.CountingReader moves into readerutil.

pkg/atomics is folded into pkg/types.

pkg/test/testdep is folded into pkg/test, with better name/docs.

Old cruft from pkg/webserver is deleted.

Change-Id: I3f72d8b29804254ef944995fb085837c878f79f5
2013-07-07 21:12:30 -07:00
Brad Fitzpatrick 7fd16c5df4 remove debugging
Change-Id: If83580e85cfb350bba059dde9e7bccb0c7658e99
2013-06-10 19:23:34 +02:00