perkeep

Commit Graph

Author	SHA1	Message	Date
Felix Geller	036f8b19af	pkg/index: Pass whole file for finding modtime when needed When reading EXIF data for large tiff files, our optimistic file prefix sometimes isn't enough and we need to pass in the whole file. We already did this in several places (image decoding and indexing), this change adds it for finding the modtime (for which we try to use EXIF data when available). It pulls common functionality out into a separate func and changes the existing uses of this pattern to make use of the func. Change-Id: I2b786775f168f47f46fb5ac707e3744991139a21	2015-05-26 08:16:10 +12:00
Felix Geller	f4ff53bbac	pkg/index: Retry whole file when EXIF data can't be read When reading EXIF data from larger TIFF files, we might fail to read the EXIF data when we only pass in the in-memory prefix. This change identifies when the third-party library encounters a short read on a tag/EXIF data and triggers a retry with the whole file by returning an ErrUnexpectedEOF. Change-Id: Ie5cdc1613db6ccac49d91a69827f11ca3406a74b	2015-05-25 07:52:50 +12:00
Brad Fitzpatrick	8229c19850	search, index: add WholeRef to pkg camtypes' FileInfo struct So when you describe a file, you also gets its wholeref. TODO: we'll need to migrate old indexes to this new format on start-up. Change-Id: I4a3fb000d68bde46474275c2070ef285a6d6ecfc	2015-02-04 21:04:39 -08:00
mpl	fda1399e9c	index,camtool: try and cope better with broken exif http://camlistore.org/issue/493 Change-Id: I40aebd67252cf82a3a5a143af6c258d7ed2aecda	2014-11-10 19:05:35 +01:00
Salmān Aljammāz	0d6e0c6425	index: avoid shadowing err when retrying to index a full file Change-Id: Ie683739039116dfb2758c6647382afebaa6e1ece	2014-10-08 18:40:00 +01:00
mpl	f15c5a7cd2	index/receive: address last comments from http://camlistore.org/r/3271 Change-Id: Id41278e5e01b9ea9310b392859709a3261dc3f52	2014-10-07 17:21:14 +02:00
Salmān Aljammāz	e14c122c52	indexer: images: try a FileReader if the prefix is too small for DecodeConfig Go's image.DecodeConfig needs more than 1MiB on some images (e.g. some Lens Blur pics taken with Google Camera). Now we first try a 512KiB header and retry with a full FileReader if that fails. https://camlistore.org/bugs/477 Change-Id: I286d15d86a69951737d94dd3692d4e9e1992b324	2014-10-07 12:13:33 +00:00
Fabian Wickborn	59a451c2dc	Merge upstream goexif This pulls the changes from the current HEAD of https://github.com/rwcarlsen/goexif (eb2943811adc24a1a40d6dc0525995d4f8563d08) Notable changes: - Removed explicit panics in favor of error returns - renamed TypeCategory to Format and made format calculated upon decoding rather than repeatedly for every format call - Merged contributions from Camlistore (exif.LatLong(), exif.DateTime() etc.) - Change String method to just return the string value - and don't have square brackets if only a single value - add separate Int and Int64 retrieval methods - Doc updates Minor changes in camlistore.org/pkg/* were neccessary to reflect changes in the API (handling of returned errors) and in names of exported fields and methods. Change-Id: I50412b5e68d2c9ca766ff2ad1a4ac26926baccab	2014-09-17 10:40:38 +02:00
Fabian Wickborn	2aed1b8241	Renamed goexif folder to match upstream URL In the advent of github.com/camlistore/goexif to be closed, this commit renames the goexif folder in third_party to match the upstream on GitHub. The affected import paths have been rewritten accordingly. Change-Id: I5a8871efd01987944b7f5e93979307857ae16fe7	2014-09-05 17:27:59 +02:00
Fabian Wickborn	f0d9c04bc2	Merge goexif with upstream package This pulls the changes from the current HEAD of https://github.com/rwcarlsen/goexif (rev cf045e9d6ba052fd348f82394d364cca4937589a) Changes to goexif from upstream: - Add support for reading Nikon and Canon maker notes - Adds parser registration to exif package - Renamed cmd to exifstat - Renamed exported fields and methods in goexif/tiff - adds support for bare tiff images. bug fix and minor cosmetics - support pulling the thumbnail - adds thumbnail support to exif pkg - tiff defines DataType and constants for the datatype values - Update covnertVals and TypeCategory to use new constants for DataType - Renamed test data dir in exif tests - created type+constants for raw tiff tag data types Not merged from upstream: - ~1 MB of test JPGs in goexif/exif/samples Minor changes in camlistore.org/pkg/* were neccessary to reflect the name changes in the exported fields and methods. Change-Id: I0fdcad2d7b5e01e0d4160a5eb52b8ec750d353cf	2014-09-05 08:36:42 +02:00
mpl	f1953edb88	Merge "index: actually reindex when out of order"	2014-08-14 20:00:00 +00:00
mpl	0628249db1	index: actually reindex when out of order problem: the out-of-order mechanism based on the outOfOrderIndexerLoop was not working for some claims. Let C be a delete claim on permanode P. If C was received before P was, C was marked as being received with the "have" index row. However, for the deletion to be marked in the index, some information about P is needed (its meta row), so C could not be fully indexed upon reception. Then, when P was finally received, the outOfOrderIndexerLoop would kick in and retry indexing C. Which would fail, because a test based on the "have" row would (wrongly) detect that C is already indexed and return early. In this patch: -we introduce the "\|indexed" suffix to the "have" - value part - row (receive.go). If a blob is received but some of its dependencies are missing, the have row value is written without the suffix. Upon reception of a blob, we now test for the presence of the suffix in the have row. If missing, the reception continues instead of returning early. The existing mechanism that was detecting missing dependencies for file blobs has been adapted to work with this suffix too. -the index enumeration (enumstat.go), which relies on "have" rows, has been adapted to work with the new "have" row format, while staying compatible with the old format. And related tests have been added. http://camlistore.org/issue/454 Change-Id: I2559d08a12b2a4e0f0691fc7e31f1ed1f874625e	2014-08-14 17:03:26 +02:00
Brad Fitzpatrick	286b53f119	index: use Exif.LatLong accessor This code is now moved to the exif package. Change-Id: Ifba2e0b6a96c076e75179528e8ea9a4c0641d843	2014-07-13 10:25:04 -07:00
Bill Thiede	eb7f66fe28	jpeg: enable images/jpeg imported from Go tip. Addresses https://camlistore.org/issue/463 Change-Id: Ie7b8f937ded78d95875f4cd13b024d0429136981	2014-07-02 21:22:15 -07:00
mpl	443f405962	index: fix data race on BlobSource, make it private. index.New was starting outOfOrderIndexerLoop in a goroutine. And outOfOrderIndexerLoop had an if index.BlobSource == nil check, on which it relied to go on. However, since BlobSource was public and unguarded, the following sequence was possible: ix, _ := index.New() ix.BlobSource = bs which is racy because the BlobSource assignment may or may not happen before the check within outOfOrderIndexerLoop. TestOutOfOrderIndexing was relying on the fact that apparently most of the time the assignment seems to be happening before the check. This patch: -makes BlobSource (now blobSource) private, rendering the race impossible out of the index package. -moves the initialization of blobSource, as well as the execution of outOfOrderIndexerLoop at a unique point, in InitBlobSource (new method). -makes sure all accesses to blobSource are guarded with the index mutex (now a RWMutex). Context: while working on tests for http://camlistore.org/issue/454 Change-Id: I9605f26b41abd62b42880be0620b06ce143761bc	2014-06-27 22:29:29 +02:00
Daniel Erat	aa391ecdd1	index: Index MusicBrainz album IDs from music files. Index "MusicBrainz Album ID" ID3v2 frames as "musicbrainzalbumid" media tags to facilitate downloading cover art from coverartarchive.org. Change-Id: Ie81017dd6f76ec355ee0d1daedfb7180cb70ad59	2014-05-05 20:45:58 -07:00
Brad Fitzpatrick	bf2764cdfe	index: rename the reindex method to indexBlob, to be less confusing. Also, upon server --reindex, check that no out-of-order blobs are pending. From a quick reading, they shouldn't be, but I'm curious to see. Will do a full reindex of my data later. Change-Id: Idebf93cc264e55512afcfb99e47320dd0ae745d1	2014-04-06 14:03:38 -07:00
Brad Fitzpatrick	bf2f09cab3	index: reschedule indexing a claim blob if public key blob isn't yet available Change-Id: Ie0174bf830eb4790080b2b5e7cdc4ea0af25406f	2014-04-02 13:39:36 -07:00
Brad Fitzpatrick	bfc607fee7	index: reindex blobs when dependent blobs arrive out-of-order Keep track of missing dependencies both in memory and in the index's underlying sorted.KeyValue. When we see a dependent blob arrive, see if we can reindex things. Fixes camlistore.org/issue/102 Change-Id: I3d8cfc463e4b8c9d158be8f9656e772839b093b9	2014-03-15 08:44:09 -07:00
Brad Fitzpatrick	bf94a73859	Get rid of SeekFetcher vs StreamingFetcher distinction and complexity. StreamingFetcher is now just Fetcher, and its FetchStreaming is now just Fetch. SeekFetcher is gone. Blobs are max 16 MB anyway, so we can slurp to memory when needed. The main thing that cared about SeekFetcher was the GET handler, ServeBlobref, because http.ServeContent needed one for range requests. That's rewritten in an earlier commit, using the FakeSeeker from another earlier commit. Lot of code got simpler as a result. Change-Id: Ib819413e48a8f9b8d97f596d0fbf771dab211f11	2014-03-14 12:29:13 -07:00
Brad Fitzpatrick	bf01b14961	index: move seekFetcherMissTracker up a layer In prep for missing blob dependency rescheduling in indexer. Change-Id: I1d492e6aa64cfb658daec17e4621d1453c6d3607	2014-03-14 09:14:46 -07:00
Tamás Gulácsi	97520583b8	Use 'uint32' instead of 'int64' for blob sizes everywhere. Not just in blob.SizedRef, but in blobserver.Fetch and blobserver.FetchStreaming, too. Blobs have a max size of 10-32 MB anyway, and the index.Corpus is now using uint32 to save memory. Change-Id: I1172445c2f9463fdaee55bfe0f1218d44be4aa53	2014-02-08 17:58:12 +01:00
Daniel Erat	5603ea8e0d	pkg/index: Index audio duration. Add pkg/media with code to calculate MPEG audio duration. Index it in a "durationms" property. Change-Id: Ifb6251657cadc365ef3f5667a0512fde17575560	2014-01-25 10:40:06 -08:00
Daniel Erat	404548d31a	pkg/index: Index more music-related properties. Add disc and mediaref (a hash of the audio portion of the file). Also relocate taglib code to third_party/github.com/hjfreyer/taglib-go. Change-Id: I58364f525b787484af894663125163095256d7c6	2014-01-22 21:25:05 -08:00
Daniel Erat	704d3c6bfc	pkg/index: Rename audiotag to mediatag. Also fix up keys and values and add tests. Change-Id: I7e6c5c4315705442e3517456f2ba16419af49f2f	2014-01-20 21:46:39 -08:00
Brad Fitzpatrick	5b03c3f8fb	search, index: let media tags be searchable too. git push from Dolores Park. Sorry, no tests. Dan Erat will tell me if this doesn't work. Change-Id: I557cc3d07983390b8a15b7756ee0825fced2f503	2014-01-20 15:47:36 -08:00
Brad Fitzpatrick	14b950496f	index, corpus: prevent indexing dup blobs With the sync handler + indexer in same process subscribing to all incoming blobs, we were indexing everything twice. Fixes camlistore.org/issue/306 Change-Id: I7da54a0e18ac613eeae36d6db29b6cdb73a37196	2013-12-30 20:17:47 -08:00
Brad Fitzpatrick	a11ff22b8e	camlistored: add --reindex flag; make sqlkv a sorted.Wiper Change-Id: I6b16c1c32187fb754d3acdbe852d02a506236078	2013-12-23 19:07:17 -08:00
Brad Fitzpatrick	a7b3f4ee01	index: index all photo EXIF tags Change-Id: I00b2eebfc75de38eed5c212ac6d52e0da07297bc	2013-12-23 16:21:19 -08:00
Bill Thiede	2d4fb25c34	images: fix Decode when resize + rotate + max W/H. Adds more tests to cover rotations with resize when used with MaxWidth/MaxHeight, previously only ScaledWidth/ScaledHeight were tested. Improve tests to compare bounds when determining equality, otherwise an image sized 0x0 is equal to all other images. Sort test image filenames so test order is stable and obvious. Keep more data in memory when indexing images upon receive. Some largish CR2 files need more data or the EXIF parsing will fail. Should address some or all of https://camlistore.org/issue/274 Change-Id: I80d90c33538c9d62ce4480ccb58c003e18ee6629	2013-12-16 10:01:07 -08:00
Brad Fitzpatrick	91d735df4b	index: start of re-indexing smartly when dependent blobs are missing See https://camlistore.org/issue/102 Change-Id: Ia5f69475d8f47398bc228a96e7694d59edf277bf	2013-11-30 23:15:17 -08:00
mpl	6c75ceb8b5	pkg/index: do not record a keySignerAttrValue on DelAttributeClaim Change-Id: Ib1f81fe4879de2be7d484a5a40cc6bf0449893d5	2013-11-30 00:56:09 +01:00
mpl	1ee5fd20c5	search: deletions are not modifications 1) pkg/search: documented that deletions times do not qualify as modtimes 2) pkg/index: got rid of DeletedAt, and keyDeletes http://camlistore.org/issue/191 Change-Id: I39578913345454d36af4599e29e7053f46577846	2013-11-29 00:29:57 +01:00
mpl	42e37d4456	pkg/index: update the deletes cache when receiving a delete claim http://camlistore.org/issue/191 Change-Id: I49da2ef4e43675fba6a80db29ba96a473c159403	2013-11-27 18:44:39 +01:00
mpl	c81f3147f6	pkg/index: write relevant keys when receiving a delete claim This change: 1) Checks if the incoming claim is a delete claim with the use of GetBlobMeta. 2) write the keyDeleted and keyDeletes keys when it's a delete claim, plus the usual keys when the target is a permanode. Yet to be done in the next CLs: 1) update the index deletes cache upon reception of a delete claim 2) update most of the search functions so they use deletedAt properly 3) add new keys necessary for GetRecentPermanodes to give a fully correct result. I also made indextest.DumpIndex public because it turned to be useful to debug within pkg/search/ as well. http://camlistore.org/issue/191 Change-Id: I8d8b9d12a535b8b1de0018b4a0e359241f14d52a	2013-11-19 18:02:12 +01:00
Brad Fitzpatrick	e8603b1293	Put claims in memory too for in-memory search. Required index schema version bump. Change-Id: I194d65476bddea111277cd0b1472c56b5527226b	2013-11-17 16:52:51 -08:00
Brad Fitzpatrick	3eb493599e	in-memory search: better structure for keeping memory corpus and kv index in sync, both at start-up and while running and receiving blobs. They both use the same mechanism now. Also adds KeyId to the index and Corpus, as the next step. Plenty more row types remain... Change-Id: Id79955ba25dc79d5fbd94b0e5248d33dcf71d97e	2013-11-17 09:41:45 -08:00
Brad Fitzpatrick	f3cc3c7ed9	search: more in-memory search work. make tests verify Scan doesn't hit Storage. also some string interning work. Change-Id: I7864b56eb97318bce943afdca3b1212f4729a9a8	2013-11-16 18:50:01 -08:00
Brad Fitzpatrick	2984897ac7	search: more in-memory search work. keep blob metadata in memory, and start of testing all search queries in three modes: classic index.Storage scanning, all in-memory with corpus scanned from the index.Storage, and the in-memory corpus built up over time as blobs arrive. Change-Id: I40536e498a63bece5bd4897cdbbd0cef78085f44	2013-11-16 17:24:02 -08:00
Brad Fitzpatrick	705107ad80	search/index: invert depedency. search now depends on index. creates new package types/camtypes for misc types needed by both. might eventually go away as search matures. Change-Id: Ib771ead7bea39936ba478b7e5d58de997060861b	2013-11-16 15:00:30 -08:00
mpl	e03d923fe1	pkg/index: use a map to populate the mutations When indexing upon a blob reception, we first populate all the mutations in a map instead of in a batch mutation. Then we transfer all the mutations in a batch and commit it immediately. This makes the window when the batch mutation is open much shorter, and will ease future indexing because it allows reading from the index while writing the mutations to the map. Change-Id: I276282388f59ca543835bfa5ec64986453b23fe1	2013-11-15 01:23:21 +01:00
mpl	5031b01880	pkg/index: keyType keyPermanodeClaim for "claim" index entry The index entry prefixed by "claim" had no keyType and was always built "by hand" with pipes concatenation. This change adds the documented keyPermanodeClaim to fix that. Change-Id: Ic59f7dbcccc6b223b155d5bffbf8e636209800cb	2013-11-08 16:20:43 +01:00
Brad Fitzpatrick	8319411ab4	Convert more ReceiveBlob into blobserver.Receive or blobserver.ReceiveNoHash Change-Id: I9199555324b617167a6062a8b55ed09b449bae4f	2013-09-16 15:57:14 +01:00
mpl	d488c576fc	search: support for static directory children This change introduces a new index entry to help with finding the children of a static directory. It also fixes ResolvePrefixHop so that it takes into account static directories, and not only collections. This is the first step to support publishing static directories. http://camlistore.org/issue/179 Change-Id: I5666e5caa6c782004054ae4c19a6b6119d4fda8b	2013-09-10 23:06:48 +02:00
Brad Fitzpatrick	00d8ff5275	index: remove now-longer-necessary blob hash check Change-Id: Ia2a79655832a840d37666b94a1f101042861c8ff	2013-09-08 12:38:20 -07:00
Hunter Freyer	6940b3991f	Basic code to index id3 (and other audio) tags. Does a few things: 1) Adds gotaglib to third_party. If you'd like to review that, feel free, though there's a bit of organization I'd like to do first. 2) Adds an "audioTag" key type. 3) Indexes wholerefs by various audio tags. Doesn't yet add a map from wholeref to tags, but I can add that next. Change-Id: I8e2a5bc27260086bad3351ac57973d1ac23cff44	2013-09-02 14:39:51 -04:00
Brad Fitzpatrick	b24cad68dd	Cleanup: remove BlobHub and time.Duration waits from storage interface Move up a layer to the HTTP. Also, start to remove ContextWrapper stuff. We've done it differently for App Engine instead, and will do it differently yet moving forward. Also add blobserver.Receive and use it in most places, moving checksum verification up a layer. Bunch of other cleanup and TODO fixing too. Much simpler and cleaner. Change-Id: I12e56c5d4e53bfcf82bdd8fb0b6d57c248ff605c	2013-08-21 13:57:28 -07:00
Brad Fitzpatrick	0bdf20884b	all: delete pkg/blobref; convert all from *blobref.BlobRef to new blob.Ref Change-Id: Id2dfb7f19452bedf4f3c9310b36227fd8117b225	2013-08-03 19:54:30 -07:00
Brad Fitzpatrick	9468e5ba70	More docs. Every package is documented now. misc.CountingReader moves into readerutil. pkg/atomics is folded into pkg/types. pkg/test/testdep is folded into pkg/test, with better name/docs. Old cruft from pkg/webserver is deleted. Change-Id: I3f72d8b29804254ef944995fb085837c878f79f5	2013-07-07 21:12:30 -07:00
Brad Fitzpatrick	7fd16c5df4	remove debugging Change-Id: If83580e85cfb350bba059dde9e7bccb0c7658e99	2013-06-10 19:23:34 +02:00

1 2

64 Commits