2011-05-09 13:05:58 +00:00
|
|
|
/*
|
|
|
|
Copyright 2011 Google Inc.
|
|
|
|
|
|
|
|
Licensed under the Apache License, Version 2.0 (the "License");
|
|
|
|
you may not use this file except in compliance with the License.
|
|
|
|
You may obtain a copy of the License at
|
|
|
|
|
|
|
|
http://www.apache.org/licenses/LICENSE-2.0
|
|
|
|
|
|
|
|
Unless required by applicable law or agreed to in writing, software
|
|
|
|
distributed under the License is distributed on an "AS IS" BASIS,
|
|
|
|
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
|
|
|
See the License for the specific language governing permissions and
|
|
|
|
limitations under the License.
|
|
|
|
*/
|
|
|
|
|
2011-12-04 22:35:26 +00:00
|
|
|
package server
|
2011-05-09 13:05:58 +00:00
|
|
|
|
|
|
|
import (
|
2014-03-05 16:51:22 +00:00
|
|
|
"bytes"
|
Get rid of QueueCreator and all its associated complexity.
Previous TODO entry was:
-- Get rid of QueueCreator entirely. Plan:
-- sync handler still has a source and dest (one pair) but
instead of calling CreateQueue on the source, it instead
has an index.Storage (configured via a RequiredObject
so it can be a kvfile, leveldb, mysql, postgres etc)
-- make all the index.Storage types be instantiable
from a jsonconfig Object, perhaps with constructors keyed
on a "type" field.
-- make sync handler support blobserver.Receiver (or StatReceiver)
like indexes, so it can receive blobs. but all it needs to
do to acknowledge the ReceiveBlob is write and flush to its
index.Storage. the syncing is async by default. (otherwise callers
could just use "replica" if they wanted sync replication).
But maybe for ease of configuration switching, we could also
support a sync mode. when it needs to replicate a blob,
it uses the source.
-- future option: sync mirror to an alternate path on ReceiveBlob
that can delete. e.g. you're uploading to s3 and google,
but don't want to upload to both at once, so you use the localdisk
as a buffer to spread out your upstream bandwidth.
-- end result: no more hardlinks or queue creator.
Change-Id: I6244fc4f3a655f08470ae3160502659399f468ed
2013-11-22 22:33:31 +00:00
|
|
|
"errors"
|
2011-05-09 13:05:58 +00:00
|
|
|
"fmt"
|
2011-05-11 14:07:31 +00:00
|
|
|
"html"
|
Get rid of QueueCreator and all its associated complexity.
Previous TODO entry was:
-- Get rid of QueueCreator entirely. Plan:
-- sync handler still has a source and dest (one pair) but
instead of calling CreateQueue on the source, it instead
has an index.Storage (configured via a RequiredObject
so it can be a kvfile, leveldb, mysql, postgres etc)
-- make all the index.Storage types be instantiable
from a jsonconfig Object, perhaps with constructors keyed
on a "type" field.
-- make sync handler support blobserver.Receiver (or StatReceiver)
like indexes, so it can receive blobs. but all it needs to
do to acknowledge the ReceiveBlob is write and flush to its
index.Storage. the syncing is async by default. (otherwise callers
could just use "replica" if they wanted sync replication).
But maybe for ease of configuration switching, we could also
support a sync mode. when it needs to replicate a blob,
it uses the source.
-- future option: sync mirror to an alternate path on ReceiveBlob
that can delete. e.g. you're uploading to s3 and google,
but don't want to upload to both at once, so you use the localdisk
as a buffer to spread out your upstream bandwidth.
-- end result: no more hardlinks or queue creator.
Change-Id: I6244fc4f3a655f08470ae3160502659399f468ed
2013-11-22 22:33:31 +00:00
|
|
|
"io"
|
|
|
|
"io/ioutil"
|
2011-05-09 13:05:58 +00:00
|
|
|
"log"
|
Update from r60 to [almost] Go 1.
A lot is still broken, but most stuff at least compiles now.
The directory tree has been rearranged now too. Go libraries are now
under "pkg". Fully qualified, they are e.g. "camlistore.org/pkg/jsonsign".
The go tool cannot yet fetch from arbitrary domains, but discussion is
happening now on which mechanism to use to allow that.
For now, put the camlistore root under $GOPATH/src. Typically $GOPATH
is $HOME, so Camlistore should be at $HOME/src/camlistore.org.
Then you can:
$ go build ./server/camlistored
... etc
The build.pl script is currently disabled. It'll be resurrected at
some point, but with a very different role (helping create a fake
GOPATH and running the go build command, if things are installed at
the wrong place, and/or running fileembed generators).
Many things are certainly broken.
Many things are disabled. (MySQL, all indexing, etc).
Many things need to be moved into
camlistore.org/third_party/{code.google.com,github.com} and updated
from their r60 to Go 1 versions, where applicable.
The GoMySQL stuff should be updated to use database/sql and the ziutek
library implementing database/sql/driver.
Help wanted.
Change-Id: If71217dc5c8f0e70dbe46e9504ca5131c6eeacde
2012-02-19 05:53:06 +00:00
|
|
|
"net/http"
|
2014-03-07 18:56:52 +00:00
|
|
|
"os"
|
2014-03-05 16:51:22 +00:00
|
|
|
"sort"
|
Get rid of QueueCreator and all its associated complexity.
Previous TODO entry was:
-- Get rid of QueueCreator entirely. Plan:
-- sync handler still has a source and dest (one pair) but
instead of calling CreateQueue on the source, it instead
has an index.Storage (configured via a RequiredObject
so it can be a kvfile, leveldb, mysql, postgres etc)
-- make all the index.Storage types be instantiable
from a jsonconfig Object, perhaps with constructors keyed
on a "type" field.
-- make sync handler support blobserver.Receiver (or StatReceiver)
like indexes, so it can receive blobs. but all it needs to
do to acknowledge the ReceiveBlob is write and flush to its
index.Storage. the syncing is async by default. (otherwise callers
could just use "replica" if they wanted sync replication).
But maybe for ease of configuration switching, we could also
support a sync mode. when it needs to replicate a blob,
it uses the source.
-- future option: sync mirror to an alternate path on ReceiveBlob
that can delete. e.g. you're uploading to s3 and google,
but don't want to upload to both at once, so you use the localdisk
as a buffer to spread out your upstream bandwidth.
-- end result: no more hardlinks or queue creator.
Change-Id: I6244fc4f3a655f08470ae3160502659399f468ed
2013-11-22 22:33:31 +00:00
|
|
|
"strconv"
|
2014-03-07 00:52:29 +00:00
|
|
|
"strings"
|
2011-05-11 14:07:31 +00:00
|
|
|
"sync"
|
|
|
|
"time"
|
2011-05-09 13:05:58 +00:00
|
|
|
|
2014-03-08 01:30:01 +00:00
|
|
|
"camlistore.org/pkg/auth"
|
2013-08-04 02:54:30 +00:00
|
|
|
"camlistore.org/pkg/blob"
|
Update from r60 to [almost] Go 1.
A lot is still broken, but most stuff at least compiles now.
The directory tree has been rearranged now too. Go libraries are now
under "pkg". Fully qualified, they are e.g. "camlistore.org/pkg/jsonsign".
The go tool cannot yet fetch from arbitrary domains, but discussion is
happening now on which mechanism to use to allow that.
For now, put the camlistore root under $GOPATH/src. Typically $GOPATH
is $HOME, so Camlistore should be at $HOME/src/camlistore.org.
Then you can:
$ go build ./server/camlistored
... etc
The build.pl script is currently disabled. It'll be resurrected at
some point, but with a very different role (helping create a fake
GOPATH and running the go build command, if things are installed at
the wrong place, and/or running fileembed generators).
Many things are certainly broken.
Many things are disabled. (MySQL, all indexing, etc).
Many things need to be moved into
camlistore.org/third_party/{code.google.com,github.com} and updated
from their r60 to Go 1 versions, where applicable.
The GoMySQL stuff should be updated to use database/sql and the ziutek
library implementing database/sql/driver.
Help wanted.
Change-Id: If71217dc5c8f0e70dbe46e9504ca5131c6eeacde
2012-02-19 05:53:06 +00:00
|
|
|
"camlistore.org/pkg/blobserver"
|
2014-03-05 16:51:22 +00:00
|
|
|
"camlistore.org/pkg/constants"
|
2013-12-17 17:02:10 +00:00
|
|
|
"camlistore.org/pkg/index"
|
2013-11-23 07:24:54 +00:00
|
|
|
"camlistore.org/pkg/sorted"
|
2015-05-02 12:26:33 +00:00
|
|
|
"camlistore.org/pkg/types/camtypes"
|
2016-04-20 23:43:47 +00:00
|
|
|
"code.google.com/p/xsrftoken"
|
2015-12-01 16:19:49 +00:00
|
|
|
"go4.org/jsonconfig"
|
2015-12-12 21:47:31 +00:00
|
|
|
"golang.org/x/net/context"
|
2015-11-20 22:27:00 +00:00
|
|
|
|
|
|
|
"go4.org/syncutil"
|
2011-05-09 13:05:58 +00:00
|
|
|
)
|
|
|
|
|
2013-08-27 21:30:02 +00:00
|
|
|
const (
|
2014-03-05 16:51:22 +00:00
|
|
|
maxRecentErrors = 20
|
|
|
|
queueSyncInterval = 5 * time.Second
|
2013-08-27 21:30:02 +00:00
|
|
|
)
|
2011-05-11 14:07:31 +00:00
|
|
|
|
2014-03-07 00:52:29 +00:00
|
|
|
type blobReceiverEnumerator interface {
|
|
|
|
blobserver.BlobReceiver
|
|
|
|
blobserver.BlobEnumerator
|
|
|
|
}
|
|
|
|
|
2013-11-23 17:09:40 +00:00
|
|
|
// The SyncHandler handles async replication in one direction between
|
|
|
|
// a pair storage targets, a source and target.
|
|
|
|
//
|
|
|
|
// SyncHandler is a BlobReceiver but doesn't actually store incoming
|
|
|
|
// blobs; instead, it records blobs it has received and queues them
|
|
|
|
// for async replication soon, or whenever it can.
|
2011-05-09 13:05:58 +00:00
|
|
|
type SyncHandler struct {
|
2014-03-05 16:51:22 +00:00
|
|
|
// TODO: rate control tunables
|
Get rid of QueueCreator and all its associated complexity.
Previous TODO entry was:
-- Get rid of QueueCreator entirely. Plan:
-- sync handler still has a source and dest (one pair) but
instead of calling CreateQueue on the source, it instead
has an index.Storage (configured via a RequiredObject
so it can be a kvfile, leveldb, mysql, postgres etc)
-- make all the index.Storage types be instantiable
from a jsonconfig Object, perhaps with constructors keyed
on a "type" field.
-- make sync handler support blobserver.Receiver (or StatReceiver)
like indexes, so it can receive blobs. but all it needs to
do to acknowledge the ReceiveBlob is write and flush to its
index.Storage. the syncing is async by default. (otherwise callers
could just use "replica" if they wanted sync replication).
But maybe for ease of configuration switching, we could also
support a sync mode. when it needs to replicate a blob,
it uses the source.
-- future option: sync mirror to an alternate path on ReceiveBlob
that can delete. e.g. you're uploading to s3 and google,
but don't want to upload to both at once, so you use the localdisk
as a buffer to spread out your upstream bandwidth.
-- end result: no more hardlinks or queue creator.
Change-Id: I6244fc4f3a655f08470ae3160502659399f468ed
2013-11-22 22:33:31 +00:00
|
|
|
fromName, toName string
|
|
|
|
from blobserver.Storage
|
2014-03-07 00:52:29 +00:00
|
|
|
to blobReceiverEnumerator
|
2013-11-23 07:24:54 +00:00
|
|
|
queue sorted.KeyValue
|
2013-12-17 17:02:10 +00:00
|
|
|
toIndex bool // whether this sync is from a blob storage to an index
|
2014-03-05 16:51:22 +00:00
|
|
|
idle bool // if true, the handler does nothing other than providing the discovery.
|
|
|
|
copierPoolSize int
|
2011-05-11 14:07:31 +00:00
|
|
|
|
2014-03-05 16:51:22 +00:00
|
|
|
// wakec wakes up the blob syncer loop when a blob is received.
|
|
|
|
wakec chan bool
|
2013-11-26 03:18:13 +00:00
|
|
|
|
2014-03-04 21:57:33 +00:00
|
|
|
mu sync.Mutex // protects following
|
2011-05-11 15:29:04 +00:00
|
|
|
status string
|
2014-03-05 16:51:22 +00:00
|
|
|
copying map[blob.Ref]*copyStatus // to start time
|
|
|
|
needCopy map[blob.Ref]uint32 // blobs needing to be copied. some might be in lastFail too.
|
|
|
|
lastFail map[blob.Ref]failDetail // subset of needCopy that previously failed, and why
|
|
|
|
bytesRemain int64 // sum of needCopy values
|
|
|
|
recentErrors []blob.Ref // up to maxRecentErrors, recent first. valid if still in lastFail.
|
Update from r60 to [almost] Go 1.
A lot is still broken, but most stuff at least compiles now.
The directory tree has been rearranged now too. Go libraries are now
under "pkg". Fully qualified, they are e.g. "camlistore.org/pkg/jsonsign".
The go tool cannot yet fetch from arbitrary domains, but discussion is
happening now on which mechanism to use to allow that.
For now, put the camlistore root under $GOPATH/src. Typically $GOPATH
is $HOME, so Camlistore should be at $HOME/src/camlistore.org.
Then you can:
$ go build ./server/camlistored
... etc
The build.pl script is currently disabled. It'll be resurrected at
some point, but with a very different role (helping create a fake
GOPATH and running the go build command, if things are installed at
the wrong place, and/or running fileembed generators).
Many things are certainly broken.
Many things are disabled. (MySQL, all indexing, etc).
Many things need to be moved into
camlistore.org/third_party/{code.google.com,github.com} and updated
from their r60 to Go 1 versions, where applicable.
The GoMySQL stuff should be updated to use database/sql and the ziutek
library implementing database/sql/driver.
Help wanted.
Change-Id: If71217dc5c8f0e70dbe46e9504ca5131c6eeacde
2012-02-19 05:53:06 +00:00
|
|
|
recentCopyTime time.Time
|
2011-05-11 15:29:04 +00:00
|
|
|
totalCopies int64
|
|
|
|
totalCopyBytes int64
|
2011-05-11 15:49:17 +00:00
|
|
|
totalErrors int64
|
2014-03-08 01:30:01 +00:00
|
|
|
vshards []string // validation shards. if 0, validation not running
|
|
|
|
vshardDone int // shards validated
|
2014-03-17 03:13:47 +00:00
|
|
|
vshardErrs []string
|
|
|
|
vmissing int64 // missing blobs found during validat
|
|
|
|
vdestCount int // number of blobs seen on dest during validate
|
|
|
|
vdestBytes int64 // number of blob bytes seen on dest during validate
|
|
|
|
vsrcCount int // number of blobs seen on src during validate
|
|
|
|
vsrcBytes int64 // number of blob bytes seen on src during validate
|
2015-10-27 23:11:58 +00:00
|
|
|
|
|
|
|
// syncLoop tries to send on alarmIdlec each time we've slept for a full
|
|
|
|
// queueSyncInterval. Initialized as a synchronous chan if we're not an
|
|
|
|
// idle sync handler, otherwise nil.
|
|
|
|
alarmIdlec chan struct{}
|
2011-05-11 14:07:31 +00:00
|
|
|
}
|
|
|
|
|
2014-03-05 16:51:22 +00:00
|
|
|
var (
|
|
|
|
_ blobserver.Storage = (*SyncHandler)(nil)
|
|
|
|
_ blobserver.HandlerIniter = (*SyncHandler)(nil)
|
|
|
|
)
|
|
|
|
|
2013-11-26 03:18:13 +00:00
|
|
|
func (sh *SyncHandler) String() string {
|
|
|
|
return fmt.Sprintf("[SyncHandler %v -> %v]", sh.fromName, sh.toName)
|
|
|
|
}
|
|
|
|
|
|
|
|
func (sh *SyncHandler) logf(format string, args ...interface{}) {
|
|
|
|
log.Printf(sh.String()+" "+format, args...)
|
|
|
|
}
|
|
|
|
|
2011-05-30 06:01:29 +00:00
|
|
|
func init() {
|
|
|
|
blobserver.RegisterHandlerConstructor("sync", newSyncFromConfig)
|
|
|
|
}
|
|
|
|
|
2014-03-07 18:56:52 +00:00
|
|
|
// TODO: this is is temporary. should delete, or decide when it's on by default (probably always).
|
|
|
|
// Then need genconfig option to disable it.
|
|
|
|
var validateOnStartDefault, _ = strconv.ParseBool(os.Getenv("CAMLI_SYNC_VALIDATE"))
|
|
|
|
|
2013-09-01 23:19:07 +00:00
|
|
|
func newSyncFromConfig(ld blobserver.Loader, conf jsonconfig.Obj) (http.Handler, error) {
|
Get rid of QueueCreator and all its associated complexity.
Previous TODO entry was:
-- Get rid of QueueCreator entirely. Plan:
-- sync handler still has a source and dest (one pair) but
instead of calling CreateQueue on the source, it instead
has an index.Storage (configured via a RequiredObject
so it can be a kvfile, leveldb, mysql, postgres etc)
-- make all the index.Storage types be instantiable
from a jsonconfig Object, perhaps with constructors keyed
on a "type" field.
-- make sync handler support blobserver.Receiver (or StatReceiver)
like indexes, so it can receive blobs. but all it needs to
do to acknowledge the ReceiveBlob is write and flush to its
index.Storage. the syncing is async by default. (otherwise callers
could just use "replica" if they wanted sync replication).
But maybe for ease of configuration switching, we could also
support a sync mode. when it needs to replicate a blob,
it uses the source.
-- future option: sync mirror to an alternate path on ReceiveBlob
that can delete. e.g. you're uploading to s3 and google,
but don't want to upload to both at once, so you use the localdisk
as a buffer to spread out your upstream bandwidth.
-- end result: no more hardlinks or queue creator.
Change-Id: I6244fc4f3a655f08470ae3160502659399f468ed
2013-11-22 22:33:31 +00:00
|
|
|
var (
|
2014-03-05 16:51:22 +00:00
|
|
|
from = conf.RequiredString("from")
|
|
|
|
to = conf.RequiredString("to")
|
|
|
|
fullSync = conf.OptionalBool("fullSyncOnStart", false)
|
|
|
|
blockFullSync = conf.OptionalBool("blockingFullSyncOnStart", false)
|
|
|
|
idle = conf.OptionalBool("idle", false)
|
|
|
|
queueConf = conf.OptionalObject("queue")
|
|
|
|
copierPoolSize = conf.OptionalInt("copierPoolSize", 5)
|
2014-03-07 18:56:52 +00:00
|
|
|
validate = conf.OptionalBool("validateOnStart", validateOnStartDefault)
|
Get rid of QueueCreator and all its associated complexity.
Previous TODO entry was:
-- Get rid of QueueCreator entirely. Plan:
-- sync handler still has a source and dest (one pair) but
instead of calling CreateQueue on the source, it instead
has an index.Storage (configured via a RequiredObject
so it can be a kvfile, leveldb, mysql, postgres etc)
-- make all the index.Storage types be instantiable
from a jsonconfig Object, perhaps with constructors keyed
on a "type" field.
-- make sync handler support blobserver.Receiver (or StatReceiver)
like indexes, so it can receive blobs. but all it needs to
do to acknowledge the ReceiveBlob is write and flush to its
index.Storage. the syncing is async by default. (otherwise callers
could just use "replica" if they wanted sync replication).
But maybe for ease of configuration switching, we could also
support a sync mode. when it needs to replicate a blob,
it uses the source.
-- future option: sync mirror to an alternate path on ReceiveBlob
that can delete. e.g. you're uploading to s3 and google,
but don't want to upload to both at once, so you use the localdisk
as a buffer to spread out your upstream bandwidth.
-- end result: no more hardlinks or queue creator.
Change-Id: I6244fc4f3a655f08470ae3160502659399f468ed
2013-11-22 22:33:31 +00:00
|
|
|
)
|
2013-09-01 23:19:07 +00:00
|
|
|
if err := conf.Validate(); err != nil {
|
|
|
|
return nil, err
|
2011-05-30 06:01:29 +00:00
|
|
|
}
|
2013-08-20 18:11:37 +00:00
|
|
|
if idle {
|
2014-03-05 16:51:22 +00:00
|
|
|
return newIdleSyncHandler(from, to), nil
|
2013-08-20 18:11:37 +00:00
|
|
|
}
|
Get rid of QueueCreator and all its associated complexity.
Previous TODO entry was:
-- Get rid of QueueCreator entirely. Plan:
-- sync handler still has a source and dest (one pair) but
instead of calling CreateQueue on the source, it instead
has an index.Storage (configured via a RequiredObject
so it can be a kvfile, leveldb, mysql, postgres etc)
-- make all the index.Storage types be instantiable
from a jsonconfig Object, perhaps with constructors keyed
on a "type" field.
-- make sync handler support blobserver.Receiver (or StatReceiver)
like indexes, so it can receive blobs. but all it needs to
do to acknowledge the ReceiveBlob is write and flush to its
index.Storage. the syncing is async by default. (otherwise callers
could just use "replica" if they wanted sync replication).
But maybe for ease of configuration switching, we could also
support a sync mode. when it needs to replicate a blob,
it uses the source.
-- future option: sync mirror to an alternate path on ReceiveBlob
that can delete. e.g. you're uploading to s3 and google,
but don't want to upload to both at once, so you use the localdisk
as a buffer to spread out your upstream bandwidth.
-- end result: no more hardlinks or queue creator.
Change-Id: I6244fc4f3a655f08470ae3160502659399f468ed
2013-11-22 22:33:31 +00:00
|
|
|
if len(queueConf) == 0 {
|
|
|
|
return nil, errors.New(`Missing required "queue" object`)
|
|
|
|
}
|
2013-11-23 07:24:54 +00:00
|
|
|
q, err := sorted.NewKeyValue(queueConf)
|
Get rid of QueueCreator and all its associated complexity.
Previous TODO entry was:
-- Get rid of QueueCreator entirely. Plan:
-- sync handler still has a source and dest (one pair) but
instead of calling CreateQueue on the source, it instead
has an index.Storage (configured via a RequiredObject
so it can be a kvfile, leveldb, mysql, postgres etc)
-- make all the index.Storage types be instantiable
from a jsonconfig Object, perhaps with constructors keyed
on a "type" field.
-- make sync handler support blobserver.Receiver (or StatReceiver)
like indexes, so it can receive blobs. but all it needs to
do to acknowledge the ReceiveBlob is write and flush to its
index.Storage. the syncing is async by default. (otherwise callers
could just use "replica" if they wanted sync replication).
But maybe for ease of configuration switching, we could also
support a sync mode. when it needs to replicate a blob,
it uses the source.
-- future option: sync mirror to an alternate path on ReceiveBlob
that can delete. e.g. you're uploading to s3 and google,
but don't want to upload to both at once, so you use the localdisk
as a buffer to spread out your upstream bandwidth.
-- end result: no more hardlinks or queue creator.
Change-Id: I6244fc4f3a655f08470ae3160502659399f468ed
2013-11-22 22:33:31 +00:00
|
|
|
if err != nil {
|
|
|
|
return nil, err
|
|
|
|
}
|
|
|
|
|
2013-12-17 17:02:10 +00:00
|
|
|
isToIndex := false
|
2011-05-30 06:01:29 +00:00
|
|
|
fromBs, err := ld.GetStorage(from)
|
|
|
|
if err != nil {
|
2013-09-01 23:19:07 +00:00
|
|
|
return nil, err
|
2011-05-30 06:01:29 +00:00
|
|
|
}
|
|
|
|
toBs, err := ld.GetStorage(to)
|
|
|
|
if err != nil {
|
2013-09-01 23:19:07 +00:00
|
|
|
return nil, err
|
2011-05-30 06:01:29 +00:00
|
|
|
}
|
2013-12-17 17:02:10 +00:00
|
|
|
if _, ok := fromBs.(*index.Index); !ok {
|
|
|
|
if _, ok := toBs.(*index.Index); ok {
|
|
|
|
isToIndex = true
|
|
|
|
}
|
|
|
|
}
|
2013-09-01 23:19:07 +00:00
|
|
|
|
2014-03-05 16:51:22 +00:00
|
|
|
sh := newSyncHandler(from, to, fromBs, toBs, q)
|
|
|
|
sh.toIndex = isToIndex
|
|
|
|
sh.copierPoolSize = copierPoolSize
|
|
|
|
if err := sh.readQueueToMemory(); err != nil {
|
|
|
|
return nil, fmt.Errorf("Error reading sync queue to memory: %v", err)
|
2011-05-30 06:01:29 +00:00
|
|
|
}
|
2012-11-07 21:40:17 +00:00
|
|
|
|
|
|
|
if fullSync || blockFullSync {
|
2014-03-05 16:51:22 +00:00
|
|
|
sh.logf("Doing full sync")
|
2012-11-07 21:40:17 +00:00
|
|
|
didFullSync := make(chan bool, 1)
|
|
|
|
go func() {
|
2014-03-05 16:51:22 +00:00
|
|
|
for {
|
2015-10-26 18:18:01 +00:00
|
|
|
n := sh.runSync("pending blobs queue", sh.enumeratePendingBlobs)
|
2014-03-05 16:51:22 +00:00
|
|
|
if n > 0 {
|
|
|
|
sh.logf("Queue sync copied %d blobs", n)
|
|
|
|
continue
|
|
|
|
}
|
|
|
|
break
|
|
|
|
}
|
|
|
|
n := sh.runSync("full", blobserverEnumerator(context.TODO(), fromBs))
|
2013-11-26 03:18:13 +00:00
|
|
|
sh.logf("Full sync copied %d blobs", n)
|
2012-11-07 21:40:17 +00:00
|
|
|
didFullSync <- true
|
2014-03-05 16:51:22 +00:00
|
|
|
sh.syncLoop()
|
2012-11-07 21:40:17 +00:00
|
|
|
}()
|
|
|
|
if blockFullSync {
|
2013-11-26 03:18:13 +00:00
|
|
|
sh.logf("Blocking startup, waiting for full sync from %q to %q", from, to)
|
2012-11-07 21:40:17 +00:00
|
|
|
<-didFullSync
|
2013-11-26 03:18:13 +00:00
|
|
|
sh.logf("Full sync complete.")
|
2012-11-07 21:40:17 +00:00
|
|
|
}
|
|
|
|
} else {
|
2014-03-05 16:51:22 +00:00
|
|
|
go sh.syncLoop()
|
2012-11-07 21:40:17 +00:00
|
|
|
}
|
2013-01-11 22:21:56 +00:00
|
|
|
|
2014-03-07 00:52:29 +00:00
|
|
|
if validate {
|
2014-03-08 01:30:01 +00:00
|
|
|
go sh.startFullValidation()
|
2014-03-07 00:52:29 +00:00
|
|
|
}
|
|
|
|
|
2013-11-26 03:18:13 +00:00
|
|
|
blobserver.GetHub(fromBs).AddReceiveHook(sh.enqueue)
|
|
|
|
return sh, nil
|
2011-05-30 06:01:29 +00:00
|
|
|
}
|
|
|
|
|
Get rid of QueueCreator and all its associated complexity.
Previous TODO entry was:
-- Get rid of QueueCreator entirely. Plan:
-- sync handler still has a source and dest (one pair) but
instead of calling CreateQueue on the source, it instead
has an index.Storage (configured via a RequiredObject
so it can be a kvfile, leveldb, mysql, postgres etc)
-- make all the index.Storage types be instantiable
from a jsonconfig Object, perhaps with constructors keyed
on a "type" field.
-- make sync handler support blobserver.Receiver (or StatReceiver)
like indexes, so it can receive blobs. but all it needs to
do to acknowledge the ReceiveBlob is write and flush to its
index.Storage. the syncing is async by default. (otherwise callers
could just use "replica" if they wanted sync replication).
But maybe for ease of configuration switching, we could also
support a sync mode. when it needs to replicate a blob,
it uses the source.
-- future option: sync mirror to an alternate path on ReceiveBlob
that can delete. e.g. you're uploading to s3 and google,
but don't want to upload to both at once, so you use the localdisk
as a buffer to spread out your upstream bandwidth.
-- end result: no more hardlinks or queue creator.
Change-Id: I6244fc4f3a655f08470ae3160502659399f468ed
2013-11-22 22:33:31 +00:00
|
|
|
func (sh *SyncHandler) InitHandler(hl blobserver.FindHandlerByTyper) error {
|
|
|
|
_, h, err := hl.FindHandlerByType("root")
|
|
|
|
if err == blobserver.ErrHandlerTypeNotFound {
|
|
|
|
// It's optional. We register ourselves if it's there.
|
|
|
|
return nil
|
|
|
|
}
|
|
|
|
if err != nil {
|
|
|
|
return err
|
|
|
|
}
|
|
|
|
h.(*RootHandler).registerSyncHandler(sh)
|
|
|
|
return nil
|
|
|
|
}
|
|
|
|
|
2014-03-05 16:51:22 +00:00
|
|
|
func newSyncHandler(fromName, toName string,
|
2014-03-07 00:52:29 +00:00
|
|
|
from blobserver.Storage, to blobReceiverEnumerator,
|
2014-03-05 16:51:22 +00:00
|
|
|
queue sorted.KeyValue) *SyncHandler {
|
|
|
|
return &SyncHandler{
|
2015-10-27 23:11:58 +00:00
|
|
|
copierPoolSize: 5,
|
2011-05-11 15:29:04 +00:00
|
|
|
from: from,
|
|
|
|
to: to,
|
|
|
|
fromName: fromName,
|
|
|
|
toName: toName,
|
Get rid of QueueCreator and all its associated complexity.
Previous TODO entry was:
-- Get rid of QueueCreator entirely. Plan:
-- sync handler still has a source and dest (one pair) but
instead of calling CreateQueue on the source, it instead
has an index.Storage (configured via a RequiredObject
so it can be a kvfile, leveldb, mysql, postgres etc)
-- make all the index.Storage types be instantiable
from a jsonconfig Object, perhaps with constructors keyed
on a "type" field.
-- make sync handler support blobserver.Receiver (or StatReceiver)
like indexes, so it can receive blobs. but all it needs to
do to acknowledge the ReceiveBlob is write and flush to its
index.Storage. the syncing is async by default. (otherwise callers
could just use "replica" if they wanted sync replication).
But maybe for ease of configuration switching, we could also
support a sync mode. when it needs to replicate a blob,
it uses the source.
-- future option: sync mirror to an alternate path on ReceiveBlob
that can delete. e.g. you're uploading to s3 and google,
but don't want to upload to both at once, so you use the localdisk
as a buffer to spread out your upstream bandwidth.
-- end result: no more hardlinks or queue creator.
Change-Id: I6244fc4f3a655f08470ae3160502659399f468ed
2013-11-22 22:33:31 +00:00
|
|
|
queue: queue,
|
2014-03-05 16:51:22 +00:00
|
|
|
wakec: make(chan bool),
|
2011-05-11 15:29:04 +00:00
|
|
|
status: "not started",
|
2014-03-05 16:51:22 +00:00
|
|
|
needCopy: make(map[blob.Ref]uint32),
|
|
|
|
lastFail: make(map[blob.Ref]failDetail),
|
|
|
|
copying: make(map[blob.Ref]*copyStatus),
|
2015-10-27 23:11:58 +00:00
|
|
|
alarmIdlec: make(chan struct{}),
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
// NewSyncHandler returns a handler that will asynchronously and continuously
|
|
|
|
// copy blobs from src to dest, if missing on dest.
|
|
|
|
// Blobs waiting to be copied are stored on pendingQueue. srcName and destName are
|
|
|
|
// only used for status and debugging messages.
|
|
|
|
// N.B: blobs should be added to src with a method that notifies the blob hub,
|
|
|
|
// such as blobserver.Receive.
|
|
|
|
func NewSyncHandler(srcName, destName string,
|
|
|
|
src blobserver.Storage, dest blobReceiverEnumerator,
|
|
|
|
pendingQueue sorted.KeyValue) *SyncHandler {
|
|
|
|
sh := newSyncHandler(srcName, destName, src, dest, pendingQueue)
|
|
|
|
go sh.syncLoop()
|
|
|
|
blobserver.GetHub(sh.from).AddReceiveHook(sh.enqueue)
|
|
|
|
return sh
|
|
|
|
}
|
|
|
|
|
|
|
|
// IdleWait waits until the sync handler has finished processing the currently
|
|
|
|
// queued blobs.
|
|
|
|
func (sh *SyncHandler) IdleWait() {
|
|
|
|
if sh.idle {
|
|
|
|
return
|
|
|
|
}
|
|
|
|
<-sh.alarmIdlec
|
|
|
|
}
|
|
|
|
|
|
|
|
func (sh *SyncHandler) signalIdle() {
|
|
|
|
select {
|
|
|
|
case sh.alarmIdlec <- struct{}{}:
|
|
|
|
default:
|
2011-05-09 13:05:58 +00:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2014-03-05 16:51:22 +00:00
|
|
|
func newIdleSyncHandler(fromName, toName string) *SyncHandler {
|
|
|
|
return &SyncHandler{
|
2013-08-20 18:11:37 +00:00
|
|
|
fromName: fromName,
|
|
|
|
toName: toName,
|
|
|
|
idle: true,
|
|
|
|
status: "disabled",
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2015-05-02 12:26:33 +00:00
|
|
|
func (sh *SyncHandler) discovery() camtypes.SyncHandlerDiscovery {
|
|
|
|
return camtypes.SyncHandlerDiscovery{
|
|
|
|
From: sh.fromName,
|
|
|
|
To: sh.toName,
|
|
|
|
ToIndex: sh.toIndex,
|
2013-01-11 22:21:56 +00:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2014-03-17 03:13:47 +00:00
|
|
|
// syncStatus is a snapshot of the current status, for display by the
|
|
|
|
// status handler (status.go) in both JSON and HTML forms.
|
|
|
|
type syncStatus struct {
|
|
|
|
sh *SyncHandler
|
|
|
|
|
|
|
|
From string `json:"from"`
|
|
|
|
FromDesc string `json:"fromDesc"`
|
|
|
|
To string `json:"to"`
|
|
|
|
ToDesc string `json:"toDesc"`
|
|
|
|
DestIsIndex bool `json:"destIsIndex,omitempty"`
|
|
|
|
BlobsToCopy int `json:"blobsToCopy"`
|
|
|
|
BytesToCopy int64 `json:"bytesToCopy"`
|
|
|
|
LastCopySecAgo int `json:"lastCopySecondsAgo,omitempty"`
|
|
|
|
}
|
|
|
|
|
|
|
|
func (sh *SyncHandler) currentStatus() syncStatus {
|
|
|
|
sh.mu.Lock()
|
|
|
|
defer sh.mu.Unlock()
|
|
|
|
ago := 0
|
|
|
|
if !sh.recentCopyTime.IsZero() {
|
|
|
|
ago = int(time.Now().Sub(sh.recentCopyTime).Seconds())
|
|
|
|
}
|
|
|
|
return syncStatus{
|
|
|
|
sh: sh,
|
|
|
|
From: sh.fromName,
|
|
|
|
FromDesc: storageDesc(sh.from),
|
|
|
|
To: sh.toName,
|
|
|
|
ToDesc: storageDesc(sh.to),
|
|
|
|
DestIsIndex: sh.toIndex,
|
|
|
|
BlobsToCopy: len(sh.needCopy),
|
|
|
|
BytesToCopy: sh.bytesRemain,
|
|
|
|
LastCopySecAgo: ago,
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2014-03-05 16:51:22 +00:00
|
|
|
// readQueueToMemory slurps in the pending queue from disk (or
|
|
|
|
// wherever) to memory. Even with millions of blobs, it's not much
|
|
|
|
// memory. The point of the persistent queue is to survive restarts if
|
|
|
|
// the "fullSyncOnStart" option is off. With "fullSyncOnStart" set to
|
|
|
|
// true, this is a little pointless (we'd figure out what's missing
|
|
|
|
// eventually), but this might save us a few minutes (let us start
|
|
|
|
// syncing missing blobs a few minutes earlier) since we won't have to
|
|
|
|
// wait to figure out what the destination is missing.
|
|
|
|
func (sh *SyncHandler) readQueueToMemory() error {
|
|
|
|
errc := make(chan error, 1)
|
|
|
|
blobs := make(chan blob.SizedRef, 16)
|
|
|
|
intr := make(chan struct{})
|
|
|
|
defer close(intr)
|
|
|
|
go func() {
|
|
|
|
errc <- sh.enumerateQueuedBlobs(blobs, intr)
|
|
|
|
}()
|
|
|
|
n := 0
|
|
|
|
for sb := range blobs {
|
|
|
|
sh.addBlobToCopy(sb)
|
|
|
|
n++
|
|
|
|
}
|
|
|
|
sh.logf("Added %d pending blobs from sync queue to pending list", n)
|
|
|
|
return <-errc
|
|
|
|
}
|
|
|
|
|
2011-05-09 13:05:58 +00:00
|
|
|
func (sh *SyncHandler) ServeHTTP(rw http.ResponseWriter, req *http.Request) {
|
2014-03-08 01:30:01 +00:00
|
|
|
if req.Method == "POST" {
|
|
|
|
if req.FormValue("mode") == "validate" {
|
|
|
|
token := req.FormValue("token")
|
2016-01-08 21:15:56 +00:00
|
|
|
if xsrftoken.Valid(token, auth.Token(), "user", "runFullValidate") {
|
2014-03-08 01:30:01 +00:00
|
|
|
sh.startFullValidation()
|
|
|
|
http.Redirect(rw, req, "./", http.StatusFound)
|
2014-03-18 06:00:18 +00:00
|
|
|
return
|
2014-03-08 01:30:01 +00:00
|
|
|
}
|
|
|
|
}
|
|
|
|
http.Error(rw, "Bad POST request", http.StatusBadRequest)
|
|
|
|
return
|
|
|
|
}
|
|
|
|
|
2014-03-17 03:13:47 +00:00
|
|
|
// TODO: remove this lock and instead just call currentStatus,
|
|
|
|
// and transition to using that here.
|
2014-03-04 21:57:33 +00:00
|
|
|
sh.mu.Lock()
|
|
|
|
defer sh.mu.Unlock()
|
2014-03-05 16:51:22 +00:00
|
|
|
f := func(p string, a ...interface{}) {
|
|
|
|
fmt.Fprintf(rw, p, a...)
|
|
|
|
}
|
|
|
|
now := time.Now()
|
|
|
|
f("<h1>Sync Status (for %s to %s)</h1>", sh.fromName, sh.toName)
|
|
|
|
f("<p><b>Current status: </b>%s</p>", html.EscapeString(sh.status))
|
2013-08-20 18:11:37 +00:00
|
|
|
if sh.idle {
|
|
|
|
return
|
|
|
|
}
|
2011-05-11 15:29:04 +00:00
|
|
|
|
2014-03-05 16:51:22 +00:00
|
|
|
f("<h2>Stats:</h2><ul>")
|
|
|
|
f("<li>Source: %s</li>", html.EscapeString(storageDesc(sh.from)))
|
|
|
|
f("<li>Target: %s</li>", html.EscapeString(storageDesc(sh.to)))
|
|
|
|
f("<li>Blobs synced: %d</li>", sh.totalCopies)
|
|
|
|
f("<li>Bytes synced: %d</li>", sh.totalCopyBytes)
|
|
|
|
f("<li>Blobs yet to copy: %d</li>", len(sh.needCopy))
|
|
|
|
f("<li>Bytes yet to copy: %d</li>", sh.bytesRemain)
|
Update from r60 to [almost] Go 1.
A lot is still broken, but most stuff at least compiles now.
The directory tree has been rearranged now too. Go libraries are now
under "pkg". Fully qualified, they are e.g. "camlistore.org/pkg/jsonsign".
The go tool cannot yet fetch from arbitrary domains, but discussion is
happening now on which mechanism to use to allow that.
For now, put the camlistore root under $GOPATH/src. Typically $GOPATH
is $HOME, so Camlistore should be at $HOME/src/camlistore.org.
Then you can:
$ go build ./server/camlistored
... etc
The build.pl script is currently disabled. It'll be resurrected at
some point, but with a very different role (helping create a fake
GOPATH and running the go build command, if things are installed at
the wrong place, and/or running fileembed generators).
Many things are certainly broken.
Many things are disabled. (MySQL, all indexing, etc).
Many things need to be moved into
camlistore.org/third_party/{code.google.com,github.com} and updated
from their r60 to Go 1 versions, where applicable.
The GoMySQL stuff should be updated to use database/sql and the ziutek
library implementing database/sql/driver.
Help wanted.
Change-Id: If71217dc5c8f0e70dbe46e9504ca5131c6eeacde
2012-02-19 05:53:06 +00:00
|
|
|
if !sh.recentCopyTime.IsZero() {
|
2014-03-05 16:51:22 +00:00
|
|
|
f("<li>Most recent copy: %s (%v ago)</li>", sh.recentCopyTime.Format(time.RFC3339), now.Sub(sh.recentCopyTime))
|
|
|
|
}
|
|
|
|
clarification := ""
|
|
|
|
if len(sh.needCopy) == 0 && sh.totalErrors > 0 {
|
|
|
|
clarification = "(all since resolved)"
|
2011-05-11 15:29:04 +00:00
|
|
|
}
|
2014-03-05 16:51:22 +00:00
|
|
|
f("<li>Previous copy errors: %d %s</li>", sh.totalErrors, clarification)
|
|
|
|
f("</ul>")
|
2011-05-11 15:29:04 +00:00
|
|
|
|
2014-03-07 18:56:52 +00:00
|
|
|
f("<h2>Validation</h2>")
|
2014-03-08 01:30:01 +00:00
|
|
|
if len(sh.vshards) == 0 {
|
2014-03-17 03:13:47 +00:00
|
|
|
f("Validation disabled")
|
2016-01-08 21:15:56 +00:00
|
|
|
token := xsrftoken.Generate(auth.Token(), "user", "runFullValidate")
|
2014-03-08 01:30:01 +00:00
|
|
|
f("<form method='POST'><input type='hidden' name='mode' value='validate'><input type='hidden' name='token' value='%s'><input type='submit' value='Start validation'></form>", token)
|
2014-03-07 18:56:52 +00:00
|
|
|
} else {
|
|
|
|
f("<p>Background scan of source and destination to ensure that the destination has everything the source does, or is at least enqueued to sync.</p>")
|
|
|
|
f("<ul>")
|
2014-03-08 01:30:01 +00:00
|
|
|
f("<li>Shards complete: %d/%d (%.1f%%)</li>",
|
|
|
|
sh.vshardDone,
|
|
|
|
len(sh.vshards),
|
|
|
|
100*float64(sh.vshardDone)/float64(len(sh.vshards)))
|
2014-03-17 03:13:47 +00:00
|
|
|
f("<li>Source blobs seen: %d</li>", sh.vsrcCount)
|
|
|
|
f("<li>Source bytes seen: %d</li>", sh.vsrcBytes)
|
2014-03-07 18:56:52 +00:00
|
|
|
f("<li>Dest blobs seen: %d</li>", sh.vdestCount)
|
|
|
|
f("<li>Dest bytes seen: %d</li>", sh.vdestBytes)
|
2014-03-18 06:00:18 +00:00
|
|
|
f("<li>Blobs found missing & enqueued: %d</li>", sh.vmissing)
|
2014-03-17 03:13:47 +00:00
|
|
|
if len(sh.vshardErrs) > 0 {
|
2016-02-13 21:22:06 +00:00
|
|
|
f("<li>Validation errors:<ul>\n")
|
|
|
|
for _, e := range sh.vshardErrs {
|
|
|
|
f(" <li>%s</li>\n", html.EscapeString(e))
|
|
|
|
}
|
|
|
|
f("</li>\n")
|
2014-03-17 03:13:47 +00:00
|
|
|
}
|
2014-03-07 18:56:52 +00:00
|
|
|
f("</ul>")
|
|
|
|
}
|
|
|
|
|
2014-03-05 16:51:22 +00:00
|
|
|
if len(sh.copying) > 0 {
|
|
|
|
f("<h2>Currently Copying</h2><ul>")
|
|
|
|
copying := make([]blob.Ref, 0, len(sh.copying))
|
|
|
|
for br := range sh.copying {
|
|
|
|
copying = append(copying, br)
|
|
|
|
}
|
|
|
|
sort.Sort(blob.ByRef(copying))
|
|
|
|
for _, br := range copying {
|
|
|
|
f("<li>%s</li>\n", sh.copying[br])
|
2011-05-11 15:29:04 +00:00
|
|
|
}
|
2014-03-05 16:51:22 +00:00
|
|
|
f("</ul>")
|
2011-05-11 15:29:04 +00:00
|
|
|
}
|
|
|
|
|
2014-03-05 16:51:22 +00:00
|
|
|
recentErrors := make([]blob.Ref, 0, len(sh.recentErrors))
|
|
|
|
for _, br := range sh.recentErrors {
|
|
|
|
if _, ok := sh.needCopy[br]; ok {
|
|
|
|
// Only show it in the web UI if it's still a problem. Blobs that
|
|
|
|
// have since succeeded just confused people.
|
|
|
|
recentErrors = append(recentErrors, br)
|
|
|
|
}
|
|
|
|
}
|
|
|
|
if len(recentErrors) > 0 {
|
|
|
|
f("<h2>Recent Errors</h2><p>Blobs that haven't successfully copied over yet, and their last errors:</p><ul>")
|
|
|
|
for _, br := range recentErrors {
|
|
|
|
fail := sh.lastFail[br]
|
|
|
|
f("<li>%s: %s: %s</li>\n",
|
|
|
|
br,
|
|
|
|
fail.when.Format(time.RFC3339),
|
|
|
|
html.EscapeString(fail.err.Error()))
|
2011-05-11 14:07:31 +00:00
|
|
|
}
|
2014-03-05 16:51:22 +00:00
|
|
|
f("</ul>")
|
2011-05-11 14:07:31 +00:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2014-03-05 16:51:22 +00:00
|
|
|
func (sh *SyncHandler) setStatusf(s string, args ...interface{}) {
|
Update from r60 to [almost] Go 1.
A lot is still broken, but most stuff at least compiles now.
The directory tree has been rearranged now too. Go libraries are now
under "pkg". Fully qualified, they are e.g. "camlistore.org/pkg/jsonsign".
The go tool cannot yet fetch from arbitrary domains, but discussion is
happening now on which mechanism to use to allow that.
For now, put the camlistore root under $GOPATH/src. Typically $GOPATH
is $HOME, so Camlistore should be at $HOME/src/camlistore.org.
Then you can:
$ go build ./server/camlistored
... etc
The build.pl script is currently disabled. It'll be resurrected at
some point, but with a very different role (helping create a fake
GOPATH and running the go build command, if things are installed at
the wrong place, and/or running fileembed generators).
Many things are certainly broken.
Many things are disabled. (MySQL, all indexing, etc).
Many things need to be moved into
camlistore.org/third_party/{code.google.com,github.com} and updated
from their r60 to Go 1 versions, where applicable.
The GoMySQL stuff should be updated to use database/sql and the ziutek
library implementing database/sql/driver.
Help wanted.
Change-Id: If71217dc5c8f0e70dbe46e9504ca5131c6eeacde
2012-02-19 05:53:06 +00:00
|
|
|
s = time.Now().UTC().Format(time.RFC3339) + ": " + fmt.Sprintf(s, args...)
|
2014-03-04 21:57:33 +00:00
|
|
|
sh.mu.Lock()
|
|
|
|
defer sh.mu.Unlock()
|
2011-05-11 15:29:04 +00:00
|
|
|
sh.status = s
|
2011-05-11 14:07:31 +00:00
|
|
|
}
|
|
|
|
|
2011-05-11 15:49:17 +00:00
|
|
|
type copyResult struct {
|
2013-08-04 02:54:30 +00:00
|
|
|
sb blob.SizedRef
|
Update from r60 to [almost] Go 1.
A lot is still broken, but most stuff at least compiles now.
The directory tree has been rearranged now too. Go libraries are now
under "pkg". Fully qualified, they are e.g. "camlistore.org/pkg/jsonsign".
The go tool cannot yet fetch from arbitrary domains, but discussion is
happening now on which mechanism to use to allow that.
For now, put the camlistore root under $GOPATH/src. Typically $GOPATH
is $HOME, so Camlistore should be at $HOME/src/camlistore.org.
Then you can:
$ go build ./server/camlistored
... etc
The build.pl script is currently disabled. It'll be resurrected at
some point, but with a very different role (helping create a fake
GOPATH and running the go build command, if things are installed at
the wrong place, and/or running fileembed generators).
Many things are certainly broken.
Many things are disabled. (MySQL, all indexing, etc).
Many things need to be moved into
camlistore.org/third_party/{code.google.com,github.com} and updated
from their r60 to Go 1 versions, where applicable.
The GoMySQL stuff should be updated to use database/sql and the ziutek
library implementing database/sql/driver.
Help wanted.
Change-Id: If71217dc5c8f0e70dbe46e9504ca5131c6eeacde
2012-02-19 05:53:06 +00:00
|
|
|
err error
|
2011-05-11 15:49:17 +00:00
|
|
|
}
|
|
|
|
|
2015-12-12 21:47:31 +00:00
|
|
|
func blobserverEnumerator(ctx context.Context, src blobserver.BlobEnumerator) func(chan<- blob.SizedRef, <-chan struct{}) error {
|
Get rid of QueueCreator and all its associated complexity.
Previous TODO entry was:
-- Get rid of QueueCreator entirely. Plan:
-- sync handler still has a source and dest (one pair) but
instead of calling CreateQueue on the source, it instead
has an index.Storage (configured via a RequiredObject
so it can be a kvfile, leveldb, mysql, postgres etc)
-- make all the index.Storage types be instantiable
from a jsonconfig Object, perhaps with constructors keyed
on a "type" field.
-- make sync handler support blobserver.Receiver (or StatReceiver)
like indexes, so it can receive blobs. but all it needs to
do to acknowledge the ReceiveBlob is write and flush to its
index.Storage. the syncing is async by default. (otherwise callers
could just use "replica" if they wanted sync replication).
But maybe for ease of configuration switching, we could also
support a sync mode. when it needs to replicate a blob,
it uses the source.
-- future option: sync mirror to an alternate path on ReceiveBlob
that can delete. e.g. you're uploading to s3 and google,
but don't want to upload to both at once, so you use the localdisk
as a buffer to spread out your upstream bandwidth.
-- end result: no more hardlinks or queue creator.
Change-Id: I6244fc4f3a655f08470ae3160502659399f468ed
2013-11-22 22:33:31 +00:00
|
|
|
return func(dst chan<- blob.SizedRef, intr <-chan struct{}) error {
|
2013-12-02 21:20:51 +00:00
|
|
|
return blobserver.EnumerateAll(ctx, src, func(sb blob.SizedRef) error {
|
Get rid of QueueCreator and all its associated complexity.
Previous TODO entry was:
-- Get rid of QueueCreator entirely. Plan:
-- sync handler still has a source and dest (one pair) but
instead of calling CreateQueue on the source, it instead
has an index.Storage (configured via a RequiredObject
so it can be a kvfile, leveldb, mysql, postgres etc)
-- make all the index.Storage types be instantiable
from a jsonconfig Object, perhaps with constructors keyed
on a "type" field.
-- make sync handler support blobserver.Receiver (or StatReceiver)
like indexes, so it can receive blobs. but all it needs to
do to acknowledge the ReceiveBlob is write and flush to its
index.Storage. the syncing is async by default. (otherwise callers
could just use "replica" if they wanted sync replication).
But maybe for ease of configuration switching, we could also
support a sync mode. when it needs to replicate a blob,
it uses the source.
-- future option: sync mirror to an alternate path on ReceiveBlob
that can delete. e.g. you're uploading to s3 and google,
but don't want to upload to both at once, so you use the localdisk
as a buffer to spread out your upstream bandwidth.
-- end result: no more hardlinks or queue creator.
Change-Id: I6244fc4f3a655f08470ae3160502659399f468ed
2013-11-22 22:33:31 +00:00
|
|
|
select {
|
|
|
|
case dst <- sb:
|
|
|
|
case <-intr:
|
|
|
|
return errors.New("interrupted")
|
|
|
|
}
|
|
|
|
return nil
|
|
|
|
})
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2014-03-05 16:51:22 +00:00
|
|
|
// enumeratePendingBlobs yields blobs from the in-memory pending list (needCopy).
|
|
|
|
// This differs from enumerateQueuedBlobs, which pulls in the on-disk sorted.KeyValue store.
|
|
|
|
func (sh *SyncHandler) enumeratePendingBlobs(dst chan<- blob.SizedRef, intr <-chan struct{}) error {
|
|
|
|
defer close(dst)
|
|
|
|
sh.mu.Lock()
|
|
|
|
var toSend []blob.SizedRef
|
|
|
|
{
|
|
|
|
n := len(sh.needCopy)
|
|
|
|
const maxBatch = 1000
|
|
|
|
if n > maxBatch {
|
|
|
|
n = maxBatch
|
|
|
|
}
|
|
|
|
toSend = make([]blob.SizedRef, 0, n)
|
|
|
|
for br, size := range sh.needCopy {
|
2016-01-08 02:20:50 +00:00
|
|
|
toSend = append(toSend, blob.SizedRef{Ref: br, Size: size})
|
2014-03-05 16:51:22 +00:00
|
|
|
if len(toSend) == n {
|
|
|
|
break
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
sh.mu.Unlock()
|
|
|
|
for _, sb := range toSend {
|
|
|
|
select {
|
|
|
|
case dst <- sb:
|
|
|
|
case <-intr:
|
|
|
|
return nil
|
|
|
|
}
|
|
|
|
}
|
|
|
|
return nil
|
|
|
|
}
|
|
|
|
|
|
|
|
// enumerateQueuedBlobs yields blobs from the on-disk sorted.KeyValue store.
|
|
|
|
// This differs from enumeratePendingBlobs, which sends from the in-memory pending list.
|
Get rid of QueueCreator and all its associated complexity.
Previous TODO entry was:
-- Get rid of QueueCreator entirely. Plan:
-- sync handler still has a source and dest (one pair) but
instead of calling CreateQueue on the source, it instead
has an index.Storage (configured via a RequiredObject
so it can be a kvfile, leveldb, mysql, postgres etc)
-- make all the index.Storage types be instantiable
from a jsonconfig Object, perhaps with constructors keyed
on a "type" field.
-- make sync handler support blobserver.Receiver (or StatReceiver)
like indexes, so it can receive blobs. but all it needs to
do to acknowledge the ReceiveBlob is write and flush to its
index.Storage. the syncing is async by default. (otherwise callers
could just use "replica" if they wanted sync replication).
But maybe for ease of configuration switching, we could also
support a sync mode. when it needs to replicate a blob,
it uses the source.
-- future option: sync mirror to an alternate path on ReceiveBlob
that can delete. e.g. you're uploading to s3 and google,
but don't want to upload to both at once, so you use the localdisk
as a buffer to spread out your upstream bandwidth.
-- end result: no more hardlinks or queue creator.
Change-Id: I6244fc4f3a655f08470ae3160502659399f468ed
2013-11-22 22:33:31 +00:00
|
|
|
func (sh *SyncHandler) enumerateQueuedBlobs(dst chan<- blob.SizedRef, intr <-chan struct{}) error {
|
2013-11-23 19:20:08 +00:00
|
|
|
defer close(dst)
|
2013-12-07 16:43:18 +00:00
|
|
|
it := sh.queue.Find("", "")
|
Get rid of QueueCreator and all its associated complexity.
Previous TODO entry was:
-- Get rid of QueueCreator entirely. Plan:
-- sync handler still has a source and dest (one pair) but
instead of calling CreateQueue on the source, it instead
has an index.Storage (configured via a RequiredObject
so it can be a kvfile, leveldb, mysql, postgres etc)
-- make all the index.Storage types be instantiable
from a jsonconfig Object, perhaps with constructors keyed
on a "type" field.
-- make sync handler support blobserver.Receiver (or StatReceiver)
like indexes, so it can receive blobs. but all it needs to
do to acknowledge the ReceiveBlob is write and flush to its
index.Storage. the syncing is async by default. (otherwise callers
could just use "replica" if they wanted sync replication).
But maybe for ease of configuration switching, we could also
support a sync mode. when it needs to replicate a blob,
it uses the source.
-- future option: sync mirror to an alternate path on ReceiveBlob
that can delete. e.g. you're uploading to s3 and google,
but don't want to upload to both at once, so you use the localdisk
as a buffer to spread out your upstream bandwidth.
-- end result: no more hardlinks or queue creator.
Change-Id: I6244fc4f3a655f08470ae3160502659399f468ed
2013-11-22 22:33:31 +00:00
|
|
|
for it.Next() {
|
|
|
|
br, ok := blob.Parse(it.Key())
|
2014-01-28 20:46:52 +00:00
|
|
|
size, err := strconv.ParseUint(it.Value(), 10, 32)
|
Get rid of QueueCreator and all its associated complexity.
Previous TODO entry was:
-- Get rid of QueueCreator entirely. Plan:
-- sync handler still has a source and dest (one pair) but
instead of calling CreateQueue on the source, it instead
has an index.Storage (configured via a RequiredObject
so it can be a kvfile, leveldb, mysql, postgres etc)
-- make all the index.Storage types be instantiable
from a jsonconfig Object, perhaps with constructors keyed
on a "type" field.
-- make sync handler support blobserver.Receiver (or StatReceiver)
like indexes, so it can receive blobs. but all it needs to
do to acknowledge the ReceiveBlob is write and flush to its
index.Storage. the syncing is async by default. (otherwise callers
could just use "replica" if they wanted sync replication).
But maybe for ease of configuration switching, we could also
support a sync mode. when it needs to replicate a blob,
it uses the source.
-- future option: sync mirror to an alternate path on ReceiveBlob
that can delete. e.g. you're uploading to s3 and google,
but don't want to upload to both at once, so you use the localdisk
as a buffer to spread out your upstream bandwidth.
-- end result: no more hardlinks or queue creator.
Change-Id: I6244fc4f3a655f08470ae3160502659399f468ed
2013-11-22 22:33:31 +00:00
|
|
|
if !ok || err != nil {
|
2013-11-26 03:18:13 +00:00
|
|
|
sh.logf("ERROR: bogus sync queue entry: %q => %q", it.Key(), it.Value())
|
Get rid of QueueCreator and all its associated complexity.
Previous TODO entry was:
-- Get rid of QueueCreator entirely. Plan:
-- sync handler still has a source and dest (one pair) but
instead of calling CreateQueue on the source, it instead
has an index.Storage (configured via a RequiredObject
so it can be a kvfile, leveldb, mysql, postgres etc)
-- make all the index.Storage types be instantiable
from a jsonconfig Object, perhaps with constructors keyed
on a "type" field.
-- make sync handler support blobserver.Receiver (or StatReceiver)
like indexes, so it can receive blobs. but all it needs to
do to acknowledge the ReceiveBlob is write and flush to its
index.Storage. the syncing is async by default. (otherwise callers
could just use "replica" if they wanted sync replication).
But maybe for ease of configuration switching, we could also
support a sync mode. when it needs to replicate a blob,
it uses the source.
-- future option: sync mirror to an alternate path on ReceiveBlob
that can delete. e.g. you're uploading to s3 and google,
but don't want to upload to both at once, so you use the localdisk
as a buffer to spread out your upstream bandwidth.
-- end result: no more hardlinks or queue creator.
Change-Id: I6244fc4f3a655f08470ae3160502659399f468ed
2013-11-22 22:33:31 +00:00
|
|
|
continue
|
|
|
|
}
|
|
|
|
select {
|
2016-01-08 02:20:50 +00:00
|
|
|
case dst <- blob.SizedRef{Ref: br, Size: uint32(size)}:
|
Get rid of QueueCreator and all its associated complexity.
Previous TODO entry was:
-- Get rid of QueueCreator entirely. Plan:
-- sync handler still has a source and dest (one pair) but
instead of calling CreateQueue on the source, it instead
has an index.Storage (configured via a RequiredObject
so it can be a kvfile, leveldb, mysql, postgres etc)
-- make all the index.Storage types be instantiable
from a jsonconfig Object, perhaps with constructors keyed
on a "type" field.
-- make sync handler support blobserver.Receiver (or StatReceiver)
like indexes, so it can receive blobs. but all it needs to
do to acknowledge the ReceiveBlob is write and flush to its
index.Storage. the syncing is async by default. (otherwise callers
could just use "replica" if they wanted sync replication).
But maybe for ease of configuration switching, we could also
support a sync mode. when it needs to replicate a blob,
it uses the source.
-- future option: sync mirror to an alternate path on ReceiveBlob
that can delete. e.g. you're uploading to s3 and google,
but don't want to upload to both at once, so you use the localdisk
as a buffer to spread out your upstream bandwidth.
-- end result: no more hardlinks or queue creator.
Change-Id: I6244fc4f3a655f08470ae3160502659399f468ed
2013-11-22 22:33:31 +00:00
|
|
|
case <-intr:
|
|
|
|
return it.Close()
|
|
|
|
}
|
|
|
|
}
|
|
|
|
return it.Close()
|
|
|
|
}
|
|
|
|
|
2015-10-26 18:18:01 +00:00
|
|
|
func (sh *SyncHandler) runSync(syncType string, enumSrc func(chan<- blob.SizedRef, <-chan struct{}) error) int {
|
Get rid of QueueCreator and all its associated complexity.
Previous TODO entry was:
-- Get rid of QueueCreator entirely. Plan:
-- sync handler still has a source and dest (one pair) but
instead of calling CreateQueue on the source, it instead
has an index.Storage (configured via a RequiredObject
so it can be a kvfile, leveldb, mysql, postgres etc)
-- make all the index.Storage types be instantiable
from a jsonconfig Object, perhaps with constructors keyed
on a "type" field.
-- make sync handler support blobserver.Receiver (or StatReceiver)
like indexes, so it can receive blobs. but all it needs to
do to acknowledge the ReceiveBlob is write and flush to its
index.Storage. the syncing is async by default. (otherwise callers
could just use "replica" if they wanted sync replication).
But maybe for ease of configuration switching, we could also
support a sync mode. when it needs to replicate a blob,
it uses the source.
-- future option: sync mirror to an alternate path on ReceiveBlob
that can delete. e.g. you're uploading to s3 and google,
but don't want to upload to both at once, so you use the localdisk
as a buffer to spread out your upstream bandwidth.
-- end result: no more hardlinks or queue creator.
Change-Id: I6244fc4f3a655f08470ae3160502659399f468ed
2013-11-22 22:33:31 +00:00
|
|
|
enumch := make(chan blob.SizedRef, 8)
|
2012-11-07 21:40:17 +00:00
|
|
|
errch := make(chan error, 1)
|
Get rid of QueueCreator and all its associated complexity.
Previous TODO entry was:
-- Get rid of QueueCreator entirely. Plan:
-- sync handler still has a source and dest (one pair) but
instead of calling CreateQueue on the source, it instead
has an index.Storage (configured via a RequiredObject
so it can be a kvfile, leveldb, mysql, postgres etc)
-- make all the index.Storage types be instantiable
from a jsonconfig Object, perhaps with constructors keyed
on a "type" field.
-- make sync handler support blobserver.Receiver (or StatReceiver)
like indexes, so it can receive blobs. but all it needs to
do to acknowledge the ReceiveBlob is write and flush to its
index.Storage. the syncing is async by default. (otherwise callers
could just use "replica" if they wanted sync replication).
But maybe for ease of configuration switching, we could also
support a sync mode. when it needs to replicate a blob,
it uses the source.
-- future option: sync mirror to an alternate path on ReceiveBlob
that can delete. e.g. you're uploading to s3 and google,
but don't want to upload to both at once, so you use the localdisk
as a buffer to spread out your upstream bandwidth.
-- end result: no more hardlinks or queue creator.
Change-Id: I6244fc4f3a655f08470ae3160502659399f468ed
2013-11-22 22:33:31 +00:00
|
|
|
intr := make(chan struct{})
|
|
|
|
defer close(intr)
|
|
|
|
go func() { errch <- enumSrc(enumch, intr) }()
|
2012-11-07 21:40:17 +00:00
|
|
|
|
|
|
|
nCopied := 0
|
|
|
|
toCopy := 0
|
|
|
|
|
2013-08-04 02:54:30 +00:00
|
|
|
workch := make(chan blob.SizedRef, 1000)
|
2012-11-07 21:40:17 +00:00
|
|
|
resch := make(chan copyResult, 8)
|
2014-03-05 16:51:22 +00:00
|
|
|
FeedWork:
|
2012-11-07 21:40:17 +00:00
|
|
|
for sb := range enumch {
|
2014-03-05 16:51:22 +00:00
|
|
|
if toCopy < sh.copierPoolSize {
|
2012-11-07 21:40:17 +00:00
|
|
|
go sh.copyWorker(resch, workch)
|
2011-05-11 14:07:31 +00:00
|
|
|
}
|
2014-03-05 16:51:22 +00:00
|
|
|
select {
|
|
|
|
case workch <- sb:
|
|
|
|
toCopy++
|
|
|
|
default:
|
2014-03-07 18:56:52 +00:00
|
|
|
// Buffer full. Enough for this batch. Will get it later.
|
2014-03-05 16:51:22 +00:00
|
|
|
break FeedWork
|
|
|
|
}
|
2012-11-07 21:40:17 +00:00
|
|
|
}
|
|
|
|
close(workch)
|
|
|
|
for i := 0; i < toCopy; i++ {
|
2014-03-05 16:51:22 +00:00
|
|
|
sh.setStatusf("Copying blobs")
|
2012-11-07 21:40:17 +00:00
|
|
|
res := <-resch
|
|
|
|
if res.err == nil {
|
2014-01-14 04:10:45 +00:00
|
|
|
nCopied++
|
2011-05-11 15:49:17 +00:00
|
|
|
}
|
2012-11-07 21:40:17 +00:00
|
|
|
}
|
2011-05-11 15:29:04 +00:00
|
|
|
|
2012-11-07 21:40:17 +00:00
|
|
|
if err := <-errch; err != nil {
|
2015-10-26 18:18:01 +00:00
|
|
|
sh.logf("error enumerating for %v sync: %v", syncType, err)
|
2012-11-07 21:40:17 +00:00
|
|
|
}
|
|
|
|
return nCopied
|
|
|
|
}
|
|
|
|
|
2014-03-05 16:51:22 +00:00
|
|
|
func (sh *SyncHandler) syncLoop() {
|
2013-11-26 03:18:13 +00:00
|
|
|
for {
|
|
|
|
t0 := time.Now()
|
|
|
|
|
2014-03-05 16:51:22 +00:00
|
|
|
for sh.runSync(sh.fromName, sh.enumeratePendingBlobs) > 0 {
|
2012-11-07 21:40:17 +00:00
|
|
|
// Loop, before sleeping.
|
2011-05-11 15:29:04 +00:00
|
|
|
}
|
2014-03-05 16:51:22 +00:00
|
|
|
sh.setStatusf("Sleeping briefly before next long poll.")
|
2013-11-26 03:18:13 +00:00
|
|
|
|
|
|
|
d := queueSyncInterval - time.Since(t0)
|
|
|
|
select {
|
|
|
|
case <-time.After(d):
|
2015-10-27 23:11:58 +00:00
|
|
|
sh.signalIdle()
|
2014-03-05 16:51:22 +00:00
|
|
|
case <-sh.wakec:
|
2013-11-26 03:18:13 +00:00
|
|
|
}
|
|
|
|
}
|
2011-05-11 14:07:31 +00:00
|
|
|
}
|
|
|
|
|
2013-08-04 02:54:30 +00:00
|
|
|
func (sh *SyncHandler) copyWorker(res chan<- copyResult, work <-chan blob.SizedRef) {
|
2011-05-11 15:49:17 +00:00
|
|
|
for sb := range work {
|
2014-03-05 16:51:22 +00:00
|
|
|
res <- copyResult{sb, sh.copyBlob(sb)}
|
2013-08-27 21:30:02 +00:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2014-03-05 16:51:22 +00:00
|
|
|
func (sh *SyncHandler) copyBlob(sb blob.SizedRef) (err error) {
|
|
|
|
cs := sh.newCopyStatus(sb)
|
|
|
|
defer func() { cs.setError(err) }()
|
|
|
|
br := sb.Ref
|
2011-05-11 15:29:04 +00:00
|
|
|
|
2014-03-05 16:51:22 +00:00
|
|
|
sh.mu.Lock()
|
|
|
|
sh.copying[br] = cs
|
|
|
|
sh.mu.Unlock()
|
2011-05-11 15:29:04 +00:00
|
|
|
|
2014-03-05 16:51:22 +00:00
|
|
|
if sb.Size > constants.MaxBlobSize {
|
|
|
|
return fmt.Errorf("blob size %d too large; max blob size is %d", sb.Size, constants.MaxBlobSize)
|
2011-05-11 15:29:04 +00:00
|
|
|
}
|
|
|
|
|
2014-03-05 16:51:22 +00:00
|
|
|
cs.setStatus(statusFetching)
|
2014-03-14 19:11:08 +00:00
|
|
|
rc, fromSize, err := sh.from.Fetch(br)
|
2011-05-11 15:29:04 +00:00
|
|
|
if err != nil {
|
2014-03-05 16:51:22 +00:00
|
|
|
return fmt.Errorf("source fetch: %v", err)
|
2011-05-11 15:29:04 +00:00
|
|
|
}
|
|
|
|
if fromSize != sb.Size {
|
2014-03-05 16:51:22 +00:00
|
|
|
rc.Close()
|
|
|
|
return fmt.Errorf("source fetch size mismatch: get=%d, enumerate=%d", fromSize, sb.Size)
|
|
|
|
}
|
|
|
|
|
|
|
|
buf := make([]byte, fromSize)
|
|
|
|
hash := br.Hash()
|
|
|
|
cs.setStatus(statusReading)
|
|
|
|
n, err := io.ReadFull(io.TeeReader(rc,
|
|
|
|
io.MultiWriter(
|
|
|
|
incrWriter{cs, &cs.nread},
|
|
|
|
hash,
|
|
|
|
)), buf)
|
|
|
|
rc.Close()
|
|
|
|
if err != nil {
|
|
|
|
return fmt.Errorf("Read error after %d/%d bytes: %v", n, fromSize, err)
|
|
|
|
}
|
|
|
|
if !br.HashMatches(hash) {
|
|
|
|
return fmt.Errorf("Read data has unexpected digest %x", hash.Sum(nil))
|
2011-05-11 15:29:04 +00:00
|
|
|
}
|
|
|
|
|
2014-03-05 16:51:22 +00:00
|
|
|
cs.setStatus(statusWriting)
|
|
|
|
newsb, err := sh.to.ReceiveBlob(br, io.TeeReader(bytes.NewReader(buf), incrWriter{cs, &cs.nwrite}))
|
2011-05-11 15:29:04 +00:00
|
|
|
if err != nil {
|
2014-03-05 16:51:22 +00:00
|
|
|
return fmt.Errorf("dest write: %v", err)
|
2011-05-11 15:29:04 +00:00
|
|
|
}
|
|
|
|
if newsb.Size != sb.Size {
|
2014-03-05 16:51:22 +00:00
|
|
|
return fmt.Errorf("write size mismatch: source_read=%d but dest_write=%d", sb.Size, newsb.Size)
|
2011-05-11 15:29:04 +00:00
|
|
|
}
|
|
|
|
return nil
|
|
|
|
}
|
|
|
|
|
Get rid of QueueCreator and all its associated complexity.
Previous TODO entry was:
-- Get rid of QueueCreator entirely. Plan:
-- sync handler still has a source and dest (one pair) but
instead of calling CreateQueue on the source, it instead
has an index.Storage (configured via a RequiredObject
so it can be a kvfile, leveldb, mysql, postgres etc)
-- make all the index.Storage types be instantiable
from a jsonconfig Object, perhaps with constructors keyed
on a "type" field.
-- make sync handler support blobserver.Receiver (or StatReceiver)
like indexes, so it can receive blobs. but all it needs to
do to acknowledge the ReceiveBlob is write and flush to its
index.Storage. the syncing is async by default. (otherwise callers
could just use "replica" if they wanted sync replication).
But maybe for ease of configuration switching, we could also
support a sync mode. when it needs to replicate a blob,
it uses the source.
-- future option: sync mirror to an alternate path on ReceiveBlob
that can delete. e.g. you're uploading to s3 and google,
but don't want to upload to both at once, so you use the localdisk
as a buffer to spread out your upstream bandwidth.
-- end result: no more hardlinks or queue creator.
Change-Id: I6244fc4f3a655f08470ae3160502659399f468ed
2013-11-22 22:33:31 +00:00
|
|
|
func (sh *SyncHandler) ReceiveBlob(br blob.Ref, r io.Reader) (sb blob.SizedRef, err error) {
|
|
|
|
n, err := io.Copy(ioutil.Discard, r)
|
|
|
|
if err != nil {
|
|
|
|
return
|
|
|
|
}
|
2016-01-08 02:20:50 +00:00
|
|
|
sb = blob.SizedRef{Ref: br, Size: uint32(n)}
|
2013-11-25 00:20:11 +00:00
|
|
|
return sb, sh.enqueue(sb)
|
|
|
|
}
|
|
|
|
|
2014-03-05 16:51:22 +00:00
|
|
|
// addBlobToCopy adds a blob to copy to memory (not to disk: that's enqueue).
|
|
|
|
// It returns true if it was added, or false if it was a duplicate.
|
|
|
|
func (sh *SyncHandler) addBlobToCopy(sb blob.SizedRef) bool {
|
|
|
|
sh.mu.Lock()
|
|
|
|
defer sh.mu.Unlock()
|
2014-03-06 21:23:27 +00:00
|
|
|
if _, dup := sh.needCopy[sb.Ref]; dup {
|
|
|
|
return false
|
|
|
|
}
|
2014-03-05 16:51:22 +00:00
|
|
|
|
|
|
|
sh.needCopy[sb.Ref] = sb.Size
|
|
|
|
sh.bytesRemain += int64(sb.Size)
|
|
|
|
|
|
|
|
// Non-blocking send to wake up looping goroutine if it's
|
|
|
|
// sleeping...
|
|
|
|
select {
|
|
|
|
case sh.wakec <- true:
|
|
|
|
default:
|
|
|
|
}
|
|
|
|
return true
|
|
|
|
}
|
|
|
|
|
2013-11-25 00:20:11 +00:00
|
|
|
func (sh *SyncHandler) enqueue(sb blob.SizedRef) error {
|
2014-03-05 16:51:22 +00:00
|
|
|
if !sh.addBlobToCopy(sb) {
|
|
|
|
// Dup
|
|
|
|
return nil
|
|
|
|
}
|
Get rid of QueueCreator and all its associated complexity.
Previous TODO entry was:
-- Get rid of QueueCreator entirely. Plan:
-- sync handler still has a source and dest (one pair) but
instead of calling CreateQueue on the source, it instead
has an index.Storage (configured via a RequiredObject
so it can be a kvfile, leveldb, mysql, postgres etc)
-- make all the index.Storage types be instantiable
from a jsonconfig Object, perhaps with constructors keyed
on a "type" field.
-- make sync handler support blobserver.Receiver (or StatReceiver)
like indexes, so it can receive blobs. but all it needs to
do to acknowledge the ReceiveBlob is write and flush to its
index.Storage. the syncing is async by default. (otherwise callers
could just use "replica" if they wanted sync replication).
But maybe for ease of configuration switching, we could also
support a sync mode. when it needs to replicate a blob,
it uses the source.
-- future option: sync mirror to an alternate path on ReceiveBlob
that can delete. e.g. you're uploading to s3 and google,
but don't want to upload to both at once, so you use the localdisk
as a buffer to spread out your upstream bandwidth.
-- end result: no more hardlinks or queue creator.
Change-Id: I6244fc4f3a655f08470ae3160502659399f468ed
2013-11-22 22:33:31 +00:00
|
|
|
// TODO: include current time in encoded value, to attempt to
|
|
|
|
// do in-order delivery to remote side later? Possible
|
|
|
|
// friendly optimization later. Might help peer's indexer have
|
|
|
|
// less missing deps.
|
2013-11-25 00:20:11 +00:00
|
|
|
if err := sh.queue.Set(sb.Ref.String(), fmt.Sprint(sb.Size)); err != nil {
|
|
|
|
return err
|
|
|
|
}
|
2014-03-05 16:51:22 +00:00
|
|
|
return nil
|
|
|
|
}
|
|
|
|
|
2014-03-08 01:30:01 +00:00
|
|
|
func (sh *SyncHandler) startFullValidation() {
|
|
|
|
sh.mu.Lock()
|
|
|
|
if len(sh.vshards) != 0 {
|
|
|
|
sh.mu.Unlock()
|
|
|
|
return
|
|
|
|
}
|
|
|
|
sh.mu.Unlock()
|
|
|
|
|
2014-03-07 00:52:29 +00:00
|
|
|
sh.logf("Running full validation; determining validation shards...")
|
|
|
|
shards := sh.shardPrefixes()
|
|
|
|
|
2014-03-08 01:30:01 +00:00
|
|
|
sh.mu.Lock()
|
|
|
|
if len(sh.vshards) != 0 {
|
|
|
|
sh.mu.Unlock()
|
|
|
|
return
|
|
|
|
}
|
|
|
|
sh.vshards = shards
|
|
|
|
sh.mu.Unlock()
|
|
|
|
|
|
|
|
go sh.runFullValidation()
|
|
|
|
}
|
|
|
|
|
|
|
|
func (sh *SyncHandler) runFullValidation() {
|
2014-03-07 00:52:29 +00:00
|
|
|
var wg sync.WaitGroup
|
|
|
|
|
|
|
|
sh.mu.Lock()
|
2014-03-08 01:30:01 +00:00
|
|
|
shards := sh.vshards
|
|
|
|
wg.Add(len(shards))
|
2014-03-07 00:52:29 +00:00
|
|
|
sh.mu.Unlock()
|
|
|
|
|
2014-03-08 01:30:01 +00:00
|
|
|
sh.logf("full validation beginning with %d shards", len(shards))
|
|
|
|
|
2014-03-07 00:52:29 +00:00
|
|
|
const maxShardWorkers = 30 // arbitrary
|
|
|
|
gate := syncutil.NewGate(maxShardWorkers)
|
|
|
|
|
|
|
|
for _, pfx := range shards {
|
|
|
|
pfx := pfx
|
|
|
|
gate.Start()
|
|
|
|
go func() {
|
|
|
|
wg.Done()
|
|
|
|
defer gate.Done()
|
|
|
|
sh.validateShardPrefix(pfx)
|
|
|
|
}()
|
|
|
|
}
|
|
|
|
wg.Wait()
|
|
|
|
sh.logf("Validation complete")
|
|
|
|
}
|
|
|
|
|
|
|
|
func (sh *SyncHandler) validateShardPrefix(pfx string) (err error) {
|
|
|
|
defer func() {
|
2014-03-17 03:13:47 +00:00
|
|
|
sh.mu.Lock()
|
2014-03-07 00:52:29 +00:00
|
|
|
if err != nil {
|
2014-03-17 03:13:47 +00:00
|
|
|
errs := fmt.Sprintf("Failed to validate prefix %s: %v", pfx, err)
|
|
|
|
sh.logf("%s", errs)
|
|
|
|
sh.vshardErrs = append(sh.vshardErrs, errs)
|
|
|
|
} else {
|
|
|
|
sh.vshardDone++
|
2014-03-07 00:52:29 +00:00
|
|
|
}
|
|
|
|
sh.mu.Unlock()
|
|
|
|
}()
|
2015-12-12 21:47:31 +00:00
|
|
|
ctx, cancel := context.WithCancel(context.TODO())
|
|
|
|
defer cancel()
|
2014-03-07 18:56:52 +00:00
|
|
|
src, serrc := sh.startValidatePrefix(ctx, pfx, false)
|
|
|
|
dst, derrc := sh.startValidatePrefix(ctx, pfx, true)
|
2014-03-18 06:21:53 +00:00
|
|
|
srcErr := &chanError{
|
|
|
|
C: serrc,
|
|
|
|
Wrap: func(err error) error {
|
|
|
|
return fmt.Errorf("Error enumerating source %s for validating shard %s: %v", sh.fromName, pfx, err)
|
|
|
|
},
|
|
|
|
}
|
|
|
|
dstErr := &chanError{
|
|
|
|
C: derrc,
|
|
|
|
Wrap: func(err error) error {
|
|
|
|
return fmt.Errorf("Error enumerating target %s for validating shard %s: %v", sh.toName, pfx, err)
|
|
|
|
},
|
|
|
|
}
|
2014-03-07 00:52:29 +00:00
|
|
|
|
2014-03-18 06:21:53 +00:00
|
|
|
missingc := make(chan blob.SizedRef, 8)
|
|
|
|
go blobserver.ListMissingDestinationBlobs(missingc, func(blob.Ref) {}, src, dst)
|
|
|
|
|
|
|
|
var missing []blob.SizedRef
|
|
|
|
for sb := range missingc {
|
|
|
|
missing = append(missing, sb)
|
2014-03-07 00:52:29 +00:00
|
|
|
}
|
|
|
|
|
2014-03-18 06:21:53 +00:00
|
|
|
if err := srcErr.Get(); err != nil {
|
|
|
|
return err
|
2014-03-07 00:52:29 +00:00
|
|
|
}
|
2014-03-18 06:21:53 +00:00
|
|
|
if err := dstErr.Get(); err != nil {
|
|
|
|
return err
|
2014-03-07 00:52:29 +00:00
|
|
|
}
|
2014-03-18 06:21:53 +00:00
|
|
|
|
|
|
|
for _, sb := range missing {
|
|
|
|
if enqErr := sh.enqueue(sb); enqErr != nil {
|
|
|
|
if err == nil {
|
|
|
|
err = enqErr
|
|
|
|
}
|
|
|
|
} else {
|
|
|
|
sh.mu.Lock()
|
|
|
|
sh.vmissing += 1
|
|
|
|
sh.mu.Unlock()
|
|
|
|
}
|
|
|
|
}
|
|
|
|
return err
|
2014-03-07 00:52:29 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
var errNotPrefix = errors.New("sentinel error: hit blob into the next shard")
|
|
|
|
|
2014-03-07 18:56:52 +00:00
|
|
|
// doDest is false for source and true for dest.
|
2015-12-12 21:47:31 +00:00
|
|
|
func (sh *SyncHandler) startValidatePrefix(ctx context.Context, pfx string, doDest bool) (<-chan blob.SizedRef, <-chan error) {
|
2014-03-07 18:56:52 +00:00
|
|
|
var e blobserver.BlobEnumerator
|
|
|
|
if doDest {
|
|
|
|
e = sh.to
|
|
|
|
} else {
|
|
|
|
e = sh.from
|
|
|
|
}
|
2014-03-07 00:52:29 +00:00
|
|
|
c := make(chan blob.SizedRef, 64)
|
|
|
|
errc := make(chan error, 1)
|
|
|
|
go func() {
|
|
|
|
defer close(c)
|
2014-03-18 06:05:27 +00:00
|
|
|
var last string // last blobref seen; to double check storage's enumeration works correctly.
|
2014-03-07 00:52:29 +00:00
|
|
|
err := blobserver.EnumerateAllFrom(ctx, e, pfx, func(sb blob.SizedRef) error {
|
2014-03-18 06:05:27 +00:00
|
|
|
// Just double-check that the storage target is returning sorted results correctly.
|
|
|
|
brStr := sb.Ref.String()
|
|
|
|
if brStr < pfx {
|
|
|
|
log.Fatalf("Storage target %T enumerate not behaving: %q < requested prefix %q", e, brStr, pfx)
|
|
|
|
}
|
|
|
|
if last != "" && last >= brStr {
|
|
|
|
log.Fatalf("Storage target %T enumerate not behaving: previous %q >= current %q", e, last, brStr)
|
|
|
|
}
|
|
|
|
last = brStr
|
|
|
|
|
2014-03-18 06:02:01 +00:00
|
|
|
// TODO: could add a more efficient method on blob.Ref to do this,
|
|
|
|
// that doesn't involve call String().
|
2014-03-18 06:05:27 +00:00
|
|
|
if !strings.HasPrefix(brStr, pfx) {
|
2014-03-18 06:02:01 +00:00
|
|
|
return errNotPrefix
|
|
|
|
}
|
2014-03-07 00:52:29 +00:00
|
|
|
select {
|
|
|
|
case c <- sb:
|
2014-03-17 03:13:47 +00:00
|
|
|
sh.mu.Lock()
|
2014-03-07 18:56:52 +00:00
|
|
|
if doDest {
|
|
|
|
sh.vdestCount++
|
|
|
|
sh.vdestBytes += int64(sb.Size)
|
2014-03-17 03:13:47 +00:00
|
|
|
} else {
|
|
|
|
sh.vsrcCount++
|
|
|
|
sh.vsrcBytes += int64(sb.Size)
|
2014-03-07 18:56:52 +00:00
|
|
|
}
|
2014-03-17 03:13:47 +00:00
|
|
|
sh.mu.Unlock()
|
2014-03-07 00:52:29 +00:00
|
|
|
return nil
|
|
|
|
case <-ctx.Done():
|
2015-12-12 21:47:31 +00:00
|
|
|
return ctx.Err()
|
2014-03-07 00:52:29 +00:00
|
|
|
}
|
|
|
|
})
|
|
|
|
if err == errNotPrefix {
|
|
|
|
err = nil
|
|
|
|
}
|
2014-03-18 06:21:53 +00:00
|
|
|
if err != nil {
|
|
|
|
// Send a zero value to shut down ListMissingDestinationBlobs.
|
|
|
|
c <- blob.SizedRef{}
|
|
|
|
}
|
2014-03-07 00:52:29 +00:00
|
|
|
errc <- err
|
|
|
|
}()
|
|
|
|
return c, errc
|
|
|
|
}
|
|
|
|
|
|
|
|
func (sh *SyncHandler) shardPrefixes() []string {
|
|
|
|
var pfx []string
|
|
|
|
// TODO(bradfitz): do limit=1 enumerates against sh.from and sh.to with varying
|
|
|
|
// "after" values to determine all the blobref types on both sides.
|
|
|
|
// For now, be lazy and assume only sha1:
|
|
|
|
for i := 0; i < 256; i++ {
|
|
|
|
pfx = append(pfx, fmt.Sprintf("sha1-%02x", i))
|
|
|
|
}
|
|
|
|
return pfx
|
|
|
|
}
|
|
|
|
|
2014-03-05 16:51:22 +00:00
|
|
|
func (sh *SyncHandler) newCopyStatus(sb blob.SizedRef) *copyStatus {
|
|
|
|
now := time.Now()
|
|
|
|
return ©Status{
|
|
|
|
sh: sh,
|
|
|
|
sb: sb,
|
|
|
|
state: statusStarting,
|
|
|
|
start: now,
|
|
|
|
t: now,
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
// copyStatus is an in-progress copy.
|
|
|
|
type copyStatus struct {
|
|
|
|
sh *SyncHandler
|
|
|
|
sb blob.SizedRef
|
|
|
|
start time.Time
|
|
|
|
|
|
|
|
mu sync.Mutex
|
|
|
|
state string // one of statusFoo, below
|
|
|
|
t time.Time // last status update time
|
|
|
|
nread uint32
|
|
|
|
nwrite uint32
|
|
|
|
}
|
|
|
|
|
|
|
|
const (
|
|
|
|
statusStarting = "starting"
|
|
|
|
statusFetching = "fetching source"
|
|
|
|
statusReading = "reading"
|
|
|
|
statusWriting = "writing"
|
|
|
|
)
|
|
|
|
|
|
|
|
func (cs *copyStatus) setStatus(s string) {
|
|
|
|
now := time.Now()
|
|
|
|
cs.mu.Lock()
|
|
|
|
defer cs.mu.Unlock()
|
|
|
|
cs.state = s
|
|
|
|
cs.t = now
|
|
|
|
}
|
|
|
|
|
|
|
|
func (cs *copyStatus) setError(err error) {
|
|
|
|
now := time.Now()
|
|
|
|
sh := cs.sh
|
|
|
|
br := cs.sb.Ref
|
|
|
|
if err == nil {
|
|
|
|
// This is somewhat slow, so do it before we acquire the lock.
|
|
|
|
// The queue is thread-safe.
|
|
|
|
if derr := sh.queue.Delete(br.String()); derr != nil {
|
|
|
|
sh.logf("queue delete of %v error: %v", cs.sb.Ref, derr)
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
sh.mu.Lock()
|
|
|
|
defer sh.mu.Unlock()
|
|
|
|
if _, needCopy := sh.needCopy[br]; !needCopy {
|
|
|
|
sh.logf("IGNORING DUPLICATE UPLOAD of %v = %v", br, err)
|
|
|
|
return
|
|
|
|
}
|
|
|
|
delete(sh.copying, br)
|
|
|
|
if err == nil {
|
|
|
|
delete(sh.needCopy, br)
|
|
|
|
delete(sh.lastFail, br)
|
|
|
|
sh.recentCopyTime = now
|
|
|
|
sh.totalCopies++
|
|
|
|
sh.totalCopyBytes += int64(cs.sb.Size)
|
|
|
|
sh.bytesRemain -= int64(cs.sb.Size)
|
|
|
|
return
|
|
|
|
}
|
|
|
|
|
2014-03-07 18:56:52 +00:00
|
|
|
sh.totalErrors++
|
2014-03-05 16:51:22 +00:00
|
|
|
sh.logf("error copying %v: %v", br, err)
|
|
|
|
sh.lastFail[br] = failDetail{
|
|
|
|
when: now,
|
|
|
|
err: err,
|
|
|
|
}
|
|
|
|
|
|
|
|
// Kinda lame. TODO: use a ring buffer or container/list instead.
|
|
|
|
if len(sh.recentErrors) == maxRecentErrors {
|
|
|
|
copy(sh.recentErrors[1:], sh.recentErrors)
|
|
|
|
sh.recentErrors = sh.recentErrors[:maxRecentErrors-1]
|
|
|
|
}
|
|
|
|
sh.recentErrors = append(sh.recentErrors, br)
|
|
|
|
}
|
|
|
|
|
|
|
|
func (cs *copyStatus) String() string {
|
|
|
|
var buf bytes.Buffer
|
|
|
|
now := time.Now()
|
|
|
|
buf.WriteString(cs.sb.Ref.String())
|
|
|
|
buf.WriteString(": ")
|
|
|
|
|
|
|
|
cs.mu.Lock()
|
|
|
|
defer cs.mu.Unlock()
|
|
|
|
sinceStart := now.Sub(cs.start)
|
|
|
|
sinceLast := now.Sub(cs.t)
|
|
|
|
|
|
|
|
switch cs.state {
|
|
|
|
case statusReading:
|
|
|
|
buf.WriteString(cs.state)
|
|
|
|
fmt.Fprintf(&buf, " (%d/%dB)", cs.nread, cs.sb.Size)
|
|
|
|
case statusWriting:
|
|
|
|
if cs.nwrite == cs.sb.Size {
|
|
|
|
buf.WriteString("wrote all, waiting ack")
|
|
|
|
} else {
|
|
|
|
buf.WriteString(cs.state)
|
|
|
|
fmt.Fprintf(&buf, " (%d/%dB)", cs.nwrite, cs.sb.Size)
|
|
|
|
}
|
2013-11-26 03:18:13 +00:00
|
|
|
default:
|
2014-03-05 16:51:22 +00:00
|
|
|
buf.WriteString(cs.state)
|
|
|
|
|
2013-11-26 03:18:13 +00:00
|
|
|
}
|
2014-03-05 16:51:22 +00:00
|
|
|
if sinceLast > 5*time.Second {
|
|
|
|
fmt.Fprintf(&buf, ", last change %v ago (total elapsed %v)", sinceLast, sinceStart)
|
|
|
|
}
|
|
|
|
return buf.String()
|
|
|
|
}
|
|
|
|
|
|
|
|
type failDetail struct {
|
|
|
|
when time.Time
|
|
|
|
err error
|
|
|
|
}
|
|
|
|
|
|
|
|
// incrWriter is an io.Writer that locks mu and increments *n.
|
|
|
|
type incrWriter struct {
|
|
|
|
cs *copyStatus
|
|
|
|
n *uint32
|
|
|
|
}
|
|
|
|
|
|
|
|
func (w incrWriter) Write(p []byte) (n int, err error) {
|
|
|
|
w.cs.mu.Lock()
|
|
|
|
*w.n += uint32(len(p))
|
|
|
|
w.cs.t = time.Now()
|
|
|
|
w.cs.mu.Unlock()
|
|
|
|
return len(p), nil
|
|
|
|
}
|
|
|
|
|
|
|
|
func storageDesc(v interface{}) string {
|
|
|
|
if s, ok := v.(fmt.Stringer); ok {
|
|
|
|
return s.String()
|
|
|
|
}
|
|
|
|
return fmt.Sprintf("%T", v)
|
Get rid of QueueCreator and all its associated complexity.
Previous TODO entry was:
-- Get rid of QueueCreator entirely. Plan:
-- sync handler still has a source and dest (one pair) but
instead of calling CreateQueue on the source, it instead
has an index.Storage (configured via a RequiredObject
so it can be a kvfile, leveldb, mysql, postgres etc)
-- make all the index.Storage types be instantiable
from a jsonconfig Object, perhaps with constructors keyed
on a "type" field.
-- make sync handler support blobserver.Receiver (or StatReceiver)
like indexes, so it can receive blobs. but all it needs to
do to acknowledge the ReceiveBlob is write and flush to its
index.Storage. the syncing is async by default. (otherwise callers
could just use "replica" if they wanted sync replication).
But maybe for ease of configuration switching, we could also
support a sync mode. when it needs to replicate a blob,
it uses the source.
-- future option: sync mirror to an alternate path on ReceiveBlob
that can delete. e.g. you're uploading to s3 and google,
but don't want to upload to both at once, so you use the localdisk
as a buffer to spread out your upstream bandwidth.
-- end result: no more hardlinks or queue creator.
Change-Id: I6244fc4f3a655f08470ae3160502659399f468ed
2013-11-22 22:33:31 +00:00
|
|
|
}
|
2013-11-23 17:09:40 +00:00
|
|
|
|
|
|
|
// TODO(bradfitz): implement these? what do they mean? possibilities:
|
|
|
|
// a) proxy to sh.from
|
|
|
|
// b) proxy to sh.to
|
|
|
|
// c) merge intersection of sh.from, sh.to, and sh.queue: that is, a blob this pair
|
|
|
|
// currently or eventually will have. The only missing blob would be one that
|
|
|
|
// sh.from has, sh.to doesn't have, and isn't in the queue to be replicated.
|
|
|
|
//
|
|
|
|
// For now, don't implement them. Wait until we need them.
|
2014-03-05 16:51:22 +00:00
|
|
|
|
2014-03-14 19:11:08 +00:00
|
|
|
func (sh *SyncHandler) Fetch(blob.Ref) (file io.ReadCloser, size uint32, err error) {
|
2014-03-05 16:51:22 +00:00
|
|
|
panic("Unimplemeted blobserver.Fetch called")
|
|
|
|
}
|
|
|
|
|
|
|
|
func (sh *SyncHandler) StatBlobs(dest chan<- blob.SizedRef, blobs []blob.Ref) error {
|
|
|
|
sh.logf("Unexpected StatBlobs call")
|
|
|
|
return nil
|
|
|
|
}
|
|
|
|
|
2015-12-12 21:47:31 +00:00
|
|
|
func (sh *SyncHandler) EnumerateBlobs(ctx context.Context, dest chan<- blob.SizedRef, after string, limit int) error {
|
2014-03-05 16:51:22 +00:00
|
|
|
defer close(dest)
|
|
|
|
sh.logf("Unexpected EnumerateBlobs call")
|
|
|
|
return nil
|
|
|
|
}
|
|
|
|
|
|
|
|
func (sh *SyncHandler) RemoveBlobs(blobs []blob.Ref) error {
|
|
|
|
panic("Unimplemeted RemoveBlobs")
|
|
|
|
}
|
2014-03-18 06:21:53 +00:00
|
|
|
|
|
|
|
// chanError is a Future around an incoming error channel of one item.
|
|
|
|
// It can also wrap its error in something more descriptive.
|
|
|
|
type chanError struct {
|
|
|
|
C <-chan error
|
|
|
|
Wrap func(error) error // optional
|
|
|
|
err error
|
|
|
|
received bool
|
|
|
|
}
|
|
|
|
|
|
|
|
func (ce *chanError) Set(err error) {
|
|
|
|
if ce.Wrap != nil && err != nil {
|
|
|
|
err = ce.Wrap(err)
|
|
|
|
}
|
|
|
|
ce.err = err
|
|
|
|
ce.received = true
|
|
|
|
}
|
|
|
|
|
|
|
|
func (ce *chanError) Get() error {
|
|
|
|
if ce.received {
|
|
|
|
return ce.err
|
|
|
|
}
|
|
|
|
ce.Set(<-ce.C)
|
|
|
|
return ce.err
|
|
|
|
}
|