stash/pkg/database/custom_migrations.go

74 lines
1.8 KiB
Go
Raw Normal View History

Add indexes for path and checksum to images (#1740) * Add indexes for path and checksum to images The scenes table has unique indexes/constraints on path and checksum colums. The images table doesn't, which doesn't really make sense, as scanning uses these colums extensively which warrents an index, and both should be unique as well. Adding these indexes thus heavily improves the scanning tasks performance. On a database containing 4700 images a (re)scan of those 4700 files, which thus shouldn't do anything, took 1.2 seconds, with the indexes added this only takes 0.4 seconds. Taking the same test on a generated database containing 4M images + the actual 4700 images took 26 minutes for a rescan, and with the index existing also only takes 0.4 seconds. * Add images.checksum unique constraint in code with fallback Work around the issue where in some cases duplicate images (/checksums on images) might exist. This as discussed in #1740 by creating the index on startup and in case of an error logging the duplicates. This so the users where this scenario exists can correct the database (by searching on the logged checksum(s) and removing the duplicates) and after a restart the unique index / constraint will still be created. In case when creating the unique index fails a "normal" / non-unique index is created as surrogate so the user will still get the performance benefit (for example during scanning) without being forced to remove the duplicates and restart beforehand. This surrogate is also automatically cleaned up after the unique index is succesfully created.
2021-09-21 01:48:52 +00:00
package database
import (
"database/sql"
Errorlint sweep + minor linter tweaks (#1796) * Replace error assertions with Go 1.13 style Use `errors.As(..)` over type assertions. This enables better use of wrapped errors in the future, and lets us pass some errorlint checks in the process. The rewrite is entirely mechanical, and uses a standard idiom for doing so. * Use Go 1.13's errors.Is(..) Rather than directly checking for error equality, use errors.Is(..). This protects against error wrapping issues in the future. Even though something like sql.ErrNoRows doesn't need the wrapping, do so anyway, for the sake of consistency throughout the code base. The change almost lets us pass the `errorlint` Go checker except for a missing case in `js.go` which is to be handled separately; it isn't mechanical, like these changes are. * Remove goconst goconst isn't a useful linter in many cases, because it's false positive rate is high. It's 100% for the current code base. * Avoid direct comparison of errors in recover() Assert that we are catching an error from recover(). If we are, check that the error caught matches errStop. * Enable the "errorlint" checker Configure the checker to avoid checking for errorf wraps. These are often false positives since the suggestion is to blanket wrap errors with %w, and that exposes the underlying API which you might not want to do. The other warnings are good however, and with the current patch stack, the code base passes all these checks as well. * Configure rowserrcheck The project uses sqlx. Configure rowserrcheck to include said package. * Mechanically rewrite a large set of errors Mechanically search for errors that look like fmt.Errorf("...%s", err.Error()) and rewrite those into fmt.Errorf("...%v", err) The `fmt` package is error-aware and knows how to call err.Error() itself. The rationale is that this is more idiomatic Go; it paves the way for using error wrapping later with %w in some sites. This patch only addresses the entirely mechanical rewriting caught by a project-side search/replace. There are more individual sites not addressed by this patch.
2021-10-12 03:03:08 +00:00
"errors"
Lint checks phase 2 (#1747) * Log 3 unchecked errors Rather than ignore errors, log them at the WARNING log level. The server has been functioning without these, so assume they are not at the ERROR level. * Log errors in concurrency test If we can't initialize the configuration, treat the test as a failure. * Undo the errcheck on configurations for now. * Handle unchecked errors in pkg/manager * Resolve unchecked errors * Handle DLNA/DMS unchecked errors * Handle error checking in concurrency test Generalize config initialization, so we can initialize a configuration without writing it to disk. Use this in the test case, since otherwise the test fails to write. * Handle the remaining unchecked errors * Heed gosimple in update test * Use one-line if-initializer statements While here, fix a wrong variable capture error. * testing.T doesn't support %w use %v instead which is supported. * Remove unused query builder functions The Int/String criterion handler functions are now generalized. Thus, there's no need to keep these functions around anymore. * Mark filterBuilder.addRecursiveWith nolint The function is useful in the future and no other refactors are looking nice. Keep the function around, but tell the linter to ignore it. * Remove utils.Btoi There are no users of this utility function * Return error on scan failure If we fail to scan the row when looking for the unique checksum index, then report the error upwards. * Fix comments on exported functions * Fix typos * Fix startup error
2021-09-23 07:15:50 +00:00
"fmt"
Add indexes for path and checksum to images (#1740) * Add indexes for path and checksum to images The scenes table has unique indexes/constraints on path and checksum colums. The images table doesn't, which doesn't really make sense, as scanning uses these colums extensively which warrents an index, and both should be unique as well. Adding these indexes thus heavily improves the scanning tasks performance. On a database containing 4700 images a (re)scan of those 4700 files, which thus shouldn't do anything, took 1.2 seconds, with the indexes added this only takes 0.4 seconds. Taking the same test on a generated database containing 4M images + the actual 4700 images took 26 minutes for a rescan, and with the index existing also only takes 0.4 seconds. * Add images.checksum unique constraint in code with fallback Work around the issue where in some cases duplicate images (/checksums on images) might exist. This as discussed in #1740 by creating the index on startup and in case of an error logging the duplicates. This so the users where this scenario exists can correct the database (by searching on the logged checksum(s) and removing the duplicates) and after a restart the unique index / constraint will still be created. In case when creating the unique index fails a "normal" / non-unique index is created as surrogate so the user will still get the performance benefit (for example during scanning) without being forced to remove the duplicates and restart beforehand. This surrogate is also automatically cleaned up after the unique index is succesfully created.
2021-09-21 01:48:52 +00:00
"strings"
"github.com/jmoiron/sqlx"
"github.com/stashapp/stash/pkg/logger"
)
func runCustomMigrations() error {
if err := createImagesChecksumIndex(); err != nil {
return err
}
return nil
}
func createImagesChecksumIndex() error {
return WithTxn(func(tx *sqlx.Tx) error {
row := tx.QueryRow("SELECT 1 AS found FROM sqlite_master WHERE type = 'index' AND name = 'images_checksum_unique'")
err := row.Err()
Errorlint sweep + minor linter tweaks (#1796) * Replace error assertions with Go 1.13 style Use `errors.As(..)` over type assertions. This enables better use of wrapped errors in the future, and lets us pass some errorlint checks in the process. The rewrite is entirely mechanical, and uses a standard idiom for doing so. * Use Go 1.13's errors.Is(..) Rather than directly checking for error equality, use errors.Is(..). This protects against error wrapping issues in the future. Even though something like sql.ErrNoRows doesn't need the wrapping, do so anyway, for the sake of consistency throughout the code base. The change almost lets us pass the `errorlint` Go checker except for a missing case in `js.go` which is to be handled separately; it isn't mechanical, like these changes are. * Remove goconst goconst isn't a useful linter in many cases, because it's false positive rate is high. It's 100% for the current code base. * Avoid direct comparison of errors in recover() Assert that we are catching an error from recover(). If we are, check that the error caught matches errStop. * Enable the "errorlint" checker Configure the checker to avoid checking for errorf wraps. These are often false positives since the suggestion is to blanket wrap errors with %w, and that exposes the underlying API which you might not want to do. The other warnings are good however, and with the current patch stack, the code base passes all these checks as well. * Configure rowserrcheck The project uses sqlx. Configure rowserrcheck to include said package. * Mechanically rewrite a large set of errors Mechanically search for errors that look like fmt.Errorf("...%s", err.Error()) and rewrite those into fmt.Errorf("...%v", err) The `fmt` package is error-aware and knows how to call err.Error() itself. The rationale is that this is more idiomatic Go; it paves the way for using error wrapping later with %w in some sites. This patch only addresses the entirely mechanical rewriting caught by a project-side search/replace. There are more individual sites not addressed by this patch.
2021-10-12 03:03:08 +00:00
if err != nil && !errors.Is(err, sql.ErrNoRows) {
Add indexes for path and checksum to images (#1740) * Add indexes for path and checksum to images The scenes table has unique indexes/constraints on path and checksum colums. The images table doesn't, which doesn't really make sense, as scanning uses these colums extensively which warrents an index, and both should be unique as well. Adding these indexes thus heavily improves the scanning tasks performance. On a database containing 4700 images a (re)scan of those 4700 files, which thus shouldn't do anything, took 1.2 seconds, with the indexes added this only takes 0.4 seconds. Taking the same test on a generated database containing 4M images + the actual 4700 images took 26 minutes for a rescan, and with the index existing also only takes 0.4 seconds. * Add images.checksum unique constraint in code with fallback Work around the issue where in some cases duplicate images (/checksums on images) might exist. This as discussed in #1740 by creating the index on startup and in case of an error logging the duplicates. This so the users where this scenario exists can correct the database (by searching on the logged checksum(s) and removing the duplicates) and after a restart the unique index / constraint will still be created. In case when creating the unique index fails a "normal" / non-unique index is created as surrogate so the user will still get the performance benefit (for example during scanning) without being forced to remove the duplicates and restart beforehand. This surrogate is also automatically cleaned up after the unique index is succesfully created.
2021-09-21 01:48:52 +00:00
return err
}
if err == nil {
var found bool
Lint checks phase 2 (#1747) * Log 3 unchecked errors Rather than ignore errors, log them at the WARNING log level. The server has been functioning without these, so assume they are not at the ERROR level. * Log errors in concurrency test If we can't initialize the configuration, treat the test as a failure. * Undo the errcheck on configurations for now. * Handle unchecked errors in pkg/manager * Resolve unchecked errors * Handle DLNA/DMS unchecked errors * Handle error checking in concurrency test Generalize config initialization, so we can initialize a configuration without writing it to disk. Use this in the test case, since otherwise the test fails to write. * Handle the remaining unchecked errors * Heed gosimple in update test * Use one-line if-initializer statements While here, fix a wrong variable capture error. * testing.T doesn't support %w use %v instead which is supported. * Remove unused query builder functions The Int/String criterion handler functions are now generalized. Thus, there's no need to keep these functions around anymore. * Mark filterBuilder.addRecursiveWith nolint The function is useful in the future and no other refactors are looking nice. Keep the function around, but tell the linter to ignore it. * Remove utils.Btoi There are no users of this utility function * Return error on scan failure If we fail to scan the row when looking for the unique checksum index, then report the error upwards. * Fix comments on exported functions * Fix typos * Fix startup error
2021-09-23 07:15:50 +00:00
if err := row.Scan(&found); err != nil && err != sql.ErrNoRows {
return fmt.Errorf("error while scanning for index: %w", err)
}
Add indexes for path and checksum to images (#1740) * Add indexes for path and checksum to images The scenes table has unique indexes/constraints on path and checksum colums. The images table doesn't, which doesn't really make sense, as scanning uses these colums extensively which warrents an index, and both should be unique as well. Adding these indexes thus heavily improves the scanning tasks performance. On a database containing 4700 images a (re)scan of those 4700 files, which thus shouldn't do anything, took 1.2 seconds, with the indexes added this only takes 0.4 seconds. Taking the same test on a generated database containing 4M images + the actual 4700 images took 26 minutes for a rescan, and with the index existing also only takes 0.4 seconds. * Add images.checksum unique constraint in code with fallback Work around the issue where in some cases duplicate images (/checksums on images) might exist. This as discussed in #1740 by creating the index on startup and in case of an error logging the duplicates. This so the users where this scenario exists can correct the database (by searching on the logged checksum(s) and removing the duplicates) and after a restart the unique index / constraint will still be created. In case when creating the unique index fails a "normal" / non-unique index is created as surrogate so the user will still get the performance benefit (for example during scanning) without being forced to remove the duplicates and restart beforehand. This surrogate is also automatically cleaned up after the unique index is succesfully created.
2021-09-21 01:48:52 +00:00
if found {
return nil
}
}
_, err = tx.Exec("CREATE UNIQUE INDEX images_checksum_unique ON images (checksum)")
if err == nil {
_, err = tx.Exec("DROP INDEX IF EXISTS index_images_checksum")
if err != nil {
logger.Errorf("Failed to remove surrogate images.checksum index: %s", err)
}
logger.Info("Created unique constraint on images table")
return nil
}
_, err = tx.Exec("CREATE INDEX IF NOT EXISTS index_images_checksum ON images (checksum)")
if err != nil {
logger.Errorf("Unable to create index on images.checksum: %s", err)
}
var result []struct {
Checksum string `db:"checksum"`
}
err = tx.Select(&result, "SELECT checksum FROM images GROUP BY checksum HAVING COUNT(1) > 1")
Errorlint sweep + minor linter tweaks (#1796) * Replace error assertions with Go 1.13 style Use `errors.As(..)` over type assertions. This enables better use of wrapped errors in the future, and lets us pass some errorlint checks in the process. The rewrite is entirely mechanical, and uses a standard idiom for doing so. * Use Go 1.13's errors.Is(..) Rather than directly checking for error equality, use errors.Is(..). This protects against error wrapping issues in the future. Even though something like sql.ErrNoRows doesn't need the wrapping, do so anyway, for the sake of consistency throughout the code base. The change almost lets us pass the `errorlint` Go checker except for a missing case in `js.go` which is to be handled separately; it isn't mechanical, like these changes are. * Remove goconst goconst isn't a useful linter in many cases, because it's false positive rate is high. It's 100% for the current code base. * Avoid direct comparison of errors in recover() Assert that we are catching an error from recover(). If we are, check that the error caught matches errStop. * Enable the "errorlint" checker Configure the checker to avoid checking for errorf wraps. These are often false positives since the suggestion is to blanket wrap errors with %w, and that exposes the underlying API which you might not want to do. The other warnings are good however, and with the current patch stack, the code base passes all these checks as well. * Configure rowserrcheck The project uses sqlx. Configure rowserrcheck to include said package. * Mechanically rewrite a large set of errors Mechanically search for errors that look like fmt.Errorf("...%s", err.Error()) and rewrite those into fmt.Errorf("...%v", err) The `fmt` package is error-aware and knows how to call err.Error() itself. The rationale is that this is more idiomatic Go; it paves the way for using error wrapping later with %w in some sites. This patch only addresses the entirely mechanical rewriting caught by a project-side search/replace. There are more individual sites not addressed by this patch.
2021-10-12 03:03:08 +00:00
if err != nil && !errors.Is(err, sql.ErrNoRows) {
Add indexes for path and checksum to images (#1740) * Add indexes for path and checksum to images The scenes table has unique indexes/constraints on path and checksum colums. The images table doesn't, which doesn't really make sense, as scanning uses these colums extensively which warrents an index, and both should be unique as well. Adding these indexes thus heavily improves the scanning tasks performance. On a database containing 4700 images a (re)scan of those 4700 files, which thus shouldn't do anything, took 1.2 seconds, with the indexes added this only takes 0.4 seconds. Taking the same test on a generated database containing 4M images + the actual 4700 images took 26 minutes for a rescan, and with the index existing also only takes 0.4 seconds. * Add images.checksum unique constraint in code with fallback Work around the issue where in some cases duplicate images (/checksums on images) might exist. This as discussed in #1740 by creating the index on startup and in case of an error logging the duplicates. This so the users where this scenario exists can correct the database (by searching on the logged checksum(s) and removing the duplicates) and after a restart the unique index / constraint will still be created. In case when creating the unique index fails a "normal" / non-unique index is created as surrogate so the user will still get the performance benefit (for example during scanning) without being forced to remove the duplicates and restart beforehand. This surrogate is also automatically cleaned up after the unique index is succesfully created.
2021-09-21 01:48:52 +00:00
logger.Errorf("Unable to determine non-unique image checksums: %s", err)
return nil
}
checksums := make([]string, len(result))
for i, res := range result {
checksums[i] = res.Checksum
}
logger.Warnf("The following duplicate image checksums have been found. Please remove the duplicates and restart. %s", strings.Join(checksums, ", "))
return nil
})
}